What is - , 07-11-2023

What Are 'Association Rules' for Social Scientists? a data mining discussion

Speaker(s):

Bio: Wendy Olsen has been working as a Professor of Socio-Economics at the University of Manchester, in the department of Social Statistics. She researches in development studies, sociology, and socio-economics. She focuses at present on explaining the hours that people work, and the diverse wage-rates. Her research thus covers employment, informal work, gender, and labour markets in India, South Asia, and the UK. She has also created new designs for 'mixed methods' research that include quantitative methods, in her book Systematic Mixed Methods Research, Palgrave, 2022. Key publications also include Data Collection (Sage, 2012), Rural Indian Social Relations (Oxford, 1996), and Realist Methodology (ed., 4 volumes, Sage, 2010). Her key theoretical frames include gender and development theory, institutional change, moral reasoning in relation to the use of quantitative data, and statistical methodology. Profile is found at: https://research.manchester.ac.uk/en/persons/wendy-kay-olsen

Abstract:

Association rules are used in market research to help guide the strategy for marketing toward better sales and revenue. THe methods of 'association rules' discernment can be used to work out what factors can predict (or act as precursors of) other outcomes. These methods are occasionally used in social science. For example one study used association rules to find correlation of social group memberships and occupations, and another application would be to p redict occupational moves by the existing occupation and the spending pattern. In this workshop, I explore 'Consumer Expenditure Surveys' as a social-science data source and explain what 'association rules' are. When applying the association rules to variables, I use a hypothesis-testing approach. It is thus a supervised, social data analytics application. (By contrast, typical data-science methods of data mining are unsupervised. I explain all this terminology.) I show results for two settings: 1) Indian consumer data and the occupational groups for 2014/5, with adoption of mobile phones a key indicator on the consumer-spending side; and 2) Indian spending on a range of technical goods, which have very low cross-correlations. In the first setting association rules are useful but regression is a close alternative. In the second setting, I develop advice for data cleaning to discover patterns in a large-data context - yet not assuming unsupervised data mining. The workshop is open to all, no prerequisite knowledge.