Linking cohort study data to administrative records: the challenges of consent and coverage
Principal Investigator: Tarek Mostafa, Institute of Education, University of London
Co-Investigators: Lucinda Platt and John Micklewright (Institute of Education)
Project duration: 1 April 2013 - 30 September 2014
The information present in sample surveys can often be enriched by linking the data for each person in the survey to administrative registers that record individuals’ educational histories, their use of hospitals or other health services, and records of their national insurance contributions, tax payments and receipt of state benefits. The linkage of survey data to administrative records is an increasingly common phenomenon.
The additional data obtained through data linkage provide a wealth of information at a limited extra cost. However, data linkage is challenged by three problems. First, survey respondents may refuse permission for their administrative records to be accessed and linked to the survey data, something known as ‘non-consent’. Second, it may be impossible to link their records even when consent is given, e.g. a person may not be found in the administrative register, a phenomenon known as ‘non-coverage’. Third, even when successfully linked some of the variables in the administrative data may contain ‘missing values’, i.e. no information. These three reasons for missing data may generate biases in analyses of data formed by linking surveys with administrative records. This project aims to examine patterns of non-consent and non-coverage in more detail than in existing research and to explore weighting and imputation techniques used to adjust for biases from all three sources. The research deals with an under-researched area of survey methodology that has become more important due to increased efforts to link survey and administrative data.
The project investigators use data from the Millennium Cohort Study (MCS), which follows the lives of around 19,000 children born in the UK in 2000-01. The MCS attempts to augment its data on the study children, their siblings, and their parents by linking to various sources of administrative data on health, education, and taxes and benefits. The findings from our research have lessons for other surveys which are linked to administrative records.
The project investigators analyse patterns of non-consent and non-coverage in different ‘domains’ (education, health, and the labour market), for different persons for whom consent was sought (the study children, siblings, parents), and over time (consent to link has been sought at different times during the study children’s live. The investigators expect that consent and coverage are related to the characteristics of the study children, their families, and possibly also of the interviewers who ask for consent. Finding such relations means that the missing data resulting from non-consent and non-coverage will cause biases in analyses of the linked data.
The project investigators then go on to construct ‘weights’ that re-adjust the composition of the reduced sample of individuals for whom data can be linked to allow for such biases and to compare the use of these weights with methods to ‘impute’ (fill-in) the missing information instead.