Combining Data from Multiple Administrative and Survey Sources for Statistical Purposes (Swansea)
Date:
07/11/2017 - 08/11/2017
Organised by:
University of Southampton/ADRC-E
Presenter:
Prof Li-Chun Zhang
Level:
Intermediate (some prior knowledge)
Contact:
Map:
View in Google Maps (SA2 8PP)
Venue:
Data Science Building, DS05 (Floor 1), Swansea University Medical School, Singleton Park, Swansea
Description:
Course number: ADRCE-training040 Zhang
Course places are limited and registration by 31st October 2017 is strongly recommended.
Summary of Course:
More and more, social science research makes use of data residing in multiple sources, including sample surveys, census and administrative registers. A major benefit is widened scope of analysis that could not have been feasible based on the data from each source on its own. However, the combined data may contain many apparent inconsistencies and shortcomings that one needs to overcome. The data linkage and integration process may also generate errors of its own. To analyse such imperfect data as-is leads generally to incorrect inference.
Day one provides a general introduction to combining multiple administrative and survey datasets for statistical purposes. A total-error framework is presented for integrated statistical data, which provides a systematic overview of the origin and nature of the various potential errors. The most typical data configurations are illustrated and the relevant statistical methods reviewed.
Day two covers a handful of selected statistical methods. Training will be given on the techniques of data fusion, or statistical matching, by which joint statistical data is created from separate marginal observations. The participants will be introduced to several imputation or adjustment techniques, in the presence of constraints arising from overlapping data sources.
The training introduction podcast can be viewed here: https://adrn.ac.uk/about/research-centre-england/training-podcasts/
Course Contents:
- Life-cycle of integrated statistical data and transformation processes
- A framework of error sources associated with data integration
- Population coverage and unit errors
- Uncertainty and techniques of categorical data fusion, or statistical matching
- Imputation and adjustment methods subjected to micro- and macro-level constraint
By the end of the course participants will have:
- Understanding of potential errors and statistical uncertainty involved in data integration
- Ability to apply relevant concepts and methods in practice
- Appreciation of opportunities and challenges of inference based on data integration
Computer Software and Computer workshops:
This event includes computer workshops. (Please provide further detail such as what types of software needed):
The course will involve practice/exercise using R at https://www.r-project.org/
Presenter:
Li-Chun Zhang is Professor of Social Statistics at the Southampton Statistical Sciences Research Institute (S3RI) at the University of Southampton, and senior methodologist at Statistics Norway. He has participated in a number of EU framework projects and Eurostat ESSnet projects. His research interest includes data integration, statistical uses of administrative sources, sampling, sample coordination, estimation and imputation, treatment of non-sampling errors, small area estimation, statistical data editing, and statistical modelling. He obtained Dr. Scient. in Statistics at the University of Tromsø, Norway.
Target Audience:
Social and medical researchers with interests in combining data from multiple sources or analysing data from different sources; staff at National Statistical Institutes (or similar organisations) who are involved in the design, management and quality assurance of statistical processes based on data from multiple sources including censuses, administrative data and sample surveys. Methodological training, knowledge and experience will be helpful.
Pre-requisites:
Understanding of central concepts of statistical uncertainty (such as bias, variance, confidence interval) and distribution, basic knowledge of data cleaning and imputation, basic experience/skill of R for statistical computing
Provisional Programme:
Day One
09:30 – 10:00 Registration and coffee
10:00 – 13:00 Introduction to integrated statistical data and potential error sources
(with tea break at 11.30)
13:00 – 14:00 Lunch
14:00 – 17:00 Most typical situations of multi-source data and relevant statistical methods
Exercise, Q&A
(with tea break at 15:45)
Day Two
9:30 – 9:45 Recap
9:45 – 13:00 Statistical matching: Uncertainty and techniques with exercise
(with tea break at 11.30)
13:00 – 14:00 Lunch
14:00 – 17:00 Some constrained imputation or adjustment methods
(with tea break at 15:45)
Course Materials:
Participants will receive written course notes.
Podcast:
Podcasts for some of our previous courses can be found at https://adrn.ac.uk/about/network/england/training-podcasts/
--
Our courses are very popular and are often oversubscribed. If you cannot attend a course you have registered for, it is essential to kindly notify us a minimum of 30 days in advance so that your place can be released for another attendee. Details of our cancellation policy are here: http://store.southampton.ac.uk/help/?HelpID=1 . Please see our full course list here: http://store.southampton.ac.uk/browse/product.asp?compid=1&modid=5&catid=113.
Cost:
The fee per day is:
1. £30 - For UK registered postgraduate students
2. £60 - For staff at UK academic institutions, Research Council UK funded researchers, UK public sector staff and staff at UK registered charity organisations
3. £220 - For all other participants
4. Free Place for ADRC/ADRN/ADS staff
All fees include event materials, lunch, morning and afternoon tea. *They do not include travel and accommodation costs.*
Website and registration:
Region:
Wales
Keywords:
Imputation, Data fusion, Data integration , Total error framework , Statistical methods , Uncertainty assessment , Linkage error , Data fusion, or statistical matching , Imputation
Related publications and presentations from our eprints archive: