A person working on a computer

Regression modelling: learning pathway

Regression modelling is one of the most frequently used tools for examining relationships between variables. On this page, you’ll find a series of learning resources that introduce the key regression modelling techniques and show you how to apply them in your research.

With our video tutorials and publications, we’ll guide you through core, intermediate and advanced methods, helping you to choose the right approach for every project.

The information on this page is organised in five stages:

  1. Planning your project
  2. Linear regression
  3. Logistic regression
  4. Multilevel modelling
  5. Advanced regression methods

Looking for online and in-person training? Search our programme of short courses


Stage 1: Planning your project


A person typing on a keyboard

Prepare your workflow

Before beginning your project, we also suggest taking the time to prepare your workflow. To help you with this, we have created a step-by-step guide that outlines common stages in the statistical data analysis workflow.

You’ll need to download some statistics software in order to do regression modelling. There are several main tools uses in the social sciences, many of which are free, but the one we recommend you start with is Statistical Package for the Social Sciences (SPSS).

Access our guide on the statistical data analysis workflow

Data sources

As you form a research question, you will need to plan on either collecting new primary data or using data collected by another person or team, secondary data. In the tutorials below, we will focus on secondary data analysis, which is frequently used in social sciences.

If you’re looking for secondary datasets for your research, we recommend exploring the UK Data Service website, where you can register for a free account, explore their extensive archives and download the data you need. Explore the UK Data Service website


Data types

Data are usually classified as continuous, nominal or ordinal. The type of data you have access to will influence your analysis plan. Continuous data includes outcomes such as single year of age. Grouped data can be nominal or ordinal. Nominal data are categories, such as ethnic groups, and ordinal data are groups of data which follow an order, such as age groups.

You can convert some continuous data to ordinal for the purpose of statistical analysis. If you do this, consider the impact on interpretation of the outputs, as you will lose some detail by creating categorical variables.


Preparing your data

Before you start using regression models, you will first need to do some data preparation, initial data exploration and correlation analysis. Correlation analysis will indicate the strength of a relationship (association) between two variables, and it if is statistically significant.

When you identify a significant relationship between variables, you can extend your testing using regression modelling, which enables us to examine how well one variable predicts another.

Scatterplots are the best way to visualise correlation between two continuous variables. To learn more about scatterplots and how to produce them, read the guide on our ReStore website.


Stage 2: Linear regression


Line graphs

Simple regression

The first type of regression modelling that you should get comfortable with is simple linear regression. This is referred to as simple as there are only two variables included: the outcome and a single predictor.

The process can be repeated for different predictors, with the same outcome, to help develop a multiple regression model that allows you to account for other variables which may influence the outcome. Here, the variables are usually continuous, such as income. See an example of the simple regression model.

The above example is from a larger resource that offers a step-by-step introduction to various types of regression. It is designed for researchers and students with limited experience in quantitative research methods, and although it is from an early NCRM project, the resource still provides a great overview. It includes information on the assumptions that must be met for linear regression.

View the step-by-step introduction to simple regression

Multiple regression

When you want to include more than one predictor variable, you can use multiple regression. It is important to choose the predictor variables carefully, based on the exploratory data analysis, correlation analysis and literature review. To provide learners with an introduction to multiple linear regression, NCRM has an online video tutorial, with downloadable worksheets and handbook.

Watch our tutorial on multiple linear regression

For an in-depth walk through, view the step-by-step introduction to multiple regression. We also recommend testing your knowledge using worksheets and practical examples with the Statistical Package for the Social Sciences (SPSS), or trying a short quiz. Access a worksheet and take a quiz.


Stage 3: Logistic regression


A line graph

As with linear regression, there are different types of models for logistic regression: binary, ordinal and multinominal. We have video tutorials on each type of model – each tutorial begins with an overview and includes computer exercises using the programme Stata.


Binary logistic regression

When the outcome (dependent) variable is binary (smoker vs non-smoker, age 0-17 vs age 18 +), you can use binary logistic regression models to predict an odds ratio of an outcome occurring in relation to one or more predictor (dependent) variables.

To give you an overview of the key skills, we recommend you watch our three-part video tutorial. The resource introduces the principles of the method, uses empirical examples to explain how the method is used and includes a workshop exercise, which shows you how to put this knowledge into practice.

Watch our tutorial on binary logistic regression

Ordinal logistic regression

Ordinal logistic regression can be used in situations where the outcome variable has at least three categories that are ordered. Our three-part tutorial provides an introduction to the method, an overview of multivariate ordinal logistic regression, examples and an exercise.

Watch our tutorial on ordinal logistic regression

Multinomial logistic regression

This method can be used in situations where the outcome (dependent) variable has three or more categories. Our two-part tutorial begins with overview of the method and when it should be used, using an example with one explanatory variable. It then discusses multinomial logistic regression models, which involve more than one explanatory variable.

Watch our tutorial on multinomial logistic regression

Stage 4: Multilevel modelling


Line graphs

Multilevel modelling is used extensively in social science research. The method extends regression approaches to explore the influences of clusters in data, such as pupils within schools in cities. It enables us to recognise that individual experiences may be shaped by context. For example, children’s educational outcomes may be more similar if they go to the same school.

Our three-part video tutorial offers a general introduction to multilevel modelling, as well as outlining the key features of random intercept and coefficient models.

Watch our tutorial on multilevel modelling

For an informal, introductory discussion about multilevel modelling, listen to our podcast on the topic. NCRM’s centre partner, the Centre for Multilevel Modelling, has some useful background information, including basic definitions of key terms and software suggestions. Visit their website.


Stage 5: Advanced regression methods


Squares connected by lines

Beyond linear and logistic regression methods, there are more options to consider as you develop your research. For example, you may be working with data on infrequent observations and find that you need to use Poisson regression. This is a form of regression analysis which is used to model count data, such as the number of event occurrences during a particular time period.

Watch our tutorial on Poisson regression models

Another method you may wish to explore is Bayesian analysis, a way of statistical modelling that treats all the unknowns in the analysis as random variables. We have a three-part tutorial that introduces the key concepts of linear regression models estimated using a Bayesian approach.

Watch our tutorial on Bayesian regression

Latent class analysis has been growing in interest over recent years, and we provide an introduction to this method in our three-video tutorial. Latent class analysis is a statistical method within the family of mixture models, which assume that a population is made up of sub-populations, or a mix of individuals.

Watch our tutorial on latent class analysis

We also suggest viewing our tutorials Latent Variable Models for Social Research and Introduction to Latent Transition Analysis.

Finally, if you’d like to try something new, you may like to explore structural equation modelling. This is not a single technique, but a framework that integrates several approaches, including regression modelling. It is particularly useful for addressing complex concepts that are difficult to measure.

Watch our tutorial on structural equation modelling