Introduction to Data Wrangling using R and tidyverse (online)

Date:

26/04/2023 - 27/04/2023

Organised by:

Nottingham Trent University

Presenter:

Mark Andrews, Associate Professor

Level:

Advanced (specialised prior knowledge)

Contact:

Kelly Smith, Commercial Manager
kelly.smith@ntu.ac.uk
0115 8484083

video conference logo

Venue: Online

Description:

Duration: 2 day online course

Course Module: Non-accredited

On this two-day course, you will gain a comprehensive practical introduction to data wrangling using R. In particular, we focus on tools provided by R's `tidyverse`, including `dplyr`, `tidyr`, `purrr`, etc.

Data wrangling is the art of taking raw and messy data and formating and cleaning it so that data analysis and visualization etc may be performed on it.

Done poorly, it can be a time consuming, labourious, and error-prone.

Fortunately, the tools provided by R's `tidyverse` allow us to do data wrangling in a fast, efficient, and high-level manner, which can have dramatic consequence for ease and speed with which we analyse data.  

This course is aimed at anyone who is involved in real world data analysis, where the raw data is messy and complex.

Data analysis of this kind is practiced widely throughout academic scientific research, as well as widely throughout the public and private sectors. 

You can sign up for this course at any point within the live period until 3 working days prior to the course commencing.

The course will cover these key topics: 

  • reading in data into R using tools such as readr and readxl 
  • wrangling with the powerful `dplyr` R package, focusing on filtering observations, selecting and modifying variables, and other major data manipulation operations 
  • summarising data in `dplyr` using descriptive statistics 
  • merging and joining data independent data frames  
  • pivoting and reshaping data using the `tidyr` R package 

During the course you’ll: 

  • gain a comprehensive practical introduction to data wrangling using R and its complementary tools and interrelated packages, such as tidyverse, dplyr, tidyr, and purr 
  • discover how to read data of different types into R, and cover in detail all the dplyr tools such as, select, filter, and mutate 
  • learn how to use pipe operator (%>%) to create data wrangling pipelines that take raw messy data on the one end and return cleaned tidy data at the other 
  • discover how to perform descriptive or summary statistics on data using dplyr's summarise and group_by functionalities 
  • learn how to combine data frames, including concatenating all data files in a folder and use SQL operations to merge information in different data frames. 
  • develop an understanding of how to "pivot" data from a "wide" to "long" format and back using tidyr's pivot_longer and pivot_wider 

What will I gain?   

By the end of the course, you’ll be able to read messy and unstructured data into R and apply the principles of data wrangling to convert these datasets into optimally structured formats. 

These data wrangling techniques will help with expediting data analysis tasks in a fast, efficiently robust, and to a high-level. 

On completion of at least 80% of the course, you’ll receive a certificate of attendance. 

Where you'll learn:?The course is delivered through interactive online workshops via Zoom.

It will be practical, hands-on, and workshop based.

There will be some brief lecture style presentations throughout, i.e., using slides or blackboard, to introduce and explain key concepts and theories.

Throughout the course, and we will use real-world data sets and coding examples. 

Tutor Profile: Mark Andrews is an Associate Professor at Nottingham Trent University whose research and teaching is focused on statistical methodology in research in the social and biological sciences.

He is the author of 2021 textbook on data science using R that is aimed at scientific researchers, and has a forthcoming new textbook on statistics and data science that is aimed at undergraduates in science courses.

His background is in computational cognitive science and mathematical psychology.  

Any questions?  Contact kelly.smith@ntu.ac.uk, Commercial Manager, School of Social Sciences 

Other available online CPD courses in this series include 

Introduction to statistics using R and Rstudio CPD course  

Introduction to Data Visualization with R using ggplot  

Introduction to Generalized Linear Models in R 

Introduction to Multilevel (hierarchical, or mixed effects) Models in R 

Introduction to Bayesian Data Analysis with R 

Cost:

£360.00, fee includes VAT

Website and registration:

Region:

East Midlands

Keywords:

Frameworks for Research and Research Designs, Data Collection, Data Quality and Data Management , Quantitative Data Handling and Data Analysis, ICT and Software, Research Skills, Communication and Dissemination, RStats, R's `tidyverse`, including `dplyr`, `tidyr`, `purrr, SQL

Related publications and presentations:

Frameworks for Research and Research Designs
Data Collection
Data Quality and Data Management
Quantitative Data Handling and Data Analysis
ICT and Software
Research Skills, Communication and Dissemination

Back to archive...