Web-scraping with Python and Introduction to text data with Python

Date:

24/04/2025 - 25/04/2025

Organised by:

University of Exeter (an NCRM Centre Partner)

Presenter:

Mariam Cook

Level:

Intermediate (some prior knowledge)

Contact:

Hannah Grant
Department Manager, SPSPA
H.G.Grant@exeter.ac.uk

Map:

View in Google Maps (EX4 4PE)

Venue:

Clayden Computer Lab
Clayden Building
University of Exeter (Streatham campus)
Streatham Rise
Exeter

Description:

Technological advancements have not only driven the digitisation of society and the emergence of novel socio-political issues, but have also resulted in significant developments in algorithms, computational power, and increasingly large datasets.

This practical-based face to face session will be delivered over two days and will provide you with both the technical programming skills and understanding of data science techniques that you will need to research pre-existing and novel social-political and economic issues and the kind of transferable skills that are currently in demand in the job market.

Text data surrounds us in our lives and comes in different shapes and sizes, e.g. newspaper articles, tweets, product reviews, song lyrics, etc. While it might seem at first glance that this information can hardly be summarized and compared, certain computational techniques allow extracting meaningful information from text data. This course provides the foundations for you to understand, execute and communicate text data analysis in a widely recognised software platform that was built for data analysis. Specifically, it will introduce additional skills using the Python programming language and requires either prior introductory experience with Python or participation in Introduction to Python for Data Analysis on 22nd and 23rd April 2025.

This course covers:

Web scraping with Python

Introduction to Google Colab (students need a functioning gmail/google account they can log into)
Pandas dataframes and uploading external data to Colab
How to scrape a web page and extract text with Beautiful Soup
How to analyse and visualise text content using the Seaborn library

Introduction to Text Data with Python

Text preprocessing
Bag of words modelling and count vectorizer
Lexicon based sentiment analysis using spacy
Comparative visualisation

By the end of this course:

Participants will be able to use Google Colab for collaborative data science projects, and
have improved their Python skills and be able to import and evaluate text data

Presenter:

Mariam Cook's research looks at the application of Advanced Quantitative Methods to support scalable democratic participation and policy making that incorporates wellbeing and sustainability outcome tracking and goal setting. She has applied Natural Language Processing (NLP) in various projects over the last ten years.

Computer workshops:

Students need a functioning gmail/google account they can log into. Students can use the computers in the lab or bring their own laptops.

Pre-requisites:

Basic Python or completion of Introduction to Python for Data Analysis.

Cost:

The fee per teaching day is £60 per day for students / £150 per day for staff working for academic institutions, Research Councils and other recognised research institutions, registered charity organisations and the public sector / £350 per day for all other participants.

In the event of cancellation by the delegate a full refund of the course fee is available up to two weeks prior to the course. NO refunds are available after this date.

If it is no longer possible to run a course due to circumstances beyond its control, NCRM reserves the right to cancel the course at its sole discretion at any time prior to the event. In this event every effort will be made to reschedule the course. If this is not possible or the new date is inconvenient a full refund of the course fee will be given. NCRM shall not be liable for any costs, losses or expenses that may be incurred as a result of its cancellation of a course, including but not limited to any travel or accommodation costs. The University of Southampton’s Online Store T&Cs also continue to apply.

Website and registration:

Region:

South West

Keywords:

Secondary Analysis, Corpus Analysis, Python, Natural Language Processing, Text data

Related publications and presentations from our eprints archive:

Secondary Analysis
Corpus Analysis
Python

Back to the training database