Introduction to Health Data Science
Foundational data-science training for public-health students (IPH)
By Arun Mitra in Teaching Health Data Science R
January 1, 2025
Background
Public-health students increasingly need data-science skills to make sense of routine and outbreak data, but many begin with no programming background. This foundational teaching module, developed for students at the Institute of Public Health (IPH), introduces health data science from first principles with an emphasis on reproducible, applied analysis.
Approach
The module is taught in R using the tidyverse, combining structured exercises with collaborative group activities. It covers data import and cleaning, exploratory data analysis (EDA), ggplot2 visualisation, and mapping with sf. A worked COVID-19 Kerala deaths dataset — covering deaths across the 14 districts of Kerala — runs through the module as a real-world case study, anchoring concepts in a familiar public-health context.
What we found
Participant learning outcomes include the ability to:
- Import, clean, and wrangle health data with the tidyverse.
- Conduct exploratory data analysis and summarise findings.
- Build reproducible analysis workflows in R.
- Apply data-science skills to a real public-health dataset.
Outputs & impact
The module produced teaching code, exercise sets, group activities, output tables, and a worked COVID-19 Kerala analysis. It was delivered as part of training at the Institute of Public Health (IPH). Specific cohorts and dates are to be confirmed.
- Posted on:
- January 1, 2025
- Length:
- 1 minute read, 197 words
- Categories:
- Teaching Health Data Science R