Introduction to Health Data Science

Foundational data-science training for public-health students (IPH)

By Arun Mitra in Teaching Health Data Science R

January 1, 2025

Background

Public-health students increasingly need data-science skills to make sense of routine and outbreak data, but many begin with no programming background. This foundational teaching module, developed for students at the Institute of Public Health (IPH), introduces health data science from first principles with an emphasis on reproducible, applied analysis.

Approach

The module is taught in R using the tidyverse, combining structured exercises with collaborative group activities. It covers data import and cleaning, exploratory data analysis (EDA), ggplot2 visualisation, and mapping with sf. A worked COVID-19 Kerala deaths dataset — covering deaths across the 14 districts of Kerala — runs through the module as a real-world case study, anchoring concepts in a familiar public-health context.

What we found

Participant learning outcomes include the ability to:

  • Import, clean, and wrangle health data with the tidyverse.
  • Conduct exploratory data analysis and summarise findings.
  • Build reproducible analysis workflows in R.
  • Apply data-science skills to a real public-health dataset.

Outputs & impact

The module produced teaching code, exercise sets, group activities, output tables, and a worked COVID-19 Kerala analysis. It was delivered as part of training at the Institute of Public Health (IPH). Specific cohorts and dates are to be confirmed.

Posted on:
January 1, 2025
Length:
1 minute read, 197 words
Categories:
Teaching Health Data Science R
Tags:
health data science R tidyverse teaching
See Also:
Three-Day Workshop on Reproducible and AI-aided Health Data Analysis at IQRAA
Normative Heart Rate Variability Across Age and Gender in Healthy South Indian Adults
Exploring Spatial Clusters of Caesarean Sections across India