Clean Thembisa Data: A New App and R Package

Public Health
R
Published

October 1, 2025

The Thembisa model is the leading source of demographic and HIV estimates for South Africa. It provides projections that underpin planning, policy, and research in the fields of epidemiology and public health.

Yet many analysts who work with Thembisa data will be familiar with a recurring challenge. While the model produces rich outputs, the raw data often require cleaning and reshaping before they are ready for analysis or visualisation.

To address this, I have built two complementary tools:

Clean Thembisa Data — an interactive Shiny application that makes it easy to filter, search and download Thembisa outputs without writing code. thembisaR — an R package that is used by the Clean Thembisa Data app for reading and processing Thembisa data.

The Shiny App: Clean Thembisa Data

For many users, the quickest way to get started is the web-based app: Clean Thembisa Data

With this tool, you can:

  • Filter and search the Age Specific National and Provincial Outputs (V4.8)
  • Download a subset of the data as an Excel or CSV file.
  • Download the entire dataset as a CSV.

This lowers the barrier for colleagues who do not work directly with R, while still ensuring consistent and reproducible handling of Thembisa outputs.

thembisaR: An R Package for Thembisa Data

For those working directly in R, the thembisaR package — which also powers the Clean Thembisa Data app — provides functions to read and clean Thembisa outputs. By default, the Notes tab is excluded during processing.

A simple example:

remotes::install_github("KirstinLyon/thembisaR")
library(thembisaR)

# Read and clean Thembisa output for age specific files
all_age_specific_data <- thembisaR::read_sex_age_specific_file("thembisa_output.csv")

# Read and clean Thembisa provincial output for either HIV or TB
all_prov_output <- thembisaR::read_prov_output("thembisa_prov_output.csv")

This makes it straightforward to integrate Thembisa data into existing epidemiological workflows and statistical analyses.

Why This Matters

Reliable, timely use of Thembisa outputs is critical for planning and monitoring in South Africa’s HIV response. By lowering the data-preparation burden, we hope these tools can:

  • Free up time for deeper analysis.
  • Improve consistency across teams.
  • Support both technical and non-technical users.

Getting Started

Try the app: Clean Thembisa Data Install the package: thembisaR

Both tools are open-source and freely available. Contributions, feedback, and feature requests are welcome.

Closing Note

Working with Thembisa data should be about analysis and insight, not wrestling with file formats. Whether you prefer a point-and-click interface or reproducible R scripts, Clean Thembisa Data and thembisaR are here to make your work easier.