ILRI PhD Graduate Fellowship: Genomic Data Modelling for Public Health Response Using Wastewater Genomic and Clinical Data.
The position
Advances in wastewater-based epidemiology (WBE) and metagenomic sequencing are transforming how population-level pathogen dynamics and antimicrobial resistance (AMR) are monitored. This project builds on a large-scale longitudinal dataset generated from 30 urban wastewater sampling sites, capturing high-resolution microbial and resistome profiles over time.
While these datasets provide rich biological insights, their true value lies in their ability to inform timely public health action. Integrating genomic signals from wastewater with clinical surveillance data presents a unique opportunity to develop data-driven predictive, genomic-informed models that can detect emerging threats, anticipate outbreaks, and guide intervention strategies.
This PhD will focus on developing genomic data-driven modelling frameworks that translate complex metagenomic and clinical datasets into actionable intelligence for public health response systems.
Terms of reference
- Curate, preprocess, and harmonize large-scale metagenomic (pathogen and resistome) and clinical datasets.
- Develop genomic-informed predictive models that link pathogen abundance and AMR gene dynamics to public health indicators such as disease incidence/prevalence.
- Apply and compare AI/ML and statistical modelling approaches (e.g., time-series models, LSTM, Bayesian hierarchical models, random forests).
- Identify genomic signatures and covariates (environmental, socioeconomic, wastewater-derived, epidemiological factors) that provide early signals of outbreaks or AMR shifts.
- Model spatiotemporal dynamics of pathogen transmission using integrated environmental and clinical data.
- Evaluate the public health utility of models, including sensitivity, timeliness, and interpretability for decision-making.
- Translate modelling outputs into operational insights, including thresholds, alerts, and risk indicators usable by public health agencies.
- Collaborate with epidemiologists and public health stakeholders to ensure policy relevance and usability of model outputs.
- Contribute to the development of reproducible analytical pipelines for genomic data modelling.
- Publish findings in peer-reviewed journals and contribute to policy briefs and technical guidance.
Minimum requirements for the ideal candidate
- Master’s degree in bioinformatics, Computational Biology, Data Science, Genomics, or related field.
- Proficiency in Python and/or R programming for data manipulation and statistical analysis, including libraries such as pandas, scikit-learn, PyTorch or Tensorflow.
- Experience with model evaluation techniques including cross-validation, performance metrics (RMSE, AUC, precision/recall) and statistical testing.
- Proficiency with workflow orchestration and scheduling software (e.g Nextflow, Slurm, Snakemake, WDL).
- Experience with containerization systems such as Docker and Apptainer/Singularity.
- Proficiency in version control and collaboration Git and Github
- Experience with genomic or metagenomic data analysis.
- Demonstrated experience with machine learning or statistical modelling.
- Understanding of infectious disease dynamics or epidemiological modelling is an advantage.
Method of application
If you are interested and qualified, kindly submit your application via the link provided below,
Deadline 03/06/2026