MIT 808: Big Data Science Capstone Project
Welcome to the University of Pretoria MIT808: Big Data Science Capstone Project online exhibition
2026 Exhibition
The 2026 Capstone Exhibition showcases 21 impactful student projects that tackle urgent and diverse challenges using cutting-edge Data Science techniques. This year, 42 students worked in interdisciplinary teams with real-world partners to create solutions spanning climate policy, air quality forecasting, population estimation, parliamentary intelligence, cancer genomics, African NLP governance, and African language AI.
You can also explore the organisations and researchers we collaborated with via the Partners tab.
π 2026 Project Overviews
π African Climate NLP
Five teams applied NLP and LLMs to African climate governance β analysing policy documents across UNFCCC submissions, South African parliamentary proceedings, and SADC-region policy frameworks. Topics include corpus-based thematic classification, contextual bias in LLM-generated recommendations, multilingual discourse asymmetries, and identification of colonial framing.
π©οΈ UAV & Population Estimation
Three teams used drone and aerial imagery combined with deep learning (U-Net, DeepLabV3) and Bayesian/Gaussian Process regression to estimate population in the Melusi informal settlement in Atteridgeville. A scalable, cost-effective alternative to traditional census methods.
π¨ Air Quality Forecasting
Three teams developed machine learning pipelines to predict ground-level ozone exceedances on the South African Highveld β from same-day alerts (Random Forest, XGBoost) to 3-hour lead-time forecasting in Secunda and 24-hour Highveld-wide predictions with Streamlit-based alert systems.
ποΈ Parliamentary Intelligence
Three teams built AI-powered tools for monitoring South African parliamentary activity at scale β supporting investigative journalists with topic clustering, intent classification, abstractive summarisation, and RAG-assisted search across PMG data.
βοΈ African NLP Governance
Three teams audited copyright compliance and licensing risk across 249 African NLP datasets using rule-based scoring, machine learning classifiers, and unsupervised clustering to reveal systemic governance gaps. Interactive Streamlit dashboards provide real-time risk auditing.
π¬ Prostate Cancer Decision Support
Three teams built clinical dashboards for the South African Prostate Cancer Study (SAPCS) β covering risk stratification, molecular driver visualisation, and genomic integrity profiling using whole-genome sequencing data.
π£οΈ African Language AI
One team evaluated open-access LLMs for Sepedi translation to power a rabies awareness chatbot, addressing late-stage reporting in communities with limited access to health education in their home language.
Archive β Previous Cohorts
| Year | Projects | Link |
|---|---|---|
| 2025 | 9 projects, 18 students | View 2025 |
| 2024 | β | View 2024 |
| 2023 | β | View 2023 |
| 2022 | β | View 2022 |
| 2021 | β | View 2021 |
| 2020 | β | View 2020 |
MIT 808 Information
MIT808 is taught at the University of Pretoria as part of the Masters in IT (Big Data Science) programme. Students carry out a Data Science Capstone project that integrates the theoretical work from their first year of study. The module is taught by Dr Olaperi Okuboyejo, Dr Abiodun Modupe, and Prof Vukosi Marivate.
More information:
- π MIT808 Public Website
- π§βπ MIT Big Data Science Programme
- π¬ Mailing List for Updates
βοΈ Feedback and Contact
We'd love to hear from you!
π© dsfsi.info@up.ac.za
Organizers
Thapelo Sindane
Web Development Assistant
University of Pretoria
Neo Mokono
Web Development Assistant
University of Pretoria
Fiskani Banda
Web Development Assistant
University of Pretoria
Richard Lastrucci
Web Development Assistant
University of Pretoria
Keabetswe Madumo
Web Development Assistant
University of Pretoria