Forecasting Ground-Level Ozone Exceedances Using Same-Day and Previous-Day Air Quality Measurements
Clerence Mathonsi , Muofhe Masikhwa
Partner: ggm
Year: 2026
Abstract:
This study evaluates the feasibility of forecasting ground-level ozone exceedances on the South African Highveld using only routinely collected air quality monitoring data from the previous day and the morning hours of the forecast day. A harmonised record of 648,388 hourly observations from 37 stations across five regions was analysed. Forty-three engineered features (previous-day and morning pollutant/meteorology, calendar/seasonal, episode persistence) were evaluated at multiple cutoff hours. Five model families (persistence, ridge, logistic, random forest, XGBoost) were trained with TimeSeriesSplit and tested on 2024-2025 data. Regression improved with later cutoffs (ridge: R2=0.71 at cutoff 12); classification achieved strong performance (random forest at cutoff 12: accuracy=0.95, precision=0.85, recall=0.81, F1=0.83, AUC=0.97), while XGBoost maximised recall (0.89 at cutoff 12), the priority for health alerts. SHAP analysis shows a shift from previous-day O3 dominance at early cutoffs to morning O3 dominance by cutoff 12. The study demonstrates that operationally useful exceedance forecasts can be produced without atmospheric model output.