Skip to content
Back to projects
Time Series Project School Project

Heat-Pollution Index for Pollen Allergy Risk

A multi-city statistical time series study testing whether a composite Heat-Pollution Index (HPI = T × (O3 + NO2)/2) carries information that raw pollen counts miss — validated across 4 French cities and corroborated by Ameli antihistamine reimbursements.

DataML
Explore Code

Business Context

Pollen allergies affect 30% of French adults and 40% of Europeans, yet current alert systems (RNSA, European bulletins) rely on a single metric: airborne pollen concentration in grains/m³. The Melbourne 2016 thunderstorm-asthma event (3,300 ER visits, 10 deaths in 30h at equal pollen load) proved that environmental context — heat and pollution — dramatically modulates allergic risk.

Strategic Problem

Does pollen concentration alone suffice to characterise allergic risk, or does a composite index combining heat and pollution carry additional, measurable information consistent with observed health data?

Data Sources

Four merged sources on date × city key: Open-Meteo Air Quality (6 pollen species + O3, NO2, PM2.5 backed by CAMS/Copernicus), Open-Meteo Historical Weather (ERA5-Land reanalysis 1980–2024), Open Medic / Ameli (R06A antihistamine reimbursements, monthly × region, 2022–2024 + 2014 anchor), INSEE population data. 1,096 daily observations across 4 cities (Paris, Marseille, Bordeaux, Strasbourg) and 192 monthly Ameli points.

Methodology

Built HPI_c,t = T*_c,t × (O3* + NO2*)/2 with city-by-city min-max normalisation — deliberately multiplicative (requires heat AND pollution simultaneously high; reciprocal amplification, not additive). Modelled a chain of 5 progressively richer specifications: Naive → ARIMA → +Fourier → ARIMAX → +HPI, justified by Ljung-Box (residual autocorrelation), ADF+KPSS (stationarity), AICc (small-sample), log(1+x) transform (skewness up to 19.2), and Spearman lag tests. Compared three model families: M1 weather-only baseline, M2 HPI-only, M3 weather+HPI. Train 2022 → Validation 2023 (HP tuning) → Test 2024 one-shot. Validated with expanding-window CV, residual bootstrap (B=500), skill score, and Friedman + Nemenyi non-parametric rank tests across all 12 city × species datasets.

Key Results

H1 validated — top decile of |residuals| from the pollen-only baseline concentrates on high-HPI days across all 4 cities (98–99% of MSE lands in the active season). H2 partial — M1 still wins in 8/12 cases, but Friedman χ²=11.49, p=0.0215 rejects equivalence; HPI adds complementary information in combination (M3). H3 validated — gains concentrate in chronically polluted cities (Paris ΔR²=+0.045, Strasbourg +0.053), not in Marseille despite its highest mean HPI: the chronic NO2 share, not Mediterranean heat, drives the signal. H4 directional — same inter-city hierarchy holds in monthly R06A regression (n=144), preserved in the 2014 temporal anchor (+18.8% baseline over a decade). Headline: +30% R06A consumption when pollen AND HPI co-occur vs baseline; 'high HPI alone' triggers nothing (3.93M ≈ 3.94M) — interactive amplification, not additive.

Business Impact

RNSA bulletins built on raw pollen concentration structurally underestimate allergic risk in chronically polluted cities — precisely where it matters most. A multi-criteria alert system integrating the heat-pollution context is not a refinement but a public-health necessity. Both ingredients (temperature +1.4 to +1.9°C since 1980, O3 rising everywhere via photochemical feedback) are trending upward — HPI grows at α + β + αβ, super-additively.

Contributors

ASAnna SpiraSNSacha NardouxENEnzo NataliKCKeira Chang