Development of an AI Air Quality Monitoring System
Air quality monitoring is not just about placing sensors. It's a task of integrating data from different sources, spatial interpolation, forecasting, and delivering information to residents in understandable format. ML solves tasks that deterministic models handle poorly: spatial interpolation, source detection, short-term forecasting.
System Architecture
Data → Processing → Storage → Analytics → Visualization
Data:
├── Government stations (Roshydromet, FBU TsLM)
├── IoT sensors (own/partner)
├── Satellite (Sentinel-5P, MODIS)
├── Mobile stations (cars, bikes)
└── NWP forecasts (Roshydromet API, Open-Meteo)
Processing:
├── LCS calibration (low-cost sensors)
├── Quality control (QA/QC)
├── Spatial interpolation
└── Air quality forecast
Storage:
└── TimescaleDB (temporal) + PostGIS (spatial)
Analytics:
├── AQI calculation
├── Trend analysis
├── Source attribution
└── Health impact estimation
Visualization:
└── Web portal + mobile app
Spatial Interpolation
There are always fewer stations than needed. For air quality map at 100 m level, interpolation is required:
Standard interpolation (Kriging): Works well with homogeneous pollution. Poorly with local sources (plant, road).
ML interpolation:
def spatial_air_quality_model(station_readings, spatial_covariates):
"""
Train on station_readings
Predict for entire city grid 100×100 m
"""
X = pd.merge(station_readings, spatial_covariates, on=['lat', 'lon'])
# Spatial features
X['distance_to_highway'] = ...
X['distance_to_industry'] = ...
X['ndvi'] = ... # vegetation
X['building_density'] = ... # building density
model = XGBRegressor().fit(X, X['pm25'])
return model
# Prediction for entire city grid
grid = create_city_grid(city_boundary, resolution=100)
grid['predicted_pm25'] = model.predict(grid[feature_cols])
Deep Learning for spatial mapping: U-Net with multispectral satellite imagery + station readings → PM2.5 map at 30-100 m resolution. Training on simultaneous station data and satellite imagery.
Air Quality Forecasting
Main factors:
- Meteorology: wind (speed and direction determine transport), atmospheric stability (mixing height), precipitation (PM washout)
- Sources: industrial emissions, transport, heating
- Photochemistry: O3 and secondary particle formation (PM2.5)—depends on temperature and solar radiation
LSTM + Weather Attention model:
class AirQualityForecastModel(nn.Module):
def __init__(self):
self.pollutant_encoder = LSTM(n_pollutants, 64)
self.weather_encoder = LSTM(n_weather_vars, 64)
self.cross_attention = CrossAttention(64, 64)
self.decoder = nn.Linear(128, n_pollutants * forecast_hours)
Horizon: 24/48/72 hours. Achievable MAPE: < 15% for 48-hour PM2.5 forecast.
Air Quality Index (AKI/AQI)
AKI calculation by PMR:
def calculate_aki(concentrations: dict) -> float:
"""
AKI = Σ (C_i / PDK_i_ss) for i pollutants
At AKI < 5 — standard air quality
"""
aki = 0
for pollutant, conc in concentrations.items():
pdk = PDK_MEAN_DAILY[pollutant]
aki += conc / pdk
return aki
Color coding:
- Green: AKI < 5 (normal)
- Yellow: 5-7 (slight pollution)
- Orange: 7-14 (moderate)
- Red: > 14 (high, dangerous to health)
Mobile App for Residents
Functions:
- Current AQI at geolocation point
- City air quality map
- AQI forecast for 24/48 hours
- Recommendations: safe to walk/exercise
- Alerts when thresholds exceeded
Personalized recommendations:
- Asthmatics / allergics: stricter notification threshold
- Cyclists: optimal time/route considering AQI
- Parents with children: playground quality index
Source Attribution
Positive Matrix Factorization (PMF): Decompose PM2.5 chemical composition spectrum into sources: industry, transport, residential heating, natural (sea salt, dust).
from scipy.optimize import nnls
# G = F × C (observations = sources × contributions)
# PMF minimizes weighted sum of squared residuals
# subject to non-negativity of F and C
EPA PMF 5.0—official receptor modeling tool.
Result: "30% PM2.5 in this city from metallurgical emissions, 40% from transport, 20% from residential heating". This is the basis for regulatory decisions.
Timeline: basic IoT network + AQI calculation + map + mobile app—8-10 weeks. System with ML forecasting, source attribution and regulatory API—4-5 months.







