AI-based carbon footprint calculation system
Scope 3 emissions account for 70–90% of a typical company's carbon footprint. Calculating them accurately without automation is impossible: Scope 1 (purchased goods and services) requires data from hundreds of suppliers, while Scope 11 (usage of sold products) requires an understanding of customer consumption patterns. Manual calculations performed annually yield an error of 35–50%. The ML pipeline calculates continuously with an error of 12–20%.
Architecture of the calculation engine
Hierarchy of calculation methods
The GHG Protocol allows three methods for Scope 3 Category 1:
| Метод | Точность | Требования к данным |
|---|---|---|
| Spend-based | ±40% | Расходы + EEIO факторы |
| Average-data | ±25% | Вес/объём закупок + emission intensity |
| Supplier-specific | ±10% | PCF данные от поставщика |
Machine learning task: automatically select the best available method for each procurement line and calculate the emission factor. The classifier (LightGBM) determines the method based on available data. If the supplier has a PCF, it uses supplier-specific. If there is a weight, it uses average-data. Otherwise, it uses spend-based pricing with EXIOBASE 3.8 EEIO tables.
Parsing of invoices and documents
80% of activity data comes in the form of PDF invoices. Document AI pipeline: - LayoutLMv3 (Microsoft) — a multimodal model for structured extraction from documents - Extracted fields: supplier_name, line_item_description, quantity, unit, unit_price, total - NER + classification by HS code (Harmonized System) → emission factor lookup - Extraction accuracy: 93% on a test dataset of invoices from 8 industries
Deployment: Azure Form Recognizer or self-hosted TorchServe. Processing 10,000 documents per day on 2×A10G GPUs, latency of 1.8 seconds per document.
Calculation of Scope 1 and Scope 2
Scope 1: Direct Emissions
Sources: fuel combustion (boilers, generators, corporate vehicles), industrial processes (welding, chemical reactions), refrigerant leaks (F-gases). SCADA/EMS integration: fuel consumption from meters → multiplication by IPCC AR5/AR6 emission factors.
ML anomaly: if boiler gas consumption on a weekend is >150% of the average weekend for the previous year, trigger an alert. LSTM Autoencoder on hourly data, trained on two years of normal readings.
Scope 2: Purchased Energy
Location-based method: kWh consumption × regional emission factor (IEA, Ember, AIB for Europe). Market-based: Guarantees of Origin (GO), RECs, and Power Purchase Agreements are subtracted from the calculation.
Automation: integration with energy supply companies' personal accounts (API or web scraping) for monthly consumption data updates without manual input.
Forecasting and scenario analysis
Net-zero pathway modeling
The company's SBTi (Science Based Targets Initiative) goal is to reduce Scope 1+2 by 46% by 2030. ML component: time-series forecasting (Temporal Fusion Transformer) baseline emissions + scenario analysis: - Business as usual - Renewables transition (solar/wind PPA) - Fleet electrification (EV conversion schedule) - Supplier engagement (top 20 emissions → PCF data requirement)
For each scenario: NPV of decarbonization investments vs. cost of carbon (EU ETS price + regulatory risk).
Internal carbon pricing
The shadow carbon price ($50–150/tCO2e) is applied to investment decisions. The ML module automatically calculates the carbon cost for capex projects from ERP data (equipment → lifecycle emissions according to the Ecoinvent database).
Integration with carbon markets
Carbon credit verification: offset quality check using the Gold Standard, VCS (Verra) database. The ML classifier assesses double-counting and permanence risks of forestry projects (satellite imagery + forest degradation using NDVI time series).
Automatic registry accounting: API integration with Xpansiv CBL, Gold Standard Registry for tracking retire/cancel transactions with credits.
Stack
Storage: Snowflake with dbt transformations for the ESG model. Computation: Python (pandas, pyCO2SYS for marine emissions). ML: scikit-learn, LightGBM, PyTorch for time-series. Document AI: LayoutLMv3, Hugging Face Transformers. Orchestration: Apache Airflow.
Development time: 3–6 months for the basic calculation engine with automatic data import. Full Scope 1-2-3 with document AI and scenario analysis: 6–10 months.







