Development of an AI system for predictive equipment maintenance
An enterprise-class predictive maintenance system covers the entire lifecycle: from connecting sensors via an IoT gateway to CMMS integration and financial ROI calculations. This isn't a single ML solution, but a production platform with thousands of assets and 24/7 reliability requirements.
Platform architecture
System levels:
Уровень 0: Edge (на оборудовании)
Модули: vibration sensor, temperature sensor, current meter
Протокол: Modbus RTU / OPC-UA
Edge gateway: Raspberry Pi / Industrial PC
Уровень 1: Fog (цеховой уровень)
OPC-UA Server → MQTT broker → Edge computing node
Локальное хранение и первичная обработка
Уровень 2: Cloud (корпоративный уровень)
Kafka → TimescaleDB / InfluxDB
ML Training Pipeline (Airflow + MLflow)
Inference Service (FastAPI)
Уровень 3: Business
CMMS / ERP интеграция
KPI Dashboard (Grafana / Tableau)
Mobile app для техников
Asset Registry:
@dataclass
class Asset:
asset_id: str
name: str
type: AssetType # motor, pump, compressor, conveyor, gearbox
manufacturer: str
model: str
install_date: datetime
rated_power_kw: float
location: dict # plant, line, cell
criticality: int # 1-5 (5 = most critical)
sensors: list[SensorConfig]
maintenance_history: list[WorkOrder]
failure_modes: list[FailureMode] # из FMEA документа
FMEA-driven Failure Modes
Failure Mode and Effects Analysis:
# Для каждого типа оборудования — список ожидаемых отказов и их сигнатуры
failure_modes_motor = [
FailureMode(
name='bearing_outer_race_defect',
detection_method='vibration_envelope_bpfo',
leading_indicators=['kurtosis > 3', 'bpfo_amplitude_rise'],
typical_development_days=30,
severity=4
),
FailureMode(
name='stator_winding_degradation',
detection_method='motor_current_signature_mcsa',
leading_indicators=['current_imbalance > 5%', 'sideband_frequencies'],
typical_development_days=60,
severity=5
),
FailureMode(
name='misalignment',
detection_method='vibration_1x_2x',
leading_indicators=['high_1x_radial', '2x_axial_component'],
typical_development_days=14,
severity=3
)
]
Multi-Asset Health Index
Hierarchical Health:
def calculate_plant_health(plant_id, asset_registry, health_scores):
"""
Иерархия: датчик → актив → линия → цех → завод
Health Index = взвешенное по критичности
"""
plant_assets = [a for a in asset_registry if a.location['plant'] == plant_id]
weighted_health = 0
total_weight = 0
for asset in plant_assets:
asset_health = health_scores.get(asset.asset_id, 1.0)
weight = asset.criticality
weighted_health += asset_health * weight
total_weight += weight
return weighted_health / total_weight if total_weight > 0 else 1.0
def health_to_color(health_index):
if health_index >= 0.8: return 'green'
elif health_index >= 0.6: return 'yellow'
elif health_index >= 0.4: return 'orange'
else: return 'red'
Ensemble Health Model
Fusion of multiple indicators:
class AssetHealthEnsemble:
def __init__(self, failure_modes, weights=None):
self.failure_modes = failure_modes
self.models = {fm.name: load_model(fm) for fm in failure_modes}
# Веса по severity
self.weights = weights or {
fm.name: fm.severity for fm in failure_modes
}
def compute_health(self, sensor_data):
"""
Для каждого failure mode — своя ML-модель
Итоговый Health Index = взвешенное по severity
"""
fm_scores = {}
for fm_name, model in self.models.items():
features = extract_features_for_fm(sensor_data, fm_name)
failure_prob = model.predict_proba([features])[0][1]
fm_scores[fm_name] = 1.0 - failure_prob # health = 1 - failure_prob
# Взвешенное среднее, ограниченное минимумом (слабое звено)
weighted_health = sum(
score * self.weights[name]
for name, score in fm_scores.items()
) / sum(self.weights.values())
# Агрессивная пессимизация: критический дефект снижает общий health
min_score = min(fm_scores.values())
if min_score < 0.3:
weighted_health = min(weighted_health, min_score * 1.5)
return weighted_health, fm_scores
Maintenance - Optimization
Optimal Maintenance Timing:
from scipy.optimize import minimize_scalar
def optimal_maintenance_time(rul_distribution, maintenance_cost, failure_cost, holding_cost_per_day):
"""
Найти оптимальный момент ТО, минимизирующий ожидаемые затраты
Слишком рано = лишние расходы на ТО
Слишком поздно = риск отказа
"""
def expected_cost(t_maintenance):
# Вероятность отказа до момента ТО
p_failure_before_maintenance = rul_distribution.cdf(t_maintenance)
# Ожидаемые затраты
cost_if_maintain = maintenance_cost + t_maintenance * holding_cost_per_day
cost_if_fail = failure_cost * p_failure_before_maintenance
return cost_if_maintain * (1 - p_failure_before_maintenance) + cost_if_fail
result = minimize_scalar(expected_cost, bounds=(1, 180), method='bounded')
return result.x # оптимальное количество дней до ТО
Work Order Generation:
def auto_create_work_order(asset, health_score, rul_days, failure_mode, cmms_client):
"""
Автоматическое создание Work Order при достижении порогов
"""
priority = determine_priority(health_score, rul_days, asset.criticality)
wo = WorkOrder(
asset_id=asset.asset_id,
description=f"PdM Alert: {failure_mode} detected. Health={health_score:.1%}, RUL={rul_days:.0f}d",
priority=priority,
type='predictive_maintenance',
estimated_labor_hours=labor_time_db[failure_mode],
required_parts=spare_parts_db[failure_mode],
scheduled_date=datetime.now() + timedelta(days=max(1, rul_days * 0.7))
)
return cmms_client.create_work_order(wo)
KPIs and ROI
Key Metrics:
def calculate_pdm_kpis(period_data):
return {
'overall_equipment_effectiveness': calculate_oee(period_data), # OEE
'unplanned_downtime_hours': period_data['unplanned_stops'].sum(),
'mtbf': calculate_mtbf(period_data), # Mean Time Between Failures
'mttr': calculate_mttr(period_data), # Mean Time to Repair
'maintenance_cost_per_unit': period_data['maintenance_cost'].sum() / period_data['production'].sum(),
'false_positive_rate': period_data['false_alerts'] / period_data['total_alerts'],
'detection_lead_time_days': period_data['advance_warning_days'].mean()
}
def calculate_roi(baseline_kpis, pdm_kpis, implementation_cost):
downtime_reduction = baseline_kpis['unplanned_downtime_hours'] - pdm_kpis['unplanned_downtime_hours']
downtime_value = downtime_reduction * cost_per_downtime_hour
maintenance_savings = baseline_kpis['maintenance_cost'] - pdm_kpis['maintenance_cost']
total_benefit = downtime_value + maintenance_savings
roi_percent = (total_benefit - implementation_cost) / implementation_cost * 100
payback_months = implementation_cost / (total_benefit / 12)
return {'roi_percent': roi_percent, 'payback_months': payback_months}
Timeframe: IoT gateway connection, asset registry, and basic health score dashboard — 5-6 weeks. Multi-asset ensemble model, FMEA-driven failure modes, CMMS automation, maintenance time optimization, and ROI tracking — 5-6 months. Enterprise platform with thousands of assets and a mobile app for technicians — 8-10 months.







