Anomaly Detection: Autoencoders, Isolation Forest, PyOD
Server monitoring shows CPU 85%, memory 91% — normal at peak hours. Or attack beginning? Classifier won't help: anomalies by definition rare, diverse, unlabeled. Supervised requires anomaly examples in training — doesn't work for unknown unknowns.
Anomaly Detection Difficulty
Main problem — no labels and extreme imbalance. Fraud transactions 0.01–0.1% of total. Manufacturing defects 0.5–3%. Even naive "all normal" classifier gives 99.9% accuracy with terrible anomaly metrics.
Second — defining "normal." Is 3am login normal? Depends on user history and timezone. Normal bearing vibration 2.3 mm/s? Depends on machinery mode and age. Context critical.
Third — evaluation without ground truth. No standard test set, AUC-ROC only with some labeled examples. Fully unlabeled — only domain expert validation and indirect metrics.
Methods and Tools
Isolation Forest — standard baseline for tabular data. Idea: anomalies isolated faster via random splits. Works well at contamination 0.01–0.1, robust to scale, requires no normalization. scikit-learn: IsolationForest.
Common mistake: set contamination='auto' (scikit default) blindly. Auto assumes -0.5 threshold, not matching real anomaly %. Better: estimate expected anomaly percent via domain knowledge, set explicitly.
PyOD (Python Outlier Detection) — library with 40+ algorithms under unified API. Includes: OCSVM, LOF, COPOD, ECOD, DeepSVDD, AutoEncoder. Quick method comparison on same data.
Autoencoders — main method for unstructured (time series, images, logs). Idea: train network to reconstruct normal data, anomalies give high reconstruction error. Anomaly threshold — 95th or 99th percentile of error on clean validation.
Practical problem: autoencoders overfit to "normal" patterns still rarely seen. If train set has anomalies, may reconstruct them well. Solution: careful training data cleaning or VAE for better generalization.
LSTMAE for time series — LSTM autoencoder captures temporal dependencies better than standard AE. Especially effective for multivariate series (10+ sensors). PyTorch, train with MSELoss on sliding windows.
Detailed: Industrial Time Series Anomaly Detection
Task: vibration sensors on 12 chemical plant pumps, 6 sensors per pump, 100 Hz frequency. Predict failure 4–24 hours ahead.
Architecture:
Raw data → feature extraction (RMS, kurtosis, peak factor, FFT amplitudes at resonance) → rolling normalization (24h window) → LSTMAE → reconstruction error → threshold logic + alerting.
LSTM window 60 seconds (6000 points at 100 Hz). Too small — misses slow patterns. Too large — loses quick change sensitivity.
Anomaly threshold: not fixed but adaptive. threshold = mean(errors_last_7d) + 3 * std(errors_last_7d). With drift in normal state (planned wear), threshold adapts avoiding false positives.
Result on 6-month pilot: detected 4 of 5 real pre-fault states (recall 0.8), 2 false alarms over 6 months (precision 0.67). Before: 3 unplanned stops at $40k each.
Fraud Detection: Financial Data Specifics
Transactions have properties complicating detection:
- Concept drift: fraud patterns change faster than normal. Model from half-year ago obsolete.
- Adversarial adaptation: sophisticated fraudsters adapt to detection — make transactions normal-like.
- Temporal dependency: series of normal then one unusual transfer — sequence anomaly, not single-point.
Practical stack: LightGBM with SMOTE-oversampling for supervised (known fraud cases) + Isolation Forest for unsupervised (new patterns). Combine signals in ensemble, final via thresholds tuned to acceptable FPR (0.1–1% transactions for manual review).
Evaluation Without Ground Truth
When no ground truth, use:
- Synthetic anomaly injection: add artificial anomalies (spike, level shift, point outlier), check if model detects
- Expert validation: sample top-K model anomalies → domain review → precision
- Business metric: fewer missed incidents / false alarms after deployment
Workflow
Start understanding "normal" for specific context — domain experts, not data. EDA, Isolation Forest baseline, quick validation on known incidents, if needed — complex methods.
Special attention — monitoring the model itself: distribution shift in features, anomaly score drift, system changes reaction.
Timelines: baseline system one method — 2–4 weeks. Production with adaptive thresholds, alerting, monitoring — 2–5 months.







