Order Book Data ML Pipeline Development
Order book data — richest source of market structure information. Full order stack contains information about expected supply/demand unavailable from OHLCV data. However volume and structure of this data require specialized pipeline.
Order book levels:
- Level 1 (Top of Book): best bid and ask with volumes. Minimal volume, maximum relevance.
- Level 2 (Full Depth): all stack levels with volumes. Binance provides 5000 levels depth. Updates via WebSocket diff stream.
- Level 3 (Full Order Feed): each individual order with ID. Not available on all exchanges, maximum detail.
Order Book Imbalance (OBI) - most researched feature for short-term forecasting:
OBI = (bid_volume - ask_volume) / (bid_volume + ask_volume)
Positive OBI indicates buying pressure, negative indicates selling pressure.
Feature engineering from order book: OBI on different levels, OBI moving average, OBI change, spread dynamics, depth stability, weighted mid price, depth asymmetry.
Storage: ClickHouse for order book data - high write speed, efficient columnar storage, fast aggregations. Level 2 snapshots every 100ms consume ~69M records/day.
Short-term price prediction: predict mid-price change through N order book updates (~1 second) using OBI and depth features. LightGBM/XGBoost for model.
Develop complete order book ML pipeline: WebSocket collector with incremental update, ClickHouse storage, feature engineering from OBI and depth data, short-term prediction model training and realtime inference.







