Development of AI-based Order Book DOM Analysis Model for Trading
Order Book (Depth of Market, DOM) — a snapshot of current market state: all limit orders to buy (bids) and sell (asks) with prices and volumes. Analysis of order books allows assessing short-term pressure of supply and demand milliseconds before a trading signal is executed.
Order Book Data Structure
L2 Order Book snapshot:
Price | Bid Volume | Ask Volume
---------|-----------|----------
100.05 | 0 | 5000
100.04 | 0 | 3000
100.03 | 0 | 1500 ← Best Ask
100.02 | 2000 | 0 ← Best Bid
100.01 | 3500 | 0
100.00 | 8000 | 0
99.99 | 2500 | 0
L3 Order Book: Individual orders with ID — needed for microstructure analysis, available on some exchanges (Binance for crypto, CME for futures via API).
Feature Engineering from Order Book
Basic metrics:
-
bid_ask_spread: Best Ask - Best Bid (absolute and relative) -
mid_price: (Best Bid + Best Ask) / 2 -
imbalance: (TotalBidVolume - TotalAskVolume) / (TotalBidVolume + TotalAskVolume) -
weighted_mid_price: volume-weighted mid price
Order Book Imbalance (OBI):
def order_book_imbalance(book, levels=5):
bids = [vol for price, vol in book['bids'][:levels]]
asks = [vol for price, vol in book['asks'][:levels]]
return (sum(bids) - sum(asks)) / (sum(bids) + sum(asks))
OBI > 0 → buyer pressure → expected upward move. This is one of the strongest short-term predictors (horizon 1-10 seconds).
Iceberg detection: Hidden orders placed as series of small orders at one price. Signs: rapid level replenishment after execution, consistent volume at level despite trades.
Market depth curves:
def depth_imbalance_at_level(book, price_distance):
bid_vol = sum([vol for p, vol in book['bids'] if (mid - p) <= price_distance])
ask_vol = sum([vol for p, vol in book['asks'] if (p - mid) <= price_distance])
return (bid_vol - ask_vol) / (bid_vol + ask_vol)
# Features: imbalance at 0.1%, 0.3%, 0.5%, 1.0% from mid
Sequence Models for Order Book
Order Book snapshot at each moment = matrix. Temporal sequence of snapshots = 3D tensor.
CNN for spatial patterns:
# Book snapshot: [levels × 2 (bid/ask)]
# Temporal: [T × levels × 2]
model = nn.Sequential(
nn.Conv2d(T, 32, kernel_size=(3, 2)), # spatial
nn.ReLU(),
nn.Conv2d(32, 64, kernel_size=(3, 1)),
nn.ReLU(),
nn.Flatten(),
nn.LSTM(...) # temporal
)
DeepLOB (Deep Learning for Limit Order Books): Architecture from academic literature (Zhang et al., 2019): CNN + LSTM + Inception modules. Trained to predict mid-price direction in 1-10 trading events. AUC 0.65-0.75 on historical LOBSTER data (Nasdaq).
Microstructure Signals
Trade vs. Quote flow:
- Toxic order flow: large aggressive orders removing liquidity
- Passive order flow: market makers adding liquidity
- Order classification: Lee-Ready algorithm, tick rule
Volume imbalance: difference between buyer-initiated and seller-initiated volumes in last N trades. Strong short-term movement predictor.
Trade arrival rate: intensity of trade flow — increases before significant movement.
Practical Limitations
Latency requirements: For HFT Order Book analysis microsecond delays needed. For algorithmic trading with 1-60 second horizon < 10 ms sufficient.
Hardware:
- FPGA for true HFT (sub-microsecond)
- Kernel bypass networking: DPDK, OpenOnload
- Co-location in exchange datacenter
Data:
- Binance: full L2 book via WebSocket
- CME: FIX/MDP3 protocol, co-location mandatory for freshness
- Crypto aggregated: Tardis.dev (historical L2 data), CoinGecko, Kaiko
Production System for DOM Analysis
Exchange Feed → FIX/WebSocket → Normalizer → Feature Calculator
↓
ML Model (ONNX)
↓
Signal Generator
↓
Order Management System
Production monitoring:
- Feature drift: Order Book statistics change at different times of day
- Model drift: accuracy on last 1000 predictions
- Regime alerts: abnormally high spread or thin book
Timeline: Feature engineering + baseline model (OBI + regression) — 2-3 weeks. DeepLOB with real market L2 data and backtesting — 8-12 weeks. Production OMS integration — another 4-6 weeks.







