Order Book Snapshots Storage System Development
Order book data is the most informative yet most complex exchange data to store. A full BTC/USDT order book on Binance contains 5000 levels on each side, updates several times per second, and generates hundreds of megabytes of data per hour. A naive storage approach leads to catastrophic volume growth; a proper system balances data completeness with practical limitations.
Types of Order Book Data
Before designing storage, understand what data is actually needed:
Full snapshots — complete order book slice at a moment in time. Large (several KB per snapshot), but allow precise market state recovery. At 1 snapshot/sec across 100 symbols — ~100 GB per day.
Depth snapshots — first N levels (usually 5, 10, 20). Sufficient for most strategies, require 50–250 times less space.
Order book diffs — only changes (level additions/deletions/updates). Minimal volume but require full snapshot for state recovery.
Mid-price and spread — aggregated derivatives. Microscopic volume, suitable for long-term analysis.
In practice, systems store combinations: full snapshots once per minute for recovery, diffs for second-level resolution between snapshots.
Storage Format: Delta Encoding
Differential storage is critical for volume reduction. Store not the full book, but changes relative to previous state.
Snapshot @ T=0:
bids: [(43250.0, 1.5), (43249.5, 2.0), (43249.0, 0.8)]
asks: [(43251.0, 1.2), (43251.5, 3.0), (43252.0, 0.5)]
Diff @ T=1 (only changes):
bids_updated: [(43250.0, 2.1)] # volume changed
bids_removed: [(43249.5, 0)] # level disappeared
bids_added: [(43248.5, 1.0)] # new level
asks_updated: []
asks_removed: []
asks_added: [(43251.75, 0.3)]
Full snapshot: ~10 KB. Diff: ~200 bytes. At 5 updates/sec with snapshot every minute — 300 diffs + 1 snapshot = ~60 KB/min instead of 3 MB/min.
Database Schema
Use ClickHouse with custom serialization:
-- Full snapshots (once per minute)
CREATE TABLE orderbook_snapshots (
exchange LowCardinality(String),
symbol LowCardinality(String),
snapshot_time DateTime64(3, 'UTC'),
depth UInt16,
bids Array(Tuple(Decimal(24,8), Decimal(24,8))), -- [(price, qty)]
asks Array(Tuple(Decimal(24,8), Decimal(24,8)))
)
ENGINE = MergeTree()
PARTITION BY (exchange, toYYYYMM(snapshot_time))
ORDER BY (exchange, symbol, snapshot_time);
-- Deltas between snapshots
CREATE TABLE orderbook_diffs (
exchange LowCardinality(String),
symbol LowCardinality(String),
diff_time DateTime64(3, 'UTC'),
first_update_id UInt64,
last_update_id UInt64,
bids_changes Array(Tuple(Decimal(24,8), Decimal(24,8))), -- qty=0 means deletion
asks_changes Array(Tuple(Decimal(24,8), Decimal(24,8)))
)
ENGINE = MergeTree()
PARTITION BY (exchange, toYYYYMM(diff_time))
ORDER BY (exchange, symbol, diff_time);
Book State Reconstruction
Key operation — reconstructing the book at arbitrary timestamp:
class OrderBookReplay:
async def reconstruct_at(self, exchange: str, symbol: str, target_ts: int) -> OrderBook:
# 1. Find last snapshot before target_ts
snapshot = await self.storage.get_last_snapshot_before(
exchange, symbol, target_ts
)
# 2. Load all diffs from snapshot to target_ts
diffs = await self.storage.get_diffs(
exchange, symbol,
from_ts=snapshot.timestamp,
to_ts=target_ts
)
# 3. Apply diffs sequentially
book = OrderBook.from_snapshot(snapshot)
for diff in diffs:
book.apply_diff(diff)
return book
Critical: maintain diff sequence order and validate via update_id — each diff has lastUpdateId, next diff must have firstUpdateId = lastUpdateId + 1. Sequence gaps indicate missing data.
Compression and Optimization
Order book contains many similar numbers (prices cluster together). Before writing to ClickHouse, apply:
Delta encoding for prices — store price differences from best bid/ask in basis points instead of absolute prices.
Binary serialization — use Protocol Buffers or MessagePack instead of JSON for 3–5x size reduction.
ClickHouse compression — ZSTD codec provides better compression for Float/Decimal types than default LZ4.
Streaming Ingestion
Ingestion pipeline for snapshots and diffs works in parallel, buffering diffs and periodically saving snapshots and batches of changes.
Monitoring
Critical: track gaps in diff sequences. System validation compares lastUpdateId of each diff with firstUpdateId of next and alerts on gaps. Data gaps make order book reconstruction between snapshots impossible.







