Order book snapshots storage system development

We design and develop full-cycle blockchain solutions: from smart contract architecture to launching DeFi protocols, NFT marketplaces and crypto exchanges. Security audits, tokenomics, integration with existing infrastructure.
Showing 1 of 1 servicesAll 1306 services
Order book snapshots storage system development
Complex
~5 business days
FAQ
Blockchain Development Services
Blockchain Development Stages
Latest works
  • image_web-applications_feedme_466_0.webp
    Development of a web application for FEEDME
    1170
  • image_ecommerce_furnoro_435_0.webp
    Development of an online store for the company FURNORO
    1092
  • image_logo-advance_0.png
    B2B Advance company logo design
    563
  • image_crm_enviok_479_0.webp
    Development of a web application for Enviok
    830
  • image_logo-aider_0.jpg
    AIDER company logo development
    763
  • image_crm_chasseurs_493_0.webp
    CRM development for Chasseurs
    878

Order Book Snapshots Storage System Development

Order book data is the most informative yet most complex exchange data to store. A full BTC/USDT order book on Binance contains 5000 levels on each side, updates several times per second, and generates hundreds of megabytes of data per hour. A naive storage approach leads to catastrophic volume growth; a proper system balances data completeness with practical limitations.

Types of Order Book Data

Before designing storage, understand what data is actually needed:

Full snapshots — complete order book slice at a moment in time. Large (several KB per snapshot), but allow precise market state recovery. At 1 snapshot/sec across 100 symbols — ~100 GB per day.

Depth snapshots — first N levels (usually 5, 10, 20). Sufficient for most strategies, require 50–250 times less space.

Order book diffs — only changes (level additions/deletions/updates). Minimal volume but require full snapshot for state recovery.

Mid-price and spread — aggregated derivatives. Microscopic volume, suitable for long-term analysis.

In practice, systems store combinations: full snapshots once per minute for recovery, diffs for second-level resolution between snapshots.

Storage Format: Delta Encoding

Differential storage is critical for volume reduction. Store not the full book, but changes relative to previous state.

Snapshot @ T=0:
  bids: [(43250.0, 1.5), (43249.5, 2.0), (43249.0, 0.8)]
  asks: [(43251.0, 1.2), (43251.5, 3.0), (43252.0, 0.5)]

Diff @ T=1 (only changes):
  bids_updated: [(43250.0, 2.1)]    # volume changed
  bids_removed: [(43249.5, 0)]      # level disappeared
  bids_added:   [(43248.5, 1.0)]    # new level
  asks_updated: []
  asks_removed: []
  asks_added:   [(43251.75, 0.3)]

Full snapshot: ~10 KB. Diff: ~200 bytes. At 5 updates/sec with snapshot every minute — 300 diffs + 1 snapshot = ~60 KB/min instead of 3 MB/min.

Database Schema

Use ClickHouse with custom serialization:

-- Full snapshots (once per minute)
CREATE TABLE orderbook_snapshots (
    exchange     LowCardinality(String),
    symbol       LowCardinality(String),
    snapshot_time DateTime64(3, 'UTC'),
    depth        UInt16,
    bids         Array(Tuple(Decimal(24,8), Decimal(24,8))),  -- [(price, qty)]
    asks         Array(Tuple(Decimal(24,8), Decimal(24,8)))
)
ENGINE = MergeTree()
PARTITION BY (exchange, toYYYYMM(snapshot_time))
ORDER BY (exchange, symbol, snapshot_time);

-- Deltas between snapshots
CREATE TABLE orderbook_diffs (
    exchange      LowCardinality(String),
    symbol        LowCardinality(String),
    diff_time     DateTime64(3, 'UTC'),
    first_update_id UInt64,
    last_update_id  UInt64,
    bids_changes  Array(Tuple(Decimal(24,8), Decimal(24,8))),  -- qty=0 means deletion
    asks_changes  Array(Tuple(Decimal(24,8), Decimal(24,8)))
)
ENGINE = MergeTree()
PARTITION BY (exchange, toYYYYMM(diff_time))
ORDER BY (exchange, symbol, diff_time);

Book State Reconstruction

Key operation — reconstructing the book at arbitrary timestamp:

class OrderBookReplay:
    async def reconstruct_at(self, exchange: str, symbol: str, target_ts: int) -> OrderBook:
        # 1. Find last snapshot before target_ts
        snapshot = await self.storage.get_last_snapshot_before(
            exchange, symbol, target_ts
        )
        
        # 2. Load all diffs from snapshot to target_ts
        diffs = await self.storage.get_diffs(
            exchange, symbol,
            from_ts=snapshot.timestamp,
            to_ts=target_ts
        )
        
        # 3. Apply diffs sequentially
        book = OrderBook.from_snapshot(snapshot)
        for diff in diffs:
            book.apply_diff(diff)
        
        return book

Critical: maintain diff sequence order and validate via update_id — each diff has lastUpdateId, next diff must have firstUpdateId = lastUpdateId + 1. Sequence gaps indicate missing data.

Compression and Optimization

Order book contains many similar numbers (prices cluster together). Before writing to ClickHouse, apply:

Delta encoding for prices — store price differences from best bid/ask in basis points instead of absolute prices.

Binary serialization — use Protocol Buffers or MessagePack instead of JSON for 3–5x size reduction.

ClickHouse compression — ZSTD codec provides better compression for Float/Decimal types than default LZ4.

Streaming Ingestion

Ingestion pipeline for snapshots and diffs works in parallel, buffering diffs and periodically saving snapshots and batches of changes.

Monitoring

Critical: track gaps in diff sequences. System validation compares lastUpdateId of each diff with firstUpdateId of next and alerts on gaps. Data gaps make order book reconstruction between snapshots impossible.