Order book snapshots storage system development

We design and develop full-cycle blockchain solutions: from smart contract architecture to launching DeFi protocols, NFT marketplaces and crypto exchanges. Security audits, tokenomics, integration with existing infrastructure.

8+Years of workmore info 900+Completed projectsmore info 100+In house employeesmore info 19+Partnersmore info

Offered services

Showing 1 of 1 servicesAll 1306 services

Complex

~5 business days

FAQ

Blockchain Development Services

Discuss your blockchain project

Free consultation — we will show how blockchain can solve your challenge

Get a quote

We will estimate the budget and timeline for your blockchain project

Blockchain Development Stages

Latest works

Development of a web application for FEEDME
1170
Development of an online store for the company FURNORO
1092
B2B Advance company logo design
563
Development of a web application for Enviok
830
AIDER company logo development
763
CRM development for Chasseurs
878

Show more works

Order Book Snapshots Storage System Development

Order book data is the most informative yet most complex exchange data to store. A full BTC/USDT order book on Binance contains 5000 levels on each side, updates several times per second, and generates hundreds of megabytes of data per hour. A naive storage approach leads to catastrophic volume growth; a proper system balances data completeness with practical limitations.

Types of Order Book Data

Before designing storage, understand what data is actually needed:

Full snapshots — complete order book slice at a moment in time. Large (several KB per snapshot), but allow precise market state recovery. At 1 snapshot/sec across 100 symbols — ~100 GB per day.

Depth snapshots — first N levels (usually 5, 10, 20). Sufficient for most strategies, require 50–250 times less space.

Order book diffs — only changes (level additions/deletions/updates). Minimal volume but require full snapshot for state recovery.

Mid-price and spread — aggregated derivatives. Microscopic volume, suitable for long-term analysis.

In practice, systems store combinations: full snapshots once per minute for recovery, diffs for second-level resolution between snapshots.

Storage Format: Delta Encoding

Differential storage is critical for volume reduction. Store not the full book, but changes relative to previous state.

Snapshot @ T=0:
  bids: [(43250.0, 1.5), (43249.5, 2.0), (43249.0, 0.8)]
  asks: [(43251.0, 1.2), (43251.5, 3.0), (43252.0, 0.5)]

Diff @ T=1 (only changes):
  bids_updated: [(43250.0, 2.1)]    # volume changed
  bids_removed: [(43249.5, 0)]      # level disappeared
  bids_added:   [(43248.5, 1.0)]    # new level
  asks_updated: []
  asks_removed: []
  asks_added:   [(43251.75, 0.3)]

Full snapshot: ~10 KB. Diff: ~200 bytes. At 5 updates/sec with snapshot every minute — 300 diffs + 1 snapshot = ~60 KB/min instead of 3 MB/min.

Database Schema

Use ClickHouse with custom serialization:

-- Full snapshots (once per minute)
CREATE TABLE orderbook_snapshots (
    exchange     LowCardinality(String),
    symbol       LowCardinality(String),
    snapshot_time DateTime64(3, 'UTC'),
    depth        UInt16,
    bids         Array(Tuple(Decimal(24,8), Decimal(24,8))),  -- [(price, qty)]
    asks         Array(Tuple(Decimal(24,8), Decimal(24,8)))
)
ENGINE = MergeTree()
PARTITION BY (exchange, toYYYYMM(snapshot_time))
ORDER BY (exchange, symbol, snapshot_time);

-- Deltas between snapshots
CREATE TABLE orderbook_diffs (
    exchange      LowCardinality(String),
    symbol        LowCardinality(String),
    diff_time     DateTime64(3, 'UTC'),
    first_update_id UInt64,
    last_update_id  UInt64,
    bids_changes  Array(Tuple(Decimal(24,8), Decimal(24,8))),  -- qty=0 means deletion
    asks_changes  Array(Tuple(Decimal(24,8), Decimal(24,8)))
)
ENGINE = MergeTree()
PARTITION BY (exchange, toYYYYMM(diff_time))
ORDER BY (exchange, symbol, diff_time);

Book State Reconstruction

Key operation — reconstructing the book at arbitrary timestamp:

class OrderBookReplay:
    async def reconstruct_at(self, exchange: str, symbol: str, target_ts: int) -> OrderBook:
        # 1. Find last snapshot before target_ts
        snapshot = await self.storage.get_last_snapshot_before(
            exchange, symbol, target_ts
        )
        
        # 2. Load all diffs from snapshot to target_ts
        diffs = await self.storage.get_diffs(
            exchange, symbol,
            from_ts=snapshot.timestamp,
            to_ts=target_ts
        )
        
        # 3. Apply diffs sequentially
        book = OrderBook.from_snapshot(snapshot)
        for diff in diffs:
            book.apply_diff(diff)
        
        return book

Critical: maintain diff sequence order and validate via update_id — each diff has lastUpdateId, next diff must have firstUpdateId = lastUpdateId + 1. Sequence gaps indicate missing data.

Compression and Optimization

Order book contains many similar numbers (prices cluster together). Before writing to ClickHouse, apply:

Delta encoding for prices — store price differences from best bid/ask in basis points instead of absolute prices.

Binary serialization — use Protocol Buffers or MessagePack instead of JSON for 3–5x size reduction.

ClickHouse compression — ZSTD codec provides better compression for Float/Decimal types than default LZ4.

Streaming Ingestion

Ingestion pipeline for snapshots and diffs works in parallel, buffering diffs and periodically saving snapshots and batches of changes.

Monitoring

Critical: track gaps in diff sequences. System validation compares lastUpdateId of each diff with firstUpdateId of next and alerts on gaps. Data gaps make order book reconstruction between snapshots impossible.