Tick Data Storage System Development
Tick data — every individual trade on an exchange: time, price, volume, side (maker/taker). Unlike OHLCV candles, tick data allows recovery of complete market activity picture at any moment. This requires significantly more storage space and specialized infrastructure.
Data Volumes
To understand the scale: Binance on BTC/USDT generates approximately 50,000–200,000 trades per day. Across all pairs on all exchanges — hundreds of millions of records daily. A year of data — tens of billions of rows. Standard PostgreSQL can't handle this without specialized solutions.
Storage Technologies
ClickHouse — columnar DBMS from Yandex, optimized for analytical queries on large volumes. Compresses time series 5–20 times better than PostgreSQL, performs aggregations on billions of rows in seconds.
CREATE TABLE trades (
exchange LowCardinality(String),
symbol LowCardinality(String),
trade_id String,
timestamp DateTime64(3, 'UTC'),
price Decimal(24, 8),
quantity Decimal(24, 8),
side LowCardinality(String),
is_maker Bool
)
ENGINE = MergeTree()
PARTITION BY (exchange, toYYYYMM(timestamp))
ORDER BY (exchange, symbol, timestamp)
SETTINGS index_granularity = 8192;
LowCardinality for strings with few unique values (exchange, symbol, side) — automatic dictionary encoding provides significant space savings.
TimescaleDB — if you already use PostgreSQL and volumes are moderate (< 1 billion rows). Supports hypertables with automatic time-based partitioning, compression policies.
Arctic (Python library, on MongoDB) — specialized solution for financial time series. Supports tick-data, OHLCV, arbitrary DataFrames with versioning.
ClickHouse Schema
For maximum write performance use buffer table:
-- Buffer: accumulates data in memory, flushes every 10 sec or 1M rows
CREATE TABLE trades_buffer AS trades
ENGINE = Buffer(currentDatabase(), 'trades', 16, 10, 100, 10000, 1000000, 10000000, 100000000);
-- Write to buffer, read from main table
INSERT INTO trades_buffer VALUES (...);
SELECT * FROM trades WHERE ...;
Tick Data Queries
Aggregate tick data into OHLCV on-the-fly:
SELECT
toStartOfInterval(timestamp, INTERVAL 1 MINUTE) AS candle_time,
argMin(price, timestamp) AS open,
max(price) AS high,
min(price) AS low,
argMax(price, timestamp) AS close,
sum(quantity) AS volume,
count() AS trade_count
FROM trades
WHERE exchange = 'binance'
AND symbol = 'BTC/USDT'
AND timestamp BETWEEN '2024-01-01' AND '2024-01-02'
GROUP BY candle_time
ORDER BY candle_time;
On ClickHouse, this query on 50M rows executes in 1–3 seconds.
Ingestion Pipeline
import asyncio
from collections import deque
class TickDataIngester:
BATCH_SIZE = 10000
FLUSH_INTERVAL = 5.0 # seconds
def __init__(self, clickhouse_client):
self.buffer = deque()
self.client = clickhouse_client
async def on_trade(self, trade: NormalizedTrade):
self.buffer.append(trade)
if len(self.buffer) >= self.BATCH_SIZE:
await self.flush()
async def flush(self):
if not self.buffer:
return
batch = [self.buffer.popleft() for _ in range(min(self.BATCH_SIZE, len(self.buffer)))]
await self.client.insert('trades_buffer', batch)
async def flush_loop(self):
while True:
await asyncio.sleep(self.FLUSH_INTERVAL)
await self.flush()
Compression and Retention
ClickHouse compresses data automatically. Additionally enable cold storage via TTL:
ALTER TABLE trades
MODIFY TTL timestamp + INTERVAL 1 YEAR TO DISK 'cold_storage';
Data older than one year automatically moves to cheaper storage (S3-compatible via ClickHouse object storage).
Backfill Historical Data
For populating historical data use public exchange APIs. Binance, for example, provides trade history through /api/v3/aggTrades with pagination by fromId. Parallel backfill across time ranges with rate limiting allows loading years of data in several hours.







