Tick data storage system development

We design and develop full-cycle blockchain solutions: from smart contract architecture to launching DeFi protocols, NFT marketplaces and crypto exchanges. Security audits, tokenomics, integration with existing infrastructure.
Showing 1 of 1 servicesAll 1306 services
Tick data storage system development
Medium
~3-5 business days
FAQ
Blockchain Development Services
Blockchain Development Stages
Latest works
  • image_web-applications_feedme_466_0.webp
    Development of a web application for FEEDME
    1170
  • image_ecommerce_furnoro_435_0.webp
    Development of an online store for the company FURNORO
    1092
  • image_logo-advance_0.png
    B2B Advance company logo design
    563
  • image_crm_enviok_479_0.webp
    Development of a web application for Enviok
    830
  • image_logo-aider_0.jpg
    AIDER company logo development
    763
  • image_crm_chasseurs_493_0.webp
    CRM development for Chasseurs
    878

Tick Data Storage System Development

Tick data — every individual trade on an exchange: time, price, volume, side (maker/taker). Unlike OHLCV candles, tick data allows recovery of complete market activity picture at any moment. This requires significantly more storage space and specialized infrastructure.

Data Volumes

To understand the scale: Binance on BTC/USDT generates approximately 50,000–200,000 trades per day. Across all pairs on all exchanges — hundreds of millions of records daily. A year of data — tens of billions of rows. Standard PostgreSQL can't handle this without specialized solutions.

Storage Technologies

ClickHouse — columnar DBMS from Yandex, optimized for analytical queries on large volumes. Compresses time series 5–20 times better than PostgreSQL, performs aggregations on billions of rows in seconds.

CREATE TABLE trades (
    exchange    LowCardinality(String),
    symbol      LowCardinality(String),
    trade_id    String,
    timestamp   DateTime64(3, 'UTC'),
    price       Decimal(24, 8),
    quantity    Decimal(24, 8),
    side        LowCardinality(String),
    is_maker    Bool
)
ENGINE = MergeTree()
PARTITION BY (exchange, toYYYYMM(timestamp))
ORDER BY (exchange, symbol, timestamp)
SETTINGS index_granularity = 8192;

LowCardinality for strings with few unique values (exchange, symbol, side) — automatic dictionary encoding provides significant space savings.

TimescaleDB — if you already use PostgreSQL and volumes are moderate (< 1 billion rows). Supports hypertables with automatic time-based partitioning, compression policies.

Arctic (Python library, on MongoDB) — specialized solution for financial time series. Supports tick-data, OHLCV, arbitrary DataFrames with versioning.

ClickHouse Schema

For maximum write performance use buffer table:

-- Buffer: accumulates data in memory, flushes every 10 sec or 1M rows
CREATE TABLE trades_buffer AS trades
ENGINE = Buffer(currentDatabase(), 'trades', 16, 10, 100, 10000, 1000000, 10000000, 100000000);

-- Write to buffer, read from main table
INSERT INTO trades_buffer VALUES (...);
SELECT * FROM trades WHERE ...;

Tick Data Queries

Aggregate tick data into OHLCV on-the-fly:

SELECT
    toStartOfInterval(timestamp, INTERVAL 1 MINUTE) AS candle_time,
    argMin(price, timestamp) AS open,
    max(price) AS high,
    min(price) AS low,
    argMax(price, timestamp) AS close,
    sum(quantity) AS volume,
    count() AS trade_count
FROM trades
WHERE exchange = 'binance'
  AND symbol = 'BTC/USDT'
  AND timestamp BETWEEN '2024-01-01' AND '2024-01-02'
GROUP BY candle_time
ORDER BY candle_time;

On ClickHouse, this query on 50M rows executes in 1–3 seconds.

Ingestion Pipeline

import asyncio
from collections import deque

class TickDataIngester:
    BATCH_SIZE = 10000
    FLUSH_INTERVAL = 5.0  # seconds

    def __init__(self, clickhouse_client):
        self.buffer = deque()
        self.client = clickhouse_client

    async def on_trade(self, trade: NormalizedTrade):
        self.buffer.append(trade)
        if len(self.buffer) >= self.BATCH_SIZE:
            await self.flush()

    async def flush(self):
        if not self.buffer:
            return
        batch = [self.buffer.popleft() for _ in range(min(self.BATCH_SIZE, len(self.buffer)))]
        await self.client.insert('trades_buffer', batch)

    async def flush_loop(self):
        while True:
            await asyncio.sleep(self.FLUSH_INTERVAL)
            await self.flush()

Compression and Retention

ClickHouse compresses data automatically. Additionally enable cold storage via TTL:

ALTER TABLE trades
MODIFY TTL timestamp + INTERVAL 1 YEAR TO DISK 'cold_storage';

Data older than one year automatically moves to cheaper storage (S3-compatible via ClickHouse object storage).

Backfill Historical Data

For populating historical data use public exchange APIs. Binance, for example, provides trade history through /api/v3/aggTrades with pagination by fromId. Parallel backfill across time ranges with rate limiting allows loading years of data in several hours.