Tick Data ML Pipeline Development
Tick data records each individual trade: price, volume, side (buy/sell), timestamp. This is most granular level of market data, containing information completely lost during aggregation into OHLCV candles.
Tick data collection: WebSocket connection with trade stream buffer, database batch insert (PostgreSQL or ClickHouse).
Storage: ClickHouse for tick data - 500K+ rows/second insert speed, fast aggregations.
Aggregation: custom OHLCV creation at any timeframe. Volume bars (N units volume), dollar bars (N USD volume), imbalance bars (N buy/sell imbalance).
Feature engineering from ticks: buy/sell pressure, trade frequency, average trade size, large trade ratio, VWAP deviation, trade size distribution analysis.
Trade size analysis: large trades signal institutional activity. Compare price impact of large vs small trades.
Realtime pipeline: Binance WebSocket → asyncio consumer → buffer → ClickHouse batch insert → Redis sorted set → feature calculator → ML inference.
Latency: from tick receipt to output signal < 10ms for Python asyncio pipeline.
Develop complete tick-data pipeline: WebSocket collector, ClickHouse storage, custom bar types, feature engineering, realtime ML inference.







