Multi-Exchange Data Aggregation System Development
Trading systems working simultaneously on multiple exchanges face a fundamental problem: each exchange has its own WebSocket API, its own data format, its own request frequency limitations, and its own behavioral quirks. An aggregator transforms this zoo into a single normalized stream.
Aggregator Architecture
The system is built on the fan-in principle: multiple data sources are collected into a single normalized stream.
Exchange Connectors — a separate module for each exchange. Responsible for WebSocket connection setup, subscribing to needed channels, handling reconnects and errors, parsing exchange raw format into normalized format.
Normalization Layer — transforms exchange-specific formats into a unified schema. Binance calls the best bid field b, Kraken — b too, but with different semantics. OKX uses nanoseconds for timestamp, Bitfinex — milliseconds.
Distribution Layer — publishes normalized events to the bus (Redis Streams, Kafka) for downstream consumers.
Normalized Format
Universal ticker event schema:
{
"exchange": "binance",
"symbol": "BTC/USDT",
"timestamp": 1704067200000,
"received_at": 1704067200045,
"bid": 43250.50,
"ask": 43251.00,
"last": 43250.75,
"volume_24h": 28450.123,
"open_24h": 42800.00
}
The received_at field — time of data receipt by the aggregator, different from exchange timestamp. The difference between them — network latency to the exchange, a useful monitoring metric.
Working with Rate Limits
Each exchange limits the number of requests. WebSocket connections are usually unlimited by message count, but there are limits on the number of subscriptions per connection (Binance: 1024 streams per connection) and the rate of sending subscription commands.
A proper connector manages the subscription queue considering these limits:
class ExchangeConnector:
MAX_SUBSCRIPTIONS_PER_CONN = 1000
SUBSCRIPTION_RATE_LIMIT = 10 # per second
async def subscribe_symbols(self, symbols: list[str]):
# Split into chunks by connection size
for chunk in chunks(symbols, self.MAX_SUBSCRIPTIONS_PER_CONN):
conn = await self.create_connection()
# Rate-limit subscriptions
async with self.rate_limiter:
await conn.subscribe(chunk)
Reconnection and Data Gaps
WebSocket connections break. Exchanges sometimes send "ping" and expect "pong" within a strictly defined time (Binance: 10 minutes without pong = disconnect). A proper connector:
- Automatically responds to ping-frames
- Tracks time of last message (heartbeat check)
- On disconnection — exponential backoff reconnect with jitter
- On recovery — resubscribes to all symbols
- Publishes
GAP_DETECTEDevent with time range of missing data
Downstream consumers must handle GAP events correctly, especially if using rolling aggregations.
Latency and Time Synchronization
When comparing prices across exchanges, time synchronization is critical. Server system time must be synchronized via NTP with 1–5ms precision. Most cloud providers provide accurate NTP, but this should be verified.
Exchanges have different network latencies — from 1ms (co-location) to 50–100ms for regular servers. For arbitrage strategies, accounting for this delay is important.
Data Quality Monitoring
| Metric | Description |
|---|---|
| Message rate | Messages per second per exchange/symbol |
| Latency (p50/p99) | Delay from exchange to aggregator |
| Gap rate | Number of data breaks per hour |
| Reconnect count | Reconnection frequency |
| Stale data alerts | Symbols without updates > X seconds |
Prometheus + Grafana — standard stack for this monitoring.
Libraries and Ready Solutions
CCXT Pro — WebSocket extension of CCXT with support for 50+ exchanges. A good starting point for prototyping, but production often requires custom connectors due to performance and specific requirements.
cryptofeed (Python) — specialized library for cryptocurrency feeds with support for 30+ exchanges, data normalization, and backends for Kafka, Redis, RabbitMQ, PostgreSQL.
For high-performance systems (< 1ms latency), write connectors from scratch in Rust or Go.







