Market data replication system development

We design and develop full-cycle blockchain solutions: from smart contract architecture to launching DeFi protocols, NFT marketplaces and crypto exchanges. Security audits, tokenomics, integration with existing infrastructure.
Showing 1 of 1 servicesAll 1306 services
Market data replication system development
Complex
~1-2 weeks
FAQ
Blockchain Development Services
Blockchain Development Stages
Latest works
  • image_web-applications_feedme_466_0.webp
    Development of a web application for FEEDME
    1170
  • image_ecommerce_furnoro_435_0.webp
    Development of an online store for the company FURNORO
    1092
  • image_logo-advance_0.png
    B2B Advance company logo design
    563
  • image_crm_enviok_479_0.webp
    Development of a web application for Enviok
    830
  • image_logo-aider_0.jpg
    AIDER company logo development
    763
  • image_crm_chasseurs_493_0.webp
    CRM development for Chasseurs
    878

Market Data Replication System Development

Data replication solves the task of reliable distribution of market feed from primary sources (exchanges) to multiple consumers in different geographical locations or isolated environments. This is not simple copying — it's ensuring consistency, fault tolerance, and scalability of the data stream.

Why Replication is Needed

Trading systems often have multiple components operating in different environments:

  • Production trading — on servers close to exchanges (co-location or low latency)
  • Research/backtesting — in data centers with large storage volumes
  • Risk management — in protected networks with restricted access
  • Analytics dashboards — accessible to broad audience

Each environment should receive identical data without overloading the source.

Replication Topologies

Hub-and-Spoke — one primary aggregator collects data from exchanges, multiple replica nodes subscribe to it. Simple implementation, single point of failure on hub.

Chain Replication — data passes through a chain: exchange → primary → secondary → tertiary. Low load on primary source, high cumulative latency.

Pub-Sub via Kafka — primary writes to Kafka, any number of consumer groups read independently. Most flexible option for production.

Kafka as Replication Backbone

Kafka is ideal for market data replication:

  • Durability — data stored on disk, consumers can replay any historical period
  • Scalability — horizontal scaling through partitioning
  • Consumer Groups — different consumers read the same topic independently
  • Exactly-once delivery — with proper idempotent producer configuration
Topic: market.trades.binance.BTCUSDT
  Partition 0: trades (all, ordered by time)

Topic: market.orderbook.binance.BTCUSDT
  Partition 0: snapshots + diffs (ordered by update_id)

Topic: market.candles.binance.BTCUSDT.1m
  Partition 0: 1-minute OHLCV (ordered by candle time)

Topic naming: {data_type}.{exchange}.{symbol}.{interval} — allows easy filtering of needed data.

Delivery Guarantees

Market data systems typically use at-least-once delivery: better to get duplicate messages than lose data. Consumers are idempotent: deduplication by trade_id or update_id.

For critical components (risk management, position accounting) — exactly-once via Kafka Transactions with idempotent producers and acks: all configuration.

Cross-Datacenter Replication

Kafka MirrorMaker 2 replicates topics between clusters, enabling EU clusters to receive replicas of all market.* topics with small delay (typically 50–200ms for transatlantic replication).

Retention and Compression Management

Market data accumulates rapidly. Kafka retention policies:

# For tick-data: 7 days, then delete
log.retention.hours=168

# For daily OHLCV: infinite (use size limit)
log.retention.bytes=10737418240  # 10 GB per partition

# Log compaction for order book: keep only latest state per price level
log.cleanup.policy=compact

Kafka-level compression: compression.type=zstd compresses market data by 40–70% without significant CPU overhead.

Replication Monitoring

Key metrics:

Metric Shows
Consumer lag Consumer delay behind producer
Replication latency Delay between primary and replica clusters
Producer send rate Publishing speed (messages/sec)
Bytes in/out rate Throughput
Under-replicated partitions Partitions with insufficient replication

Consumer lag > N minutes for trading bot — critical alert. For analytics system — warning.

Schema Registry and Format Compatibility

With evolving schemas (adding fields, changing types), preventing consumer breakage is important. Confluent Schema Registry + Avro provide schema evolution with compatibility checks, supporting backward compatible changes like optional fields with defaults.