Market data replication system development

We design and develop full-cycle blockchain solutions: from smart contract architecture to launching DeFi protocols, NFT marketplaces and crypto exchanges. Security audits, tokenomics, integration with existing infrastructure.

8+Years of workmore info 900+Completed projectsmore info 100+In house employeesmore info 19+Partnersmore info

Offered services

Showing 1 of 1 servicesAll 1306 services

Complex

~1-2 weeks

FAQ

Blockchain Development Services

Discuss your blockchain project

Free consultation — we will show how blockchain can solve your challenge

Get a quote

We will estimate the budget and timeline for your blockchain project

Blockchain Development Stages

Latest works

Development of a web application for FEEDME
1170
Development of an online store for the company FURNORO
1092
B2B Advance company logo design
563
Development of a web application for Enviok
830
AIDER company logo development
763
CRM development for Chasseurs
878

Show more works

Market Data Replication System Development

Data replication solves the task of reliable distribution of market feed from primary sources (exchanges) to multiple consumers in different geographical locations or isolated environments. This is not simple copying — it's ensuring consistency, fault tolerance, and scalability of the data stream.

Why Replication is Needed

Trading systems often have multiple components operating in different environments:

Production trading — on servers close to exchanges (co-location or low latency)
Research/backtesting — in data centers with large storage volumes
Risk management — in protected networks with restricted access
Analytics dashboards — accessible to broad audience

Each environment should receive identical data without overloading the source.

Replication Topologies

Hub-and-Spoke — one primary aggregator collects data from exchanges, multiple replica nodes subscribe to it. Simple implementation, single point of failure on hub.

Chain Replication — data passes through a chain: exchange → primary → secondary → tertiary. Low load on primary source, high cumulative latency.

Pub-Sub via Kafka — primary writes to Kafka, any number of consumer groups read independently. Most flexible option for production.

Kafka as Replication Backbone

Kafka is ideal for market data replication:

Durability — data stored on disk, consumers can replay any historical period
Scalability — horizontal scaling through partitioning
Consumer Groups — different consumers read the same topic independently
Exactly-once delivery — with proper idempotent producer configuration

Topic: market.trades.binance.BTCUSDT
  Partition 0: trades (all, ordered by time)

Topic: market.orderbook.binance.BTCUSDT
  Partition 0: snapshots + diffs (ordered by update_id)

Topic: market.candles.binance.BTCUSDT.1m
  Partition 0: 1-minute OHLCV (ordered by candle time)

Topic naming: {data_type}.{exchange}.{symbol}.{interval} — allows easy filtering of needed data.

Delivery Guarantees

Market data systems typically use at-least-once delivery: better to get duplicate messages than lose data. Consumers are idempotent: deduplication by trade_id or update_id.

For critical components (risk management, position accounting) — exactly-once via Kafka Transactions with idempotent producers and acks: all configuration.

Cross-Datacenter Replication

Kafka MirrorMaker 2 replicates topics between clusters, enabling EU clusters to receive replicas of all market.* topics with small delay (typically 50–200ms for transatlantic replication).

Retention and Compression Management

Market data accumulates rapidly. Kafka retention policies:

# For tick-data: 7 days, then delete
log.retention.hours=168

# For daily OHLCV: infinite (use size limit)
log.retention.bytes=10737418240  # 10 GB per partition

# Log compaction for order book: keep only latest state per price level
log.cleanup.policy=compact

Kafka-level compression: compression.type=zstd compresses market data by 40–70% without significant CPU overhead.

Replication Monitoring

Key metrics:

Metric	Shows
Consumer lag	Consumer delay behind producer
Replication latency	Delay between primary and replica clusters
Producer send rate	Publishing speed (messages/sec)
Bytes in/out rate	Throughput
Under-replicated partitions	Partitions with insufficient replication

Consumer lag > N minutes for trading bot — critical alert. For analytics system — warning.

Schema Registry and Format Compatibility

With evolving schemas (adding fields, changing types), preventing consumer breakage is important. Confluent Schema Registry + Avro provide schema evolution with compatibility checks, supporting backward compatible changes like optional fields with defaults.