Historical Order Data Storage Development
Order history is not just an archive. It's the primary data source for execution efficiency analysis, backtesting, commission calculation, tax reporting, and trading strategy audit. Properly designed storage should provide high write speed in real-time and analytical queries across multiple dimensions on years of data.
Order Data Structure
An order in a trading system is not simply a record "bought 1 BTC at 50000". The complete model includes several types of events:
Order Lifecycle Events:
-
ORDER_CREATED— initial creation with parameters -
ORDER_ACCEPTED— exchange confirmed receipt -
ORDER_PARTIALLY_FILLED— partial execution -
ORDER_FILLED— complete execution -
ORDER_CANCELLED— cancellation (with reason) -
ORDER_REJECTED— exchange rejection (with error code) -
ORDER_EXPIRED— timeout expired
Storing these events separately (event sourcing) provides complete reproducibility: you can always recover the state of any order at any point in time.
Database Schema
For order time series, TimescaleDB (PostgreSQL extension) or ClickHouse are optimal.
TimescaleDB — a good choice if you already use PostgreSQL. Automatically partitions tables by time (hypertables), supports continuous aggregations, and compression policies.
CREATE TABLE order_events (
event_id UUID DEFAULT gen_random_uuid(),
event_time TIMESTAMPTZ NOT NULL,
order_id UUID NOT NULL,
exchange VARCHAR(32) NOT NULL,
symbol VARCHAR(32) NOT NULL,
event_type VARCHAR(32) NOT NULL,
side VARCHAR(8),
order_type VARCHAR(16),
price NUMERIC(24, 8),
quantity NUMERIC(24, 8),
filled_qty NUMERIC(24, 8),
avg_fill_price NUMERIC(24, 8),
commission NUMERIC(24, 8),
commission_asset VARCHAR(16),
client_order_id VARCHAR(64),
strategy_id VARCHAR(64),
metadata JSONB
);
SELECT create_hypertable('order_events', 'event_time',
chunk_time_interval => INTERVAL '1 day');
Indexes for typical query patterns:
CREATE INDEX ON order_events (order_id, event_time DESC);
CREATE INDEX ON order_events (exchange, symbol, event_time DESC);
CREATE INDEX ON order_events (strategy_id, event_time DESC);
CREATE INDEX ON order_events USING GIN (metadata);
Query Patterns and Optimization
Order state recovery — a frequent operation. Instead of recomputing from events each time, maintain a materialized orders table with current state:
CREATE TABLE orders (
order_id UUID PRIMARY KEY,
exchange VARCHAR(32) NOT NULL,
symbol VARCHAR(32) NOT NULL,
status VARCHAR(32) NOT NULL,
side VARCHAR(8) NOT NULL,
order_type VARCHAR(16) NOT NULL,
price NUMERIC(24, 8),
quantity NUMERIC(24, 8),
filled_qty NUMERIC(24, 8) DEFAULT 0,
avg_fill_price NUMERIC(24, 8),
total_commission NUMERIC(24, 8) DEFAULT 0,
strategy_id VARCHAR(64),
created_at TIMESTAMPTZ NOT NULL,
updated_at TIMESTAMPTZ NOT NULL
);
Table updates occur on each new event through a trigger or application-side logic.
Analytical queries — typically aggregations by strategy, instrument, period:
-- P&L by strategies for period
SELECT
strategy_id,
symbol,
SUM(CASE WHEN side = 'BUY' THEN -filled_qty * avg_fill_price ELSE filled_qty * avg_fill_price END) as realized_pnl,
SUM(total_commission) as total_fees,
COUNT(*) as order_count
FROM orders
WHERE created_at BETWEEN '2024-01-01' AND '2024-12-31'
AND status = 'FILLED'
GROUP BY strategy_id, symbol
ORDER BY realized_pnl DESC;
TimescaleDB continuous aggregates allow pre-computing these aggregations and updating them incrementally:
CREATE MATERIALIZED VIEW orders_daily_summary
WITH (timescaledb.continuous) AS
SELECT
time_bucket('1 day', created_at) AS day,
exchange,
symbol,
strategy_id,
COUNT(*) AS order_count,
SUM(filled_qty * avg_fill_price) AS volume,
SUM(total_commission) AS fees
FROM orders
WHERE status = 'FILLED'
GROUP BY 1, 2, 3, 4;
Storing Fills Separately
For detailed execution quality analysis, it's critical to store individual fills (partial executions) separately from orders:
CREATE TABLE order_fills (
fill_id UUID DEFAULT gen_random_uuid(),
fill_time TIMESTAMPTZ NOT NULL,
order_id UUID NOT NULL REFERENCES orders(order_id),
exchange VARCHAR(32) NOT NULL,
symbol VARCHAR(32) NOT NULL,
price NUMERIC(24, 8) NOT NULL,
quantity NUMERIC(24, 8) NOT NULL,
commission NUMERIC(24, 8),
commission_asset VARCHAR(16),
is_maker BOOLEAN,
trade_id VARCHAR(64)
);
SELECT create_hypertable('order_fills', 'fill_time',
chunk_time_interval => INTERVAL '1 day');
This allows calculating execution VWAP, comparing with mid-price at execution time (market impact), and analyzing maker/taker ratio by strategy.
Retention Policies and Archiving
Hot data (last 30 days) is stored without compression for maximum write and read speed. Older data is compressed:
ALTER TABLE order_events SET (
timescaledb.compress,
timescaledb.compress_segmentby = 'exchange, symbol',
timescaledb.compress_orderby = 'event_time DESC'
);
SELECT add_compression_policy('order_events',
INTERVAL '30 days');
TimescaleDB compression provides 10–20x size reduction for time series with repetitive values. In practice, an order_events table of 100GB compresses to 5–10GB.
Data older than 2 years can be exported to Parquet files on S3 using pg_parquet or custom ETL, while maintaining historical analysis capability through Athena or ClickHouse.
Ingestion Pipeline
High-frequency writes require batching. Instead of INSERT per event:
COPY for bulk inserts — 10–50x faster than individual INSERTs. Accumulate events in memory (100ms or 1000 events, whichever comes first) and write with a single COPY.
Unlogged tables for intermediate buffer — WAL is not written, write speed is much higher. After aggregation, data is moved to the main logged table.
Connection pooling through PgBouncer — trading systems open many short connections. PgBouncer in transaction mode allows serving thousands of clients through a small pool of actual PostgreSQL connections.
Monitoring and Alerts
Key metrics to monitor storage:
| Metric | Normal | Alert |
|---|---|---|
| Write latency (p99) | < 10ms | > 50ms |
| Query latency (p99) | < 100ms | > 500ms |
| Replication lag | < 1s | > 10s |
| Disk usage growth | Predictable | Anomalous growth |
| Failed inserts | 0 | Any |
Order loss is always a critical incident. The system must have a reconciliation mechanism: periodically compare local history with exchange data via REST API and fill gaps.
Replication and Fault Tolerance
Production storage operates in PostgreSQL streaming replication mode: primary for writes, replica for analytical queries. If primary fails — failover through Patroni or PgBouncer with automatic switching. RPO (Recovery Point Objective) with proper synchronous_commit configuration — zero.







