Temporal Fusion Transformer AI Model for Markets

We design and deploy artificial intelligence systems: from prototype to production-ready solutions. Our team combines expertise in machine learning, data engineering and MLOps to make AI work not in the lab, but in real business.

8+Years of workmore info 900+Completed projectsmore info 100+In house employeesmore info 19+Partnersmore info

Offered services

Showing 1 of 1 servicesAll 1566 services

Complex

~5 business days

FAQ

AI Development Areas

Discuss your AI project

Free consultation — we'll show you how AI can solve your challenge

Get a quote

We'll estimate the budget and timeline for your AI project

AI Solution Development Stages

Latest works

Development of a web application for FEEDME
1170
Development of an online store for the company FURNORO
1094
B2B Advance company logo design
563
Development of a web application for Enviok
830
AIDER company logo development
763
CRM development for Chasseurs
879

Show more works

Development of AI-based Temporal Fusion Transformer Model for Markets

Temporal Fusion Transformer (TFT) — an architecture developed at Google Brain specifically for time series forecasting tasks with heterogeneous input data. Unlike vanilla Transformer, TFT explicitly processes three types of variables: static (unchanging over time), known future (known in advance) and unknown (observed only until the forecast moment).

What Makes TFT Special for Finance

Three categories of input variables:

Type	Examples for market	Processing
Static covariates	Ticker, sector, market cap	Static embeddings
Known future	Earnings dates, FOMC dates, holidays	Future encoder
Past observed	Returns, volume, VIX, RSI	Past encoder

This is fundamentally important: knowing that the Federal Reserve meeting will occur in 5 days, the model should account for this when forecasting right now. TFT does this explicitly.

Variable Selection Network (VSN): Learnable weights for each input variable. Allows automatic filtering of irrelevant features and provides interpretability — which variables are actually important for the forecast.

Gated Residual Network (GRN): Nonlinear processing with a gate mechanism controlling how much nonlinear transformation is applied (gate = 0: pass-through, gate = 1: full nonlinear).

Complete TFT Architecture

Static covariates → Static Covariate Encoders
                         ↓
Past observed → LSTM encoder ─────────────┐
                                           ├→ Multi-head Attention → GRN → Quantile Output
Known future → LSTM decoder ──────────────┘

Inside attention: temporal self-attention, where each forecast step can "look" at relevant history.

Implementation for Market Data

from pytorch_forecasting import TemporalFusionTransformer, TimeSeriesDataSet
from pytorch_forecasting.metrics import QuantileLoss

data = prepare_market_dataframe(
    tickers=['AAPL', 'MSFT', ...],  # 100+ instruments
    start='2015-01-01'
)

training = TimeSeriesDataSet(
    data[data.date < '2022-01-01'],
    time_idx="time_idx",
    target="forward_5d_return",
    group_ids=["ticker"],
    max_encoder_length=126,     # 6 months history
    max_prediction_length=5,    # 5 days forecast
    static_categoricals=["sector", "country"],
    static_reals=["log_market_cap", "beta"],
    time_varying_known_reals=["days_to_earnings", "fomc_flag", "vix"],
    time_varying_unknown_reals=[
        "return", "volume_ratio", "rsi", "atr_normalized",
        "momentum_12_1", "short_interest_ratio"
    ],
)

tft = TemporalFusionTransformer.from_dataset(
    training,
    learning_rate=0.001,
    hidden_size=160,
    attention_head_size=4,
    dropout=0.1,
    hidden_continuous_size=64,
    loss=QuantileLoss(quantiles=[0.1, 0.25, 0.5, 0.75, 0.9])
)

Training and Hyperparameters

Key hyperparameters:

hidden_size: 64-256 (main model capacity)
attention_head_size: 1-4
max_encoder_length: 60-252 (1 quarter — 1 year)
dropout: 0.05-0.3

Learning rate finding:

res = trainer.tuner.lr_find(
    tft, train_dataloaders=train_dataloader,
    val_dataloaders=val_dataloader,
    max_lr=0.1
)
optimal_lr = res.suggestion()

Early stopping:

from pytorch_lightning.callbacks import EarlyStopping
early_stop_callback = EarlyStopping(
    monitor="val_loss", patience=10, mode="min"
)

Quantile Forecasts and Applications

TFT natively outputs quantile forecasts (p10, p25, p50, p75, p90). This is valuable for:

Risk-based position sizing:

point_forecast = forecasts['p50']
uncertainty = forecasts['p90'] - forecasts['p10']
position_size = base_size × (1 / (uncertainty / expected_return))

Asymmetric return profiles: If p90 − p50 >> p50 − p10 → right-skewed distribution → upside potential exceeds risk.

Interpretability: Variable Importance

raw_predictions, x = tft.predict(val_dataloader, mode="raw", return_x=True)
interpretation = tft.interpret_output(raw_predictions, reduction="sum")
fig = tft.plot_interpretation(interpretation)

Example result: Variable importance shows that momentum_12_1 (0.22), vix (0.18) and days_to_earnings (0.15) are main predictors. short_interest_ratio (0.04) — insignificant.

Attention pattern visualization: model pays maximum attention to points 5 and 20 days before forecast — corresponds to weekly and monthly momentum effect.

Benchmark Against Other Methods

On M5 competition (Walmart demand forecasting, 2020):

TFT: RMSSE 0.1127 (top-10%)
LightGBM: 0.1152
DeepAR: 0.1189
Prophet: 0.1402

TFT advantage is especially pronounced with known future covariates and static features.

Timeline: TFT baseline for 50+ instruments — 4-5 weeks. Full system with earnings calendar, macro covariates and portfolio construction — 3-4 months.