Temporal Fusion Transformer AI Model for Markets

We design and deploy artificial intelligence systems: from prototype to production-ready solutions. Our team combines expertise in machine learning, data engineering and MLOps to make AI work not in the lab, but in real business.
Showing 1 of 1 servicesAll 1566 services
Temporal Fusion Transformer AI Model for Markets
Complex
~5 business days
FAQ
AI Development Areas
AI Solution Development Stages
Latest works
  • image_web-applications_feedme_466_0.webp
    Development of a web application for FEEDME
    1161
  • image_ecommerce_furnoro_435_0.webp
    Development of an online store for the company FURNORO
    1041
  • image_logo-advance_0.png
    B2B Advance company logo design
    561
  • image_crm_enviok_479_0.webp
    Development of a web application for Enviok
    823
  • image_logo-aider_0.jpg
    AIDER company logo development
    762
  • image_crm_chasseurs_493_0.webp
    CRM development for Chasseurs
    848

Development of AI-based Temporal Fusion Transformer Model for Markets

Temporal Fusion Transformer (TFT) — an architecture developed at Google Brain specifically for time series forecasting tasks with heterogeneous input data. Unlike vanilla Transformer, TFT explicitly processes three types of variables: static (unchanging over time), known future (known in advance) and unknown (observed only until the forecast moment).

What Makes TFT Special for Finance

Three categories of input variables:

Type Examples for market Processing
Static covariates Ticker, sector, market cap Static embeddings
Known future Earnings dates, FOMC dates, holidays Future encoder
Past observed Returns, volume, VIX, RSI Past encoder

This is fundamentally important: knowing that the Federal Reserve meeting will occur in 5 days, the model should account for this when forecasting right now. TFT does this explicitly.

Variable Selection Network (VSN): Learnable weights for each input variable. Allows automatic filtering of irrelevant features and provides interpretability — which variables are actually important for the forecast.

Gated Residual Network (GRN): Nonlinear processing with a gate mechanism controlling how much nonlinear transformation is applied (gate = 0: pass-through, gate = 1: full nonlinear).

Complete TFT Architecture

Static covariates → Static Covariate Encoders
                         ↓
Past observed → LSTM encoder ─────────────┐
                                           ├→ Multi-head Attention → GRN → Quantile Output
Known future → LSTM decoder ──────────────┘

Inside attention: temporal self-attention, where each forecast step can "look" at relevant history.

Implementation for Market Data

from pytorch_forecasting import TemporalFusionTransformer, TimeSeriesDataSet
from pytorch_forecasting.metrics import QuantileLoss

data = prepare_market_dataframe(
    tickers=['AAPL', 'MSFT', ...],  # 100+ instruments
    start='2015-01-01'
)

training = TimeSeriesDataSet(
    data[data.date < '2022-01-01'],
    time_idx="time_idx",
    target="forward_5d_return",
    group_ids=["ticker"],
    max_encoder_length=126,     # 6 months history
    max_prediction_length=5,    # 5 days forecast
    static_categoricals=["sector", "country"],
    static_reals=["log_market_cap", "beta"],
    time_varying_known_reals=["days_to_earnings", "fomc_flag", "vix"],
    time_varying_unknown_reals=[
        "return", "volume_ratio", "rsi", "atr_normalized",
        "momentum_12_1", "short_interest_ratio"
    ],
)

tft = TemporalFusionTransformer.from_dataset(
    training,
    learning_rate=0.001,
    hidden_size=160,
    attention_head_size=4,
    dropout=0.1,
    hidden_continuous_size=64,
    loss=QuantileLoss(quantiles=[0.1, 0.25, 0.5, 0.75, 0.9])
)

Training and Hyperparameters

Key hyperparameters:

  • hidden_size: 64-256 (main model capacity)
  • attention_head_size: 1-4
  • max_encoder_length: 60-252 (1 quarter — 1 year)
  • dropout: 0.05-0.3

Learning rate finding:

res = trainer.tuner.lr_find(
    tft, train_dataloaders=train_dataloader,
    val_dataloaders=val_dataloader,
    max_lr=0.1
)
optimal_lr = res.suggestion()

Early stopping:

from pytorch_lightning.callbacks import EarlyStopping
early_stop_callback = EarlyStopping(
    monitor="val_loss", patience=10, mode="min"
)

Quantile Forecasts and Applications

TFT natively outputs quantile forecasts (p10, p25, p50, p75, p90). This is valuable for:

Risk-based position sizing:

point_forecast = forecasts['p50']
uncertainty = forecasts['p90'] - forecasts['p10']
position_size = base_size × (1 / (uncertainty / expected_return))

Asymmetric return profiles: If p90 − p50 >> p50 − p10 → right-skewed distribution → upside potential exceeds risk.

Interpretability: Variable Importance

raw_predictions, x = tft.predict(val_dataloader, mode="raw", return_x=True)
interpretation = tft.interpret_output(raw_predictions, reduction="sum")
fig = tft.plot_interpretation(interpretation)

Example result: Variable importance shows that momentum_12_1 (0.22), vix (0.18) and days_to_earnings (0.15) are main predictors. short_interest_ratio (0.04) — insignificant.

Attention pattern visualization: model pays maximum attention to points 5 and 20 days before forecast — corresponds to weekly and monthly momentum effect.

Benchmark Against Other Methods

On M5 competition (Walmart demand forecasting, 2020):

  • TFT: RMSSE 0.1127 (top-10%)
  • LightGBM: 0.1152
  • DeepAR: 0.1189
  • Prophet: 0.1402

TFT advantage is especially pronounced with known future covariates and static features.

Timeline: TFT baseline for 50+ instruments — 4-5 weeks. Full system with earnings calendar, macro covariates and portfolio construction — 3-4 months.