Crypto Bot Monitoring Setup
A trading or arbitrage bot — this is a stateful process that runs 24/7 with real money. Without monitoring the first failure turns into a loss you discover in the morning. A good monitoring system catches the problem before it becomes expensive.
What We Monitor
Process health: bot is alive, heartbeat comes regularly. The most basic level — /healthz endpoint, checked every minute.
On-chain state: wallet balances, no transactions pending > 10 minutes (stuck nonce), gas price within allowed range.
Exchange state: exchange API responds, WebSocket connected, open orders are current (no orders older than N hours that aren't executing).
P&L metrics: realized and unrealized P&L, daily drawdown. Alert if drawdown > X% of capital.
Monitoring Stack
Prometheus + Grafana — standard for production bots. Bot exports metrics:
import { Counter, Gauge, Registry } from "prom-client";
const register = new Registry();
const tradesTotal = new Counter({
name: "bot_trades_total",
help: "Total number of trades executed",
labelNames: ["side", "symbol"],
registers: [register],
});
const walletBalance = new Gauge({
name: "bot_wallet_balance_usd",
help: "Current wallet balance in USD",
registers: [register],
});
const lastHeartbeat = new Gauge({
name: "bot_last_heartbeat_timestamp",
help: "Unix timestamp of last successful loop iteration",
registers: [register],
});
// In main bot loop:
lastHeartbeat.setToCurrentTime();
Grafana dashboard with key panels: balances over time, trades per hour, P&L over time, exchange request latency.
Alertmanager for notifications:
# alerting-rules.yml
groups:
- name: bot
rules:
- alert: BotHeartbeatMissed
expr: time() - bot_last_heartbeat_timestamp > 300
for: 2m
annotations:
summary: "Bot heartbeat missed for 5+ minutes"
- alert: DrawdownExceeded
expr: bot_daily_drawdown_pct > 5
annotations:
summary: "Daily drawdown exceeded 5%"
- alert: PendingTransactionStuck
expr: bot_pending_tx_age_seconds > 600
annotations:
summary: "Transaction stuck in pending for 10+ minutes"
Telegram Notifications
For immediate alerts Telegram Bot API is simplest:
async function sendAlert(message: string, level: "info" | "warn" | "critical") {
const emoji = { info: "ℹ️", warn: "⚠️", critical: "🚨" }[level];
await fetch(`https://api.telegram.org/bot${BOT_TOKEN}/sendMessage`, {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({
chat_id: ALERT_CHAT_ID,
text: `${emoji} *Bot Alert*\n\n${message}`,
parse_mode: "Markdown",
}),
});
}
Logging
Structured logs (JSON) with mandatory fields: timestamp, level, event, data. For aggregation — Loki (with Grafana) or Datadog:
import pino from "pino";
const logger = pino({ level: "info" });
logger.info({ event: "trade_executed", symbol: "ETHUSDT", side: "buy", amount: 0.5, price: 3400 });
logger.error({ event: "api_error", exchange: "binance", error: err.message, retryIn: 5000 });
Dead Man's Switch
For critical bots — external watchdog. If bot doesn't "knock" in N minutes — automatically restart or send SMS. Simplest implementation via healthchecks.io or Cronitor: bot sends GET request every 5 minutes, service raises alert on missing ping.
Basic monitoring setup with Prometheus, Grafana, and Telegram alerts: 1 working day. Including alerting rules configuration specific to particular bot.







