Setting up Bitrix24 On-Premise clustering

Our company is engaged in the development, support and maintenance of Bitrix and Bitrix24 solutions of any complexity. From simple one-page sites to complex online stores, CRM systems with 1C and telephony integration. The experience of developers is confirmed by certificates from the vendor.
Our competencies:
Development stages

Configuring Bitrix24 On-Premise Clustering

When a single server is no longer sufficient — which typically happens at 150–200 concurrent users or when the database grows beyond 50 GB — horizontal scaling becomes necessary. Bitrix24 Enterprise On-Premise supports clustering out of the box, but a real-world setup requires understanding the architecture and making several non-trivial decisions.

Bitrix24 Cluster Architecture

A typical production cluster consists of the following components:

[Load Balancer: nginx/HAProxy]
         |
    ┌────┴────┐
  [Web 1]  [Web 2]     ← Application servers (PHP/nginx)
    └────┬────┘
         |
    [Shared Storage: NFS/GlusterFS]  ← Shared file storage
         |
    ┌────┴────┐
  [DB Master] ← [DB Replica]         ← MySQL/MariaDB replication
         |
    [Redis Sentinel/Cluster]         ← Cache and sessions

Without shared file storage the cluster cannot function: if a user uploaded a file to Web 1 and the next request is served by Web 2, the file appears to be missing. NFS is the simplest option; GlusterFS provides fault tolerance.

Load Balancer Configuration

nginx as a load balancer with sticky sessions (requests from the same user are routed to the same backend):

upstream bitrix_backends {
    ip_hash;  # sticky sessions based on client IP
    server web1.internal:80 weight=1;
    server web2.internal:80 weight=1;
    keepalive 32;
}

server {
    listen 443 ssl;
    server_name portal.company.com;

    location / {
        proxy_pass http://bitrix_backends;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header Host $host;
    }
}

ip_hash is the simplest sticky session mechanism. Its drawback: when a client's IP changes (mobile users), the session breaks. A more reliable option is sticky cookies via nginx_sticky_module or HAProxy with cookie persistence.

MySQL Replication Configuration

Master-Slave replication for read queries:

-- On the master: create a replication user
CREATE USER 'replicator'@'db-replica' IDENTIFIED BY 'strong_password';
GRANT REPLICATION SLAVE ON *.* TO 'replicator'@'db-replica';

-- In master my.cnf
[mysqld]
server-id = 1
log_bin = /var/log/mysql/mysql-bin.log
binlog_do_db = bitrix24

-- In replica my.cnf
[mysqld]
server-id = 2
relay_log = /var/log/mysql/mysql-relay-bin.log
read_only = 1

Bitrix24 must be explicitly configured to route read queries to the replica. Set this in /bitrix/.settings.php:

'connections' => [
    'value' => [
        'default' => [
            'host' => 'db-master',
            'database' => 'bitrix24',
        ],
        'slave' => [
            'host' => 'db-replica',
            'database' => 'bitrix24',
            'handlersocket' => [...],
        ],
    ],
],

Redis Cluster for Sessions and Cache

User sessions must be stored in a shared Redis instance, not on each web node's local disk:

// /bitrix/.settings.php — Redis configuration
'cache' => [
    'value' => [
        'type' => [
            'class_name' => '\\Bitrix\\Main\\Data\\CacheEngineRedis',
            'extension'  => 'redis',
        ],
        'redis' => [
            'host' => 'redis-sentinel',
            'port' => 26379,
        ],
    ],
],
'session' => [
    'value' => [
        'mode' => 'redis',
        'redis' => [
            'host' => 'redis-sentinel',
            'port' => 26379,
        ],
    ],
],

Use Redis Sentinel instead of a standalone Redis instance to enable automatic failover when the master goes down.

Cluster Monitoring

Metric Tool Alert Threshold
MySQL replication lag Prometheus + mysqld_exporter > 30 seconds
RAM usage on web nodes node_exporter + Grafana > 85%
PHP-FPM queue php-fpm status backlog > 10
NFS disk lag iostat await > 20ms
Redis hit rate redis-exporter < 80%

A cluster without monitoring is a cluster that will fail on a Friday evening — and you'll hear about it from users rather than from an alerting system.

Planned Maintenance Without Downtime

To update the cluster with zero downtime: remove nodes from the load balancer one at a time, update them, verify, and return them to the pool. Database updates are more complex and require a maintenance window (typically outside business hours). Document the procedure in a runbook and rehearse it on a staging environment.