Developing a retry system for Bitrix24 integrations

Our company is engaged in the development, support and maintenance of Bitrix and Bitrix24 solutions of any complexity. From simple one-page sites to complex online stores, CRM systems with 1C and telephony integration. The experience of developers is confirmed by certificates from the vendor.

8+Years of workmore info 900+Completed projectsmore info 100+In house employeesmore info 19+Partnersmore info

Showing 1 of 1 servicesAll 1626 services

Medium

~1-2 weeks

FAQ

Our competencies:

Free consultation

Book a free consultation if you have any questions. A dedicated specialist will advise you.

Cost calculation

If you know what exactly you need to develop, or you already have a ready-made technical task.

Development stages

Developing a Retry System for Bitrix24 Integrations

Integrations fail. An external API returns 503, the network hiccups, a banking service goes offline for maintenance. The question isn't whether the integration will fail, but what happens after it does. A retry system is automatic recovery: didn't work now—we'll try again in a minute, an hour, a day. If after N attempts it still fails—notify a human.

Principles That Cannot Be Violated

Idempotency. A retry attempt must produce the same result as the first, without side effects. If an operation creates a payment order in the bank—the repeated call mustn't create a second one. For this, use idempotency_key (unique operation UUID)—the bank or external system ignores duplicates with the same key.

Exponential backoff. First attempt—immediately. Second—after 1 minute. Third—after 4 minutes. Fourth—after 16 minutes. This prevents a storm of retry requests when a overloaded service recovers.

Jitter. Add a random component to the delay (±20%). If thousand operations fail simultaneously and all retry with identical delays—we get another storm. Jitter spreads the peak.

Maximum retry attempts. After N attempts (usually 5–10), the operation is marked as permanently failed. Then—manual intervention.

Queue Architecture with Retry

For cloud Bitrix24 (no server access), retry is implemented via:

Bitrix agents (\CAgent::AddAgent)—for simple scenarios with few operations
External service (separate PHP/Node.js server) with Redis Queue or RabbitMQ

For on-premise Bitrix24—agents or queue based on infoblock/HL-block.

Task structure in queue:

{
  "id": "uuid-v4",
  "type": "bank_payment_create",
  "payload": {
    "deal_id": 1234,
    "amount": 50000,
    "idempotency_key": "pay-uuid-v4"
  },
  "attempts": 2,
  "max_attempts": 5,
  "next_run_at": "2025-03-13T15:30:00Z",
  "status": "pending",
  "last_error": "Connection timeout"
}

Task table: integration_jobs in PostgreSQL or MySQL. Index on (status, next_run_at)—the worker picks tasks ready for execution.

Worker Implementation

Worker is a separate process, launched by cron every minute (or daemon via Supervisor). Algorithm:

// Grab a batch of tasks for execution (with FOR UPDATE SKIP LOCKED lock)
$jobs = JobRepository::getPending(limit: 10);

foreach ($jobs as $job) {
    try {
        $job->markRunning();
        $handler = HandlerFactory::create($job->type);
        $handler->execute($job->payload);
        $job->markSuccess();
    } catch (RetryableException $e) {
        // Temporary error—schedule retry
        $delay = $this->calcBackoff($job->attempts); // 2^attempts * 60 seconds
        $delay += rand(0, (int)($delay * 0.2)); // jitter
        $job->scheduleRetry($delay, $e->getMessage());
    } catch (FatalException $e) {
        // Business error—don't retry, notify
        $job->markFailed($e->getMessage());
        $this->notify($job);
    }
}

FOR UPDATE SKIP LOCKED—mandatory with multiple workers. Without it, two workers might take one task and execute it twice.

Exception Classification

Correctly divide errors into "retry" and "don't retry":

Error Type	Class	Retry
HTTP 429 (Rate Limit)	`RetryableException`	Yes, long delay
HTTP 503 / 502 (Service Unavailable)	`RetryableException`	Yes
Network timeout	`RetryableException`	Yes
HTTP 401 (Unauthorized)	Special: refresh token, then retry	Yes, once
HTTP 400 (Bad Request)	`FatalException`	No
HTTP 422 (Validation Error)	`FatalException`	No
Duplicate operation (idempotency hit)	Success	—

Dead Letter Queue

Tasks that exhaust retry limits move to Dead Letter Queue (DLQ)—separate table or queue. DLQ isn't a trash bin, it's a list of things requiring attention. Interface for DLQ:

View failed tasks with complete attempt history
Manual retry after fixing the error cause
Edit payload (if data needs correction before retry)
Batch retry of task groups

Bitrix24 Integration

On permanent error or threshold exceeded—notify responsible person in Bitrix24:

\CIMNotify::Add([
    'MESSAGE_TYPE' => IM_MESSAGE_SYSTEM,
    'TO_USER_ID' => $responsibleUserId,
    'MESSAGE' => "Integration: operation #{$job->id} failed after {$job->attempts} attempts. " .
                 "Error: {$job->last_error}. Manual intervention required.",
]);

Or via REST API im.notify.system.add if notification is sent from external service.

Queue Monitoring

Metric	What It Shows
`pending_jobs_count`	Current load, size of unexecuted tasks
`failed_jobs_count`	Accumulated error debt
`avg_retry_count`	Average attempts until success
`p99_execution_time`	Worker performance
`dlq_size_delta`	DLQ growth or shrinkage

Development Stages

Stage	Content	Timeline
Design	Data schema, error classification, backoff strategy	2–3 days
Task table and repository	CRUD, locks, indexes	2–3 days
Worker	Core logic, exception handling	3–5 days
DLQ and interface	Viewing, manual retry	3–5 days
Notifications	Bitrix24 IM integration	1–2 days
Monitoring	Metrics, dashboard	2–3 days

A retry system is a mandatory component of any production integration. Without it, every external service failure turns into lost operations and manual recovery work.

1C Bitrix presentation 1C Bitrix24 presentation 1C Enterprise presentation