Implementing Custom A/B Testing Platform on Website
Off-the-shelf tools (Optimizely, VWO, Google Optimize) cost thousands per month, inject third-party JS scripts into critical path, provide limited access to raw data, and don't integrate with internal analytics. A custom platform solves all these issues at the cost of 2–3 weeks of development.
Architecture
┌─────────────────────────────────────────────────────────────┐
│ Web App │
│ ┌──────────────┐ ┌──────────────┐ ┌────────────────┐ │
│ │ Assignment │ │ Tracking │ │ Admin UI │ │
│ │ Service │ │ (events) │ │ (results) │ │
│ └──────┬───────┘ └──────┬───────┘ └────────────────┘ │
└─────────┼───────────────────┼─────────────────────────────────┘
↓ ↓
┌──────────────┐ ┌──────────────┐
│ Experiments │ │ Event Store │
│ DB │ │ (ClickHouse) │
│ (PostgreSQL) │ │ │
└──────────────┘ └──────────────┘
Database: Experiment Schema
CREATE TABLE experiments (
id SERIAL PRIMARY KEY,
slug VARCHAR(100) UNIQUE NOT NULL,
name VARCHAR(255) NOT NULL,
description TEXT,
status VARCHAR(20) DEFAULT 'draft', -- draft, running, paused, completed
traffic SMALLINT DEFAULT 100, -- % traffic participating in experiment
start_at TIMESTAMPTZ,
end_at TIMESTAMPTZ,
created_at TIMESTAMPTZ DEFAULT NOW(),
updated_at TIMESTAMPTZ DEFAULT NOW()
);
CREATE TABLE experiment_variants (
id SERIAL PRIMARY KEY,
experiment_id INTEGER REFERENCES experiments(id),
slug VARCHAR(100) NOT NULL, -- 'control', 'treatment_a', 'treatment_b'
name VARCHAR(255),
weight SMALLINT DEFAULT 50, -- % traffic within experiment
config JSONB DEFAULT '{}', -- custom variant parameters
UNIQUE(experiment_id, slug)
);
CREATE TABLE user_assignments (
user_id BIGINT NOT NULL,
experiment_id INTEGER REFERENCES experiments(id),
variant_id INTEGER REFERENCES experiment_variants(id),
assigned_at TIMESTAMPTZ DEFAULT NOW(),
PRIMARY KEY (user_id, experiment_id)
);
-- Partition by experiment_id for fast access with large volumes
CREATE INDEX ON user_assignments (experiment_id, variant_id);
Assignment Service—Deterministic Distribution
Key requirement: user always goes to same variant for same experiment. Solution—hash-based assignment: hash(user_id + experiment_slug) % 100.
class ExperimentAssignmentService
{
private array $experimentsCache = [];
public function getVariant(int $userId, string $experimentSlug): ?string
{
$experiment = $this->getActiveExperiment($experimentSlug);
if (!$experiment) return null;
// Check existing assignment
$existing = $this->assignmentRepo->find($userId, $experiment['id']);
if ($existing) {
return $existing['variant_slug'];
}
// Check if user falls into experiment traffic
$trafficBucket = $this->hashToBucket($userId, $experimentSlug . '_traffic');
if ($trafficBucket >= $experiment['traffic']) {
return null; // user not in experiment
}
// Select variant
$variantBucket = $this->hashToBucket($userId, $experimentSlug);
$variant = $this->selectVariant($experiment['variants'], $variantBucket);
// Store assignment
$this->assignmentRepo->assign($userId, $experiment['id'], $variant['id']);
// Track assignment event
$this->eventTracker->track($userId, 'experiment.assigned', [
'experiment' => $experimentSlug,
'variant' => $variant['slug'],
]);
return $variant['slug'];
}
private function hashToBucket(int $userId, string $salt): int
{
// MurmurHash3 via PHP extension or implementation
$hash = crc32($userId . '_' . $salt);
return abs($hash) % 100;
}
private function selectVariant(array $variants, int $bucket): array
{
// Variants with weights [50, 30, 20] → thresholds [50, 80, 100]
$cumulative = 0;
foreach ($variants as $variant) {
$cumulative += $variant['weight'];
if ($bucket < $cumulative) {
return $variant;
}
}
return end($variants);
}
}
Event Tracking
Send all significant user actions with experiment context:
class ExperimentEventTracker
{
public function track(int $userId, string $event, array $properties = []): void
{
// Add active experiments context
$activeVariants = $this->assignmentRepo->getUserVariants($userId);
$payload = [
'event' => $event,
'user_id' => $userId,
'session_id' => session_id(),
'occurred_at' => now()->toIso8601String(),
'experiments' => $activeVariants, // ['checkout-button-color' => 'blue', ...]
'properties' => $properties,
];
// Queue for async write to ClickHouse
$this->queue->push(new TrackExperimentEvent($payload));
}
}
ClickHouse table for events:
CREATE TABLE experiment_events (
event_date Date DEFAULT toDate(occurred_at),
occurred_at DateTime64(3, 'UTC'),
user_id UInt64,
session_id String,
event LowCardinality(String),
experiment LowCardinality(String),
variant LowCardinality(String),
properties String -- JSON
) ENGINE = MergeTree()
PARTITION BY (event_date, experiment)
ORDER BY (experiment, variant, user_id, occurred_at)
TTL event_date + INTERVAL 90 DAY;
Computing Results—Z-Test
import numpy as np
from scipy import stats
from dataclasses import dataclass
@dataclass
class VariantStats:
name: str
users: int
conversions: int
@property
def conversion_rate(self) -> float:
return self.conversions / self.users if self.users > 0 else 0
def calculate_significance(control: VariantStats, treatment: VariantStats) -> dict:
"""Two-tailed z-test for proportions"""
p1 = control.conversion_rate
p2 = treatment.conversion_rate
n1 = control.users
n2 = treatment.users
# Pooled proportion
p_pool = (control.conversions + treatment.conversions) / (n1 + n2)
se = np.sqrt(p_pool * (1 - p_pool) * (1/n1 + 1/n2))
if se == 0:
return {"error": "Insufficient data"}
z_score = (p2 - p1) / se
p_value = 2 * (1 - stats.norm.cdf(abs(z_score)))
# Confidence interval for difference
diff = p2 - p1
se_diff = np.sqrt(p1*(1-p1)/n1 + p2*(1-p2)/n2)
ci_lower = diff - 1.96 * se_diff
ci_upper = diff + 1.96 * se_diff
return {
"control_rate": round(p1 * 100, 2),
"treatment_rate": round(p2 * 100, 2),
"relative_lift": round((p2 - p1) / p1 * 100, 2) if p1 > 0 else None,
"z_score": round(z_score, 4),
"p_value": round(p_value, 6),
"significant": p_value < 0.05,
"confidence_95": [round(ci_lower * 100, 2), round(ci_upper * 100, 2)],
"required_sample_size": calculate_required_sample(p1),
}
def calculate_required_sample(baseline_rate: float, mde: float = 0.05,
power: float = 0.8, alpha: float = 0.05) -> int:
"""Minimum sample size to detect effect mde at given power"""
z_alpha = stats.norm.ppf(1 - alpha/2)
z_beta = stats.norm.ppf(power)
p2 = baseline_rate * (1 + mde)
p_bar = (baseline_rate + p2) / 2
n = (z_alpha * np.sqrt(2 * p_bar * (1-p_bar)) + z_beta * np.sqrt(baseline_rate*(1-baseline_rate) + p2*(1-p2)))**2 / (p2 - baseline_rate)**2
return int(np.ceil(n))
Feature Flags Integration
A/B testing and feature flags—adjacent concepts. Experiment variant can contain feature configuration:
// Variant 'treatment_a' has config: {"checkout_steps": 1, "show_trust_badges": true}
$variant = $experimentService->getVariant($userId, 'checkout-redesign');
$config = $experimentService->getVariantConfig('checkout-redesign', $variant);
$checkoutSteps = $config['checkout_steps'] ?? 3; // default for control group
$showTrustBadges = $config['show_trust_badges'] ?? false;
Protection from SRM (Sample Ratio Mismatch)
If ratio of users in groups differs significantly from expected—results are unreliable:
-- Check SRM for 'checkout-redesign' experiment
SELECT
v.slug,
COUNT(*) as assigned_users,
v.weight as expected_weight,
COUNT(*) * 100.0 / SUM(COUNT(*)) OVER () as actual_pct
FROM user_assignments ua
JOIN experiment_variants v ON v.id = ua.variant_id
JOIN experiments e ON e.id = ua.experiment_id
WHERE e.slug = 'checkout-redesign'
GROUP BY v.slug, v.weight;
-- Chi-square test for distribution uniformity
-- If p < 0.01—SRM, results questionable
Timeline
Days 1–2—database schema, Assignment Service with hash-based distribution, unit tests for determinism.
Days 3–4—event tracking, worker for writing to ClickHouse, integration with existing authentication.
Days 5–6—calculate results (z-test, confidence intervals), Admin UI for launching and monitoring experiments.
Day 7—SRM checks, documentation for product team, pilot first experiment.







