Setting up A/B testing in mobile app
A/B test — controlled experiment where one user segment sees variant A, another — variant B, and we measure which variant gives better result by target metric. In practice, most A/B tests in mobile apps are either set up incorrectly (no statistical significance, experiment stopped too early), or don't run at all — because there's no infrastructure.
Tool selection
| Tool | Suitable for | Minus |
|---|---|---|
| Firebase A/B Testing | Simple UI/text/parameters | Limited targeting flexibility |
| Amplitude Experiment | Product hypotheses with retention analysis | Paid, requires Amplitude Analytics |
| Statsig | Full cycle: flags, experiments, analysis | Requires setup |
| Growthbook | Open-source, self-hosted | Infrastructure costs |
For most mobile projects, Firebase A/B Testing — reasonable start. Integration through Remote Config, no additional SDKs.
Firebase A/B Testing: setup
Firebase A/B Testing built on Remote Config. First define parameter:
// Get value from Remote Config
let remoteConfig = RemoteConfig.remoteConfig()
remoteConfig.configSettings = RemoteConfigSettings()
remoteConfig.configSettings.minimumFetchInterval = 0 // in debug
remoteConfig.fetchAndActivate { status, error in
let ctaText = remoteConfig.configValue(forKey: "checkout_cta_text").stringValue
self.checkoutButton.setTitle(ctaText, for: .normal)
}
In Firebase Console → A/B Testing create experiment:
- Select
checkout_cta_textas Target Parameter - Control: "Proceed to checkout"
- Variant A: "Buy now"
- Target metric:
purchase(conversion event) - Participant percentage: 50%
- Minimum sample size: Firebase calculates automatically
Critical A/B test errors
Stopping test on first significant results — most common mistake. If watching p-value daily and stop test when p < 0.05 first time — probability of false positive significantly exceeds stated 5%. Must stop test when predetermined sample size reached.
One test — one metric. Can't optimize conversion rate and session length simultaneously with one test. If both grow — good, but target must be one.
Novelty effect. New design gives click spike first week simply because it's new. For behavioral tests minimum duration — 2 weeks. For retention tests — 4 weeks.
Statsig for complex experiments
When need more flexible segmentation (test only Moscow users with > 3 sessions):
// iOS Statsig SDK
import StatsigSDK
Statsig.initialize(sdkKey: "client-xxx") {
let experiment = Statsig.getExperiment("checkout_flow_v2")
let variant = experiment.getValue(forKey: "flow_type", defaultValue: "standard")
if variant == "simplified" {
self.showSimplifiedCheckout()
} else {
self.showStandardCheckout()
}
}
// Android
val experiment = Statsig.getExperiment("checkout_flow_v2")
val flowType = experiment.getString("flow_type", "standard")
Statsig supports Stratified Sampling — uniform user distribution across strata (platform, country, subscription plan). Without stratification, random distribution may create cohorts with different composition, distorting results.
Exposure logging
For correct analysis important to log fact of variant display — not just conversion:
Analytics.logEvent("experiment_exposure", parameters: [
"experiment_id": "checkout_cta_v2",
"variant": variantName,
"user_id": userId
])
This allows analyzing conversion only among users who actually saw experiment, not all participants.
What's included in the work
- Tool selection for tasks and stack (Firebase / Statsig / Amplitude Experiment)
- SDK integration and Remote Config / Feature Flags setup
- A/B layer implementation in code with correct variant handling
- Target metrics setup and conversion events
- Sample size and test duration configuration
- Exposure logging for analysis
Timeline
One A/B test on Firebase Remote Config: 1–2 days. Infrastructure for regular A/B testing (Statsig/Growthbook): 3–5 days. Cost calculated individually.







