Implementing AI Image Generation (Kandinsky) in a Mobile App
Kandinsky is a Russian model from Sber AI (KANDINSKY 3.1 at time of writing). Main practical advantage for products serving Russian-speaking audiences: native understanding of Russian prompts without translation. "Sunset over birch forest" on Kandinsky works the same way English prompts work on Western models — no quality loss from translation.
Available integration methods
Fusionbrain API (api.fusionbrain.ai) — official API from Kandinsky developers. Free tier, REST, relatively stable. Most integrations use this.
Replicate — Kandinsky 2.2 and 3 available as community models. Stable API, but may be older model version.
HuggingFace Inference API — kandinsky-community/kandinsky-3. Sufficient for prototypes.
For production — Fusionbrain API with custom backend proxy.
Fusionbrain API: protocol specifics
API uses two-stage model: create task first, then poll status.
class KandinskyService(private val apiKey: String, private val secretKey: String) {
// Step 1: get model ID
suspend fun getModelId(): String {
val response = httpClient.get("https://api-key.fusionbrain.ai/key/api/v1/models") {
header("X-Key", "Key $apiKey")
header("X-Secret", "Secret $secretKey")
}
val models = response.body<List<FusionBrainModel>>()
return models.first { it.name == "Kandinsky" }.id.toString()
}
// Step 2: create generation task
suspend fun createTask(modelId: String, prompt: String, width: Int = 1024, height: Int = 1024): String {
val params = JSONObject().apply {
put("type", "GENERATE")
put("numImages", 1)
put("width", width)
put("height", height)
put("generateParams", JSONObject().apply {
put("query", prompt)
})
}
// Multipart request
val requestBody = MultipartBody.Builder()
.setType(MultipartBody.FORM)
.addFormDataPart("model_id", modelId)
.addFormDataPart(
"params",
"params.json",
params.toString().toRequestBody("application/json".toMediaType())
)
.build()
val response = OkHttpClient().newCall(
Request.Builder()
.url("https://api-key.fusionbrain.ai/key/api/v1/text2image/run")
.header("X-Key", "Key $apiKey")
.header("X-Secret", "Secret $secretKey")
.post(requestBody)
.build()
).execute()
return JSONObject(response.body!!.string()).getString("uuid")
}
// Step 3: polling
suspend fun pollResult(taskUuid: String): Bitmap? {
repeat(30) {
delay(3000)
val response = OkHttpClient().newCall(
Request.Builder()
.url("https://api-key.fusionbrain.ai/key/api/v1/text2image/status/$taskUuid")
.header("X-Key", "Key $apiKey")
.header("X-Secret", "Secret $secretKey")
.get()
.build()
).execute()
val json = JSONObject(response.body!!.string())
if (json.getString("status") == "DONE") {
val images = json.getJSONArray("images")
val base64 = images.getString(0)
val bytes = Base64.decode(base64, Base64.DEFAULT)
return BitmapFactory.decodeByteArray(bytes, 0, bytes.size)
}
}
return null
}
}
Response comes as base64 string in images field — not URL. Decode to Bitmap / UIImage directly on client. Save to internal storage if history is needed.
Generation parameters
Kandinsky supports:
-
width/height: 256 to 1024, multiples of 64. Optimal: 768x768 or 1024x1024 -
style:DEFAULT,ANIME,PORTRAIT,NATURE,REALISTIC(availability depends on model version) -
negativePromptDecoder: negative prompt — list of what should not be present
val params = JSONObject().apply {
put("type", "GENERATE")
put("numImages", 1)
put("width", 768)
put("height", 1024)
put("style", "PORTRAIT")
put("generateParams", JSONObject().apply {
put("query", "portrait of young woman in Russian traditional costume, detailed, realism")
})
put("negativePromptDecoder", "blurry, artifacts, deformation, text, watermark")
}
Russian prompt vs English
Kandinsky understands Russian without quality degradation. But in practice, for technical descriptions (architecture, mechanisms), English prompts yield more precise results — model trained on mixed corpus, technical terms better represented in English. For artistic, landscape, portrait scenarios — Russian works excellently.
For maximum quality — prompt in both languages (if UI allows), Kandinsky processes both.
Integration via Replicate (alternative)
let replicateBody: [String: Any] = [
"version": "ai-forever/kandinsky-3:...",
"input": [
"prompt": prompt,
"negative_prompt": negativePrompt,
"num_steps": 50,
"guidance_scale": 4.0,
"scheduler": "DDPMScheduler",
"width": 1024,
"height": 1024
]
]
Replicate provides more predictable response time (8–20 sec) than Fusionbrain during peak hours.
Common mistakes
FAIL status without explanation from Fusionbrain — usually prompt violates content policy or too short (under 3 words). Minimum prompt for stable work — 5–10 words of description.
Decoding base64 on main thread — UI block. Always background thread: DispatchQueue.global().async (iOS) or Dispatchers.Default (Android).
Timeline
Basic Fusionbrain API integration with UI — 3–4 days. Styles, generation history, gallery saving, content policy error handling — 8–12 days.







