YandexGPT Integration into Mobile App

NOVASOLUTIONS.TECHNOLOGY is engaged in the development, support and maintenance of iOS, Android, PWA mobile applications. We have extensive experience and expertise in publishing mobile applications in popular markets like Google Play, App Store, Amazon, AppGallery and others.
Development and support of all types of mobile applications:
Information and entertainment mobile applications
News apps, games, reference guides, online catalogs, weather apps, fitness and health apps, travel apps, educational apps, social networks and messengers, quizzes, blogs and podcasts, forums, aggregators
E-commerce mobile applications
Online stores, B2B apps, marketplaces, online exchanges, cashback services, exchanges, dropshipping platforms, loyalty programs, food and goods delivery, payment systems.
Business process management mobile applications
CRM systems, ERP systems, project management, sales team tools, financial management, production management, logistics and delivery management, HR management, data monitoring systems
Electronic services mobile applications
Classified ads platforms, online schools, online cinemas, electronic service platforms, cashback platforms, video hosting, thematic portals, online booking and scheduling platforms, online trading platforms

These are just some of the types of mobile applications we work with, and each of them may have its own specific features and functionality, tailored to the specific needs and goals of the client.

Showing 1 of 1 servicesAll 1735 services
YandexGPT Integration into Mobile App
Medium
~3-5 business days
FAQ
Our competencies:
Development stages
Latest works
  • image_mobile-applications_feedme_467_0.webp
    Development of a mobile application for FEEDME
    756
  • image_mobile-applications_xoomer_471_0.webp
    Development of a mobile application for XOOMER
    624
  • image_mobile-applications_rhl_428_0.webp
    Development of a mobile application for RHL
    1052
  • image_mobile-applications_zippy_411_0.webp
    Development of a mobile application for ZIPPY
    947
  • image_mobile-applications_affhome_429_0.webp
    Development of a mobile application for Affhome
    862
  • image_mobile-applications_flavors_409_0.webp
    Development of a mobile application for the FLAVORS company
    445

YandexGPT Integration in Mobile Applications

YandexGPT is ideal where Russian-language semantics matter: customer support, form autofill, content generation. OpenAI models excel at English — YandexGPT provides advantage specifically for Russian-speaking audiences, especially with regional queries and domain-specific terminology.

The integration task looks simple at first: POST to https://llm.api.cloud.yandex.net/foundationModels/v1/completion, pass IAM token and text. In practice, it's a chain of non-trivial decisions.

Authorization: IAM Token vs API Key

First failure point is authorization. IAM token lives 12 hours, API key is permanent but less secure. You can't store a service key directly in mobile app: it'll be extracted from APK in 10 minutes via apktool. Correct scheme: mobile client authenticates with your backend, backend holds IAM token and proxies requests to Yandex Cloud.

// iOS: request through own proxy
struct YGPTRequest: Encodable {
    let prompt: String
    let maxTokens: Int
    let temperature: Double
}

func sendToYandexGPT(prompt: String) async throws -> String {
    let url = URL(string: "https://api.yourapp.com/ai/complete")!
    var request = URLRequest(url: url)
    request.httpMethod = "POST"
    request.setValue("Bearer \(authToken)", forHTTPHeaderField: "Authorization")
    request.httpBody = try JSONEncoder().encode(YGPTRequest(
        prompt: prompt,
        maxTokens: 500,
        temperature: 0.7
    ))
    let (data, _) = try await URLSession.shared.data(for: request)
    return try JSONDecoder().decode(CompletionResponse.self, from: data).text
}

On Android — similarly via Retrofit with OkHttp interceptor for token injection.

Streaming Generation (stream: true)

stream: true mode in Yandex Foundation Models API returns response in chunks — like ChatGPT. For mobile UX this matters: users see text as it generates, not waiting 3–5 seconds.

On iOS, handle Server-Sent Events via URLSessionDataDelegate:

class StreamingDelegate: NSObject, URLSessionDataDelegate {
    var onChunk: (String) -> Void
    var buffer = Data()

    func urlSession(_ session: URLSession,
                    dataTask: URLSessionDataTask,
                    didReceive data: Data) {
        buffer.append(data)
        // SSE parsing: look for "data:" lines
        guard let text = String(data: buffer, encoding: .utf8) else { return }
        let lines = text.components(separatedBy: "\n")
        for line in lines where line.hasPrefix("data: ") {
            let json = String(line.dropFirst(6))
            if let chunk = parseYGPTChunk(json) {
                DispatchQueue.main.async { self.onChunk(chunk) }
            }
        }
    }
}

On Android — OkHttp with EventSource (library okhttp-sse) or manual BufferedReader line-by-line parsing.

Models and Parameters

Yandex provides several options: yandexgpt-lite — fast and cheap, yandexgpt — full version, yandexgpt-32k — long contexts. For most mobile scenarios (chat hints, autofill), yandexgpt-lite suffices and is noticeably faster.

Model Context Response Speed Use Case
yandexgpt-lite 8k tokens ~1–2 sec Suggestions, summaries
yandexgpt 8k tokens ~3–5 sec Complex tasks
yandexgpt-32k 32k tokens ~8–15 sec Long documents

Parameter temperature from 0 to 1: 0.2–0.4 — deterministic answers (FAQ bot), 0.7–0.9 — creative text.

Caching and Limits

Yandex Cloud charges by tokens. Cache repeated queries on mobile client — typical pattern for FAQ or onboarding. Simple LRU cache of 100 entries in memory cuts costs on repeated sessions.

Yandex Foundation Models rate limit is 10 RPS per folder by default. Under peak load (many users simultaneously), backend queue needed, not direct calls from devices.

Implementation Process

Audit scenarios: where exactly is LLM needed — support, text generation, request classification. Choose model and mode (synchronous / stream). Develop proxy service on backend with IAM token management. Integrate into mobile app with UI for streaming generation. Test answer quality on real user queries, tune system prompt.

Timeline Guidelines

Basic integration through proxy without streaming — 2–3 days. Full chat interface with streaming generation, caching, and error handling — 5–8 days.