Mobile App AI Chatbot Implementation

NOVASOLUTIONS.TECHNOLOGY is engaged in the development, support and maintenance of iOS, Android, PWA mobile applications. We have extensive experience and expertise in publishing mobile applications in popular markets like Google Play, App Store, Amazon, AppGallery and others.
Development and support of all types of mobile applications:
Information and entertainment mobile applications
News apps, games, reference guides, online catalogs, weather apps, fitness and health apps, travel apps, educational apps, social networks and messengers, quizzes, blogs and podcasts, forums, aggregators
E-commerce mobile applications
Online stores, B2B apps, marketplaces, online exchanges, cashback services, exchanges, dropshipping platforms, loyalty programs, food and goods delivery, payment systems.
Business process management mobile applications
CRM systems, ERP systems, project management, sales team tools, financial management, production management, logistics and delivery management, HR management, data monitoring systems
Electronic services mobile applications
Classified ads platforms, online schools, online cinemas, electronic service platforms, cashback platforms, video hosting, thematic portals, online booking and scheduling platforms, online trading platforms

These are just some of the types of mobile applications we work with, and each of them may have its own specific features and functionality, tailored to the specific needs and goals of the client.

Showing 1 of 1 servicesAll 1735 services
Mobile App AI Chatbot Implementation
Medium
~1-2 weeks
FAQ
Our competencies:
Development stages
Latest works
  • image_mobile-applications_feedme_467_0.webp
    Development of a mobile application for FEEDME
    756
  • image_mobile-applications_xoomer_471_0.webp
    Development of a mobile application for XOOMER
    624
  • image_mobile-applications_rhl_428_0.webp
    Development of a mobile application for RHL
    1052
  • image_mobile-applications_zippy_411_0.webp
    Development of a mobile application for ZIPPY
    947
  • image_mobile-applications_affhome_429_0.webp
    Development of a mobile application for Affhome
    862
  • image_mobile-applications_flavors_409_0.webp
    Development of a mobile application for the FLAVORS company
    445

AI Chatbot Implementation in Mobile Applications

Integrating GPT-4o or Claude into mobile chat isn't "plug SDK and done". Real complexity starts after the first working request: conversation context management, displaying streaming generation without UI jank, handling network during poor signal, storing chat history between sessions without data leaks.

Conversation Context Management

All LLMs are stateless. Each request to OpenAI, Anthropic, GigaChat, or YandexGPT sends full conversation history. This means: storage and context truncation is your job. With naive implementation, after 20 messages token cost grows 3–4x, and with 128k context you can wait 30+ seconds for response.

Practical solution — sliding window with summarization:

class ConversationManager {
    private var messages: [ChatMessage] = []
    private let maxMessages = 20
    private let summaryThreshold = 15

    func addMessage(_ message: ChatMessage) {
        messages.append(message)
        if messages.count > summaryThreshold {
            Task { await compressSummary() }
        }
    }

    private func compressSummary() async {
        // Take messages before threshold, summarize with separate LLM request
        let toCompress = Array(messages.prefix(10))
        let summary = try? await llmClient.summarize(messages: toCompress)
        if let summary {
            messages = [ChatMessage(role: .system, content: "Context: \(summary)")] +
                       Array(messages.suffix(10))
        }
    }
}

System prompt is separate. It should remain first message always. When compressing context, don't touch it.

Streaming Generation and UI

Users shouldn't wait for full response. Streaming via SSE is standard for all modern LLM APIs. On iOS:

// Update SwiftUI View through @Published
class ChatViewModel: ObservableObject {
    @Published var streamingText = ""

    func streamResponse(for prompt: String) {
        streamingText = ""
        Task {
            for try await chunk in llmClient.stream(prompt: prompt) {
                await MainActor.run {
                    streamingText += chunk
                }
            }
        }
    }
}

On Android with Compose — StateFlow<String>, updated from collectAsState(). Common mistake: calling notifyDataSetChanged() or recreating RecyclerView Adapter on each chunk — causes visible flicker. Update only last message text, not entire list.

Offline Mode and Local Models

For basic scenarios (FAQ bot, data formatting) — consider on-device models. Apple Intelligence API (iOS 18+) gives access to local language model via FoundationModels framework without network. Google ML Kit on Android provides SmartReply and EntityExtraction offline.

For more complex: llama.cpp via Metal/CoreML on iOS or NNAPI on Android — runs Llama 3 8B int4 directly on device. On iPhone 15 Pro generation speed ~15 tokens/sec, acceptable for auxiliary functions.

Chat History Storage

Chat history is personal data. SQLite/Core Data with encryption via SQLCipher or iOS Data Protection. Don't store in UserDefaults — syncs to iCloud unencrypted. On Android — Room with EncryptedSharedPreferences for encryption keys.

Cleanup strategy: auto-delete conversations older than N days, or explicit deletion on user request — GDPR/CCPA requirement.

Common Production Issues

Repeating answers. GPT sometimes loops on pattern. Parameter presence_penalty: 0.6 and frequency_penalty: 0.3 reduce probability. If looped — client-side detect logic: if last 3 bot messages contain > 60% identical n-grams, reset context.

Timeout on poor network. LLM can generate long. URLSession default timeout is 60 seconds, too short for long streamed responses. Set timeoutIntervalForResource: 120 and add progress indicator "thinking..." after 5 seconds waiting for first chunk.

Moderation. OpenAI Moderation API before sending user input — required for consumer apps. One POST /v1/moderations costs less than handling App Store Review complaint.

Implementation Process

Design architecture: choose LLM provider, on-device vs cloud, authorization scheme. Develop backend proxy with rate limiting. Implement ConversationManager with context management. Chat UI with streaming, bubble layout, typing indicator. Encrypted chat history. Test edge cases: network loss during generation, very long responses, parallel requests.

Timeline Guidelines

Simple chat with one LLM provider without history — 5–7 days. Full-featured chatbot with history, context compression, offline mode, and moderation — 3–5 weeks.