Implementing AI-Powered Support Ticket Classification in Mobile Applications
A ticket arrives—the operator manually decides where to route it: billing, tech support, delivery complaints. With 500+ inquiries a day, that's a bottleneck. The goal: train the mobile app to classify tickets on the client side or immediately on submission, without waiting for manual review.
Where classification happens: on-device or on server
Common question: do we need an on-device model or is API enough? It depends on traffic volume and latency requirements.
For most support apps, the flow is: ticket text goes to backend, gets classified via LLM or fine-tuned BERT, returns in 300–800 ms. On mobile it's a simple URLSession/OkHttp call. No Core ML needed.
If offline support or minimal latency is required, go on-device. iOS uses CoreML with a distilled model (MobileNet-class, ~10–20 MB). Android uses TensorFlow Lite with GPU or NNAPI delegates.
Building the classifier
Fine-tuned BERT via Hugging Face Inference API
Fastest path to production: take bert-base-multilingual-cased or distilbert-base-multilingual-cased, fine-tune on your historical ticket dataset (minimum 200–300 examples per category), and deploy via Hugging Face Inference Endpoints.
Mobile client sends POST:
// iOS
struct ClassifyRequest: Encodable {
let inputs: String
}
struct ClassifyResponse: Decodable {
let label: String
let score: Float
}
func classifyTicket(_ text: String) async throws -> ClassifyResponse {
var request = URLRequest(url: URL(string: "https://api-inference.huggingface.co/models/your-model")!)
request.httpMethod = "POST"
request.setValue("Bearer \(apiKey)", forHTTPHeaderField: "Authorization")
request.setValue("application/json", forHTTPHeaderField: "Content-Type")
request.httpBody = try JSONEncoder().encode(ClassifyRequest(inputs: text))
let (data, _) = try await URLSession.shared.data(for: request)
return try JSONDecoder().decode([ClassifyResponse].self, from: data).first!
}
Android equivalent via Retrofit + kotlinx.serialization.
On-device via CoreML (iOS)
If offline operation is critical, export the model to .mlpackage. Input is tokenized text, output is probability vector across N categories.
import CoreML
import NaturalLanguage
// Tokenization via NLTokenizer + embedding
let model = try TicketClassifier(configuration: MLModelConfiguration())
let prediction = try model.prediction(
input_ids: inputIds, // MLMultiArray
attention_mask: attentionMask
)
let categoryIndex = prediction.logits.argmax() // custom extension
Note: NLEmbedding provides ready-made word embeddings without server calls, but for 10+ category classification, accuracy will be lower than a fine-tuned model.
Text preprocessing
Before sending to the model:
- Truncate to 512 tokens (BERT limit)—keep the start where the problem usually is
- Normalize Unicode:
text.folding(options: .diacriticInsensitive, locale: .current) - Remove personal data before sending to server: card numbers, phone numbers via regex on client
Integration into ticket submission form
Classification doesn't run on submit, but with debounce on text field onChange—after 1.5–2 second pause in typing. User sees suggested category and can correct manually.
// Android, Compose
val ticketText by viewModel.ticketText.collectAsState()
val suggestedCategory by viewModel.suggestedCategory.collectAsState()
// ViewModel
private val _ticketText = MutableStateFlow("")
init {
_ticketText
.debounce(1500)
.filter { it.length > 20 }
.mapLatest { text -> classifyUseCase(text) }
.onEach { _suggestedCategory.value = it }
.launchIn(viewModelScope)
}
mapLatest cancels previous request on new input—no accumulated network calls.
Common mistakes
Too few classes. "Other" shouldn't exceed 15% of real traffic—otherwise everything unclear falls there and the classifier loses value. If "other" > 30%, audit your taxonomy.
Not logging confidence scores. If score < 0.6, show manual selection to user, don't force a category. Track this in Firebase Crashlytics with custom attributes.
Model never retrains. Classifiers degrade as the product grows: new ticket types emerge, old categories shift. Set up retraining pipeline at least quarterly from operator corrections.
Process
Audit current ticket taxonomy → collect and label training data → choose architecture (API vs on-device) → train and validate model → integrate into mobile client → A/B test vs manual classification → deploy and monitor.
Timeline estimates
Integration with ready API (OpenAI, Hugging Face)—3–5 days. Fine-tune own model + integration—2–4 weeks. On-device CoreML/TFLite with model export—plus 1 week.







