Rasa NLP Engine Integration in Mobile Chatbot
Rasa is a self-hosted NLP stack: you control the model, data, and infrastructure. This is why teams choose it for projects with sensitive data or non-standard domains. But "self-hosted" means you'll configure the pipeline yourself — the default config gives mediocre accuracy for Russian.
Rasa NLU Pipeline for Russian
The standard config with WhitespaceTokenizer performs poorly with Russian: Russian morphology is more complex than English, different forms of the same word are perceived as separate tokens. A working config for a Russian chatbot:
language: ru
pipeline:
- name: SpacyNLP
model: ru_core_news_md
- name: SpacyTokenizer
- name: SpacyFeaturizer
- name: RegexFeaturizer
- name: LexicalSyntacticFeaturizer
- name: CountVectorsFeaturizer
- name: CountVectorsFeaturizer
analyzer: char_wb
min_ngram: 1
max_ngram: 4
- name: DIETClassifier
epochs: 150
- name: EntitySynonymMapper
- name: ResponseSelector
epochs: 100
- name: FallbackClassifier
threshold: 0.7
ambiguity_threshold: 0.1
spacy with the ru_core_news_md model plus char n-gram features from CountVectorsFeaturizer is the baseline combination that gives reasonable accuracy on short production phrases. DIETClassifier with 150 epochs is usually enough for 20–50 intents.
Rasa Core: Dialog Management
Rasa separates NLU (intent understanding) and Core (dialog management). Core operates on rules (rules.yml) and stories (stories.yml). The most common mistake is trying to describe all scenarios through rules and ending up with a fragile system that breaks on non-standard utterance order.
Rule of thumb: rigid commands (cancel, help, start over) go in rules. Multi-step scenarios with variability go in stories. Rasa Core trains on stories and generalizes to scenarios it hasn't explicitly seen — this is its main advantage over decision trees.
Custom Actions. Dynamic responses (order status, available slots) are implemented via action_server — a separate Python service that Rasa calls over HTTP. The mobile app doesn't interact with it directly:
class ActionCheckOrderStatus(Action):
def name(self) -> str:
return "action_check_order_status"
async def run(self, dispatcher, tracker, domain) -> list:
order_id = tracker.get_slot("order_id")
status = await order_service.get_status(order_id)
dispatcher.utter_message(text=f"Your order #{order_id}: {status}")
return [SlotSet("order_status", status)]
Mobile App Integration
Rasa Server exposes a REST channel at /webhooks/rest/webhook. The mobile app sends a POST request:
{
"sender": "user_device_id_or_session_uuid",
"message": "message text"
}
The response is an array of messages, each of which can be text, image, buttons, or custom payload.
On Android, this is a standard Retrofit call. Important: sender must be a stable session identifier — Rasa stores slots between requests within the same sender. Generating a new ID each time loses dialog context.
For production, don't expose Rasa directly to the internet — put nginx in front with rate limiting and token authentication.
Deployment
Rasa Server + Action Server conveniently run in Docker Compose. The model is trained with rasa train and mounted in the container. On a modest VPS, training 50 intents takes 5–10 minutes — acceptable for CI/CD.
Rasa Enterprise (commercial version) adds analytics and A/B testing for dialogs, but the open-source version suffices for most tasks.
Development Process
Domain audit, collecting utterance examples for the NLU dataset.
Pipeline setup, training the baseline model, evaluating accuracy via rasa test.
Developing dialog scenarios (rules + stories), custom actions.
Integrating the REST channel with the mobile client, infrastructure setup.
Timeline Estimates
Integrating with a ready-made Rasa server — 3–4 days. Full cycle including training a model from scratch, writing scenarios, and deployment — 2–4 weeks depending on intent count.







