Question Answering over Documents Implementation

We design and deploy artificial intelligence systems: from prototype to production-ready solutions. Our team combines expertise in machine learning, data engineering and MLOps to make AI work not in the lab, but in real business.
Showing 1 of 1 servicesAll 1566 services
Question Answering over Documents Implementation
Medium
~5 business days
FAQ
AI Development Areas
AI Solution Development Stages
Latest works
  • image_web-applications_feedme_466_0.webp
    Development of a web application for FEEDME
    1161
  • image_ecommerce_furnoro_435_0.webp
    Development of an online store for the company FURNORO
    1041
  • image_logo-advance_0.png
    B2B Advance company logo design
    561
  • image_crm_enviok_479_0.webp
    Development of a web application for Enviok
    823
  • image_logo-aider_0.jpg
    AIDER company logo development
    762
  • image_crm_chasseurs_493_0.webp
    CRM development for Chasseurs
    848

Implementation of Question Answering (Answers to Questions Over Documents)

QA over documents is a system that accepts a natural language question and returns an answer found or synthesized from a document corpus. This is the foundation for corporate search systems, knowledge bases, and automated assistants.

QA System Architectures

Extractive QA: the answer is an exact span from the document. The model selects a span (start/end) in the found document. deepset/roberta-base-squad2, sberbank-ai/rubert-base-cased-qa — ready-made models. Advantage: no hallucinations. Disadvantage: the answer must literally be in the text.

RAG (Retrieval-Augmented Generation): the most common production approach. A retriever finds relevant documents, LLM synthesizes the answer. The answer can generalize information from multiple sources.

Long-context LLM: Claude 3.5 (200K tokens) or Gemini Pro (1M tokens) — entire document corpus in context. For small knowledge bases (< 500 pages) this is simpler than RAG.

RAG Pipeline

from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_community.vectorstores import Qdrant
from langchain.chains import RetrievalQA

embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
vectorstore = Qdrant.from_existing_collection(
    embeddings=embeddings,
    url="http://localhost:6333",
    collection_name="docs"
)

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",  # or "map_reduce" for long documents
    retriever=vectorstore.as_retriever(search_kwargs={"k": 5}),
    return_source_documents=True,
)

result = qa_chain.invoke({"query": "What is the procedure for contract termination?"})
# result["result"] — answer
# result["source_documents"] — sources

Hallucination Control

Critical for system trust:

  1. Source citation: each fact in the answer → reference to document/paragraph
  2. Faithfulness check: separate prompt verifies answer is supported by context
  3. Explicit refusal: if information not in documents — "There is no answer to this question in available documents"
  4. Confidence scoring: confidence estimation (logprobs or separate chain)

Advanced Retrieval

Basic top-K retriever is insufficient for complex questions:

  • HyDE (Hypothetical Document Embeddings): LLM first generates hypothetical answer, then searches by its embedding
  • Multi-query: reformulate question 3–5 ways, combine results
  • Parent-child chunks: store small chunks for search, but send parent wider context to LLM

Working with Tables and Structured Data

QA over tables is a separate task. Options:

  • Text2SQL: LLM generates SQL query, executes it, returns result
  • Table serialization: table → Markdown/CSV → into LLM context
  • TAPAS (Google): specialized model for QA over tables

Quality Metrics

  • Exact Match (EM): proportion of questions with exactly correct answer (for extractive)
  • F1 score: token overlap of answer with reference
  • RAGAS: specialized library for RAG evaluation: faithfulness, answer relevance, context precision, context recall