Implementation of Question Answering (Answers to Questions Over Documents)
QA over documents is a system that accepts a natural language question and returns an answer found or synthesized from a document corpus. This is the foundation for corporate search systems, knowledge bases, and automated assistants.
QA System Architectures
Extractive QA: the answer is an exact span from the document. The model selects a span (start/end) in the found document. deepset/roberta-base-squad2, sberbank-ai/rubert-base-cased-qa — ready-made models. Advantage: no hallucinations. Disadvantage: the answer must literally be in the text.
RAG (Retrieval-Augmented Generation): the most common production approach. A retriever finds relevant documents, LLM synthesizes the answer. The answer can generalize information from multiple sources.
Long-context LLM: Claude 3.5 (200K tokens) or Gemini Pro (1M tokens) — entire document corpus in context. For small knowledge bases (< 500 pages) this is simpler than RAG.
RAG Pipeline
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_community.vectorstores import Qdrant
from langchain.chains import RetrievalQA
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
vectorstore = Qdrant.from_existing_collection(
embeddings=embeddings,
url="http://localhost:6333",
collection_name="docs"
)
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
qa_chain = RetrievalQA.from_chain_type(
llm=llm,
chain_type="stuff", # or "map_reduce" for long documents
retriever=vectorstore.as_retriever(search_kwargs={"k": 5}),
return_source_documents=True,
)
result = qa_chain.invoke({"query": "What is the procedure for contract termination?"})
# result["result"] — answer
# result["source_documents"] — sources
Hallucination Control
Critical for system trust:
- Source citation: each fact in the answer → reference to document/paragraph
- Faithfulness check: separate prompt verifies answer is supported by context
- Explicit refusal: if information not in documents — "There is no answer to this question in available documents"
- Confidence scoring: confidence estimation (logprobs or separate chain)
Advanced Retrieval
Basic top-K retriever is insufficient for complex questions:
- HyDE (Hypothetical Document Embeddings): LLM first generates hypothetical answer, then searches by its embedding
- Multi-query: reformulate question 3–5 ways, combine results
- Parent-child chunks: store small chunks for search, but send parent wider context to LLM
Working with Tables and Structured Data
QA over tables is a separate task. Options:
- Text2SQL: LLM generates SQL query, executes it, returns result
- Table serialization: table → Markdown/CSV → into LLM context
- TAPAS (Google): specialized model for QA over tables
Quality Metrics
- Exact Match (EM): proportion of questions with exactly correct answer (for extractive)
- F1 score: token overlap of answer with reference
- RAGAS: specialized library for RAG evaluation: faithfulness, answer relevance, context precision, context recall







