Parent Document Retriever Implementation for RAG

We design and deploy artificial intelligence systems: from prototype to production-ready solutions. Our team combines expertise in machine learning, data engineering and MLOps to make AI work not in the lab, but in real business.
Showing 1 of 1 servicesAll 1566 services
Parent Document Retriever Implementation for RAG
Medium
from 1 business day to 3 business days
FAQ
AI Development Areas
AI Solution Development Stages
Latest works
  • image_web-applications_feedme_466_0.webp
    Development of a web application for FEEDME
    1161
  • image_ecommerce_furnoro_435_0.webp
    Development of an online store for the company FURNORO
    1041
  • image_logo-advance_0.png
    B2B Advance company logo design
    561
  • image_crm_enviok_479_0.webp
    Development of a web application for Enviok
    823
  • image_logo-aider_0.jpg
    AIDER company logo development
    762
  • image_crm_chasseurs_493_0.webp
    CRM development for Chasseurs
    848

Implementation of Parent Document Retriever for RAG

Parent Document Retriever is an architectural RAG pattern solving a fundamental contradiction: small chunks are needed for precise retrieval (better semantic precision), but wide context is needed for quality generation (full section, not 3 sentences). Solution: index small "child" chunks, but pass their "parent" large documents to the LLM.

Parent Document Retriever Architecture

Indexing:
├── Document (2000 tokens)
│   ├── Child chunk 1 (128 tokens) → embedding → index
│   ├── Child chunk 2 (128 tokens) → embedding → index
│   ├── Child chunk 3 (128 tokens) → embedding → index
│   └── Child chunk 4 (128 tokens) → embedding → index

Retrieval:
├── Query → search by child embeddings
├── Found child_chunk_3 (high relevance)
└── Return parent document (2000 tokens) → to LLM

Implementation with LangChain

from langchain.retrievers import ParentDocumentRetriever
from langchain.storage import InMemoryByteStore, LocalFileStore
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.vectorstores import Qdrant
from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

# Storage for parent documents (persistent)
store = LocalFileStore("./parent_docs_store")

# Splitters: child fine, parent coarse
child_splitter = RecursiveCharacterTextSplitter(
    chunk_size=200,
    chunk_overlap=20,
)
parent_splitter = RecursiveCharacterTextSplitter(
    chunk_size=2000,
    chunk_overlap=100,
)

vectorstore = Qdrant.from_texts(
    texts=[],  # Empty — filled via retriever
    embedding=embeddings,
    collection_name="child_chunks",
    url="http://localhost:6333",
)

retriever = ParentDocumentRetriever(
    vectorstore=vectorstore,
    docstore=store,
    child_splitter=child_splitter,
    parent_splitter=parent_splitter,
)

# Indexing
retriever.add_documents(documents, ids=None)

# Query — will return parent documents
relevant_docs = retriever.invoke("procedure for approving purchase")
print(f"Found {len(relevant_docs)} parent documents")
print(f"Size of first: {len(relevant_docs[0].page_content)} chars")

Three-Level Hierarchy

For complex documents, use three levels: document → section → paragraph:

from langchain.retrievers import ParentDocumentRetriever

# Sub-chunk (for indexing) → chunk (parent) → section (grandparent)
sub_splitter = RecursiveCharacterTextSplitter(chunk_size=100, chunk_overlap=10)
chunk_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)

retriever = ParentDocumentRetriever(
    vectorstore=vectorstore,
    docstore=store,
    child_splitter=sub_splitter,
    parent_splitter=chunk_splitter,
)

Practical Comparison of Approaches

Dataset: technical regulations (average document 3500 words, 20–40 sections).

Approach Chunk in Index Context in LLM Context Recall Faithfulness
Standard (512 tokens) 512 512×5=2560 0.69 0.81
Standard (256 tokens) 256 256×5=1280 0.74 0.78
Parent Doc (child=200, parent=1500) 200 1500×3=4500 0.88 0.91
Parent Doc + Reranker 200 1500×3=4500 0.88 0.94

Parent Document Retriever significantly improves context recall (+19%) with high faithfulness: child chunks precisely find needed section, parent documents provide full context.

Parent Document Caching

With high QPS, parent documents should be cached in Redis:

import redis
import json

redis_client = redis.Redis(host="localhost", port=6379)

class CachedParentDocumentRetriever:
    def __init__(self, base_retriever, ttl: int = 3600):
        self.retriever = base_retriever
        self.ttl = ttl

    def invoke(self, query: str) -> list:
        # Retrieval child chunks
        child_docs = self.retriever.vectorstore.similarity_search(query, k=5)

        # Load parents with cache
        parent_docs = []
        for child in child_docs:
            parent_id = child.metadata.get("doc_id")
            cache_key = f"parent:{parent_id}"

            cached = redis_client.get(cache_key)
            if cached:
                parent_docs.append(json.loads(cached))
            else:
                parent = self.retriever.docstore.mget([parent_id])[0]
                if parent:
                    redis_client.setex(cache_key, self.ttl, json.dumps(parent.dict()))
                    parent_docs.append(parent)

        return parent_docs

Timeline

  • Setting up Parent Document Retriever: 2–3 days
  • Tuning optimal chunk sizes: 2–3 days
  • Testing and evaluation: 2–3 days
  • Total: 1 week