LlamaIndex Integration for RAG and Data Indexing

We design and deploy artificial intelligence systems: from prototype to production-ready solutions. Our team combines expertise in machine learning, data engineering and MLOps to make AI work not in the lab, but in real business.
Showing 1 of 1 servicesAll 1566 services
LlamaIndex Integration for RAG and Data Indexing
Medium
from 1 week to 3 months
FAQ
AI Development Areas
AI Solution Development Stages
Latest works
  • image_web-applications_feedme_466_0.webp
    Development of a web application for FEEDME
    1161
  • image_ecommerce_furnoro_435_0.webp
    Development of an online store for the company FURNORO
    1041
  • image_logo-advance_0.png
    B2B Advance company logo design
    561
  • image_crm_enviok_479_0.webp
    Development of a web application for Enviok
    823
  • image_logo-aider_0.jpg
    AIDER company logo development
    762
  • image_crm_chasseurs_493_0.webp
    CRM development for Chasseurs
    848

LlamaIndex Integration for RAG Systems

LlamaIndex (formerly GPT Index) is a framework specialized for RAG: document loading, indexing, and querying. Unlike LangChain's universality, LlamaIndex focuses on data ingestion, advanced retrieval, and query understanding. It stands out with a rich ecosystem of loaders (150+ sources) and advanced indexing strategies.

Basic RAG with LlamaIndex

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, Settings
from llama_index.core.node_parser import SentenceSplitter
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding

# Global settings configuration
Settings.llm = OpenAI(model="gpt-4o", temperature=0)
Settings.embed_model = OpenAIEmbedding(model="text-embedding-3-small")
Settings.node_parser = SentenceSplitter(chunk_size=512, chunk_overlap=50)

# Load documents
documents = SimpleDirectoryReader("./data", recursive=True).load_data()

# Create index
index = VectorStoreIndex.from_documents(documents)

# Query
query_engine = index.as_query_engine(similarity_top_k=5)
response = query_engine.query("What is the warranty period for the equipment?")
print(response)
# Access sources
for node in response.source_nodes:
    print(f"Score: {node.score:.3f}, Source: {node.metadata.get('file_name')}")

Vector Store Integration

from llama_index.vector_stores.qdrant import QdrantVectorStore
from llama_index.core import StorageContext
import qdrant_client

# Connect to Qdrant
client = qdrant_client.QdrantClient(url="http://localhost:6333")
vector_store = QdrantVectorStore(client=client, collection_name="docs")
storage_context = StorageContext.from_defaults(vector_store=vector_store)

# Index into Qdrant
index = VectorStoreIndex.from_documents(
    documents,
    storage_context=storage_context,
    show_progress=True,
)

# Reload existing index
index = VectorStoreIndex.from_vector_store(vector_store)

SubQuestionQueryEngine: Breaking Down Complex Questions

from llama_index.core.query_engine import SubQuestionQueryEngine
from llama_index.core.tools import QueryEngineTool

# Create tools from different sources
financial_tool = QueryEngineTool.from_defaults(
    query_engine=financial_index.as_query_engine(),
    name="financial_data",
    description="Company financial metrics for 2023–2025",
)

contracts_tool = QueryEngineTool.from_defaults(
    query_engine=contracts_index.as_query_engine(),
    name="contracts",
    description="Contracts with suppliers and clients",
)

# SubQuestion engine automatically breaks down queries into sub-queries
engine = SubQuestionQueryEngine.from_defaults(
    query_engine_tools=[financial_tool, contracts_tool],
    use_async=True,
)

response = engine.query(
    "Compare Q1 2025 revenue with budget and check for overdue payments in contracts"
)
# Agent creates 2 sub-queries and combines results

RouterQueryEngine: Index-Based Routing

from llama_index.core.query_engine.router_query_engine import RouterQueryEngine
from llama_index.core.selectors import LLMSingleSelector

router_engine = RouterQueryEngine(
    selector=LLMSingleSelector.from_defaults(),
    query_engine_tools=[
        QueryEngineTool.from_defaults(
            query_engine=summary_index.as_query_engine(response_mode="tree_summarize"),
            description="For summary questions about the document as a whole",
        ),
        QueryEngineTool.from_defaults(
            query_engine=vector_index.as_query_engine(),
            description="For searching specific facts and details",
        ),
    ],
)

IngestionPipeline: Advanced Preprocessing

from llama_index.core.ingestion import IngestionPipeline, IngestionCache
from llama_index.core.node_parser import SentenceSplitter, SemanticSplitterNodeParser
from llama_index.core.extractors import TitleExtractor, QuestionsAnsweredExtractor
from llama_index.core.vector_stores import SimpleVectorStore

pipeline = IngestionPipeline(
    transformations=[
        SentenceSplitter(chunk_size=512, chunk_overlap=64),
        TitleExtractor(nodes=3),  # Adds document title to metadata of each chunk
        QuestionsAnsweredExtractor(questions=5),  # Generates hypothetical questions for HyDE
        OpenAIEmbedding(model="text-embedding-3-small"),
    ],
    vector_store=vector_store,
    cache=IngestionCache(),  # Caches processed documents
)

nodes = await pipeline.arun(documents=documents, show_progress=True)

Retrieval-Augmented Fine-Tuning: Embedding Adaptation

from llama_index.finetuning import EmbeddingAdapterFinetuneEngine, generate_qa_embedding_pairs

# Generate training pairs from documents
train_dataset = generate_qa_embedding_pairs(
    nodes=train_nodes,
    llm=OpenAI(model="gpt-4o-mini"),
)

# Fine-tune embedding adapter
finetune_engine = EmbeddingAdapterFinetuneEngine(
    train_dataset,
    base_embed_model=OpenAIEmbedding(),
    model_output_path="embed_adapter",
    epochs=4,
    batch_size=8,
)
finetune_engine.finetune()

# Apply adapter
from llama_index.embeddings.adapter import AdapterEmbeddingModel
adapted_embed = AdapterEmbeddingModel(
    OpenAIEmbedding(),
    adapter_path="embed_adapter",
)

Practical Case Study: Corporate Knowledge Base for an Insurance Company

Initial Situation: 15,000 pages of documents (policies, insurance rules, regulatory instructions, internal procedures). Operators spent 8–12 minutes searching for answers to customer questions.

LlamaIndex Architecture:

  • Sources: 4 document types in separate Qdrant indexes
  • RouterQueryEngine: routing by question type
  • SubQuestionQueryEngine: for questions spanning multiple types
  • IngestionPipeline: automatic re-indexing when documents update
  • Metadata filtering: by insurance type, document date, regional regulator

Results:

  • Average operator response time: 10 min → 1.5 min
  • Answer accuracy (expert evaluation): 91%
  • Incorrect references to outdated policy editions: ~8% → 0.4%
  • Document coverage: 73% (operators previously unaware of many documents)

LlamaIndex vs LangChain for RAG

Aspect LlamaIndex LangChain
Specialization RAG, document QA Universal LLM applications
Data loaders 150+ native Through community
Advanced retrieval SubQuestion, Router built-in Requires customization
Agent capabilities Available (LlamaAgents) More mature (LangGraph)
Ecosystem LlamaHub LangChain Hub

Timeline

  • Basic RAG on LlamaIndex: 3–5 days
  • Multi-source RAG with RouterQueryEngine: 1–2 weeks
  • IngestionPipeline with auto-updates: 1 week
  • Embedding fine-tuning for domain: 2–3 weeks