Agentic RAG with Autonomous Information Search

We design and deploy artificial intelligence systems: from prototype to production-ready solutions. Our team combines expertise in machine learning, data engineering and MLOps to make AI work not in the lab, but in real business.
Showing 1 of 1 servicesAll 1566 services
Agentic RAG with Autonomous Information Search
Complex
from 1 week to 3 months
FAQ
AI Development Areas
AI Solution Development Stages
Latest works
  • image_web-applications_feedme_466_0.webp
    Development of a web application for FEEDME
    1161
  • image_ecommerce_furnoro_435_0.webp
    Development of an online store for the company FURNORO
    1041
  • image_logo-advance_0.png
    B2B Advance company logo design
    561
  • image_crm_enviok_479_0.webp
    Development of a web application for Enviok
    823
  • image_logo-aider_0.jpg
    AIDER company logo development
    762
  • image_crm_chasseurs_493_0.webp
    CRM development for Chasseurs
    848

Implementing Agentic RAG with Autonomous Search

Agentic RAG is an architecture where an LLM-agent autonomously decides: whether search is needed, how many times to search, what queries to formulate, and whether found information is sufficient for a response. Unlike standard RAG with fixed one-shot retrieval, the agent iteratively explores the knowledge base until gathering sufficient context.

Standard RAG vs Agentic RAG

Standard RAG:

  1. Query → Retrieval (once) → Generation
  2. No control over context sufficiency
  3. No adaptation of search strategy

Agentic RAG:

  1. Query → Agent analyzes task
  2. Agent formulates search query
  3. Retrieval → Agent evaluates result
  4. If context insufficient → new search with different query
  5. Repeat until sufficient context
  6. Generate answer

Implementation with LangGraph

from langgraph.graph import StateGraph, END
from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage, AIMessage, ToolMessage
from typing import TypedDict, Annotated
import operator

class AgentState(TypedDict):
    messages: Annotated[list, operator.add]
    retrieved_docs: list[str]
    search_count: int
    sufficient_context: bool

llm = ChatOpenAI(model="gpt-4o", temperature=0)

def analyze_and_search(state: AgentState) -> AgentState:
    """Agent decides what and how to search"""
    query = state["messages"][0].content
    retrieved_so_far = "\n".join(state["retrieved_docs"])

    decision_prompt = f"""You are a research agent. Your task is to find information for the answer.

Question: {query}

Information found so far:
{retrieved_so_far if retrieved_so_far else "Nothing found"}

Number of searches performed: {state["search_count"]}

Decide:
1. Is the found information sufficient for a complete answer? (YES/NO)
2. If NO — formulate the next search query (specific aspect of the question)

Answer in JSON: {{"sufficient": true/false, "next_query": "..."}}"""

    response = llm.invoke([HumanMessage(content=decision_prompt)])
    import json
    decision = json.loads(response.content)

    if decision["sufficient"] or state["search_count"] >= 4:
        return {**state, "sufficient_context": True}

    # Perform search
    new_docs = retriever.invoke(decision["next_query"])
    new_texts = [d.page_content for d in new_docs]

    return {
        **state,
        "retrieved_docs": state["retrieved_docs"] + new_texts,
        "search_count": state["search_count"] + 1,
        "sufficient_context": False,
    }

def generate_answer(state: AgentState) -> AgentState:
    """Generates final answer based on collected context"""
    context = "\n\n".join(state["retrieved_docs"])
    question = state["messages"][0].content

    answer = llm.invoke([
        HumanMessage(content=f"Context:\n{context}\n\nQuestion: {question}\n\nProvide a comprehensive answer:")
    ])

    return {**state, "messages": state["messages"] + [answer]}

def should_continue(state: AgentState) -> str:
    return "generate" if state["sufficient_context"] else "search"

# Build graph
graph = StateGraph(AgentState)
graph.add_node("search", analyze_and_search)
graph.add_node("generate", generate_answer)

graph.set_entry_point("search")
graph.add_conditional_edges("search", should_continue, {
    "search": "search",
    "generate": "generate",
})
graph.add_edge("generate", END)

agent = graph.compile()

Adaptive RAG: Routing by Query Complexity

Not all questions require an agentic approach. Adaptive RAG adds a classifier:

from enum import Enum

class RetrievalStrategy(Enum):
    DIRECT_ANSWER = "direct"   # Without search (LLM knows answer)
    SINGLE_SHOT = "single"     # Standard RAG
    ITERATIVE = "iterative"    # Agentic RAG
    GRAPH = "graph"            # Graph RAG

def classify_query(query: str) -> RetrievalStrategy:
    """Classifies query to select strategy"""
    response = llm.invoke(f"""Classify the question by search strategy:
- direct: common knowledge, doesn't require search
- single: one search will provide sufficient context
- iterative: multiple searches from different angles needed
- graph: question about relationships between entities

Question: {query}
Answer (one word only):""")
    return RetrievalStrategy(response.content.strip())

def adaptive_rag(query: str):
    strategy = classify_query(query)

    if strategy == RetrievalStrategy.DIRECT_ANSWER:
        return llm.invoke(query).content
    elif strategy == RetrievalStrategy.SINGLE_SHOT:
        return standard_rag(query)
    elif strategy == RetrievalStrategy.ITERATIVE:
        return agent.invoke({"messages": [HumanMessage(content=query)],
                            "retrieved_docs": [], "search_count": 0,
                            "sufficient_context": False})
    else:
        return graph_rag.query(query)

Practical Case: Investment Analyst Assistant

Task: answering analytical questions on a corpus of financial reports from 200 companies.

Example questions:

  • "How did company X profitability change over 3 years?" → iterative (3 searches by year)
  • "Which companies in the sector have EBITDA margin above 25%?" → iterative (multiple searches + aggregation)
  • "What is P/E of company X?" → single shot

Results of Agentic vs Single-Shot RAG:

Question Type Single-shot Completeness Agentic Completeness Avg Searches
Simple facts 0.91 0.92 1.1
Period comparison 0.48 0.84 2.3
Cross-company 0.31 0.76 3.1
Sector aggregation 0.22 0.68 3.8

Agentic RAG critically improves complex queries (+218% for cross-company) with moderate latency degradation (×2.4 average).

Guardrails: Limiting Iterations

MAX_ITERATIONS = 5
TIMEOUT_SECONDS = 30

# In LangGraph configuration
agent = graph.compile(
    checkpointer=MemorySaver(),
    interrupt_before=["search"],  # For human-in-the-loop
)

# Emergency exit on iteration overflow
config = {"recursion_limit": MAX_ITERATIONS * 2}
result = agent.invoke(initial_state, config=config)

Timeline

  • Design agentic architecture: 1 week
  • Implement iterative retrieval: 1–2 weeks
  • Adaptive routing: 1 week
  • Testing and evaluation: 2 weeks
  • Total: 5–7 weeks