LangChain Integration for AI Pipelines
LangChain is a framework for building LLM applications, providing abstractions over models, documents, tools, and memory. The key concept is chains that combine components into reproducible processing pipelines. LCEL (LangChain Expression Language) is a declarative interface that unifies syntax for any combination of components.
LCEL: Foundation of Modern LangChain
LCEL uses the | operator for component composition. Any object implementing Runnable can be connected in a chain:
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser, JsonOutputParser
from langchain_core.runnables import RunnablePassthrough, RunnableParallel
from langchain_community.vectorstores import Chroma
llm = ChatOpenAI(model="gpt-4o", temperature=0)
# Simple chain
prompt = ChatPromptTemplate.from_messages([
("system", "You are an expert in {domain}."),
("human", "{question}"),
])
chain = prompt | llm | StrOutputParser()
result = chain.invoke({"domain": "financial analysis", "question": "What is EBITDA?"})
# Parallel chain
parallel_chain = RunnableParallel({
"summary": prompt | llm | StrOutputParser(),
"keywords": ChatPromptTemplate.from_template("Extract keywords: {question}") | llm | StrOutputParser(),
})
Integration with LLM Providers
LangChain supports a unified interface for different providers:
# OpenAI
from langchain_openai import ChatOpenAI
llm_openai = ChatOpenAI(model="gpt-4o-mini", temperature=0.2)
# Anthropic
from langchain_anthropic import ChatAnthropic
llm_claude = ChatAnthropic(model="claude-3-5-sonnet-20241022")
# Google
from langchain_google_genai import ChatGoogleGenerativeAI
llm_gemini = ChatGoogleGenerativeAI(model="gemini-2.0-flash")
# Local Ollama
from langchain_ollama import ChatOllama
llm_local = ChatOllama(model="llama3.2:3b", temperature=0)
# Hugging Face
from langchain_huggingface import HuggingFaceEndpoint
llm_hf = HuggingFaceEndpoint(repo_id="mistralai/Mistral-7B-Instruct-v0.3")
Provider changes don't require modifying chain logic — only swap the llm object.
RAG Pipeline with LangChain
from langchain_community.document_loaders import DirectoryLoader, PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_openai import OpenAIEmbeddings
from langchain_qdrant import QdrantVectorStore
from langchain_core.runnables import RunnablePassthrough
import json
# Load and chunk documents
loader = DirectoryLoader("./docs", glob="**/*.pdf", loader_cls=PyPDFLoader)
docs = loader.load()
splitter = RecursiveCharacterTextSplitter(chunk_size=800, chunk_overlap=100)
chunks = splitter.split_documents(docs)
# Indexing
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
vectorstore = QdrantVectorStore.from_documents(
chunks,
embedding=embeddings,
url="http://localhost:6333",
collection_name="knowledge_base",
)
retriever = vectorstore.as_retriever(search_kwargs={"k": 5})
# RAG chain
rag_prompt = ChatPromptTemplate.from_template("""Answer the question based on context.
Context:
{context}
Question: {question}
If the answer is not in the context — state that explicitly.""")
def format_docs(docs):
return "\n\n".join(doc.page_content for doc in docs)
rag_chain = (
{"context": retriever | format_docs, "question": RunnablePassthrough()}
| rag_prompt
| llm
| StrOutputParser()
)
answer = rag_chain.invoke("What are the conditions for contract termination?")
Memory Management in Conversations
from langchain.memory import ConversationBufferWindowMemory
from langchain_core.chat_history import BaseChatMessageHistory
from langchain_core.runnables.history import RunnableWithMessageHistory
from langchain_community.chat_message_histories import RedisChatMessageHistory
# Chat history in Redis
def get_session_history(session_id: str) -> BaseChatMessageHistory:
return RedisChatMessageHistory(session_id, url="redis://localhost:6379")
chat_prompt = ChatPromptTemplate.from_messages([
("system", "You are a technical support assistant."),
("placeholder", "{history}"),
("human", "{input}"),
])
chain_with_history = RunnableWithMessageHistory(
chat_prompt | llm | StrOutputParser(),
get_session_history,
input_messages_key="input",
history_messages_key="history",
)
# Usage
config = {"configurable": {"session_id": "user_123"}}
chain_with_history.invoke({"input": "My app won't start"}, config=config)
chain_with_history.invoke({"input": "Error: 'connection refused'"}, config=config)
Structured Output and Validation
from pydantic import BaseModel, Field
from typing import Literal
class TicketClassification(BaseModel):
category: Literal["billing", "technical", "account", "other"]
priority: Literal["low", "medium", "high", "critical"]
summary: str = Field(description="Brief problem description (1-2 sentences)")
requires_human: bool
structured_llm = llm.with_structured_output(TicketClassification)
classify_prompt = ChatPromptTemplate.from_messages([
("system", "Classify the support ticket."),
("human", "{ticket_text}"),
])
classifier_chain = classify_prompt | structured_llm
result: TicketClassification = classifier_chain.invoke({
"ticket_text": "I can't login, password doesn't work for two days already"
})
LangSmith: Tracing and Debugging
import os
os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_API_KEY"] = "ls__..."
os.environ["LANGCHAIN_PROJECT"] = "production-rag"
# All calls are automatically logged in LangSmith
# Available: input/output data, latency, tokens, cost, traces
LangSmith enables analyzing prompts, finding bottlenecks in chains, and comparing pipeline versions through dataset evaluation.
Practical Case Study: Unifying 5 LLM Integrations
Situation: Product team maintained 5 separate integrations (OpenAI, Claude, corporate YandexGPT, local Llama, Gemini) with duplicated retry logic, prompt formatting, and error handling.
Solution: Refactored to LangChain LCEL with unified interface.
Architecture:
- Configurable provider via
LLM_PROVIDERenvironment variable - Common prompt templates in YAML files
- Single error handling layer via
.with_fallbacks()
from langchain_core.runnables import RunnableWithFallbacks
primary_llm = ChatOpenAI(model="gpt-4o")
fallback_llm = ChatAnthropic(model="claude-3-5-sonnet-20241022")
robust_llm = primary_llm.with_fallbacks([fallback_llm])
Results:
- Integration code volume: -67%
- Time to add new provider: 3 days → 4 hours
- Pipeline uptime (via fallback): 99.1% → 99.8%
- LangSmith visibility: incident debugging time reduced from 2h to 20min
When LangChain Is Overkill
LangChain adds abstraction justified only in complex pipelines. For simple one-shot LLM calls, direct SDK usage (OpenAI, Anthropic) is simpler and more predictable. LangChain is optimal when: multiple components (retriever + LLM + parser), multiple providers, need tracing and memory.
Timeline
- Basic LangChain integration + 1 provider: 2–4 days
- RAG pipeline with vector DB: 1–2 weeks
- Conversational agent with memory: 1–2 weeks
- Refactor existing code to LCEL: 1–3 weeks







