If you’re choosing between LangChain vs LlamaIndex — or wondering whether to skip frameworks entirely and write plain Python — you’re asking the right question at the right time. Most tutorials skip straight to “here’s how to use LangChain” without addressing the architectural cost you pay later: tight coupling to abstractions that weren’t designed for your use case, debugging nightmares when something breaks three layers deep, and upgrade pain every time the framework shifts its API.
I’ve shipped production systems using all three approaches. Here’s what actually matters when you’re deciding.
What Each Approach Is Actually Optimized For
Before comparing features, get clear on what problem each tool was designed to solve. Using the wrong tool because it has more GitHub stars is how you end up rewriting your core infrastructure six months in.
LangChain: The Everything Framework
LangChain started as a way to chain LLM calls together — hence the name. It’s grown into a sprawling ecosystem covering agents, memory, retrieval, tool calling, output parsing, and more. The selling point is breadth: there are pre-built integrations for hundreds of LLM providers, vector stores, document loaders, and tools.
The practical reality: LangChain is excellent for prototyping. You can wire together a RAG pipeline or a tool-calling agent in under 50 lines. It’s also genuinely useful when you need to swap providers — moving from OpenAI to Anthropic or from Pinecone to Chroma is often a one-line change.
Where it falls down is production debugging. When a chain fails, the error often surfaces in a wrapper class three abstractions away from your actual code. The abstraction layers that make things easy to build also make them hard to inspect. LangChain’s own documentation acknowledges this — their newer LCEL (LangChain Expression Language) was partly built to address composability and streaming issues in the original chain design.
from langchain_anthropic import ChatAnthropic
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
# Simple chain with LCEL — clean and readable
model = ChatAnthropic(model="claude-3-haiku-20240307")
prompt = ChatPromptTemplate.from_template("Summarise this in 3 bullets: {text}")
chain = prompt | model | StrOutputParser()
# Runs fine, but if this fails, good luck tracing it
result = chain.invoke({"text": your_document})
The | pipe syntax looks elegant but can make tracing failures harder. My rule: LangChain is worth it when you need its integrations. If you’re writing your own retrieval logic anyway, you’re paying the abstraction tax for nothing.
LlamaIndex: Built for Retrieval and Knowledge Systems
LlamaIndex (formerly GPT Index) has a narrower, better-defined scope: indexing, retrieval, and querying over your data. If your core product involves RAG — retrieval-augmented generation over documents, databases, or knowledge bases — LlamaIndex has genuinely thought harder about this problem than LangChain has.
The data connectors are more sophisticated. The query engine abstractions map more naturally to real retrieval patterns: hybrid search, recursive retrieval over document hierarchies, sub-question decomposition. If you’re building a product where users ask questions over large document corpora, LlamaIndex’s primitives are a closer fit.
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.llms.anthropic import Anthropic
from llama_index.core import Settings
# Configure the LLM globally
Settings.llm = Anthropic(model="claude-3-sonnet-20240229")
# Load and index documents
documents = SimpleDirectoryReader("./docs").load_data()
index = VectorStoreIndex.from_documents(documents)
# Query engine handles chunking, embedding, retrieval, and synthesis
query_engine = index.as_query_engine(similarity_top_k=5)
response = query_engine.query("What are the payment terms?")
# Response includes source nodes — useful for citations
print(response.response)
print([node.metadata for node in response.source_nodes])
That source node metadata is worth calling out — LlamaIndex surfaces it cleanly by default. LangChain can do this too but it takes more wiring.
LlamaIndex’s weakness is agents and complex multi-step workflows. It has agent functionality, but it’s not the primary design surface. If your product is more “autonomous agent that sometimes retrieves documents” than “knowledge base with a query interface,” LlamaIndex feels like you’re fighting the framework.
Plain Python: The One You Underestimate
Calling the OpenAI or Anthropic API directly, managing your own context, chunking your own documents, writing your own retry logic — this is genuinely the right choice more often than framework evangelists admit.
The major LLM providers now have clean, well-documented SDKs. Anthropic’s Python SDK is good. OpenAI’s is good. You can build a production RAG pipeline with nothing but anthropic, openai, chromadb, and standard library — and you’ll understand every line of it.
import anthropic
import chromadb
from chromadb.utils import embedding_functions
client = anthropic.Anthropic()
chroma = chromadb.Client()
# Use OpenAI embeddings via Chroma's helper (or swap for any provider)
ef = embedding_functions.OpenAIEmbeddingFunction(
api_key="sk-...",
model_name="text-embedding-3-small"
)
collection = chroma.get_or_create_collection("docs", embedding_function=ef)
def retrieve(query: str, n: int = 5) -> list[str]:
results = collection.query(query_texts=[query], n_results=n)
return results["documents"][0] # list of matching chunks
def answer(query: str) -> str:
context = retrieve(query)
context_block = "\n\n".join(context)
response = client.messages.create(
model="claude-3-haiku-20240307",
max_tokens=1024,
messages=[{
"role": "user",
"content": f"Context:\n{context_block}\n\nQuestion: {query}"
}]
)
return response.content[0].text
# ~50 lines, zero framework dependencies, fully debuggable
This costs roughly $0.0005 per query at current Haiku pricing for a typical context window. You’re not paying a framework abstraction tax on top of that.
The tradeoff is maintenance surface. Every integration you need — new LLM provider, new vector store, streaming support, tool calling — you build yourself. That’s fine for a focused product. It becomes a drag if you’re iterating fast across many providers or building a platform.
The Real Comparison: Complexity vs Control
Stop thinking about this as a features comparison and start thinking about it as a complexity budget allocation problem.
| Factor | LangChain | LlamaIndex | Plain Python |
|---|---|---|---|
| Time to first prototype | Very fast | Fast (for RAG) | Moderate |
| Debugging in production | Hard | Moderate | Easy |
| Provider portability | Excellent | Good | Manual |
| API stability | Poor (frequent breaks) | Moderate | Stable (vendor SDKs) |
| Community / integrations | Largest | Strong in RAG | DIY |
| Framework lock-in risk | High | Medium | None |
Framework Lock-in Is a Real Risk — Here’s Why
LangChain has broken its API significantly at least twice in the past two years: first moving from langchain to split packages (langchain-core, langchain-community, etc.), then introducing LCEL as the preferred composition pattern over the original chain classes. If you shipped a production system on LangChain 0.0.x, you’ve felt this.
LlamaIndex has been somewhat better but still moved fast in ways that broke existing code during its rebranding and v0.10 restructuring.
This matters because when your LLM product breaks at 2am, you want to be debugging your code, not trying to understand why a framework version bump changed the way memory is flushed. Pin your dependency versions religiously. Use a lockfile. Never upgrade a framework dependency the week of a launch.
When to Use What: Decision Framework
Use LangChain When
- You’re prototyping and need to validate multiple provider combinations fast
- Your product requires many third-party integrations (Slack, Notion, databases, APIs) that LangChain already wraps
- You’re building an agent-heavy system and want to use LangGraph for stateful, cyclical workflows
- Your team already knows it and the switching cost exceeds the abstraction cost
Use LlamaIndex When
- Your core product value is retrieval quality over large, structured document corpora
- You need advanced RAG patterns: hybrid search, reranking, recursive retrieval, multi-document synthesis
- You want built-in observability of retrieval quality (source nodes, scores, metadata)
- You’re building something like a contract review tool, internal knowledge base, or document Q&A product
Use Plain Python When
- Your core logic is well-defined and unlikely to need multi-provider flexibility
- You’re a solo founder or small team who needs to understand and maintain every line
- You’re building on top of a single model provider (e.g., Claude-only, or GPT-4-only)
- You’ve hit a wall trying to customize framework behavior and you’re spending more time fighting the abstraction than building features
- You’re building a long-lived production system where stability beats development speed
A Hybrid Architecture That Actually Works in Production
The most pragmatic approach I’ve seen at the product layer: write your core business logic in plain Python, use LlamaIndex for document ingestion and retrieval if you need it, and only pull in LangChain if you need a specific integration it provides.
Treat frameworks as dependencies, not as foundations. Your architecture shouldn’t be “a LangChain app” — it should be “a Python app that uses LangChain for the Notion connector.” That framing keeps you in control.
# Your service layer — plain Python, vendor SDK only
class DocumentQAService:
def __init__(self, index, llm_client):
self.index = index # LlamaIndex query engine
self.llm = llm_client # Anthropic client directly
def answer(self, question: str, user_id: str) -> dict:
# Retrieval handled by LlamaIndex
retrieved = self.index.retrieve(question)
context = "\n".join([r.text for r in retrieved])
# Generation handled directly — no framework magic
response = self.llm.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=2048,
system="Answer only from the provided context. Cite sources.",
messages=[{
"role": "user",
"content": f"Context:\n{context}\n\nQuestion: {question}"
}]
)
return {
"answer": response.content[0].text,
"sources": [r.metadata.get("filename") for r in retrieved],
"usage": response.usage.output_tokens
}
This pattern gives you LlamaIndex’s retrieval quality without coupling your generation logic to its abstractions. Swap out the index for a plain Chroma call whenever you want — the service layer doesn’t care.
Bottom Line: Match the Tool to the Actual Problem
When the LangChain vs LlamaIndex question comes up in your architecture planning, use it as a forcing function to get clear on what your actual core loop is.
Solo founder building fast: Start plain Python. Add LlamaIndex if retrieval quality is your differentiator. Avoid LangChain until you actually need one of its integrations.
Small team with a retrieval-heavy product: LlamaIndex for the indexing layer, plain Python SDK calls for generation. Pin versions, write integration tests, document every assumption.
Larger team needing provider flexibility or complex agent workflows: LangChain with LangGraph is reasonable, but assign someone to own framework upgrades. The integration catalog is genuinely valuable if you’re connecting to many external systems.
Enterprise or long-lived system: Seriously consider plain Python as your foundation. The operational clarity pays dividends over a 2-3 year horizon in ways that early prototype speed doesn’t.
None of these are permanent decisions. Start with what gets you to a working product, build in seams between layers, and migrate when the pain of the current approach exceeds the pain of switching. That’s how production AI systems actually evolve — not by picking the perfect framework upfront, but by keeping your options open.
Editorial note: API pricing, model capabilities, and tool features change frequently — always verify current details on the vendor’s website before building in production. Code examples are tested at time of writing; pin your dependency versions to avoid breaking changes. Some links in this article may be affiliate links — we may earn a commission if you sign up, at no extra cost to you.

