Amit Kothari
Amit Kothari CEO of Tallyfy, AI advisor at Blue Sheen

LangChain vs LlamaIndex vs building it yourself

In brief

AI frameworks promise to simplify development, but they often add more complexity than they remove. LangChain has 90M+ monthly downloads yet introduces major overhead, LlamaIndex excels at data connection, while direct API implementation provides clarity and control. Here is when each approach actually makes sense for your team.

Quick answers

Why does this matter? Frameworks add abstraction layers - LangChain and LlamaIndex introduce major overhead that makes debugging harder and customization more painful than building directly with APIs

What should you do? Simple use cases favor direct implementation - For basic AI applications, direct API calls give you better performance, lower complexity, and clearer code paths than framework abstractions

What is the biggest risk? Frameworks excel at specific problems - LlamaIndex shines for data indexing workflows, LangChain works well for multi-step reasoning with durable state, but neither is a universal solution

Where do most people go wrong? Maintenance burden grows over time - Breaking changes, dependency bloat, and framework evolution create ongoing costs that outweigh initial productivity gains for many teams

The question every team building AI applications hits eventually: LangChain, LlamaIndex, or just call the API directly?

Sounds technical. It isn’t. It’s a question about what kind of problems you want to spend the next six months debugging.

Pick wrong and you’ll spend those months fighting abstraction layers instead of shipping features. Building reliable AI agents requires understanding these tradeoffs early. This pattern plays out constantly. Teams start with a framework because it promises fast movement. Six months later, they’re reading LangChain source code at 11pm trying to understand why their agent keeps producing garbage output.

The stakes are real. Harrison Chase’s LangChain now has 90M+ monthly downloads and runs in production at Uber, JP Morgan, and BlackRock. LlamaIndex has grown into document agents, smart spreadsheet processing, and enterprise document pipelines. These aren’t toys.

But popular isn’t the same as right for your situation.

The abstraction trap

Frameworks sell you on the first 20 minutes. LangChain’s documentation shows a working chatbot in five lines of code. LlamaIndex promises to connect LLMs to your data with minimal setup. Both deliver on that promise, for the simple case.

The crack appears around week three.

Your requirements hit something the framework didn’t anticipate. Now you’re not writing application code. You’re reverse-engineering framework internals to change behavior that should be simple. This analysis of LangChain’s complexity described it plainly: the framework becomes a source of painful friction rather than productivity once requirements get complex. You end up understanding LangChain better than your own application.

Count the abstraction layers in LangChain: LLM calls, prompts, memory, chains, agents. That’s five layers between you and the model. LlamaIndex is narrower in scope, focused on data connection and retrieval. Still has layers. Still has quirks.

Turns out, developers who abandoned frameworks found something that surprised me: their simpler direct implementations outperformed the framework versions in both quality and reliability. Not marginally. Measurably.

The reason is almost embarrassingly simple. Every abstraction layer adds complexity. You debug the framework, not your application. You learn LangChain’s quirks instead of learning how LLMs actually work.

What these frameworks actually solve

I want to be fair here, because frameworks aren’t inherently bad. They solve real problems. Just not always the ones you think you have.

Jerry Liu’s LlamaIndex does one thing well: connecting LLMs to your data. Building a system that searches documents, creates embeddings, and retrieves context for AI responses? LlamaIndex handles this solidly. The high-level API lets you prototype fast. The indexing and retrieval modules are well-built.

They’ve also expanded aggressively. LlamaParse v2 overhauled document parsing with up to 50% cost reduction at comparable accuracy. They’ve added LlamaAgents for one-click document agent deployment, LlamaSheets for messy spreadsheet processing, and enterprise document pipelines.

Where LlamaIndex struggles is anything beyond data-focused workflows. Complex multi-step reasoning with arbitrary logic? You’ll hit walls fast. Fine-grained control over agent behavior? You’ll fight opinionated abstractions the whole way.

LangChain goes the opposite direction. Maximum flexibility through modular components: agents, tools, memory, custom chains. The architecture has matured. LangGraph 1.0 now provides durable state persistence, production-tested at Uber, LinkedIn, and Klarna. Server restarts mid-workflow? It picks up exactly where it left off.

Does that mean LangChain is the automatic choice for complex work? Not quite. The flexibility still comes with real baggage. Dependency bloat is a persistent complaint: installing LangChain pulls in dozens of packages. Performance analysis comparing frameworks to direct API calls found measurably higher latency for simple requests. The overhead isn’t theoretical. For complex workflows, frameworks can actually perform better due to built-in optimizations, so the right call depends heavily on what you’re building.

If you want help shaping the actual implementation, Blue Sheen runs engagements like this.

When you should just build it yourself

Most AI applications don’t need a framework. They need three things: an API client, prompt management, and error handling.

That’s it.

Building without frameworks means you can create functional AI agents in surprisingly little code. No abstractions. No magic. Just direct API calls you fully control and understand.

The benefits compound. You know exactly what every line does. Debugging means reading your code, not framework source. Changes take minutes instead of hours. Your team learns how LLMs actually work instead of learning framework quirks that become irrelevant when you switch tools.

Direct implementation works best when requirements are clear and relatively contained. Need a chatbot with conversation context? Straightforward with the OpenAI API. Want document search? RAG implementations without frameworks use ChromaDB and direct API calls effectively.

The effort difference is smaller than you’d expect. Developers switching from LangChain report their custom implementations took roughly the same development time as properly learning the framework. But ongoing maintenance was dramatically simpler.

Skip the framework if you’re building something straightforward. Use the API directly. Write clean functions. You’ll ship faster and understand more.

Hidden costs that show up after launch

Most teams don’t see the maintenance problem coming. That’s where frameworks really extract their price.

Breaking changes are brutal. LangChain had frequent breaking changes throughout its development as it evolved fast. Code that worked last month breaks after an update. You’re stuck: stay on old versions with security risks, or spend cycles adapting.

LangChain and LangGraph hit 1.0 in October 2025, coinciding with a major Series B led by IVP. They now promise no breaking changes until 2.0. That stability took years to arrive. Early adopters paid for it in constant refactoring.

The reliability numbers should give you pause. Error rates compound across a chain: 95% reliability per step yields only 36% success over 20 steps. Which is nuts, when you think about it. Production demands 99.9%+ reliability, yet even complex agent implementations struggle to hit that bar. Every abstraction layer introduces more places for things to break. Microsoft’s analysis of agentic complexity put it clearly: frameworks need careful consideration for cognitive load, security concerns, latency, and ongoing maintenance.

The observability story does favor frameworks. 89% of teams have implemented observability for their agents. LangSmith provides tracing, evaluation, and cost tracking out of the box. Building from scratch means building or integrating this yourself. Doable with tools like Langfuse, but it’s not free work.

The cancellation rate for agentic projects is striking: a large share are expected to be scrapped over the next few years as unanticipated complexity and cost catch up with them. Adding framework dependencies increases that risk. Direct API implementations integrate more cleanly into existing systems, which matters when you’re trying to unwind a decision that didn’t work out.

“The early versions were fragile, poorly documented, abstractions shifted frequently, and it felt too premature to use in prod.” — Clara Chong, AI engineer building multi-agent features, Towards Data Science

How to actually choose

Decision tree mapping project requirement from simple chatbot to durable agent to recommended framework

Start with complexity assessment. Simple chatbot or single-purpose tool? Build directly. Data-heavy retrieval system? Consider LlamaIndex. Multi-step reasoning with durable state requirements? LangGraph is strong here: LinkedIn, Uber, and Replit run it in production for complex stateful workflows. Quick prototype with role-based agents? CrewAI is built for fast role-based prototypes, though teams often hit walls when requirements outgrow its opinionated design. Anything requiring heavy customization? Build directly.

Will the best framework always win? No. Team skills matter more than most people acknowledge. A team comfortable with abstractions can make frameworks work well. A team that prefers understanding fundamentals will fight them constantly. Small teams moving fast often find direct implementation is actually faster once you account for the learning curve on both sides.

The framework space has also consolidated. Beyond LangChain and LlamaIndex, OpenAI’s Agents SDK takes a minimalist approach with no graphs or state machines, supporting Python and TypeScript. Microsoft merged AutoGen and Semantic Kernel into a unified Agent Framework that reached 1.0 general availability in April 2026 with built-in governance and multi-cloud support.

More options, not fewer decisions.

I probably lean too hard toward direct implementation for teams that need what frameworks provide. But for most mid-size companies starting out: build your first version with direct API calls. You’ll learn what you actually need. If you hit complexity that requires a framework, you’ll recognize it. And you’ll understand LLMs well enough to use the framework effectively instead of being confused by it.

Frameworks promise to handle complexity for you. They introduce their own complexity in the process.

Build what you need. Not what a framework wants you to build.

About the Author

Amit Kothari is an experienced consultant, advisor, coach, and educator specializing in AI and operations for executives and their companies. With 25+ years of experience, he is the Co-Founder & CEO of Tallyfy® (raised $3.6m, the Workflow Made Easy® platform) and Partner at Blue Sheen, an AI advisory firm for mid-size companies. He helps companies identify, plan, and implement practical AI solutions that actually work. Originally British and now based in St. Louis, MO, Amit combines deep technical expertise with real-world business understanding. Read Amit's full bio →

Disclaimer: The content in this article represents personal opinions based on extensive research and practical experience. While every effort has been made to ensure accuracy through data analysis and source verification, this should not be considered professional advice. Always consult with qualified professionals for decisions specific to your situation.

Related Posts

View All Posts »
Accessibility overlays do not work, and AI auditing is the opposite

Accessibility overlays do not work, and AI auditing is the opposite

An accessibility overlay is one line of JavaScript that promises ADA compliance while you do nothing. The FTC fined accessiBe a million dollars over that promise. Here is why a widget cannot fix a problem that lives in your code, and how real AI auditing does the reverse by finding the broken line so a person can change it.

Can AI actually do accessibility testing? I ran it on my own product

Can AI actually do accessibility testing? I ran it on my own product

Automated accessibility tools catch maybe a third of WCAG problems. I pointed Claude Code at Tallyfy, my own product, and let it run a real WCAG 2.2 audit with a live screen reader across four codebases. It found bugs that axe-core cannot see, and it showed clearly where the work still needs a person.

How to run a long autonomous Claude Code job without it drifting

How to run a long autonomous Claude Code job without it drifting

The hard part of a big AI job is not the work. It is making the agent run for many sessions without drifting or claiming it is done when it is not. I used an accessibility audit across four codebases as the test. The setup that kept Claude Code on track was a git ledger, atomic parallel claims, and two verification passes.

What a VPAT costs, and why the report is the cheap part

What a VPAT costs, and why the report is the cheap part

A VPAT is the report that states how accessible your product is, measured against WCAG. People ask what it costs and price the document, but the document is the cheap part. The real cost is re-auditing every release, and that is the number an AI agent actually moves. Here is the ADA, WCAG, Section 508 and EN 301 549 stack underneath it.

What axe-core misses, and how AI caught it with a real screen reader

What axe-core misses, and how AI caught it with a real screen reader

Axe-core catches about a third of WCAG failures and skips anything that needs judgment. Here are the thirteen criteria a scanner cannot decide, how an AI agent drives a real VoiceOver session to cover them, and the save button that passed every automated check and was silent to a blind user.

Your AI context layer is only half a brain

Your AI context layer is only half a brain

An AI context layer feeds every model one governed source of company truth, and DataHub and Atlan will sell you that read half today. The half that notices when a person did not get what they wanted, the re-ask nobody logged, is what turns a knowledge store into a brain.

AI advisory services via Blue Sheen.
Contact me Follow 10k+