LangChain Framework Explained: LCEL, LangGraph & RAG
A complete tour of LangChain's five packages - LCEL, LangGraph, retrievers, tools, structured output - assembled into a working RAG application.
Watch (21:39)
Overview
A complete tour of LangChain's five packages - LCEL, LangGraph, retrievers, tools, structured output - assembled into a working RAG application.
Full transcript (from the video)
Almost every developer who has used a language model API has had the same realization. The first call is trivial. You send a request and get a response then move on. The second call is where it gets hard. Suddenly there is conversation history to remember.
Documents to retrieve from somewhere. Tools to call with the results fed back into the prompt. Errors to handle. Tokens to stream to the user. and observability to set up so you know what happened when something goes wrong.
That is the gap langchain has spent the last few years filling. Today, Langchain is the orchestration layer that sits between your code and the model. It handles all the connective tissue and it has matured into a family of five packages that snap together cleanly. The rest of this video is about those five packages in order. Before we look at how to write Langchain code, you need a map of the ecosystem because the names overlap and the recent stable release shuffle things, five packages worth knowing.
Langchain itself is the core. It contains the model wrappers, the prompt templates, the tool primitives, and the new high-level create agent function. Langraph is the state machine engine that powers anything stateful, branching, or longunning. Langsmith is the observability platform with tracing data set and evaluators. Langserve used to be how you deployed a chain as an HTTP API but it is now in maintenance mode and the official path is the Langraph platform.
The fifth piece is the family of provider integration packages like Langchain OpenAI and Langchain Anthropic which were split out of the core. So you only install what you need. Install command on screen. That is your starting line. The first time you read modern langchain code, you will see the pipe operator everywhere.
That is LCAL, the lang chain expression language. The idea is simple. Anything in lang that does work. Whether it is a prompt template, a model call, a parser, or a plain function implements a common interface called runnable. And any two runnables can be composed by piping them together with the same vertical bar that Unix uses.
So a typical chain reads as prompt pipe model pipe parser. Why bother? Because the moment you wrap your code in a runnable, you get four things you would otherwise have to write yourself. Synchronous invoke, asynchronous invoke, batched invoke. You also get automatic tracing through lang.
The pipe is what makes the whole framework feel coherent and it is also why a lot of older langchain code with its specialized chain classes is now considered legacy. Once you understand the pipe, the next thing to learn is the small kit of helpers that lets you build any data flow on top of it. There are three you will use constantly. Runnable lambda wraps any Python function into the pipeline. So anything you can write in Python becomes a step in your chain.
Rundable parallel fans out the input to multiple branches at the same time and collects them back into a dictionary, which is how you do summary plus title plus tags in one call. Runnable pass through forwards the input unchanged and is usually combined with its assign helper to add new fields to a dictionary as it flows through. Between these three primitives plus the pipe, you can express almost any orchestration without writing a single bespoke chain class. The example on screen runs three branches in parallel and returns all three under a single dictionary. The basic building blocks are models, embeddings, and prompts.
Models live in the provider packages and have a consistent interface no matter who built them. chat open AAI from langchain open AI chat anthropic they all expose the same invoke stream and batch method embeddings turn text into a numeric vector which is what makes retrieval possible later the standard one is open AI embeddings but every provider has their own prompt templates are how you build the actual messages sent to the model chat prompt template for messages takes a list of ro and content pair pairs with placeholders in curly braces for variables. The interesting helper is messages pleclure which leaves a slot in the prompt for a list of past messages which is how you plug in chat history without having to format it yourself. These three together are the smallest useful chain. A common trap with language models is that they return text, but you almost always want structured data.
Lang chain has a long history of output parsers for this. Stro parser is the trivial one which just returns the model's text. The historical favorite was paid output parser which asks the model to emit JSON matching a pyantic schema and then parses it. The modern shortcut and the one you should reach for in new code is the with structured output method on any chat model. You hand it a pyantic class describing the shape you want and you get back a real Python object fully typed and validated.
The model handles the prompting, the parsing and the retries on schema violations internally. The example on screen turns an email into a T-z object with urgency and reason fields in three lines. No parsing code required. Tools are how the model reaches outside its own context to do things. In lang chain, you mark any Python function as a tool by adding the add tool decorator above it.
The decorator reads the function signature and dock string and uses them to build a JSON schema that the model will see. Then you call model.bind tools and pass a list of those tools. And the model can now decide when to call them and what arguments to pass. Tool calls come back as structured tool call objects with the name of the tool and the arguments and your code dispatches them. This is the same primitive that every modern agent in Langchain stands on and it is also what create agent uses internally.
If you understand the at tool decorator and bind tools, you understand half of lang chain. Before you can search over a corpus of documents, you have to ingest them. Langchain has dozens of document loaders for this one per source. PIP PDF loader for PDFs, web-based loader for webpage, unstructured file loader for almost anything else, plus loaders for Google Drive, Slack, GitHub, and most major SAS. The loader returns a list of document objects, each with text and metadata.
The next step is splitting those documents into chunks small enough that an embedding model can handle them. The workhorse splitter is recursive character text splitter, which tries to break on paragraphs, then sentences, then character until each chunk is under your size limit. The two knobs that actually matter are chunk size and chunk overlap. Smaller chunks mean more precise retrieval, but more rows in your vector store. Overlap helps avoid losing context at chunk boundaries.
Start with a,000 and 150. Once you have chunks, you push their embeddings into a vector store. The vector store holds three things. The embedding vector itself, the original chunk text, and any metadata you attached like the source file name or page number. There are many backends.
Chroma for local prototyping. PG vector when you already have Postgress. Pine cone or Weeb8 for managed scale. F AIS when you need a portable inprocess index. They all expose the same interface.
So you can swap one for two another with a single line change. The piece that does the actual searching is the retriever which you get by calling as retriever on any vector store. The retriever has its own family of decorators that boost recall. Multi-query retriever rephrases the question multiple ways. Contextual compression retriever filters out irrelevant chunks.
Ensemble retriever combines vector search with classic keyword search. Pick the simplest one that works. Now we can put it all together. Rag retrieval augmented generation is the single most asked about pattern in langchain interviews and it is also the pattern that most production systems start from. The full assembly is loader to chunks, chunks to embeddings, embeddings into a vector store, store wrapped as a retriever, retriever results plugged into a prompt that says answer using only the following context, then the model, then a parser.
In LCL, the whole thing is a few lines. The example on screen uses a parallel runnable to send the retrieved context and the original question into the prompt as separate variables then pipes through the model then through a string output parser that is a working rag system. Everything else re-ranking query rewriting evaluate multi-step retrieval is a refinement of these five steps. For a long time, the way you built an agent in Langchain was a class called agent exeutor. You handed it a model and a list of tools, and it ran a hidden loop, calling the model, calling the tool the model picked, feeding the result back, and stopping when the model said it was done.
It worked, but the loop was opaque. The state was hidden inside the executive. Branching was impossible. There was no way to pause for human approval and conversations could not be persisted across processes. Agent exeutor the modern story is langraph which models the agent as an explicit state graph that you can see debug persist and pause.
For most cases you do not write the graph by hand. You call the new create agent function in the core lang chain package which builds a sensible default graph for you with tools memory and a reasonable stop condition. Langraph is built on three concepts state nodes state is just a typed dictionary usually a typed dict that holds everything your agent cares about including the current question the conversation history the running scratch pad and the tools called so far. A node is a Python function that takes the state, does some work, and returns a partial state update. The graph merges that update into the running state.
Edges connect nodes. Static edges are fixed transitions like a from B to C. Conditional edges added with add conditional edges run a rotor function that returns the name of the next node, which is how you do branching. You start by going from the start node to your first real node and you finish by transitioning to end. Once you compile the graph, the result is a runnable with the same invoke stream and batch methods you already know.
The killer feature of Langraph and the reason it is now the agent default is the interrupt mechanism. There are two flavors. The compile time form called interrupt before takes a list of node names. When the graph is about to enter one of those nodes, it pauses, persists its state, and returns control to your application. Your code can then inspect the pending action, show it to a human, and only resume execution by calling invoke again with none.
The runtime form, just called interrupt, lets any node stop in the middle and wait for a value to be supplied. Both rely on the same idea. Persist the state, return control, resume on demand. This single mechanism turns Langraph from a framework into an operations tool. You can require human approval before any tool that touches money, sends an email, or modifies a database with two lines of code.
Langraph also gives you persistence almost for free through a concept called the checkpointer. A checkpointer is a small adapter that writes the graph's state after every step keyed by a thread ID you pass in the run config. Think of the thread ID as a session ID one per user conversation. The simplest checkpointter is memory saver which is a Python dictionary perfect for prototype for production. Swap in Postgress saver which writes to a Postgress table.
With a checkpointer in place, the same conversation can pause for an hour, resume in a different process, and continue from exactly where it left off. The model also gets the entire history of past states for free, which is what replaced the old conversation buffer memory class. Persistent state and resumable conversations are now a first class V, not something you bolt on yourself. The single highest leverage thing you can do after wiring up Langchain is to wire up Lang Smmith. It takes three environment variables.
Langmith tracing set to your Langsmith API key and a Langmith project name. After that, every runnable, every LCL chain and every Langraph node automatically traces to the Langsmith dashboard with no code changes. You see the full call tree per run, the latency at every step, the token cost, the inputs and outputs of every model call, and any errors. From there, you get data sets for capturing real production examples, evaluators for grading runs against expected outputs, and a prompt hub for versioning the prompts you use across the team. The free developer tier is enough to feel the value.
If you build anything in Langchain and skip this, you will spend twice as long debugging. A historical detail worth getting right because it confuses many newcomers is what happened to Langserve. Langserve was until recently the official way to deploy a lang chain chain as an httpi with a few lines of fast API scaffolding. Today, Langserve is in maintenance mode. The repo accepts bug fixes but not new features and the Langchain team officially recommends the Langraph platform for new deployments.
The platform handles streaming persistence doublete human in the loop cron jobs and web hooks none of which Langserve ever did well. On the client side, the equivalent of the old remote runnable, which let one Python process call a remote chain is now remote graph. If you have an existing Langerve app, it still works. But if you are starting fresh, deploy to the Langraph platform or wrap your graph in your own fast API server. Do not start a new Langserve project.
I want to be very specific about what a small team can actually ship in a week because most lang chain marketing aims at large companies. Five use cases that fit a small business with what to wire up. One, customer support email triage. Use the structured output method to classify each email by urgency and category and to draft a reply. Two, internal documentation Q&A.
Use the rag pattern over your wiki deployed as a bot in a single Slack channel. Three, lead scoring one model call per inbound form. Structured output into a pyantic schema. Log everything to Langmith. Four, contract clause extraction.
PDF loader plus structured output. No rag needed because each contract fits in context. Five, ask the database. The built-in SQL agent introspects your schema, writes queries, and self-corrects when they fail. Each of these is a weekend project, not a quarter.
If you are interviewing for an AI engineering role in 2026, Langchain comes up. The questions cluster into a few groups. The first group is conceptual. Walk me through rag end to end naming each piece. Chain versus agent.
When do you use each? What is LCL and why does it exist? How do you stream tokens to the user? The right answers are short. Rag is loader to splitter to embeddings to store to retriever to prompt to model.
A chain is a fixed pipeline. An agent is a model in a loop choosing tools. LEL is the pipe operator that gives you streaming and tracing for free. Streaming is the dotstream method on any runnable. If you can answer those four cleanly with concrete class names, you have already cleared the bar at most companies hiring for this work.
The deeper questions are where you separate yourself from someone who only read the tutorials. How would you debug a hallucinating chain? The right answer mentions Langmith first, then inspecting the retrieved chunks because the answer is usually not in them. Then tightening the prompt to say answer only from context. and finally adding a grounding evaluator.
What does a vector store actually store? The right answer covers three things. The embedding vector itself, the original chunk text it was generated from, and any metadata you attached to it. Lang chain versus llama in. Langchain is broader with agents and orchestration.
Llama index is retrieval first with richer indexing. They interrupt and many production systems use both. Prompt treat all retrieved and user content as untrusted. Separate system from user channels. Allow list the tools the model can call.
Validate every structured output. For high stakes actions add a second model that reviews the proposed action before it runs. A balanced view requires saying when not to use lang chain because the honest answer is often. If you only need a single LLM call, do not use lang chain. Use the provider SDK directly.
The framework adds value when you have orchestration, not when you have a oneshot prompt. If your latency budget is tight, the abstraction layer adds milliseconds and at scale that can matter. If you need to read every bite of every request going over the wire for compliance or for debugging, the layers of abstraction make that harder than it needs to be. And if your use case is so simple that the framework's surface area is bigger than your code, the framework is the wrong tool. The Langchain team is the first to say this in their own docs.
Pick Lang Chain when you need orchestration, not because the marketing said so. If you want to actually be productive in lang chain by next weekend, here is the order I would learn it in. Day one, build a single chain that uses with structured output. No rag, no agents, just prompt pipe model with structured output. You will internalize LCL and the most useful method on a chat model.
Day two, set the three Langsmith environment variables and run yesterday's chain again. You will see your first trace and you will never want to debug lang chain blind again. Day three, build a small rag chain over a folder of your own PDFs. Use Chroma for the vector store. Day four, build an agent with two real tools using create agent.
Day five, take that same agent, throw it away, and rebuild it as a handwritten Langraph state machine. By the end of day five, you understand the whole stack. If you remember nothing else from this video, carry these five ideas forward. First, LCL and the pipe operator are now the heart of the framework. Almost everything else is built on the runnable interface.
Second, the two API surfaces you will use most are with structured output for getting JSON like data back and the at tool decorator for letting the model call your code. Third, Langraph has replaced the old agent exeutor and the checkpointer makes resumable conversations a first class feature. Fourth, Langsmith is three environment variables and it is the highest leverage thing you can add to any Langchain project. And fifth, Langchain is an orchestration framework. If you only need to make a single LLM call, use the provider SDK directly.
If you need to