LangChain Explained for Engineers: How the Runnable Interface Works
Once the Runnable interface clicks, the rest of LangChain — LCEL, LangGraph, tools, retrievers, structured output — becomes predictable instead of surprising.
Watch (13:05)
Overview
Once the Runnable interface clicks, the rest of LangChain — LCEL, LangGraph, tools, retrievers, structured output — becomes predictable instead of surprising.
Full transcript (from the video)
Langchain is a framework for building applications powered by large language models. It started as a single Python package, but has grown into a small ecosystem of related libraries that share the same building blocks. The core idea is that real applications need more than a single chat call. They need prompt templates, structured outputs, retry, vector search, tool calls, and stateful orchestration. Langchain gives you a small set of composable primitives that handle each of those concerns and a uniform interface that lets you wire them together.
Once you learn the primitives, the whole framework starts to feel like one consistent shape applied at different layers. The most important concept in modern lang chain is the runnable interface. Every chain, model, prompt, retriever, and tool implements the same interface. A runnable is anything that takes an input, does something with it, and returns an output. Crucially, every runnable supports the same four call patterns.
There is invoke for a single synchronous call, batch for handling many inputs at once, stream for getting incremental output as it is generated, and an async counterpart for each of those. Because everything implements the same interface, you can swap a chat model for a retriever for a custom function. and the calling code does not need to change. This uniform shape is what makes the rest of the framework composable. Langchain expression language usually written as L is the syntax for composing runnables together.
The headline feature is the pipe operator, the same character you use for shell pipes. When you write a chain as prompt pipe model pipe parser, you are describing a sequence where the prompt produces messages. The model produces a response and the parser extracts a structured value. Liel handles the threading of inputs and outputs for you. Each step receives the previous steps output as its input.
The whole composition is itself a runnable so you can pipe it into the next thing or invoke it directly. This is why people often describe lang chain code as feeling like Unix pipelines for language model calls. Runnable lambda is the simplest way to bring a regular Python function into a chain. You wrap a function, give the runnable a descriptive name, and now that function participates in the LCL world. It can be piped into other runnables, batched over many inputs and traced through the same observability layer as the rest of the chain.
The most common use is for pure transforms between LLM calls like reshaping a dictionary or extracting a field. Because the function is wrapped rather than rewritten, you can keep your business logic in plain Python while still benefiting from the framework's plumbing. This is a hidden source of leverage in real projects. Two cousins of Runnable Lambda extend the composition story. Rundable parallel takes a dictionary mapping names to runables and runs all of them on the same input returning a dictionary of results.
This is how you fan out work for example asking three different prompts about the same document and gathering their answers. Runnable branch is the conditional version. You give it a list of predicate runnable pairs plus a default and the first matching predicate decides which branch handles the input. Together, these let you express any directed graph of language calls without writing imperative if statements or thread pools. The shape of the chain stays declarative which keeps tracing meaningful.
Sometimes you want a chain step that yields multiple results over time rather than returning a single value. Rundable generator is the wrapper for that pattern. You give it a generator function that takes an iterator of inputs and yields outputs as they become available. When the chain is invoked with stream, those outputs arrive incrementally so a downstream consumer can react before the full sequence is done. The classic example is processing tokens from a streaming chat model and yielding parsed values as soon as enough text has been seen.
This is the primitive that makes streaming feel like first class behavior rather than a special case bolted on top. Langchain models the language model as a base chat model. Cloud and local providers each ship their own subclass, but they all share the same invoke stream and batch methods. The unit of conversation is a typed message rather than a raw string. System message carries the persona or instructions that shape how the model behaves.
Human message is the user turn. Aa message is what the model produced and it can also carry tool call payloads when the model decides to use a function. Treating the conversation as a list of typed messages rather than concatenated strings is what makes structured features like tool calling and multi-turn conversations work cleanly. A chat prompt template is how you describe a prompt that has variable parts. You write a list of message templates, each with placeholders for runtime values.
When the template is invoked with a dictionary of variables, it produces a fully formed list of messages ready for the model. The template itself is a runnable, so it pipes into a model the same way any other runnable does. There are companions like messages please skelchure for inserting full message lists and fshot variants for showing examples. The point is that prompt construction is a typed traceable step in the chain rather than a string formatting hack scattered through your code. Plain text output from a model is hard to consume reliably.
The with structured output method on a chat model fixes this by binding a pyantic schema to the call. You define a beam bold with the fields you want, pass that class to with structured output, and the resulting runnable returns parsed instances of your model. Behind the scenes, providers that support native function calling use that path. providers that do not get a fallback that injects the JSON schema into the prompt and parses the response. Either way, your downstream code receives a typed object with validated fields, so you never have to write a reax over model output again.
Real applications hit transient failures, network blips, rate limits, structured output that fit, model providers having a rough hour. Two methods on every runnable handle this without restructuring your code. The width retry method wraps the runnable with exponential backoff retries configurable on attempts and base delay. The with fallbacks method takes a list of backup runnables that get tried in order if the primary raises. The classic pattern is a primary cloud model with an alternate model as the fallback both wrapped in retry.
Because these helpers return new runnables, the rest of your chain does not change and tracing still shows you which attempt actually succeeded. Modern applications often need to look up relevant context before generating an answer. The two ingredients are an embedding model and a vector store. An embeddings instance turns a piece of text into a fixedlength vector that captures its meaning. A vector store holds many of those vectors along with the original text and arbitrary metadata and supports nearest neighbor lookup.
Together they are the foundation of retrieval augmented generation. The pattern usually shortened to rag. You ingest the corpus once query it at runtime and inject the top matches into the prompt so the model has fresh relevant context to work with. Langchain ships connectors for dozens of embedding providers and vector stores so you can swap implementations without changing your chain. Modern chat models can call functions.
Lang chain expresses this as tool. You write a regular Python function decorate it with at tool and now the framework knows the function name, argument types and description automatically from the signature and dock string. You bind a list of tools to the model and at inference time the model returns either a final answer or a tool call payload naming one of your tools and the arguments it wants. Tool node is the helper that takes the tool call output and dispatches it back to the matching Python function capturing the result so the next loop iteration can see it. This is how external capabilities from web search to database queries to local file reads get composed into a chain.
Leel chains are linear or treeshaped but real workflows often need branches that depend on accumulated state or human in the loop posit langraph extends the lang chain world with a typed state machine called state graph. You define a typed for the state register nodes that take the state and return a partial update and wire them together with edges. Plain edges go from one node to the next unconditionally. Conditional edges call a routing function that inspects the state and picks the next node by name. The whole graph compiles into a runnable so it slots into LCEL composition like anything else while gaining the ability to handle cycles and complex routing.
State graph state would be hard to reason about if every update fully replaced the field. Langraph solves this with reducers. A field is declared with a notated naming both the type and a reducer function. Each time a node returns a partial update for that field, the reducer is called with the current value and the new one to produce the merged result. The most common reducer is add messages which appends to a message list rather than overwriting it.
You can also write your own to merge dictionaries to duplicate lists or apply any other accumulation strategy. Reducers are why a graph that visits the same node many times can keep building up history without manual juggling. Sometimes the order of tool calls cannot be planned ahead of time. The model has to decide after each step whether to call another tool or to finish. That iterative loop is what an agent is.
The create agent helper builds one for you. You give it a chat model that supports tool calling, a list of bound tools, and an optional prompt. The returned runnable runs a loop. Each turn, the model receives the conversation so far plus the tool registry, and it either emits a tool call that gets executed and fed back or it emits a final answer that ends the loop. Internally, this is a small langraph state machine, which is why agents and the rest of the framework feel like one consistent thing.
Once you have a real chain, you need to see what it is doing. The observability story is built on callbacks. You subclass base callback handler and override the methods for the events you care about. There are families for chains, language models, tools, and retrievers. Each family has start, end, and error call backs, plus extras like new token streaming.
When you attach the handler to a chain via its config, every runnable in the composition fires events through it. Langsmith tracing is implemented as one of these handlers, but you can write your own to forward events to your logging system, your event ledger, or anything else. The handler interface is the seam between the chain and everything you want to know about it. The reason this framework feels coherent once you learn it is that almost everything implements the runnable interface, models, prompts, retrievers, output parsers, custom functions, tool nodes, even whole compiled graphs, all expose, invoke, stream, and batch. L is the syntax for piping them together with retry and with fallbacks add reliability.
Callbacks add langraph adds stateful orchestration when straight pipelines are not enough. Once those primitives click, building real applications stops feeling like glue code and starts feeling like composition. You pick the right shape.