How Pi Mono Actually Works: The Shared Agent Stack Behind Pi
The architecture under Pi Mono — one harness, many agents — and why that shape keeps showing up in local AI tools.
Watch (16:06)
Overview
The architecture under Pi Mono — one harness, many agents — and why that shape keeps showing up in local AI tools.
Full transcript (from the video)
The right way to read PI Mono is as a stack of reusable layers, not as one giant app called PI. The terminal coding agent gets all the attention as the most visible surface. But underneath that surface, the repo is split into a provider abstraction and an agent. On top of that shared middle, it has a terminal product. It also has a browser product, a Slack product, a terminal UI toolkit, and also the remote model deployment. That structure matters for two reasons. First, it makes the repo easier to understand because each package has one narrow job. Second, it shows how to customize PI for your own work. You do not need to fork the whole thing. You just need a different tool surface, the UI or model backend. In many cases, you keep the shared middle and swap the surface around it. I want to ground this in the current repo as it exists now, not in an older mental view.
I checked current pi mono head. The package layout is very clear. The root package file connects the repo workspace. The root build script shows the build order. First is the terminal UI package. Next comes the AI package.
Next comes the agent core package. Next comes the coding agent package. Mom follows the build order. Web UI follows the build order. Pods is supporting infrastructure beside that stack. Term agent core depends on the provider layer. End user surfaces depend on the shared middle. This build order is a useful shortcut for foundational and ser. One of the best signals in pyono is that the repo teaches contributors how to read it. The root agents file says if you didn't ask for a concrete module start with the root readme then read the relevant package readmmes in parallel.
This approach sounds simple but is actually a design statement. It means the author expects the repo to be navigated from broad architecture toward narrow implementation. The agents file also makes package boundaries part of correctness not just style. It forbids destructive git. It pushes contributors toward explicit verification. The readmemes are treated as operational docs. So when you ask why PI feels coherent as a monor repo, part of the answer is that the workflow rules are aligned with the package structure. The repo guides both humans and agents. PI AI is the layer that prevents the rest of the repo from turning into a provider compatibility maze. The readme is explicit that this package focuses on models that support tool calling because the whole point of the stack is agentic workflows not just text generate. So PI owns the conversation with open AI anthropic Google bedrock copilot codeex and compatible endpoints. It normalizes the event stream usage accounting and model selection. So the upper layers can ask for a model and stream.
Conceptually, this is the adapter layer between raw model vendors and the rest of the product. If you are building your own AI stack, this is one of the most reusable decisions in the repo. Put the vendor chaos in one place, then keep the rest of the system pointed at a stable interface. There is a second layer of value in PI AI beyond raw provider access. The models file builds a registry from generated model data that lets callers ask for a provider and model pair through one typed lookup path. The register built-ins file shows another important choice. It lazy loads provider modules instead of importing every provider at startup. It also forwards provider streams into one shared event stream that keeps startup and dependency load more disciplined P AI. So the rest of the stack can show token and price information without each surface inventing its own math. In other words, a good provider layer centralizes model discovery capability differences.
If those pieces are not centralized, complexity leaks upward and every surface gets hard. If PI AI is the vendor facing edge, PI agent core is the runtime center. Its main agent class holds state and hands turn mechanics to loop. That loop repeats the same job. It takes the current context and reshapes it. It converts it into L. It streams the assistant response. It detects tool calls, executes them, and appends the results. It decides whether another turn is required. The important AI concept is that an agent is not just one model completion. It is a controlled loop around a model. This repo makes that explicit. You see boundaries between prompt injection, LLM streaming, tool execution and continuation. That is why same core powers a terminal tool. It can also power a browser UI or a slackbot.
All this is one of the smartest architectural decisions in the whole repo. Agent core does not make every application jam its own UI messages directly into the LLM protocol. Instead, it keeps an internal agent message type that can carry app specific. It converts to ordinary LLM messages only at the boundary where a provider call happens.
The readme spells out the difference. LLM understand user messages, assistant messages, and tool results. An app may also want UI messages that there is also a final conversion step before the model request. If you want to customize PI for your own stack, this is a major extension point. It lets you add richer local state without forcing the model to consume every internal implementation detail. The event model is the other reason agent core stays reusable. The loop emits start, turn, message, and tool execution events as it progresses so a UI can stay responsive while the work is happening. That is the difference between an opaque completion and an observable runtime. Before tool call runs after argument validation and can block execution. After tool call, the readme even calls out an important semantic detail. Assistant message end acts like a barrier before tool pre-flight begins. So hooks see state that already includes the assistant message that ask for the tool. That kind of detail is easy to miss but it is exactly what makes a framework usable.
If you are building a safer or more specialized agent, this is where you insert approval policies auditing custom. The coding agent package is best understood as a harness. It is not the whole stack and it does not try to be.
The package readme says this plainly. PI is a minimal terminal coding harness that you extend through prompt templates, skills, extensions, themes, encode packages/coding agent/cain.ts mostly parses CLI arguments, loads resources, wires up the model registry and settings, and then hands off to the shared session layer. That is a good sign. It means the terminal product is not hiding. It is mostly assembly policy and user interaction for your own AI tooling. That is a strong pattern. Keep the product shell thin enough that you can replace the terminal with a browser, Slack, or something else without rebuilding the agent runtime. Agent core gives you a loop. Agent session gives you a product. The session layer in packages/coding agents/saycore/ aagent session.ts is shared across run modes and it handles the things people actually feel when they use pi everyday queuing steering and follow-up messages saving JSONL sessions switching or this is important because a lot of AI tools collapse product behavior directly into the model loop. PI keeps them separate that lets the core stay conceptually clean while the session layer adds persistence and ergonomics on top. If you want to build your own agent product, this is a useful middle layer to emulate. It turns a capable loop into a durable working environment without polluting the loop itself with terminal specific assumptions. Pi's default tool choice is unusually opinionated. The core coding surface is just read, bash, edit, and write. Optional built-ins like GP, bind, and the coding agent readme and the source in src core/tools landexes.
Make the philosophy explicit. Start with a tiny legible tool set and extend only when your workflow demands it. That has an AI design benefit. Smaller tool surfaces are easier. Instead of baking a giant menu into the default agent, PI expects teams to add workflow specific behavior through extensions or packages.
So if you are adapting the stack for yourself, the lesson is not that four tools are always enough. The lesson is to make extra capability intentional, right? This is the most important product idea in PI. Customization is not an afterthought. It is the main surface.
The coding agent readme lays out the layers clearly. Agents files provide repo instructions. System files replace or append the system prompt. Prompt templates are reusable markdown.
Extensions are full TypeScript modules that can add tools, UI, key bindings, hooks, or even sub aent behavior. Themes change the terminal presentation. Pack the philosophy section drives the point home by listing things PI does not build in by default like sub agents, permission pop-ups or plan mode. Bet is not missing functionality by accident.
It is a deliberate bet that your workflow should own those choices. If you want to customize PI, this is where you should spend your energy. Many terminal AI tools bury their UI code inside the app and treat it as glue. PI turns terminal UX into a package. The packages/tui/tui test file shows why it handles component rendering differential that is a real UI framework not a helper script. The coding agent benefits from that separation because interactive behavior like overlays, selectors, and editors can improve without tangling with the agent runtime. More importantly, it makes the repo's layering honest. The agent loop is not the terminal package is if you are building your own AI tool and care about terminal ergonomics. This is one of the more interesting packages in the repo because it shows how much product quality actually comes from the shell around the model. The web package proves that the agent loop is not terminal specific. at Mario Zechner/Py web UI exports browser components like chat panel and agent interface but those components still expect an agent instance underneath the package layers browser concern the chat panel.ts TS file is a nice concrete example. It creates an agent interface, wires up an artifacts panel, reconstructs prior artifact state from messages, and uses index DB back storage for settings and sessions. So the it is a UI around the same middle layer with browser native additions like sandboxed artifacts and local storage. That is exactly the kind of reuse you want if you are designing your own stack. keep the intelligence loop stable, then build different user experiences around it. Mom is where the repo stops looking like a developer toy and starts looking like an operation substrate. The package readme describes it as a slack bot that can execute bash, read and write files, manage working memory, and even create workflow. The important architectural idea is that every channel gets a dedicated workspace with log.jsonal context.json JSONal memory files, attachments, and channel local tools that lets the bot preserve context over time without pretending the whole world fits in one prompt window. It still uses a model, tools, context compaction, and policy. The surface just happens to be slack and the persistence model happens to be. If you want to adapt PI to your own organization, this package is a strong clue that the stack is designed for durable workflows, not only ad hoc chat sessions. Pods is the package that handles inference placement. It is not the reasoning loop and it is not the main user interface. It is the infrastructure layer that sets up GPU pods, starts model servers, and exposes OpenAI compatible. The CLI in packages/pods/ts makes the intended flow obvious, configure a pod, start a known model, and then point an agent or any compatible client at the resulting endpoint. The coding agent or web UI does not need a special new mental model for remote inference. It still talks to a familiar endpoint for your own stack.
Pods is the answer when the thing you want to customize is where models run, how GPU memory is managed or how remote deployments are exposed. It is not the answer when your real problem is agent behavior or product workflow. This is the practical takeaway for people who want to build their own AI workflow on top of PI ideas. First, decide which layer is actually wrong for your use case. A lot of teams fork everything when what they really need is one new surface or one new policy hook. If the difference is provider access, stay low and work in PI AI. If the difference is how messages, tools, or context are handled, work in agent core. If the difference is that you want Slack, a browser, a CI bot, or a custom terminal, change the shell and keep the middle.
And if the difference is mostly team process, use agents files, system files, prompt templates, skills, extensions or packages. The whole repo is set up to reward that kind of layering. You get more reuse, less drift from upstream and a clearer story about where each kind of if you remember one diagram from this repo, let it be this one. Endpoints feed into PI AI. PI AI feeds agent core.
Agent core powers the userfacing products. The coding agent relies on PI TUI for terminal UX. Pod sits to the side and changes that is the repo. It is not mystical and it is not one giant monolith. It is a provider layer, a runtime layer and a set of surfaces around that runtime. That is why PI is interesting beyond the terminal app. You can take the ideas apart and reuse them.
You can keep the middle and build your own surface or you can keep the surfaces and swap the model plumbing underneath.
Once you see those boundaries, the repo stops feeling like a brand and starts feeling like a set of composible engineering choices.