How Pi Mono Actually Works in Your App
Walking through the Pi Mono shared agent stack from inside a real app, not as an abstract architecture diagram.
Watch (18:12)
Overview
Walking through the Pi Mono shared agent stack from inside a real app, not as an abstract architecture diagram.
Full transcript (from the video)
Most people look at Pi and see a terminal coding agent that is the visible product, but the repo underneath tells a different story. Pi Mono is published as a set of npm packages that you can install independently. The harness that makes PI feel like one coherent product is actually three layers stacked on top of each other. A provider layer at the bottom, an agent loop in the middle, and a thin product shell on top. Those layers are designed to come apart. You do not need to fork the whole monorreo to use them. You do not need to read thousands of lines of source code. You install the packages, wire the layers together, and build your own surface around the same agent runtime that powers the terminal product. That is what this video walks through step by step from an empty project to a working agent harness inside your own app. The harness has three layers. The starting point is three packages from the PI mono repo. PI AI is the provider layer. PI agent is the agent loop. PI coding agent is the reference product shell. In practice, you may not even need the coding agent package as a runtime depend. You mainly need it as a reference for how the assembly works. PAA and PA agent are the two loadbearing packages. The coding agent shows you the pattern for wiring them together. So the workflow is straightforward. Install all three. Read the coding agent source to understand the assembly and then build your own shell on top of the first two. The versions stay in lock step across the monor repo. So you will not hit version mismatch problems as long as you pin to the same release. PI AI is the simplest layer to understand and it is the right place to start. You ask for a model by provider name and model name. You get back a typed model object. Then you call stream with that model a system prompt a list of messages and a list of tools.
The return value is an async event stream and that is the entire interface.
Underneath pi handles authentication. It normalizes the streaming format. So your code never needs to branch on different vendors. It handles talking to anthropic, open AI, Google, or even a local endpoint. It also tracks token usage and cost in one place. All of that complexity lives behind two functions.
Everything above the provider layer speaks one protocol regardless of which vendor sits underneath. This is why PI earns its place as a real dependency instead of something you write yourself.
Without it, every model backend brings a different SDK import, a different authentication pattern, a different streaming format, and different usage accounting. You end up with branching logic in high AI collapses all of that into one import and one streaming interface. Provider switching becomes a configuration change instead of a code change. That matters more than it sounds because the moment you need a second provider, the branching complexity starts centralizing the provider logic early keeps the agent loop and the product shell clean. If you skip this step and talk to model SDKs directly, you will eventually rebuild the same normalization that PI or the agent loop is where PI becomes more than a chat wrapper. You create an agent instance with a model, a system prompt, and a set of tools. Then you call run with a list of messages. What happens underneath is not a single model completion. It is a controlled loop. The agent sends the messages to the model. The model responds and that response might include tool calls. The agent executes those tool calls, appends the results to the conversation, and sends the updated context back to the model for another.
This loop repeats until the model responds with a final message that does not request any tools. That is the fundamental difference between a chatbot and an agent. A chatbot makes one completion and stops. An agent runs a loop until the task is actually high.
Agent core makes that loop explicit. observable and hookable from the outside. This diagram shows the turn cycle as a flow and it is worth studying carefully. A user message enters the system. The agent transforms the context which might mean pruning old messages or injecting fresh system state. It converts the internal agent messages in it streams the response through the provider layer. Then it parses that response looking for tool calls. If it finds tool calls, it executes them, appends the results, and loops back to the transform step for another turn. If there are no tool calls, the response is final, and the loop exits. This is the single most important diagram to internalize when building your own harness. Every feature you add is a hook into one of these steps. Context shaping lives at the transform step. Tool gating lives at the execution step. Once you see the loop clearly, you know exactly where your code belongs. This is one of the smartest architectural decisions in the whole stack. The agent loop does not force your app to jam all of its state into raw model messages. Internally, it works with a richer type called agent message. Agent messages can carry metadata, UI state, attachments, or anything else your application needs to track. The conversion to plain model messages only happens at the boundary where the provider call is made. That boundary has two transform context reshapes the agent messages by pruning old turns, injecting fresh state or reordering information, convert to model format, then strips the agent messages down to what? This separation is why PI can power a terminal, a browser, and a Slackbot from the same loop. Each surface carries different metadata in its agent messages, but the model always sees a clean and focused context with tools in PI follow a simple declarative shape. Each tool has a name, a description that the model reads, a typed parameter schema and an execute function. The execute function receives validated parameters and returns a result. It presents the tool to the model, parses the tool call when the model asks for it, validates the arguments against the schema, calls your execute function, and feeds the result back into the next turn. You do not need to write any of that plumbing yourself.
You define what the tool does and what it accepts, and the loop takes care of the rest. This is where your application's unique capabilities enter the system. A coding agent has file and terminal tools. A customer support agent might have ticket and knowledgebased tools. A data pipeline agent might have query and transform tools. The loop stays the same across all of these. The tools are what make each agent. The default tool surface in PI is just four tools. Read, bash, edit, and write. That is not a limitation. It is a deliberate philosophy. Smaller tool surfaces are easier for models to use correctly.
easier for human. The coding agent documentation makes this explicit by saying to start with a tiny legible tool set and extend only when your workflow demands it. When you build your own harness, resist the urge to register 20 tools on the first day. Start with the minimum set your agent actually needs.
You can always add more capabilities later as real needs emerge. A bloated tool surface from day one makes the model worse at choosing the right tool, makes your approval logic more complex, and makes debugging. Four tools that the model uses well, will always beat 40 tools that it uses poorly. The agent loop is not a black box. It emits events at every meaningful step, and those events are what make the system observable. The message update event fires as assistant text streams in so you can pipe tokens to any user interface in real time. The tool execution start event fires before a tool actually runs so you can show the user what the agent is about to do. The tool execution end event fires after the result comes back so you can log it, audit it, or display it. These events are what turn an agent from an opaque function call into an observable runtime. They are also what let different surfaces feel responsive while sharing the same loop underneath. A terminal can print streaming text. A browser can update a component. A Slackbot can post event handlers are what make each surface feel alive.
Events tell you what happened. Hooks let you change what happens next. The before tool call hook runs after the model asks for a tool and the arguments pass validation but before the tool actually you can inspect the tool name and the arguments prompt the user for approval or block the call entirely. The after tool call hook runs after the tool returns its result but before that result is emitted to the rest of the system. You can inspect the output, transform it or redact sensitive content. This is where your application's safety policy lives in practice. A coding harness might gate destructive file operations behind user approval. A customerf facing a a compliance workflow might log every single tool call to an audit trail. The hooks are the control plane of the agent and they separate what the model wants to do from what your application actually allows. This is where all three layers come together into a working product. Your product shell is the outermost layer. It owns the user interaction loop, the session life cycle and the presentation logic. You create a model from the prov. You create an agent with that model, a system prompt and your tools. Then you run your own interaction loop on top. Get input from the user. Pass it to the agent. Render the response. Repeat until the session end. The important thing to notice is how little of this code is pi specific.
The call to run the agent is the only part that touches the harness. Everything else is your code. That is why this architecture works for any surface you can imagine. A terminal reads from standard input. A web server reads from HTTP requests. A Slackbot reads from web hook events. The harness does not care where the messages come from. It processes them and returns responses. If you want richer session behavior, the coding agent source is the best reference. The agent session class in that package adds persistence by saving conversations as JSONL files that can be resumed later. It adds context compaction by summarizing old turns when the context windows. It supports model switching so users can change providers in the middle of a conversation without restarting. It supports session branching so you can fork a conversation and explore alternatives without losing the original thread. It builds the system prompt dynamically from agents files. It supports export to HTML and markdown for review. And it handles cued messages so users can type follow-up instructions while the agent is still working. You do not need all of these features on the first day. But when you need any of them, the coding agent source shows you exactly how each one is built on top of the same agent loop you already have. One of the best patterns in PI is how it separates agent behavior from agent code. Instead of hard- coding every instruction inside a system prompt string, the coding agent loads agents files from the project route and from parent directories. These files tell the agent what the codebase expects. They name which paths are safe to edit, which paths to avoid, and what verification commands to run before finishing. System files extend or replace the default system prompt with project specific context. Together, these two file types let project maintainers shape how they when you build your own harness, adopt this pattern early. It means the people who know the project best can guide the agent without needing to understand your harness internals. It also means that behavior changes are version controlled, reviewable, and scoped to the right part of the codebase. This is a minimal working harness in about 30 lines of code. It imports the provider layer and the agent loop. It imports four tools.
It loads agents files from the current working directory for project specific instru. It creates a model, creates an agent with that model and those tools, and adds a hook that logs bash commands before they run. Then it reads lines from standard input and passes each one to the agent. This is a real and functional coding agent. It can read files, run shell commands, and make edits. It follows project instructions from agents files. It logs what it does.
It is not as polished as the full PI terminal product, but it is structurally the same. Every feature you add from this point forward is an incremental addition to this working core. Session persistence, context compaction, richer presentation, model switching. You are not building. These are the mistakes that cost people the most time when they try to use PI in their own projects. The most common mistake is forking the entire repo instead of installing the packages. Forking means you own the maintenance burden of everything in the monor repo. Install the packages and let upstream handle the rest. Another common mistake is copying the coding agent source directly into your app. Use it as a reference for how the assembly works, but write your own shell. A related mistake is registering every tool you can think of on the first day. Start with four and add more only when the agent actually needs them. Do not skip the provider layer and talk to model SDKs directly because you will end up rebuilding the same normalization logic that PIA already handles. Use agents and system files for project specific instructions instead of one giant prompt string. And remember that the agent is a loop, not a single call. Design your interface and your error handling around the fact that one user message might trigger multiple tool calls across multiple. Once your basic harness is working, there are clear paths to grow it without rearchitecting anything. RPC mode lets you expose the agent as a service, which means other applications can send it tasks over the network without going through a terminal. Custom extensions let you add tools, commands, widgets, and policy hooks as standalone TypeScript modules that plug into the agent at runtime. Skills and packages let you bundle reusable capabilities and distribute them through npm. HIPPODS lets you deploy models on remote GPUs, so your agent can use openw weight models without needing local hardware.
Each of these is an incremental addition to the same harness you already built.
None of them require you to change the core architecture. That is the payoff of building on a layered system from the start. You grow the surface without ever rebuilding the foundation underneath.
This is the decision framework that saves the most time when you are debugging or extending your harness.
When something is not working the way you want, the first question to ask is which layer the problem actually lives in. If the problem involves model access, authentication or cost tracking, you are working in the provider layer.
If the problem involves how the agent reasons, which tools it picks, or how it handles multi-turn tasks, you are working in the agent. If the problem involves how the user interacts with the agent, how sessions are stored or how output is displayed, you are working in the product shell and if the problem is about teamspecific process like which files to avoid or which commands to run for verification, you are working in agents files and extensions. Most wasted time comes from making changes in the wrong layer. Identify the layer first, then go deep in that layer instead of scattering changes across all three. If you remember one diagram from this video, make it this one. P AI is the provider layer and it feeds the agent loop. The agent loop feeds your product shell. Your product shell can be anything you need it to be. A terminal app, a web app, a Slackbot, or an RPC service. PIP pods sits to the side and changes where model endpoints come from.
So you can swap between local and remote inference without touching the rest of the stack. That is the whole system.
Three layers with clear boundaries and your application is the outermost layer.
You are not building inside PI. You are building on top of it. The provider layer normalizes model access. The agent loop runs the reasoning cycle. You're once you see those boundaries clearly, the harness stops being someone else's