Skip to content
Back to all posts
Article

How Local AI Actually Remembers: Inside The Agent Log

Every local coding agent writes a structured log. Resume, replay, and audit are all downstream of that one file.

18 min read

Watch (23:00)


Overview

Every local coding agent writes a structured log. Resume, replay, and audit are all downstream of that one file.


Full transcript (from the video)

Most people think of logs as debug output. Something you tail when the program breaks. Something you g after a deploy. Something that exists for humans. That mental model is wrong for local AI. Every modern local agent from comfy UI running on your workstation to clawed code in your terminal to cursor in your editor writes structured records of what it did. Those records are not an afterthought. They are the raw material that the next turn of the model reads to make a smarter decision. If you learn the shape of a great agent log, you can read any of these tools like they are speaking the same language. You can resume sessions, replay workflows, and build your own agents that get better over time instead of forgetting everything between run. This video walks through the actual log formats used by five real local AI systems, points out the patterns that make their logs AI readable, and shows you how to write the same kind of log in your own agent. By the end, you will see logs as a memory layer, not a debug layer. The first thing to get straight is that an agent log and a debug log are doing different jobs. A debug log is written for a human who is trying to figure out why something broke. It is free form, often pros, often noisy, and the value of any one line depends entirely on the operator who is reading it. An agent log is written for another program to consume. Usually, that program is the next turn of the same model, trying to figure out what already happened so it can decide what to do next. Every field in an agent log has to be parsible.

Every entry has to be anchored in time, in turn number, and in the task it belongs to. The test for whether you have an agent log or a debug log is simple. Can a fresh model read your log file and reconstruct the session state well enough to continue the work? If yes, you have an agent log. If not, you have a debug log wearing a log file.

That distinction shapes every design choice that follows. Comfy UI is a great place to start. Most people use it as a node graph tool. They never look at what it writes to disk. Every time you cue a prompt, Comfy UI runs the graph and records the result. That record is called a history entry. The entry includes the prompt ID. It includes the full graph that produced the run. It lists every node that executed. It captures the outputs each node produced and it ends with a status block. The status block says whether the run succeeded. The entry is a the graph is captured alongside the outputs. So the log is enough to rebuild the session from scratch. Hand the entry to any workflow runner. It can reproduce the exact image that came out of that run.

That is an agent grade log hiding inside an image tool. Once you see it, you notice something else. Much of Comfy UI's flexibility is downstream of this one fact. Every run is logged in a shape, another program. This diagram shows why Comfy UI's log design matters.

When you cue a prompt, the server runs the graph, captures each node's output, and writes a full entry into history.

That entry is not just a record. It is the single source of truth that the rest of the UI draws from. Loading a previous workflow, reusing a seed, dragging a prior image back into the canvas to remix it. All of those flows read from the same history log. The log is the backing store for the product experience, not just a diagnostic artifact. Notice how the replay arrow loops back into the execution step. That is the shape you want in any local agent. The system produces a structured record of what it did. And that same record is what lets the next run build on the work instead of starting from scratch. When logs feed back into execution, your tool becomes iterative by default. When they do not, every run lives in isolation and the user has to remember every claude code takes this idea further. It makes the log the core of the product. Every session writes a line delimited jile lives under the claude projects folder in your home directory. One record is appended for each event as it happens. User messages land in the same file. Assistant turns land there too. So do tool invocations and tool nothing ever rewrites an earlier line. The file can grow safely while the agent is still running. Any reader can tail it without fear of corruption. Each entry carries a timestamp. Each entry carries a u each entry carries a type field and each entry the system turn captures tool use blocks with the actual arguments. The tool result blocks carry the actual. So the transcript becomes a complete recording at the protocol level. Another process can read it. Another process another process can act on it. It is not a stream of pros written for a human.

That is why every clawed code feature that feels like memory is really just reading from this file. The neurosoom and neurosoom and continue flags are not magic. They are what happens when a log is written correctly the first time.

When you run claude and resume, the CLI lists recent session files and shows a summary. When you run claude sick continue, it reads the most recent transcript file, rehydrates the full conversation history, and hands it back to the model as the prior context for the next turn. From the model's perspective, nothing was ever lost. The previous session simply This is only possible because the transcript was written in a shape the product can consume later. If the log had been pros or if tool calls had been flattened into strings or if timestamps had been missing, resume would not work. The feature is the log and the log is the feature. This is the cleanest example in the local AI space of a log that is not just a record but the actual memory substrate of the tool. It works because nothing stands between the agent and its own past. Cursor takes a different approach. It is a fork of VS Code. So it inherits the VS code extension log architecture. Every session gets a timestamp folder under the cursor config directory. Inside that folder, each extension writes its own log file. The cursor the autoumlete extension has a log. The renderer process has a log. The main process has a log too. When something goes wrong, you can trace exactly which layer produced the behavior. The agent log itself is surprisingly structured. You can see turn boundaries. You you can see tool results. You can see the total time each turn took. It is not JSONL. It is not quite an agent grade log in the clawed code sense, but it is detailed enough.

You can reconstruct what the agent tried. You can see what it succeeded at.

You can see where it stalled. When people debug weird cursor behavior, this is the file they should open first. It is also why the cursor team can diagnose user reports so quickly. The trail is already there. One cursor feature that surprises new users is the checkpoint system. You are editing with the agent.

You do not like the last patch and you hit restore checkpoint. Suddenly the file is back to where it was two turns ago. That feature exists because every agent edit is logged with the before and after content plus a checkpoint identifier tied to the turn. Restore is just reading a prior checkpoint from the log and reapplying the inverse patch.

Without an agent readable edit log, restore would have to rely on git, which would force commits into the user's history and make the feature awkward.

With an agent readable edit log, restore can be as cheap as scrolling back through a journal. This is a good general pattern. If you are building anything where a user might want to undo an agent action, start by logging the action in a replayable shape, the undo feature writes itself afterwards.

Logging is not just observation. It is the substrate for any product behavior that needs to travel back in time.

Codeci writes a file called rollout.jsonl. It lives in a dated session folder under the user home directory. Every turn produces a rich cluster of typed entries. A typical turn begins with session metadata at the top. Then the user message arrives. Then the assistant message. Reasoning summaries follow whenever the model produces extended thought. After that come the tool calls with their arguments. Then the tool results with their outputs. Then any patches represented as unified diffs and finally a turn done. This is the most faithful machine readable recording of an agent session in any mainstream local coding tool today. Patches are included as part of the log. So a fresh process can replay the session end to end. It arrives at the same file contents without needing to reach into git state.

Reasoning is captured as its own entry type so you can audit why the agent made a choice. The design lesson is simple.

Separating entry types by kind is what makes the log queryable later. Typed logs pay you back every single time you read them. PI coding agent is built on top of the PI agent loop. It writes sessions as JSONL files you can open with any text tool. Each line is a typed entry. Turns are keyed by roll and turn number. Tool calls carry the full argument payload. Tool results carry the duration and a structured output block.

Agent messages are richer than plain model messages. They can carry reasoning notes. They can carry UI state. They can carry metadata that the shell needs later. Each line. So you can GP, jq, and slice the session file with standard Unix tools. You can ask how many bash tool calls happened in this session. You can ask how long each one took. You can ask which ones failed. You can build dashboards on top of the log without ever modifying the agent. That kind of introspection is the direct payoff of line delimited JSON with typed entries.

The log becomes a queryable database of what the agent did. It is not just a tape recording you replay once and throw away across comfy UI clawed code cursor codeex and the pi coding agent. The log formats differ in surface detail but share the same essential backbone underneath four fields. The first is a timestamp in ISO format so that the ordering of events is never ambiguous to a reader. The second is a turn identifier so that every event produced by a single user intent can be grouped, counted, and retrieved together as a user. The third is a type tag so that a reader can filter by what kind of event it is before trying to interpret the payload inside. The fourth is a set of stable identifiers, including a session ID for the whole run, a trace ID for a causal chain across turns, and tool call IDs that let you join a tool request with its eventual result. If your log carries those four fields for every entry, you have a replayable agent log that any downstream process can consume.

If it is missing any of them, you will run into cases where a future model cannot reconstruct the past or where your own tooling cannot correlate a tool call with its outcome later on. None of these fields are truly optional because together they form the minimum viable shape of a useful agent log. Tool calls are the loadbearing records in an agent log because they are what the agent actually did to the world outside the model itself. The tool call record should capture everything that the world change depended on, including the tool name, the full arguments with any large payloads like file contents or command strings, the start time and duration for the call, a clear success flag, and a stable tool use identifier that can later be matched with the corresponding tool result. When you inspect the logs of claude code, codeex and pi side by side, you will notice that tool call records are consistently the most detailed entries in each log, and that is not an accident. Tool calls are the part of the session that cannot be regenerated from the prompt alone. The model's words can be reinferred if you have the inputs, but tool effects are tied to a specific moment in time.

Losing a tool call in the log means losing a piece of the history that can never be reconstructed from anything else. So if you optimize one field for structure in your own logger, make it this one. Here is the real payoff of writing an agent log in a replayable shape. The next time the agent wakes up, whether after a crash, a resume, or the simple end of one turn and start of another, it reads the log back to decide what to do next. It loads the session file. It slices the recent tool results to understand what the environment currently looks like. It injects that information into the next prompt. So the model has ground truth to reason over.

The model produces a plan. The plan becomes a tool call. The tool call is appended to the same log and the loop continues. This is the core architecture of a local agent that gets smarter over the course of a session instead of forgetting between turns. The log is both memory and history. Without it, every turn is a fresh start. With it, every turn begins informed by everything that came before. That is why the shape of your log quietly controls the shape of your agents intelligence. Errors are the part of the log that humans most want to read and that machines most struggle with when logs are unstructured. The fix is to make errors structured from day one. Comfy UI illustrates the pattern well. When a node fails, the log captures the node ID, the prompt ID, the exception class, the message, the trace back, and a snapshot of the inputs that were fed into the failing step. Every one of those fields is something the next turn of the agent can reason about. Did this failure come from a specific input? Did it happen only on this prompt ID? Is this a new class of error or one we have seen before? Contrast that with an unstructured stack trace. A stack trace can be read by a human, but a model has to repart it every time and the parse is fragile. Structured error entries let the model filter, aggregate, and compare across. When an error is logged in the same shape as a success, the agent can learn from both instead of just the wins. This is the pattern that quietly breaks most homegrown agents. Someone adds console.log statements to trace what the agent is doing, and for a while that feels like logging. The output is readable in a terminal. The developer can see what the agent tried. The problem is that when a downstream process tries to consume that log, it has to parse free form strings and small formatting changes break everything. An agent reading its own console output is a fragile system. The alternative is almost as simple to write but infinitely more rob. Every meaningful event becomes a structured entry with a known schema.

The tool call, the tool result, and the turn completion each have their own shape and their own required fields.

Writing the log this way costs maybe 10 extra lines the first time you set it up. It pays back every time you want to query, replay, resume, audit, or analyze what the agent did. If your agent only prints text, it is a demo. If your agent writes typed events, it is a system.

This is a minimal logger for your own agent. and it fits on a single screen while giving you everything you need to be on the same footing as Claude code and codeex. A session identifier is generated once per run and every event merges that session identifier with an ISO timestamp and a typed payload before appending a single JSON line to a file on disk. Four named helpers capture the four events that matter most, covering a user turn, a tool call, a tool result, and a turn completion, which together are enough to replay a session, resume after a crash, audit the agents behavior, and GP the log with standard command. Notice what this logger deliberately does not do. It does not filter by log level. It does not colorize output and it does not support structured templates or fancy formatting of any kind. Those features exist for human debug logs and they actively get in the way when the consumer is another program. Keep the logger boring and keep the entries typed because the whole value of this pattern is that the schema is stable and the payload is fully self-escribing on its own. Where you put the log matters almost as much as the format itself. Every tool we have looked at chooses a path inside the user home directory and the specific paths follow a pattern you will recognize quickly.

Comfy UI stores its runs either under the install folder or under whatever output directory the user configured for it. Clawed code tucks its transcripts into a hidden clawed folder in home.

While codeex keeps its rollouts in a parallel codeex folder and the pi coding agent uses its own pi folder alongside cursor is the outlier here because it follows the platform appropriate config path that VS code style applications expect. The convention across all of them is remarkably consistent. One JSONL file per session named after the session identifier. So each session stays self-contained. Date charting the folders by year and by month and by day keeps the field system from choking on 10,000 files in a single directory. And picking a sortable identifier such as UEID version 7 means sessions sort naturally by time without any extra indexing on your part. On the topic of rotation, the right default is to keep logs forever because they are small and because the cost of losing a session a user wanted to recall later is surprisingly high. If a single session grows past some size cap, split it into numbered parts rather than truncating anything in place. And never let an auto rotate or autodelete timer run quietly in the back. When redaction is needed, offer a deliberate command the user can run on purpose rather than having a silent reaper prune the log behind their back. Once your log is line delimited JSON with typed entries, every question you might want to ask about the agents behavior becomes a oneliner in JQ. A single jq invocation can count every tool call in a session, list every bash command the agent ever issued, or surface the slowest tool result. With a small extension, JQ can even pipe the last user message straight back into a new agent invocation. Turning the log itself into a generator for follow-up prompts. Notice that none of this requires anything special from the agent because there is no dashboard, no database, and no custom parser involved anywhere in this pipeline. The format is standard JSON, and the tools are ones that every engineer already has installed on their machine. This is the real test of whether a log is truly AI readable. Can your agent read it and reason over it without any custom code?

If the answer is yes, you have a log that closes the loop cleanly. If the answer is no, you have a format that will always need a translator in between. The best part is that these same JQ pipelines can run inside a tool call, which means the agent can query its own path directly as part of deciding what to do next. These are the mistakes that quietly undermine most homegrown agent logs. Treating logs as something you bolt on at the end shapes the format around whatever code got written first instead of around the questions you will need to ask about it later. Writing Free Form Pros into a log file is the next trap because you will reparse your own output for years and every small format change will break whichever parser you rely on. Auto rotating or deleting on a timer is another common mistake and users quietly lose session history that actually mattered to them. Splitting a single session across multiple files with inconsistent formats turns replay into a research project every time you want.

Varying tool arguments inside a string kills your ability to filter reliably.

and mixing human progress messages into the same stream as machine events guarantees that every downstream parser will trip on the shape drift. The corrective moves are simple to describe.

Design the log before you write the first tool call. Commit to typed JSONL from day one. Keep sessions in one appendon file. Store tool arguments as real structured objects and send human progress to estur so the log file itself stays. Do these things and the rest of the agent architecture gets a lot easier to build on top of. If you remember one diagram from this video, make it this one. A local agent has a user on one side, a set of tools that reach into the world on the other, and a session log sitting at the center of everything. The agent writes to the log every time it thinks, calls a tool, or receives a result. The log also records world effects like file writes and command outputs. So the environment is captured alongside the agents own actions and then the arrow that closes the loop. The log feeds back into the agent on the next turn. That is what turns a chat interface into a system with memory.

Resume, audit, replay, and continuous improvement. All the feature list of every local AI tool you have ever admired comes downstream of a log that was designed to be read, not just written. Build your logger first. Treat it as the memory layer of your agent.

Then build the rest of