Skip to content
Back to all posts
Article

Codex Explained for Engineers: How the Coding Agent Actually Works

A plain-English walkthrough of the Codex loop: instructions in, tools and reads, edits out, approvals around it.

8 min read

Watch (11:55)


Overview

A plain-English walkthrough of the Codex loop: instructions in, tools and reads, edits out, approvals around it.


Full transcript (from the video)

This video is for engineers who have seen codeex demos or use the CLIA bit but still do not have a clean mental model for what the product actually is.

My main claim is simple. Codeex is not just a model wrapped in a terminal skin.

It is a layered coding runtime. There is a local execution loop for reading the repo, editing files, and running commands. There is an instruction system through agents.md and skills. There is a governed capability layer through tools and MCP. There are approval and sandbox controls that decide what the agent can do and when it has to stop. Then there is a cloud worker path for longer running tasks plus review surfaces in the CLI app IDE and GitHub. Once you separate those layers, codeex becomes much easier to reason about and much easier to trust. The first correction is to stop thinking about codecs as one UI with one behavior mode. The docs show several surfaces, but they point back to the same underlying runtime model. The CLI is the direct local loop. The IDE and codeex app give you a richer coding surface and easier review. Cloud tasks offload bigger jobs into configured environment. GitHub turns review and follow-up tasks into remote agent work attached to pull requests. These are not identical modes with different buttons.

They exist because local iteration, remote task execution, and code review have different latency budgets, risk profiles, and artifact needs. The useful engineering question is not only what model Codeex is using. The useful question is which surface owns this task, what context it has, what it can execute, and what review artifact it will return. The CLI features page is the cleanest place to study codecs as an agent instead of a chatbot. The terminal client is explicitly described as something that can read your repository, make edits, and run commands while you iterate together. The interface also keeps the important control points visible. You can watch codecs explain its plan before making a change, approve or reject steps inline, and inspect syntax highlighted diffs and command output in the same loop. There is also a separate review path. The /re command launches a dedicated reviewer that reads a selected diff and returns prioritized findings without touching your working tree. That is important because it shows codecs separating authoring from review.

One loop edits, another loop critiques. That is already much closer to an engineering runtime than to a single prompt response interaction. The agents.m MD docs make the instruction model unusually explicit. Codeex does not rely only on one giant hidden system prompt. It loads structured local guidance. At project scope, it starts at the repository route and walks down to the current working directory, checking each directory for agents.md style instruction files and including at most one file per level. The merge order goes from general to specific. So a subdirectory can narrow or override behavior for the code that lives there.

That gives you a scoped instruction hierarchy instead of one flat blob. This matters because it explains why codecs can feel repo aare in a durable way. The repo itself participates in the runtime.

It names commands, boundaries, and completion rules in text files. The agent can load progressively as it moves through the project. Skills are the next important layer. They show how Codeex avoids bloating every session with every workflow. The docs define a skill as a directory with a skill.md file and optional scripts, references, and other assets. The key detail is progressive disclosure. Codec starts with the metadata for each skill, then loads the full instructions only if it decides that skill applies to the current task.

That is a very practical architecture choice. Durable repo guidance can stay small in agents.md.

Heavy specialized procedures like migration playbooks, release steps, or framework specific workflows become named modules that only enter context when they matter. For engineers building their own agent tooling, this is one of the cleanest patterns in the codeex docs. Keep the always on instruction stack small. Load the expensive workflow only when the task actually needs it.

The MCP docs make another useful distinction. MCP is not the instruction stack. It is the capability extension layer. Codeex supports local stdo servers and remote streamable HTTP servers with bearer token and ooth support for remote connection.

Configuration lives alongside the rest of codeex config and the docs expose concrete controls like tool allow lists and deny lists per server. That tells you OpenAI is treating MCP as governed runtime infrastructure, not as a magical prompt trick. Repo instructions say how the agent should behave in this codebase. MCP says what external capabilities exist. The runtime decides how to connect, authenticate, expose, and call them. Keeping those layers separate matters a lot. If behavior is wrong, inspect agents and skills. If the tool surface is wrong, inspect MCP registration and tool gating. This is one of the most important engineering ideas in the codeex docs. Sandboxing and approvals are not the same thing. The sandbox decides what the process can technically touch. The approval policy decides when codecs must pause before crossing a boundary. The sandboxing page lays out the common modes clearly. read only workspace write and dangerful access. The approvals and security docs also explain the default recommendation for version controlled folders. Auto means workspace write plus on request approvals. The IDE features page reinforces the same split. In normal agent mode, codecs can work inside the working directory automatically, but it still needs approval to go outside that boundary or access the network. That separation is what keeps the tool productive without pretending that all risk can be solved by one vague permission toggle. The cloud environments docs make the remote worker model concrete. When you submit a cloud task, Codeex creates a container, checks out your repo at the selected branch or commit, runs your setup script, applies your internet access settings, and then lets the agent work through terminal commands in a loop. A few details here matter a lot. Setup scripts run with internet access. The agent phase itself has internet access off by default unless you explicitly allow limited or unrestricted access. Environment variables persist through the task while secrets are only available to set up scripts and removed before the agent phase begins. The same page also says that if your repo includes agents.md, the agent uses it to find project specific lint and test commands. So cloud tasks are not just remote prompting. They are reproducible configured worker environments attached to the same repoare runtime. Codeex gets easier to understand when you compare the IDE and GitHub surfaces side by side. In the IDE features docs agent mode is the normal local path. Codeex can read files, make edits, and run commands in the working directory automatically, while chat mode is there when you want to plan before making changes. The same page also exposes a stronger mode called agentful access if you want network and broader command power without approval. On the GitHub side, the integration docs show that Codeex is not only a local assistant. A pull request comment with atcodeex review triggers a standard GitHub code review and other atcodeex comments start a cloud task using the poll request as context that is a good clue about the product architecture. The same runtime can be driven from local conversation remote review or PR scoped task delegation. This is the cleanest way to describe review in codeex. It is not just another text generation term. It is a separate loop with a different job. In the CLI /re launches a dedicated reviewer that reads the diff and returns prioritized findings in the app. The review pane reflects the state of the whole git repository, not only what codecs changed and review comments can appear in line. The same pane also lets you stage, unstage or revert at the diff, file or hunk level. Then GitHub adds another review surface where atcodeex review or automatic reviews post comments directly on pull requests.

The GitHub docs also say review behavior can follow agents.m MD review guidelines. So codeex is not only writing code, it is also exposing a review runtime that tries to fit normal engineering workflow instead of replacing it. The multi-agent docs are worth including because they show where Codeex is headed, but also where the boundaries still are. Open AI describes multi-agent workflows as experimental.

Codeex can spawn specialized agents in parallel, then collect their results into one response. The docs call out codebase exploration and multi-step feature plans as good use cases. They also say you can define your own set of agents with different model configurations and instructions architecturally. That means the runtime is not limited to one transcript and one model role forever. But the product is careful here which is good. Multi-agent execution has to be explicitly enabled.

That tells you OpenAI understands that fan out raises the complexity bar.

Parallel workers can help, but they also multiply context management, review, and coordination problems if the task boundaries are weak. The clean mental model is now pretty simple. Codeex is a coding agent runtime with several product surfaces around it. Instructions come from agents.md and skills instead of one opaque prompt. Capabilities come from the built-in tool runtime and MCP instead of free form imagination.

Approvals and sandboxing decide when the agent can keep moving and when the human must intervene. Execution can happen locally or in a configured cloud worker.

Review surfaces then turn the outcome back into normal software artifacts like diffs, logs, tests, and PR comments.

That is why codeex feels more durable than a generic terminal chatbot when it works well. It is not only generating code, it is operating a control plane around code work. For engineers, the practical takeaway is to ask which codec surface should own the task, what boundaries it has, and what artifact it will return for View.