Skip to content
Back to all posts
Article

How AI Coding CLIs Actually Work: Codex, Claude Code, Gemini CLI

Side-by-side teardown of three terminal coding agents — the loop, the tools, and the parts they each get right.

8 min read

Watch (10:51)


Overview

Side-by-side teardown of three terminal coding agents — the loop, the tools, and the parts they each get right.


Full transcript (from the video)

This is an architecture walkthrough, not a feature tour ranking video. The question is what these tools actually are under the hood. If you strip away branding, serious AI coding, CLIs are converging on the same shape, a terminal user interface on top of an agent orchestrator, an instruction stack, a context builder, a policy engine, a tool runtime, a session store, and sometimes a sub agent layer. I'm grounding this in the official docs for codeex cla code and Gemini CLI current as of March 11, 2026. Then pulling out the shared design pattern that matters if you want to use one well or build one yourself. The terminal already owns the real developer workflow. That is why it became the natural surface for AI coding agents. It can inspect files, run tests, and show commands in a way developers already understand and can review. This is the architecture I want you to keep in your head for the rest of the video. The visible product is a streaming terminal interface, but the real system sits behind it. There's an instruction loader that assembles project rules and task guidance. There is a context builder that decides which files, diffs, summaries, and tool results belong in the next model turn. There's a policy engine that decides what the agent is allowed to do and when a human must approve it. There's a tool runtime for shell, file edits, git, browser-like tools, and MCP servers. And there's a state layer so the session can persist, compact, recover, and resume. Many people picture these tools as prompt in answer out. That is not how they work once they become useful. A single user request expands into a loop. First, the CLI resolves instructions and configuration. Then it builds a working context from current files, diffs, recent transcript state, and sometimes explicit plan state. Then the model chooses either to answer or to request a tool. If a tool is requested, that goes through approval rules and sandbox rules before execution. The tool output then becomes fresh context for the next model step. What the user experiences as one turn is really an orchestration pipeline with repeated decision points. One of the clearest architecture patterns across these tools is that they stop pretending one giant hidden system prompt is enough. Codex documents progressive instruction loading through agents.md files and skills. Cloud code centers project memory in clad.md and adds command and hook surfaces.

Gemini CLI uses Geminy.md plus plan and policy features. The important point is architectural not cosmetic. Good tools separate durable project instructions from ephemeral task instructions that makes behavior inspectable, composable, and easier to debug when the agent does something surprising. Context assembly is where weak AI CLIs usually fall apart. The model does not need the whole repo. It needs the right subset of the repo plus the relevant diffs, the current task plan, and enough recent history to stay coherent. That means context assembly behaves more like a build step than a chat append. Claude code explicitly documents memory and automatic compaction. Gemini CLI documents checkpoints and plan mode.

Codeex emphasizes instruction layering tasks and multi- aent execution.

Different products expose it differently, but the shared reality is that a durable CLI needs a context pipeline, not just a big context window.

This is one of the biggest mindset shifts in a serious AI tools are not decorative functions bolted onto a chatbot. They are the execution runtime. The model proposes an action like search, edit, run or open.

The runtime decides how that action is encoded, executed, streamed and captured. Anthropic is very explicit that cloud code is built for composable terminal workflows. Codeex treats MCP servers and tool calls as first class extension points. Gemini CLI documents tools, shell mode, and policy mediation.

Once you see tools as a runtime, the rest of the architecture starts making sense. Operators often blur these together, but the good tools do not.

Codeex has explicit approval modes and separate sandboxing guidance. Claude code documents, permission modes, settings, and hooks. Gemini CLI has trusted folders, a policy engine, and sandbox mode. Those are not the same concern. A policy system answers which actions are even allowed. An approval system answers whether a human has to confirm a specific action. Now, a sandbox answers whether the command runs on the host in a restricted area or in a more isolated environment. If you're building one of these tools, separating those layers is not optional. It is the core safety architecture. The jump from helper to agent happens when the tool gains planned state. Once a task spans multiple steps, the system needs to track progress blockers and delegation instead of relying on one chat transcript. That is why plan mode, tasks, checkpoints, and workflow surfaces keep appearing in this series tools. Once planning exists, the next scaleup move is sub agents. The primary agent can delegate narrow work like repo mapping, testing or docs checks to specialized workers, but fan out only helps when the delegated task has a clean boundary and a clear return artifact. Codeex makes the orchestration layers unusually explicit instructions load progressively through agents files and skills package reusable workflows instead of relying on one giant hidden prompt. MCP approval modes, sandboxing, multi-agent support, and cloud tasks all point in the same direction and durable task execution architecture instead of one local chat loop. Anthropic's clawed code leans hard into the Unix idea that good tools should compose cleanly with the rest of the developer environment that shows up in the docs in several places. Project guidance lives in Clyde.md.

Hooks give operators a way to run custom logic when certain events happen.

Permission modes bound how much the tool can do without confirmation. Sub agents and MCP widen the execution surface without forcing every pattern into a single prompt. The architectural feel is less like one monolithic product shell and more like a disciplined terminal runtime that can be customized around real engineering workflows. Gemini CLI is useful to study. The docs expose the architecture as a system instead of only a product tour. The documentation explicitly separates a CLI package from a core package. It documents repol local jamin.md guidance plan mode sub aents trusted folders checkpointing sandbox mode and a policy engine that is a very architectureheavy framing. It tells you Google understands the terminal agent as more than a chat interface. The important lesson is that the moment you support longunning tasks and tools, you need a real control plane around policy, resumability, and context management.

This is where the products stop looking identical. Codeex is unusually strong on task offload, instruction layering, and reusable skills. Clouded code feels strongest when you want terminal native composability, hooks, and workflow customization around a very capable core agent. Gemini CLI is the most explicit about policy and state formalization in the docs with trusted folders, checkpoints, and a policy engine. But notice what does not differ much. All three still need repo local instructions, tool mediation, approval logic, and extension bus, and some answer to sub agent delegation. The product variations sit on top of the same base architecture. If you were building one today, this is the minimal architecture I would recommend. Keep the terminal UI thin. Put almost all real logic in an orchestrator. Give that orchestrator a dedicated instruction loader, context builder, policy engine, model adapter, tool runtime, and state store. Add MCP so the runtime can grow without rewiring the core. only add sub agents after the single agent loop is dependable. Ban out multiplies the need for planned state and review state. The big mistake is building the UI first and smearing orchestration across it. The UI should be the shell. The real product is the control plane behind it. Good demos can hide this, but good products cannot.

If your AI CLI does not have explicit objects for sessions, tasks, tool calls, approvals, and artifacts, the system will become impossible to reason about.

Once usage grows, you will not know what state is durable, what output should be reviewable, what task is still running, or how to recover after interruption.

That is why checkpointing task models and explicit artifact paths keep appearing in the better tools. They are not bureaucratic add-ons. They are the data model that makes the agent loop debugable and safe enough to trust with meaningful work. This is where most homegrown agents fail. They keep too much hidden state in prompts. They cannot explain why a tool call was allowed. They let long transcripts bloat until the agent loses the plot. They edit files without recoverable artifacts or clean review surfaces. or they add sub aents before the task model is real which turns parallelism into chaos. That is why the mature products keep investing in instruction files, policy engines, approval modes, checkpoints and explicit task abstractions. They are not slowing the system down. They are the price of turning a demo into a reliable tool. If you remember one thing, remember this. The real product is not the terminal theme or the streaming text. It is reliable delegation. Can the system understand the repo, choose the right tools, operate within clear policy, recover after interruption, and show its work in a way the human can trust? That is the architecture story underneath codeex cla code and Gemini CLI. Different companies emphasize different pieces, but they are all moving toward the same destination. an AI CLI as a policy governed execution engine sitting directly on top of the developer workflow.