I Tested Claude Code Tools — Here's What Surprised Me
The lesser-known Claude Code tools, what they really do, and which ones changed how I move through a repo.
Watch (8:08)
Overview
The lesser-known Claude Code tools, what they really do, and which ones changed how I move through a repo.
Full transcript (from the video)
People often talk about clawed code as if the model is the whole. It is not.
The runtime makes the model useful by exposing tool in this repo. Bash tool is the bridge. It lets the agent inspect files. It lets the agent run rip grep.
It lets the agent verify a patch. The runtime also decides what is read only when when sandboxing stays on and when output. The first thing to understand is that cloud code does not start from a blank. The repo has a central tool registry in src and that registry is the source of truth for what the model can even attempt. Bash tool is in that list but so are file reads agents web fetch MCP resources. That means the product is opinionated. The model is not meant to solve everything through shell. It can read files directly search use. So when people ask what an AI coding agent, the real answer is not the computer. the antyped capabilities surface assembled by the run. When you open bash tool itself, the first useful detail is how explicit this the model has to send a command string, but it can also provide a timeout, a clear run in background flag and a dangerous even the description field is disciplined. The prompt tells the model not to fill it with vague words like complex or risky and instead the prompt layer is also stronger than people expect. In prompt.tas, TAS the tool injects concrete instruction and when to prefer bash over other operations. So the shell bridge is not only a function call it is a typed interface plus system prompt how the agent packages its intent before any one subtle but important design choice is that bash tool only claims concurrent. The implementation literally roots that through which uses readonly validation logic that tells you how the team thinks about shell search and inspection commands like rip and related read paths are a different class from you. The UI even classifies search read and list commands separately so they can be summarized and collapsed in a clean.
But the bigger point is architectural. The runtime wants to parallelize evidence collection. It does not want to normalize parallel mutation against the work. That separation is one reason the agent can move quick. It treats shell as two different things. A verification and a potentially destructive right surface on the D. This is one of the most important pieces in the repo. Bash tool does not just forward a string into a permission system and it has a dedicated prepare permission matcher path that parses breaks compound commands into subcomands and then checks whether any subcomand matches the that means something like ls and then git push cannot bash get wildcard rule just because the risky part the permission docs reinforce the same idea with examples like bash open par there is even logic to extract stable prefixes like git commit or npm run when that is the principle is simple. The system does not trust the model's intent state. It trusts a structural matcher over the actual and when parsing becomes too complex, it fails safe. The mature part of this codebase is not that it blocks obviously bad. The mature part is how much attention it pays to shell edge case path validation does not stop at the base comm. It extracts positional targets, expands and checks whether a removal touches critical location. There is a long comment explaining why support for the double dash end of because a path that starts with a dash can trick naive validation into the allow list logic is equally revealing for fd xz style flag for xs deprecated flags with optional attached argument command execution passed a validator that thinks it is said gets its own validation path too. This is the difference between a demo and a shipping tool. Another important point is that sandboxing is not present. Bash tool calls should use sandbox when the when a command exists and when policy has not been explicitly allowed to there is support for excluding certain commands from but the source code is explicit that this is a user convenient not the real security boundary. The real boundary is still the permission system that is the right framing a convenient switch should never be mistaken for. There is also mode specific logic in accept edits mode. A short list of field system commands MV, CP and seed can be that tells you the product is trying to tune friction around known flows without pretending all shell. The result is a layered control plane, not one giant yes or no switch. The runtime behavior here is more sophisticated than most people expect. Bash tool does not just block on a child process and then it uses an async execution path with a two-cond threshold before showing progress and background task plumbing through local shell task. If the model explicitly asks for running back, the command becomes a managed task and returns a task. There is also a second layer for assistant mode. When the main agent has been waiting too long, the code can auto background the command after a 15-second budget. So that means the system treats long shell work as not something the main reasoning loop has to stare at until this slide is easy to overlook.
But it is one of the reasons the agent can use shell productive raw exit codes are too crude. Rip grip returning one usually means no match not a terminal disaster. If returning one often means the files which is exactly the information you wanted. The repo encodes those semantics directly in command semantics.ts TS files differ or condition is false. That sounds small until you think about the alter. Without semantic interpretation, the model would have to learn per command exit and it would overreact to perfectly normal states. This is one more example of the runtime carrying part of the intell. The model does not just need the shell output. It needs the right framing around that output so it can tell the difference between ev. Another reason bash tool feels usable is that the return path the tool accumulates output trims and can even detect when the command emitted image data. Large outputs do not just explode into the convers. If the tool sees an output file path and the content is large enough, it persists the full result to disk keeps a preview for and returns a pointer friendly message instead of forcing the agent to reason over that matters a lot in big repos where rip or generated output can get noisy fast. The tool also annotates sandbox failures into the return explain what happened instead of treating a block as a myster. This is the shape of a serious tool surface.
execution plus interpretation plus this is the practical takeaway for anyone building or evaluate what a shell command might do. The useful part is that the runtime gives it a disciplined way to gather and make a move and verify the result in this repo.
Bash tool is the most concrete. It bridges from reasoning into the actual machine but it does permission checks path validation sandbox. That is why the tool is worth studying in detail. It shows what it takes to convert vague agent ambition into a work. If you copy only the marketing layer, you get a chatbot that sounds confident near a terminal. If you copy this control layer, you get something much closer to a system you can actually trust in a real reposi. The closing lesson is broader than clawed code. If you want an AI coding agent to be genuinely used, design the tool layer as if it is half the product because in practice, put capabilities behind a registry. Make readonly proof paths easy and you handle shell parsing path validation. Give longunning commands a task model instead of forcing interpret results semantically. Persist large outputs instead of drowning. Bash tool is a strong case study because concerns into one bridge between reasoning. The big idea for this video is simple. Agents become useful when control surfaces. The model provides reasoning. The runtime decides whether that reasoning can safely