AI in the Terminal: Claude Code, Codex CLI, and When to Use Which

MacSmithAI
Apr 17
7 min read

The series so far has been about adding MCP servers to graphical AI surfaces — Claude Desktop, Raycast, VSCode's Copilot pane. Useful, but all of them are chat windows wearing different outfits. You type something, an AI types back, tools run in the middle.

Terminal-based AI is a different interaction model. You're not in a chat — you're in a shell that also happens to have an agent in it. The agent reads files, runs commands, edits code, and uses MCP servers, but the shell is the primary surface. This fits how a lot of engineering work actually happens, and it changes which tasks feel natural to delegate.

Two tools dominate this space: Anthropic's Claude Code and OpenAI's Codex CLI. They've converged on similar feature sets but made meaningfully different architectural choices. This post covers the setup for both, the MCP story for each, and — most usefully — when each is actually the right tool for the job.

Claude Code: setup

Install via npm:

npm install -g @anthropic-ai/claude-code
claude

First run opens a browser for authentication. You sign in with your Claude account (Pro, Max, or API credits all work), and Claude Code remembers the auth locally. From then on, claude inside any directory drops you into an interactive session with that directory as the working context.

MCP setup uses the same config file format you already know. Project-scoped config lives at .mcp.json in the repo root; global config at ~/.claude/.mcp.json. Both use the mcpServers root key (matching Claude Desktop, not VSCode).

The easier way to add servers is the CLI command:

# Project-scoped — the server is only available in this repo
claude mcp add --scope project intune \
  -- node /Users/yourname/mcp-servers/intune-mcp/dist/index.js

# Global — available everywhere
claude mcp add --scope global filesystem \
  -- npx -y @modelcontextprotocol/server-filesystem /Users/yourname/Projects

Everything after -- is the command to launch the server. Inside a session, /mcp lists configured servers and lets you enable or disable them for the current session — genuinely useful when you want to cut context usage.

Codex CLI: setup

Install via npm as well, though the npm package is actually a wrapper that downloads a Rust binary:

npm install -g @openai/codex
codex

First run prompts you to authenticate — either ChatGPT Plus/Pro subscription or an OpenAI API key. macOS and Linux are fully supported; Windows is still experimental, so use WSL2 if you're on Windows.

MCP config lives in ~/.codex/config.toml — note the TOML format, not JSON. This is the biggest gotcha when copy-pasting configs between tools. The equivalent Intune server entry:

[mcp_servers.intune]
command = "node"
args = ["/Users/yourname/mcp-servers/intune-mcp/dist/index.js"]

[mcp_servers.intune.env]
AZURE_TENANT_ID = "00000000-0000-0000-0000-000000000000"
AZURE_CLIENT_ID = "11111111-1111-1111-1111-111111111111"
AZURE_CLIENT_SECRET = "your-secret-value-here"

Or use the CLI:

codex mcp add intune \
  --env AZURE_TENANT_ID=... \
  --env AZURE_CLIENT_ID=... \
  --env AZURE_CLIENT_SECRET=... \
  -- node /Users/yourname/mcp-servers/intune-mcp/dist/index.js

Codex also supports streamable HTTP MCP servers natively, and handles OAuth flows for servers that need it — a meaningful advantage for remote MCP servers that Claude Code handles less cleanly today. Adding an HTTP server is as simple as codex mcp add some-server --url https://example.com/mcp.

Project-scoped configs at .codex/config.toml work but only for "trusted projects" — you explicitly mark a repo as trusted before Codex will load config from it. That friction is deliberate. It means you can clone a random repo with a malicious .codex/config.toml and nothing automatic happens.

The sandboxing difference that actually matters

Here's where the two tools meaningfully diverge, and it's worth understanding before you pick defaults.

Codex CLI uses OS-level sandboxing. On macOS it uses Apple's Seatbelt framework — the same mechanism that isolates App Store apps. On Linux it uses Landlock and seccomp. When Codex runs a shell command, the kernel enforces the boundaries. A command that tries to read a file outside the allowed scope gets denied by the OS itself, not by Codex deciding to deny it.

Claude Code uses application-layer permissions. It prompts you before running commands and has a configurable permissions system in .claude/settings.json, but the enforcement happens in the Claude Code process. If Claude Code has a bug in its permission checks, or you approve something you shouldn't have, the damage isn't bounded by the kernel.

Neither approach is strictly better. OS-level sandboxing is more robust against the agent misbehaving, but it also makes legitimate operations fail in ways that are hard to debug ("why can't you see this file?" — because Seatbelt said no). Application-layer permissions are more flexible and the error messages are clearer, but they're only as strong as the code enforcing them.

If you're running agents in a CI pipeline, on shared infrastructure, or in any context where a runaway agent could hurt something, Codex's sandboxing is genuinely valuable. For day-to-day interactive use on a personal machine where you're reading what the agent does before approving it, Claude Code's model is fine and less fiddly.

MCP tool context management

One practical issue that hits hard when you're running multiple MCP servers: tool definitions eat context. A handful of chatty servers can consume 10-20% of your context window before the first user message, which translates directly into reduced capacity for actual work.

Claude Code has a feature called Tool Search that lazy-loads MCP tool definitions — rather than dumping every tool's schema into context on session start, tools get loaded on demand when the model decides it needs one. Enable it by setting ENABLE_TOOL_SEARCH=auto:5 in your environment (the 5 is the percentage threshold at which tools get auto-deferred). The context savings can be dramatic for heavy MCP setups.

The other lever is the /mcp slash command inside a Claude Code session. If you're working on something that doesn't need the Intune server, disable it for the session. Tool definitions you're not using are just expensive dead weight.

Codex handles this slightly differently with namespaced MCP registration and parallel-call opt-in flags, plus a /mcp equivalent in the TUI. The net effect is similar — turn off what you don't need.

When to reach for which tool

After a few months of using both regularly, here's how I actually split the work:

Claude Code wins for:

Multi-step refactoring and architectural work. The Sonnet 4.6 and Opus 4.7 models are noticeably stronger at holding an architectural plan across dozens of file edits without drifting. This is the single biggest reason I default to Claude Code for serious feature work.
Tasks where reasoning matters more than speed. "Figure out why this test is flaky" or "trace why this request is slow" are exactly the tasks where the reasoning models earn their premium.
Long-lived interactive sessions. The /compact and /clear commands plus the rules system in .claude/rules/ make it easier to sustain a coherent session across hours of work.
Path-specific rules. Dropping a file into .claude/rules/ that only activates when Claude touches matching files is a genuinely clever design. You can specify "always use pytest-style asserts in tests/" without burning that instruction in context for non-test work.

Codex CLI wins for:

Fast iteration on small changes. GPT-5.3-Codex-Spark hits 1000+ tokens/sec, which makes short back-and-forth exchanges feel almost synchronous. For "rename this variable everywhere" or "add a docstring to these three functions," that speed is the feature.
CI and automation pipelines. codex exec for non-interactive runs plus the official GitHub Action make it the more infrastructure-ready choice. Claude Code can work in CI too, but Codex was built for it.
Anything touching the broader OpenAI ecosystem. The Agents SDK treats Codex as a first-class MCP server, so building a multi-agent system with Codex as a subcontractor is more natural.
High-risk agent operations. The OS-level sandboxing is real protection. If I were building a system where an agent reviews untrusted PRs, Codex's kernel-enforced boundaries would be the right foundation.

When to use both:

The pattern I've landed on: Claude Code for interactive work where I'm collaborating with the agent, Codex for automation where the agent works on its own. The mental model is "pair programmer" vs "background worker," and both tools do one of those better than the other.

You can point both at the same MCP servers. Nothing prevents you from using the Intune server from either CLI — the server doesn't care which client is talking to it. That's the whole point of a protocol.

The cost conversation

Both tools burn tokens fast. A serious agent session — the kind where it's reading dozens of files, making many tool calls, iterating on a solution — can run $5-20 in model costs. If you're on a Max or ChatGPT Pro subscription this is included up to some limit; if you're paying API costs directly, it's real money.

Practical controls:

Match the model to the task. Claude Haiku 4.5 for exploration and quick queries, Sonnet 4.6 for most work, Opus 4.7 when the task genuinely needs the reasoning. On the OpenAI side, GPT-5.3-Codex-Spark for fast iteration, GPT-5.4 for harder problems.
Always set dollar caps in CI. Any claude or codex exec invocation in an automated pipeline should have a spending limit. A runaway agent in a loop is the expensive failure mode.
Use CLI tools directly when a single command works. If the task is "create a GitHub issue," gh issue create is faster and cheaper than asking an MCP server to do it. Don't use MCP where a bash command would suffice.
Monitor usage. Tools like ccusage and the built-in /cost command in Claude Code give you visibility into where the budget is going.

Zooming out

We've now been through five posts covering the same conceptual pattern — MCP servers behind different interaction surfaces. Claude Desktop for long-form conversation, Raycast for reflex lookups, VSCode for code-adjacent work, the filesystem for file operations, and now the terminal for agentic execution.

The thing that kept surprising me as I actually used all of these is how little the MCP layer matters compared to picking the right surface. The Intune MCP server isn't more powerful in VSCode than in Raycast — it's literally the same tools. What changes is the cognitive mode the surface encourages. You ask different questions in a terminal than in a chat window, even when the underlying capability is identical. Building a good AI workflow is mostly about matching surface to task, and MCP is the plumbing that makes that matching cheap.

The series is probably done here for now — five posts is enough to cover the core patterns, and I'd rather stop before it becomes a checklist of every possible MCP integration. If there's a specific tool or workflow you want me to dig into next, that's where I'd go with the next post.

Previous posts in this series: setting up an Intune MCP server on Mac, driving it from Raycast, wiring it into VSCode's Copilot agent mode, and adding a filesystem MCP server (with the security caveats that come with it).