Shut up and commit: how Caveman breaks AI agents of rambling

"Why use many token when few token do trick." The line sounds like a meme but it is the actual guiding principle of an open-source project that has been doing the rounds in the agent community: Caveman.

What It Is

Caveman is a skill for coding agents - primarily Claude Code, Gemini CLI, Cursor, Windsurf, Codex, Cline. When active, the agent switches to a deliberately reduced style: no filler, no pleasantries, no meta-commentary. Code, URLs, technical terminology stay untouched. The authors report 65 % average reduction in output tokens across ten diverse coding tasks.

Four intensity levels:

Lite - filler gone, grammar intact.
Full - default: fragments, almost no articles.
Ultra - telegraph style.
文言文 (Wenyan) - classical Chinese as extreme compression.

Plus sub-skills like /caveman-commit (terse conventional commit messages), /caveman-review (single-line PR comments), and /caveman-compress (compresses memory files, ~46 % input savings).

Why This Is More Than a Cost Trick

At first glance Caveman looks like a cost tool: fewer tokens, smaller bill, done. But a more serious observation sits underneath:

Constraint forces precision. Research on "brief responses" shows that restrictive output settings measurably improve the quality of certain answers - models hallucinate less when they cannot ramble.
Reading speed goes up. Agent responses are not prose for customers, they are a work log for developers. Three bullets read faster than three paragraphs.
Tokens become a design resource. The framing set up by Caveman and its siblings Cavekit, Cavemem, and Caveman Code treats tokens the way one treats memory or CPU: a scarce budget that deserves active management.

Where This Hits in Our Projects

Three spots where we already use Caveman-style patterns:

Commit messages and PR reviews. An agent producing forty lines of prose about "what this commit does and why it is a good idea" is worse than one that returns a single conventional-commit line. /caveman-commit does that at the press of a button.
Memory files. In long agent sessions, context grows - notes, lessons, project rules. An uncompressed 5,000-token memory file costs something every request. /caveman-compress often halves that without losing information.
Status reports. For recurring agent runs (CI checks, monitoring, nightly migrations) "all green" beats a three-page report about the correctness of every intermediate step. Caveman mode as default, verbose mode only when asked.

What to Take Away Even Without Installing Caveman

The interesting part of Caveman is not the skill itself but the posture behind it. Three principles that fit any agent strategy:

Output length is a design decision. "Be terse, no explanations, results only" in the system prompt is often 80 % of the effect.
Separate the work log from the result. Let the agent think at length internally (chain-of-thought, tool calls, reasoning) but keep the visible output compressed.
Measure what you save. The Caveman authors benchmark 22 to 87 % savings depending on task. Without measurement, any compression is gut feel.

Caveat

Caveman is not for customer emails, not for explanatory blog posts, not for documentation read by third parties. It is a tool for agent-to-developer communication - where time and tokens cost more than politeness. Everywhere else, keep writing full sentences.

Sources

Caveman on GitHub - skill, intensity levels, benchmarks, and install guide
getcaveman.dev - product site covering the full ecosystem: Caveman, Cavekit, Cavemem, and Caveman Code

What It Is

Four intensity levels:

Lite - filler gone, grammar intact.

Full - default: fragments, almost no articles.

Ultra - telegraph style.

文言文 (Wenyan) - classical Chinese as extreme compression.

Plus sub-skills like /caveman-commit (terse conventional commit messages), /caveman-review (single-line PR comments), and /caveman-compress (compresses memory files, ~46 % input savings).

Why This Is More Than a Cost Trick

At first glance Caveman looks like a cost tool: fewer tokens, smaller bill, done. But a more serious observation sits underneath:

Constraint forces precision. Research on "brief responses" shows that restrictive output settings measurably improve the quality of certain answers - models hallucinate less when they cannot ramble.

Reading speed goes up. Agent responses are not prose for customers, they are a work log for developers. Three bullets read faster than three paragraphs.

Tokens become a design resource. The framing set up by Caveman and its siblings Cavekit, Cavemem, and Caveman Code treats tokens the way one treats memory or CPU: a scarce budget that deserves active management.

Where This Hits in Our Projects

Three spots where we already use Caveman-style patterns:

Commit messages and PR reviews. An agent producing forty lines of prose about "what this commit does and why it is a good idea" is worse than one that returns a single conventional-commit line. /caveman-commit does that at the press of a button.

Memory files. In long agent sessions, context grows - notes, lessons, project rules. An uncompressed 5,000-token memory file costs something every request. /caveman-compress often halves that without losing information.

Status reports. For recurring agent runs (CI checks, monitoring, nightly migrations) "all green" beats a three-page report about the correctness of every intermediate step. Caveman mode as default, verbose mode only when asked.

What to Take Away Even Without Installing Caveman

The interesting part of Caveman is not the skill itself but the posture behind it. Three principles that fit any agent strategy:

Output length is a design decision. "Be terse, no explanations, results only" in the system prompt is often 80 % of the effect.

Separate the work log from the result. Let the agent think at length internally (chain-of-thought, tool calls, reasoning) but keep the visible output compressed.

Measure what you save. The Caveman authors benchmark 22 to 87 % savings depending on task. Without measurement, any compression is gut feel.