In a quiet corner of GitHub better known for weekend experiments than paradigm shifts, Drona Reddy, a data analyst at Amazon US, has published a single markdown file that promises to cut Claude’s output token usage by more than half, not by changing code, but by reshaping the model’s behavior.
The file, called Claude.md and available under an MIT license, outlines a set of structured instructions that claim to reduce Claude’s output verbosity by about 63% without any code modifications.
These instructions impose strict behavioral constraints on the model, including limits on output length, emphasis on token efficiency and accuracy, controls on speculation, rules for typography, and a zero‑tolerance policy on sycophantic responses. They also simplify code generation and define clear override policies, effectively training the model to respond more concisely and deliberately.
Reducing output tokens
The rationale is straightforward: eliminate what Reddy describes as Claude’s “frivolous” habits, stripping out everything that isn’t strictly necessary. That means no automatic pleasantries like “Sure!” or “Great question!”, no boilerplate sign-offs such as “I hope this helps,” no restating the prompt, and no unsolicited suggestions or over-engineered abstractions.
It also curbs stylistic quirks like “em” dashes, smart quotes, and other Unicode characters that can break parsers, while preventing the model from reflexively agreeing with flawed assumptions.
At scale, that kind of austerity, according to Reddy, could translate into meaningful savings, turning small stylistic trims into outsized efficiency gains.
The data analyst also outlined three distinct use cases where the markdown file could be most effective. First, high-volume automation pipelines, such as resume bots, agent loops, and code generation, where verbosity compounds across repeated calls.
Second, repeated structured tasks, where Claude’s default expansiveness can add up over hundreds of interactions. Third, team environments that require consistent, parseable output formats across sessions, where tighter control over responses improves reliability and downstream usability.
In his own simulations on Claude Sonnet, Reddy said the file could save close to 9,600 tokens a day at 100 prompts, translating to roughly $0.86 in monthly savings. At 1,000 prompts a day, the savings rise to about 96,000 tokens, or $8.64 a month, while across three projects combined, he estimates reductions of nearly 288,000 tokens, equivalent to around $25.92 monthly.
However, the data analyst also warned that the file might be really ineffective, even counterproductive, in certain use cases, such as single one-off queries, fixing deep failures, or exploratory work where feedback is required, as the file itself consumes input tokens on every message.
“The CLAUDE.md file itself consumes input tokens on every message. The savings come from reduced output tokens. The net is only positive when output volume is high enough to offset the persistent input cost. At low usage it costs more than it saves,” Reddy wrote in the repository’s documentation.
Modest enterprise gains
Analysts do see enterprises and their CIOs benefitting from the markdown file, at least to a certain degree, especially as they struggle to balance spiraling inference bills and moving agentic or other AI pilots into production.
“A 63% token reduction can meaningfully lower inference costs and latency for enterprises running high-volume Claude workloads,” said Charlie Dai, principal analyst at Forrester.
The gains, however, may be more operational than transformative.
“For CIOs, this method offers some operational benefits as it improves output consistency, improves latency, and enforces basic token discipline, which can help in scaling automation,” said Pareekh Jain, principal analyst at Pareekh Consulting.
However, Jain pointed out that though this is a “useful tactical optimization”, it does not fundamentally change enterprise AI economics.
“In enterprise settings, the tactic is likely to translate into more modest savings because output tokens are only a portion of total usage as input context, retrieval, and agent orchestration typically dominate costs,” Jain said. “As a result, most enterprises would likely see single-digit savings rather than the headline number,” he added. The markdown file is designed to be model-agnostic and should work across large language models that can follow structured instructions, though Reddy noted he has not tested its effectiveness on local models such as those running on llama.cpp or Mistral.
Go to Source
Author: