skim sits between Claude Code (or any LLM tool) and the API. It strips token waste in real-time, injects prompt caching automatically, and tells you live when your context window is filling up — without changing a single line of code.
Claude Code Pro users run into a hidden problem: context fills up, the model quietly starts forgetting earlier work, and response quality drops — with no signal. skim shows you live context fill % after every call and automatically strips waste before it counts. It also caches your system prompt so it costs nothing on calls 2+.
Everything included
Open source. Self-hostable. No account required.
$ pip install skim-llm$ export ANTHROPIC_BASE_URL=http://localhost:7474