DeepSeek-native AI coding agent for your terminal. Engineered around prefix-cache stability — leave it running.
-
Updated
Jun 14, 2026 - Go
DeepSeek-native AI coding agent for your terminal. Engineered around prefix-cache stability — leave it running.
The Multi-Agent Reasoning framework creates an interactive chatbot where AI agents collaborate via structured reasoning and Swarm Integration for optimal answers. Simulating a team that discusses, debates, and refines responses, it enables complex problem-solving and precise results. Now with Prompt Caching to reduce latency and costs.
Context intelligence layer for LLM agents: persistent memory with write-time dedup, sensitivity tagging, conflict detection, and hierarchical decay. ~12ms. No LLM calls. MIT.
Anthropic Claude API wrapper for Go
Independent research on Claude Code internals, Claude Agent SDK, and related tooling.
Automatic prompt caching for Claude Code. Cuts token costs by up to 90% on repeated file reads, bug fix sessions, and long coding conversations - zero config.
Drop-in prompt-caching fixes for the LLM agent harness you use. Point your AI coding agent at this repo and it ships the patches.
🚀 Autocache - Intelligent Anthropic API Cache Proxy Automatically inject cache-control fields into Claude API requests to reduce costs by up to 90% and latency by up to 85%. Works as a transparent drop-in replacement for popular AI platforms like n8n, Flowise, Make.com, LangChain, and LlamaIndex—no code changes required
Build production ready apps for GPT using Node.js & TypeScript
Anthropic API reverse proxy with prompt-cache injection and request body transforms
A self-hosted Claude agent that actually remembers — verbatim memory palace at zero retrieval cost, Discord + web UI, and 90%-cheaper repeat calls via prompt caching.
Global, unlimited persistent memory for Claude Code agents. Context-activated hints injected automatically via hooks using scatter-gather map-reduce.
Interact with Anthropic and Anthropic-compatible chat completion APIs in a simple and elegant way. Supports vision, prompt caching, and more.
Tools, libraries, papers, and patterns for reducing the cost of running large language models in production.
Agentic-AI framework w/o the headaches
Run Claude Code with OpenRouter models while keeping tools, file editing, bash, MCP, VS Code support, smart caching, model switching, and safe rollback updates.
See what Claude Code and Codex actually send to the API — and what each part costs.
Agent Skill that caches LLM image descriptions as XMP metadata inside image files, reducing token usage by ~92% on repeated reads. Works with 30+ compatible agents.
A curated list of strategies, tools, papers, and resources for reducing LLM token costs and improving efficiency in production.
Open-source context runtime and governance layer for coding agents. MCP server + Python SDK (Anthropic, LangChain, OpenAI Agents, Google ADK) with shared procedures, failure rescue, loop detection, cost tracking, and cross-vendor model routing across Claude Code, Codex, Copilot, opencode, and any MCP host.
Add a description, image, and links to the prompt-caching topic page so that developers can more easily learn about it.
To associate your repository with the prompt-caching topic, visit your repo's landing page and select "manage topics."