Skip to content

Explore code-execution-with-MCP pattern for token-efficient tool responses #716

@czlonkowski

Description

@czlonkowski

Background

Anthropic published Code execution with MCP: Building more efficient agents describing a pattern where MCP servers expose a sandboxed code-execution surface and the model writes scripts that filter / transform tool output before any of it ever reaches the context window. The headline result is large reductions in context consumption on tool-heavy workflows.

Raised in #473 by @skyhi69 and revisited in #482 by @ArtemSultanov-PG. Both users are hitting context limits while building n8n workflows and explicitly asked whether n8n-mcp could adopt the pattern.

What we have today (for context)

Already built:

  • get_node detail levels (minimal / standard / full) — ~200 / ~1-2k / ~3-8k tokens
  • n8n_update_partial_workflow diff format — 18 operations, ~80-90% token saving on update turns
  • The companion n8n-skills pack — skills load on demand instead of the system prompt

These cover the easy wins. Code-execution-with-MCP is the next tier and it's a meaningful build, not a tweak.

What this issue is for

A scoping spike, not an implementation commitment. I want to answer:

  1. What would it cost to bolt a code-execution surface onto n8n-mcp? A safe sandbox (Deno permissions, isolated-vm, or a separate Docker container) is the hard part. What are the tradeoffs?
  2. Which tool responses are the actual context hogs? Profile real sessions. get_node at full is the obvious suspect, but there may be others. Without data we'd be optimizing the wrong thing.
  3. What would the user-facing API look like? Probably a new tool like execute_script({ code, inputs }) that runs against a snapshot of the database. Sketch the contract.
  4. How does this interact with the skills pack? Skills already encode reusable patterns — could the code-execution surface be the runtime for skills, instead of a parallel system?
  5. Security model. This is the showstopper if we get it wrong. What's the threat model — the model is already trusted to call destructive tools, so the surface area concern is more about resource abuse and sandbox escape than authorization.

Acceptance criteria for the spike

  • Short design doc in docs/ covering the five questions above
  • At least one prototype direction tried end-to-end (even a throwaway), so we have real numbers on the cost vs. the win
  • Decision: build, defer, or skip (with reasoning)

Out of scope for the spike

  • Any production implementation
  • Backwards-compat shims or flag plumbing
  • Documentation for end users (we don't know yet whether we're shipping)

References

Concieved by Romuald Członkowski — https://www.aiadvisors.pl/en

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions