Explore code-execution-with-MCP pattern for token-efficient tool responses

## Background

Anthropic published [Code execution with MCP: Building more efficient agents](https://www.anthropic.com/engineering/code-execution-with-mcp) describing a pattern where MCP servers expose a sandboxed code-execution surface and the model writes scripts that filter / transform tool output **before** any of it ever reaches the context window. The headline result is large reductions in context consumption on tool-heavy workflows.

Raised in #473 by @skyhi69 and revisited in #482 by @ArtemSultanov-PG. Both users are hitting context limits while building n8n workflows and explicitly asked whether n8n-mcp could adopt the pattern.

## What we have today (for context)

Already built:

- `get_node` detail levels (minimal / standard / full) — ~200 / ~1-2k / ~3-8k tokens
- `n8n_update_partial_workflow` diff format — 18 operations, ~80-90% token saving on update turns
- The companion [`n8n-skills`](https://github.com/czlonkowski/n8n-skills) pack — skills load on demand instead of the system prompt

These cover the easy wins. Code-execution-with-MCP is the next tier and it's a meaningful build, not a tweak.

## What this issue is for

A scoping spike, not an implementation commitment. I want to answer:

1. **What would it cost to bolt a code-execution surface onto n8n-mcp?** A safe sandbox (Deno permissions, isolated-vm, or a separate Docker container) is the hard part. What are the tradeoffs?
2. **Which tool responses are the actual context hogs?** Profile real sessions. `get_node` at `full` is the obvious suspect, but there may be others. Without data we'd be optimizing the wrong thing.
3. **What would the user-facing API look like?** Probably a new tool like `execute_script({ code, inputs })` that runs against a snapshot of the database. Sketch the contract.
4. **How does this interact with the skills pack?** Skills already encode reusable patterns — could the code-execution surface be the runtime for skills, instead of a parallel system?
5. **Security model.** This is the showstopper if we get it wrong. What's the threat model — the model is already trusted to call destructive tools, so the surface area concern is more about resource abuse and sandbox escape than authorization.

## Acceptance criteria for the spike

- [ ] Short design doc in `docs/` covering the five questions above
- [ ] At least one prototype direction tried end-to-end (even a throwaway), so we have real numbers on the cost vs. the win
- [ ] Decision: build, defer, or skip (with reasoning)

## Out of scope for the spike

- Any production implementation
- Backwards-compat shims or flag plumbing
- Documentation for end users (we don't know yet whether we're shipping)

## References

- Anthropic post: https://www.anthropic.com/engineering/code-execution-with-mcp
- Discussions: #473, #482

Concieved by Romuald Członkowski — https://www.aiadvisors.pl/en

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Explore code-execution-with-MCP pattern for token-efficient tool responses #716

Background

What we have today (for context)

What this issue is for

Acceptance criteria for the spike

Out of scope for the spike

References

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

Explore code-execution-with-MCP pattern for token-efficient tool responses #716

Description

Background

What we have today (for context)

What this issue is for

Acceptance criteria for the spike

Out of scope for the spike

References

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions