Internal strategy doc. Synthesized from a multi-phase research + brainstorm run (4 parallel web-research agents → strategist → skeptic gate → synthesis), verified against live sources June 2026. Companion docs:
COMPETITORS.md(cross-ecosystem),PI-ECOSYSTEM.md(Pi-internal).
Claude Code's dynamic workflows are JavaScript closures — you execute them, you
observe results, but the workflow definition itself is opaque to deterministic
tooling. pi-taskflow's declarative JSON DSL is structured data. A DAG expressed
as structured data can be statically analyzed before any token is spent,
deterministically replayed without re-execution, memoized across runs by hash
lookup rather than LLM reasoning, and compiled to multiple artifacts (Mermaid,
OTel span templates, CI YAML) from a single source of truth. Claude bans Date.now()
to control non-determinism; pi-taskflow can embrace non-determinism and capture it
for replay and forensics. This wedge — declarative structure over imperative script
— is the foundation for every move below.
pi-taskflow is the first agent orchestrator whose workflow structure is verifiable by deterministic algorithms — not by running the workflow and hoping.
| Stage | What happens | Tokens |
|---|---|---|
| Compile-time | Dead-end phases, gate exhaustion, flow-ref integrity, concurrency topology warnings, trivial guard contradictions — caught by graph algorithms on the DAG | 0 |
| Pre-execution | Graph-position cache key per phase; cross-run memoization index consulted; matched phases reused instantly | 0 (cache hit) |
| Execution | Declarative criteria (schema conformance, path containment, structural invariants) evaluated before the LLM gate agent runs; the LLM handles only the qualitative residue | gate only |
| Post-execution | Event-sourced trace replays the run deterministically; change a gate threshold / budget and replay against cached data | 0 |
No framework does all four. LangGraph has checkpointing but no static verification and no cross-run memoization. Temporal has event-sourced replay but workflows are imperative code you can't statically analyze. Claude's JS scripts are structurally opaque.
The qualifier matters. "Structurally verifiable" = we can prove DAG integrity, reference soundness, and gate completeness — not that the LLM won't hallucinate. The tagline is Structurally Verifiable, never unqualified "provable".
| # | Idea | Attacks | Why pi-taskflow wins | Effort | Surpasses Claude? |
|---|---|---|---|---|---|
| 1 | Graph-position caching — key = phaseId(upstreamKeys):inputHash |
map fan-out cache collisions, best-of-N cache pollution | DAG position is explicit & computable at runtime; lives inside existing hashInput/cachedPhase |
S | Y |
| 2 | Static structural verification (dead-ends, gate exhaustion, flow refs, concurrency warnings, trivial contradictions) | 41.8% of multi-agent failures are spec/coordination errors (MAST); Claude has zero static checks | validateTaskflow() already does cycle detection + ref checks; the rest is graph-algorithmic on existing output |
S | Y |
| 3 | Cross-run memoization (global cache index keyed on phase input hash) | Claude/LangGraph don't share state across sessions | file-based store is inherently shareable & inspectable; needs #1 | S | Y |
| 4 | Declarative eval gates + onBlock: "retry" (retry upstream on fail, not halt) |
21.3% of failures are in verification/termination (MAST) | machine-checkable criteria run before the LLM gate; onBlock:retry is genuinely new control flow |
M | Y |
| 5 | Deterministic replay from append-only event trace | Agent Reproducibility Paradox; Claude resume is session-scoped only | PhaseState already captures inputHash/output/usage/model; upgrade to JSONL event trace, replay against recorded responses |
L | Partial (Temporal replays workflow code; we replay agent decisions) |
| 6 | OpenTelemetry GenAI export (optional peerDependency; no-op when absent) | observability gaps; Claude has no external tracing | already collect timing/tokens/status/agent/model per phase; custom taskflow.* span attributes |
S | Y |
| 7 | Multi-target DSL compilation (Mermaid + verification report + OTel template now; CNCF/GH-Actions later) | workflows trapped in framework-specific code | JSON DSL compiles to many artifacts from one source; source hash enables drift detection | M | Partial |
| 8 | Best-of-N with late binding (spawn N, take best K) — rescoped from speculative pruning | brute-force parallel blows up cost | runtime owns scheduling; graph-position keys keep pruned branches out of cache | XL→M | Partial |
| 9 | Model routing / cost optimization (cheap phases → cheap models) | per-phase cost is known; nobody auto-routes | runtime already tracks usage + enforces caps; add a route hint |
S | n/a |
| 10 | Workflow template library (4–6 battle-tested .tf.json) |
patterns re-implemented per project | dogfoods the flow sub-workflow type; reduces adoption friction |
S | n/a |
| Gap | pi-taskflow approach | Effort |
|---|---|---|
| Loop-until-done | new loop phase: "until": "{steps.X.output.done}==true" + maxIterations + convergence detection — ✅ shipped |
M |
| Tournament | new tournament phase: N variants compete, a judge sub-phase picks best/aggregate — ✅ shipped |
M |
| Worktree isolation | "cwd": "temp"/"dedicated"/"worktree" per phase; runtime creates & destroys an isolated dir (or a git worktree on a throwaway branch) — ✅ shipped |
M |
| Security quarantine | per-phase "tools": {"allow":[...], "deny":[...]} (depends on pi core tool-restriction API) |
S (if pi supports) |
| Saga/compensation | compensate phase triggered on upstream failure, reverse order |
L (defer) |
- H1 — Verifiable Foundation (~4 wks): graph-position caching → static verification → loop-until-done → cross-run memoization → OTel export → model routing. Outcome: the only orchestrator with static DAG verification + cross-run memoization + OTel.
- H2 — Quality & Portability (~4 wks): declarative eval gates (
onBlock:retry) → tournament → worktree → Mermaid+verification compilation → template library. - H3 — Research Frontier (~6 wks): deterministic replay → best-of-N late binding → quarantine → saga (deferred).
- Zero-dep vs OTel/JSON-Schema tension → resolve via optional peerDependencies (zero-deps at rest, opt-in at runtime). Don't hand-roll OTLP.
- Claude still wins: IDE integration, serverless execution, single-
.jssimplicity, Opus model quality. - LangGraph still wins: node-level checkpoint + time-travel (we're phase-level only).
- Temporal still wins: event-sourced durability + exactly-once at scale (we're a local orchestrator).
- Biggest threat: if Claude ships loops + tournaments + static analysis first, the "structured DAG" narrative erodes. The wedge is only defensible if we ship H1 fast — their imperative model makes static analysis harder, which is our time window.
Every capability claim is grounded against existing code; nothing is invented. Update as the landscape moves.