Skip to content

feat(attribution): per-source LLM request breakdown on JSON-RPC + /status (PR B, #419)#421

Merged
xinhuagu merged 3 commits into
mainfrom
feat/llm-request-attribution-protocol-419b
Apr 17, 2026
Merged

feat(attribution): per-source LLM request breakdown on JSON-RPC + /status (PR B, #419)#421
xinhuagu merged 3 commits into
mainfrom
feat/llm-request-attribution-protocol-419b

Conversation

@xinhuagu

@xinhuagu xinhuagu commented Apr 17, 2026

Copy link
Copy Markdown
Owner

Summary

PR B of #419 (epic #418). Takes the attribution data PR A (merged in #420) plumbed through the core agent types and makes it user-visible:

  1. Daemon: every JSON-RPC `agent.prompt` result's `usage` node now carries an additive `llmRequestsBySource` field — a map of lowercase source name to count, alongside the existing scalar `llmRequests`.
  2. CLI: `ContextMonitor` accumulates the per-source counts across turns.
  3. `/status`: renders an inline breakdown using short labels.

Before / After

Before (PR A only):
```
LLM requests: 8
```

After:
```
LLM requests: 8 (main=5 planner=2 cont=1)
```

When talking to an older daemon (or on a fresh session with no requests yet), the parenthetical is omitted — just the scalar.

Backward compatibility

Protocol change is strictly additive:

  • Old CLI + new daemon: old CLI ignores the new `llmRequestsBySource` field (Jackson default), keeps reading `llmRequests`. Unchanged UX.
  • New CLI + old daemon: new CLI sees no map field, monitor stays empty, `/status` renders scalar-only.
  • Same-version pair: new breakdown visible.

Invariant locked in tests: `sum(llmRequestsBySource.values()) == llmRequests` on every payload.

Behavioral fix caught along the way

PR A's review surfaced that the compaction summary LLM call was uncounted in `llmRequestCount`. Its sibling — the `LLMTaskPlanner` request on planned flows — had the same bug: the scalar reported only stepResults' counts, not the planner's one-shot. This PR threads `plannerAttribution` through the planner call site and merges it into the plan's serialized usage, so both the scalar and the per-source map match real API cost.

Resumed plans (`plan.resume` flow) don't re-run the planner — their plan was already generated and checkpointed — so that path reports `PlanExecutionResult.requestAttribution()` directly.

Wire format

```json
{
"usage": {
"inputTokens": 1234,
"outputTokens": 567,
"totalTokens": 1801,
"llmRequests": 8,
"llmRequestsBySource": {
"main_turn": 5,
"planner": 2,
"continuation": 1
}
}
}
```

Keys match the lowercase enum names. Short display names (`main`, `cont`, `compact`) live only in the CLI's display map; the wire keeps the full name so external telemetry can correlate exactly with the daemon's `RequestSource` enum.

Scope boundaries

  • Not in this PR: per-turn breakdown on the individual turn summary line and structured export to `runtime-latest.json` for offline analysis — that's PR C, the last slice.
  • Not in this PR: any policy change based on attribution data. Pure visibility.

Test plan

  • `./gradlew build` — passes (53 tasks, BUILD SUCCESSFUL)
  • `ContextMonitorTest`:
    • `recordLlmRequestsBySourceAccumulatesPerSource`
    • `recordLlmRequestsBySourceIgnoresNullEmptyAndInvalidValues`
    • `totalLlmRequestsBySourceReturnsUnmodifiableSnapshot`
  • `TerminalReplTest`:
    • `status_renderBreakdownOnlyWhenDaemonSuppliedPerSourceMap` — old-daemon compat path
    • `status_rendersInlineBreakdownFromPerSourceMap` — end-to-end /status render
    • `formatLlmRequestsBreakdown_*` — pure formatter unit tests (ordering, empty/zero, unknown-source fallthrough)

Refs #418, #419

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features
    • Added per-source tracking of LLM requests to monitor request distribution across different system components.
    • Enhanced status display to show LLM request counts broken down by source, providing improved visibility into usage patterns across the system.

… (PR B, #419)

Second slice of #419 (epic #418). Takes the attribution data PR A plumbed
through the core agent types and:

1. Serializes it onto the JSON-RPC agent.prompt result. Every usage node
   now carries an additive `llmRequestsBySource` field (map of lowercase
   source name to count) alongside the existing scalar `llmRequests`.
   Invariant: `sum(values) == llmRequests`.

2. Lets the CLI accumulate per-source totals in ContextMonitor, fed from
   that new field.

3. Renders an inline breakdown on `/status`:
     LLM requests: 8 (main=5 planner=2 cont=1)
   Short labels via a local display map; empty suffix when the daemon
   didn't send the field (old-daemon compatibility) or all counts are
   zero.

Protocol change is strictly additive — old CLIs ignore the new field
and keep reading the scalar; new CLIs on old daemons get an empty
breakdown and render the scalar alone.

Notable behavioral fix caught along the way: PR A pointed out the
MessageCompactor summary call was uncounted in llmRequestCount. Its
sibling, the LLMTaskPlanner request on planned flows, had the same
issue: `llmRequests` reflected only stepResults' counts, not the
planner's one-shot. This PR threads plannerAttribution through that
call site and merges it into the plan's serialized usage, so both the
scalar and the per-source map match real API cost.

Resumed plans (`plan.resume` flow) don't re-run the planner — their
plan was already generated and checkpointed — so that path reports
PlanExecutionResult.requestAttribution() directly.

Tests:

- ContextMonitorTest:
  * recordLlmRequestsBySourceAccumulatesPerSource — two turns'
    breakdowns sum per source.
  * recordLlmRequestsBySourceIgnoresNullEmptyAndInvalidValues — null,
    empty, blank keys, zero/negative values don't corrupt state.
  * totalLlmRequestsBySourceReturnsUnmodifiableSnapshot.

- TerminalReplTest:
  * status_renderBreakdownOnlyWhenDaemonSuppliedPerSourceMap — no
    per-source map → scalar-only line, no trailing `()`.
  * status_rendersInlineBreakdownFromPerSourceMap — end-to-end: feed
    monitor a map, /status renders `main=5 planner=2 cont=1`.
  * formatLlmRequestsBreakdown_roundTripsKnownSources — ordered
    formatter output with stable key ordering.
  * formatLlmRequestsBreakdown_emptyOrZeroReturnsEmpty — all-zero
    collapses, null/empty return "".
  * formatLlmRequestsBreakdown_unknownSourcePassesThroughAsRawKey —
    forward-compat: daemon adding a new source doesn't silently drop.

Refs #418, #419

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

@greptile-apps greptile-apps Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

@coderabbitai

coderabbitai Bot commented Apr 17, 2026

Copy link
Copy Markdown

Warning

Rate limit exceeded

@xinhuagu has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 41 minutes and 40 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 41 minutes and 40 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 55be5352-1c1c-4511-82de-7f567218a150

📥 Commits

Reviewing files that changed from the base of the PR and between 9e9b053 and dd217f0.

📒 Files selected for processing (5)
  • aceclaw-cli/src/main/java/dev/aceclaw/cli/ContextMonitor.java
  • aceclaw-cli/src/main/java/dev/aceclaw/cli/TerminalRepl.java
  • aceclaw-cli/src/test/java/dev/aceclaw/cli/ContextMonitorTest.java
  • aceclaw-cli/src/test/java/dev/aceclaw/cli/TerminalReplTest.java
  • aceclaw-daemon/src/main/java/dev/aceclaw/daemon/StreamingAgentHandler.java
📝 Walkthrough

Walkthrough

This PR adds per-source LLM request tracking across the CLI and daemon. ContextMonitor now records and exposes aggregated request counts by source using recordLlmRequestsBySource() and totalLlmRequestsBySource(). TerminalRepl parses per-source usage data and displays formatted breakdowns in /status output. StreamingAgentHandler emits per-source request counts via the new usage.llmRequestsBySource field derived from RequestAttribution tracking.

Changes

Cohort / File(s) Summary
Per-source request tracking
aceclaw-cli/src/main/java/dev/aceclaw/cli/ContextMonitor.java, aceclaw-cli/src/test/java/dev/aceclaw/cli/ContextMonitorTest.java
Added synchronized recordLlmRequestsBySource(Map<String, Integer>) and totalLlmRequestsBySource() methods to track session-scoped LLM requests by source in a LinkedHashMap, with tests verifying aggregation, immutability, and handling of null/invalid inputs.
CLI display and parsing
aceclaw-cli/src/main/java/dev/aceclaw/cli/TerminalRepl.java, aceclaw-cli/src/test/java/dev/aceclaw/cli/TerminalReplTest.java
Added JSON parsing helper parseLlmRequestsBySource() and formatting helper formatLlmRequestsBreakdown() to render per-source request counts as inline parenthetical suffix in /status output; wired parsing into task completion accounting paths with tests covering display formatting and edge cases.
Daemon attribution emission
aceclaw-daemon/src/main/java/dev/aceclaw/daemon/StreamingAgentHandler.java
Added writeLlmRequestsBySource() helper to populate usage.llmRequestsBySource from RequestAttribution by-source data; changed planned/resumed plan usage computation to derive scalar llmRequests from merged attribution totals and emit per-source breakdown; added planner attribution tracking before plan invocation.

Sequence Diagram

sequenceDiagram
    participant Agent as StreamingAgentHandler<br/>(Daemon)
    participant Planner as Planner
    participant Attribution as RequestAttribution
    participant CLI as TerminalRepl<br/>(CLI)
    participant Monitor as ContextMonitor<br/>(Session)

    Agent->>Attribution: Create plannerAttribution<br/>for planner's LLM call
    Agent->>Planner: plan(prompt, toolDefs,<br/>plannerAttribution)
    Planner->>Planner: Make LLM requests<br/>track in attribution
    Planner-->>Agent: Return plan with<br/>attribution data
    
    Agent->>Agent: Merge plan attribution<br/>with step results
    Agent->>Agent: Emit usage with<br/>llmRequestsBySource<br/>from merged attribution
    Agent-->>CLI: Send usage.llmRequestsBySource
    
    CLI->>CLI: parseLlmRequestsBySource()<br/>extract per-source map
    CLI->>Monitor: recordLlmRequestsBySource(map)
    Monitor->>Monitor: Aggregate into<br/>LinkedHashMap<String,Long>
    
    CLI->>Monitor: /status command
    Monitor-->>CLI: totalLlmRequestsBySource()
    CLI->>CLI: formatLlmRequestsBreakdown()<br/>render inline breakdown
    CLI-->>Agent: Display "(main=X, planner=Y)"
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related issues

Possibly related PRs

Suggested labels

breaking-change


Caution

Pre-merge checks failed

Please resolve all errors before merging. Addressing warnings is optional.

  • Ignore

❌ Failed checks (2 errors, 1 warning)

Check name Status Explanation Resolution
Block Major Correctness And Security Risks ❌ Error PR introduces unresolved case sensitivity bug in recordLlmRequestsBySource() violating documented lowercase invariant and terminal injection vulnerability in formatLlmRequestsBreakdown() allowing untrusted daemon keys through without sanitization. Normalize source keys via source.trim().toLowerCase(Locale.ROOT) and sanitize unknown keys using existing sanitizeTerminalText() method; add tests validating fixes.
Require Test Coverage For New Logic ❌ Error StreamingAgentHandler.java has significant business logic changes (+45/-5 lines) including new writeLlmRequestsBySource() method and RequestAttribution tracking, but no corresponding test file changes are mentioned in the PR summary. Add comprehensive unit tests for StreamingAgentHandler covering the new writeLlmRequestsBySource() method, RequestAttribution planner integration, and modified usage computation logic.
Docstring Coverage ⚠️ Warning Docstring coverage is 48.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: adding per-source LLM request breakdown exposed to JSON-RPC responses and the /status CLI output, with reference to the related issue.
No Api Breaking Changes Without Version Bump ✅ Passed PR introduces only additive public API changes (two new methods in ContextMonitor) without removing, renaming, or modifying signatures of existing public methods, maintaining full backward compatibility.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/llm-request-attribution-protocol-419b

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Self-review cleanups on PR B (#421):

- ContextMonitor: added proper imports for Collections, LinkedHashMap,
  and Map instead of fully qualifying them inline. The file already
  imports ArrayDeque / Locale, so the new types fit the existing style.

- StreamingAgentHandler: dropped `java.util.Locale.ROOT` -> `Locale.ROOT`
  at line 608; Locale was already imported. (The existing ObjectNode
  FQCN at 1977 is intentional — the file's style is to not import
  ObjectNode, so I left my matching usage that way.)

No behavior change; compiles clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9e9b0535ee

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +3099 to +3101
out.printf(" %sLLM requests:%s %d%s%n", MUTED, RESET,
contextMonitor.totalLlmRequests(),
formatLlmRequestsBreakdown(contextMonitor.totalLlmRequestsBySource()));

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Hide breakdown when per-source map is only partially populated

/status always appends formatLlmRequestsBreakdown(totalLlmRequestsBySource()) whenever the map is non-empty, but totalLlmRequests is accumulated even for turns that had no llmRequestsBySource payload. In a mixed session (e.g., some turns from an older daemon, then reconnect to a newer daemon), this prints a partial breakdown next to the full scalar total, which misreports attribution (for example LLM requests: 8 (main=3) where 5 requests are unaccounted). Consider suppressing the suffix unless the per-source sum matches the scalar total, or adding an explicit unknown bucket.

Useful? React with 👍 / 👎.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
aceclaw-daemon/src/main/java/dev/aceclaw/daemon/StreamingAgentHandler.java (1)

668-674: ⚠️ Potential issue | 🟠 Major

Accumulate RequestAttribution in adaptive continuation to preserve per-source attribution.

runTurnWithAdaptiveContinuation accumulates totalLlmRequests (line 1082) but does not accumulate RequestAttribution. When the merged turn is created at lines 1143–1145, it omits the requestAttribution parameter, causing the per-source breakdown from individual segments to be lost. This means line 673's call to writeLlmRequestsBySource() will use an empty or unset attribution, and the /status endpoint will not report per-source request counts for adaptive continuation.

Accumulate RequestAttribution alongside the scalar count, and pass the merged attribution when constructing the final turn:

var totalRequestAttribution = RequestAttribution.builder();
// Inside the loop:
totalRequestAttribution.merge(turn.requestAttribution());
// When creating merged turn:
var mergedAttribution = totalRequestAttribution.build();
var mergedTurn = new dev.aceclaw.core.agent.Turn(
        mergedMessages, lastStopReason, usage, lastCompaction, maxIterationsReached,
        budgetExhausted, budgetExhaustionReason, totalLlmRequests, mergedAttribution);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@aceclaw-daemon/src/main/java/dev/aceclaw/daemon/StreamingAgentHandler.java`
around lines 668 - 674, runTurnWithAdaptiveContinuation currently accumulates
only totalLlmRequests but loses per-source RequestAttribution when creating the
merged Turn; declare a RequestAttribution.Builder (e.g., totalRequestAttribution
= RequestAttribution.builder()) before the loop, call
totalRequestAttribution.merge(turn.requestAttribution()) inside the loop where
totalLlmRequests is accumulated, then build the merged attribution
(totalRequestAttribution.build()) and pass that as the requestAttribution
argument when constructing the merged dev.aceclaw.core.agent.Turn (instead of
omitting it) so writeLlmRequestsBySource(...) sees the correct per-source
counts.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@aceclaw-cli/src/main/java/dev/aceclaw/cli/ContextMonitor.java`:
- Around line 180-185: recordLlmRequestsBySource currently merges the incoming
map keys as-is which breaks the lowercase key contract; update
recordLlmRequestsBySource in ContextMonitor to normalize each source before
using it (e.g., trim() and toLowerCase(Locale.ROOT)) and then perform the
null/blank and count checks and merge into totalLlmRequestsBySource using the
normalizedKey; ensure you reference the same Long::sum merge and keep the
synchronized method semantics.

In `@aceclaw-cli/src/main/java/dev/aceclaw/cli/TerminalRepl.java`:
- Around line 1233-1243: The method formatLlmRequestsBreakdown currently prints
map keys directly (via LLM_REQUEST_SOURCE_DISPLAY.getOrDefault(...)), which can
include control/ANSI characters; update formatLlmRequestsBreakdown to sanitize
the resolved label before appending it to parts—trim it, replace or remove
ANSI/control sequences (e.g., strip characters matching control/escape patterns
like \u001B and other \p{Cntrl} chars), and fall back to a safe placeholder
(e.g., "unknown") if the sanitized label is empty; keep all other behavior the
same so counts remain unchanged.

---

Outside diff comments:
In `@aceclaw-daemon/src/main/java/dev/aceclaw/daemon/StreamingAgentHandler.java`:
- Around line 668-674: runTurnWithAdaptiveContinuation currently accumulates
only totalLlmRequests but loses per-source RequestAttribution when creating the
merged Turn; declare a RequestAttribution.Builder (e.g., totalRequestAttribution
= RequestAttribution.builder()) before the loop, call
totalRequestAttribution.merge(turn.requestAttribution()) inside the loop where
totalLlmRequests is accumulated, then build the merged attribution
(totalRequestAttribution.build()) and pass that as the requestAttribution
argument when constructing the merged dev.aceclaw.core.agent.Turn (instead of
omitting it) so writeLlmRequestsBySource(...) sees the correct per-source
counts.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: bee45d1c-d951-432a-a995-d611ac511891

📥 Commits

Reviewing files that changed from the base of the PR and between 082dbc2 and 9e9b053.

📒 Files selected for processing (5)
  • aceclaw-cli/src/main/java/dev/aceclaw/cli/ContextMonitor.java
  • aceclaw-cli/src/main/java/dev/aceclaw/cli/TerminalRepl.java
  • aceclaw-cli/src/test/java/dev/aceclaw/cli/ContextMonitorTest.java
  • aceclaw-cli/src/test/java/dev/aceclaw/cli/TerminalReplTest.java
  • aceclaw-daemon/src/main/java/dev/aceclaw/daemon/StreamingAgentHandler.java

Comment thread aceclaw-cli/src/main/java/dev/aceclaw/cli/ContextMonitor.java
Comment thread aceclaw-cli/src/main/java/dev/aceclaw/cli/TerminalRepl.java
Addresses CodeRabbit's three findings on #421:

1. MAJOR (outside-diff): runTurnWithAdaptiveContinuation accumulated the
   scalar llmRequestCount across segments but not the per-source
   RequestAttribution. When the merged Turn was built, its
   requestAttribution defaulted to empty — so for exactly the turns
   that ran long enough to split into segments, /status lost the
   breakdown and showed only the scalar total. Fixed by carrying a
   RequestAttribution.Builder through the segment loop, merging each
   turn.requestAttribution() as we go, and passing the build() result
   to the merged Turn's canonical constructor.

2. INLINE: ContextMonitor.recordLlmRequestsBySource now normalizes
   incoming keys via trim() + toLowerCase(Locale.ROOT) before merging
   into the running total. Defense-in-depth against a future daemon
   shipping mixed-case keys — the current daemon always sends lowercase
   so the normalization is a no-op today.

3. INLINE: formatLlmRequestsBreakdown now routes labels through
   sanitizeTerminalText, stripping ANSI / control chars from unknown
   source keys that pass through from the wire. Known labels are
   already clean but go through the same path for consistency. Empty
   sanitized labels fall back to "unknown" instead of "=N".

Tests:
- ContextMonitorTest.recordLlmRequestsBySourceNormalizesKeyCase —
  "MAIN_TURN" + "main_turn" + "Main_Turn" collapse to one bucket.
- TerminalReplTest.formatLlmRequestsBreakdown_sanitizesControlCharsInUnknownKeys —
  ANSI escapes are stripped; all-escape key collapses to "unknown".

No new daemon-side test for the adaptive-continuation attribution
preservation: exercising that path requires provoking a multi-segment
turn which needs the segmented-loop test infra that doesn't currently
exist. The fix is a three-line mechanical change (declare builder,
merge per iteration, pass to canonical ctor) verifiable by inspection.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@xinhuagu xinhuagu merged commit cc38c52 into main Apr 17, 2026
6 of 7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant