Skip to content

feat: baseline collector auto-reads runtime metrics#315

Merged
xinhuagu merged 1 commit into
mainfrom
feature/issue-308
Mar 19, 2026
Merged

feat: baseline collector auto-reads runtime metrics#315
xinhuagu merged 1 commit into
mainfrom
feature/issue-308

Conversation

@xinhuagu

@xinhuagu xinhuagu commented Mar 19, 2026

Copy link
Copy Markdown
Owner

Summary

Baseline collector now auto-reads runtime-latest.json for core metrics instead of requiring --metric overrides. Each metric output includes source field (runtime-latest.json, manual_override, or empty for pending).

Closes #308

Test plan

  • ./gradlew build passes
  • Script syntax verified

🤖 Generated with Claude Code

Greptile Summary

This PR adds automatic reading of runtime-latest.json to the baseline collector script, eliminating the need for --metric overrides for standard metrics. The priority chain (manual override → runtime file → pending) is correctly implemented and the new source field in JSON output cleanly identifies where each metric value came from.

  • The logic in read_runtime_metric is correct: it checks jq availability, guards on file existence, and only emits a value when status == "measured".
  • INJECTION_AUDIT_PATH (line 138) is defined but never referenced anywhere in the script — it should either be used or removed.
  • Both jq calls in read_runtime_metric interpolate $key directly into the filter string; using jq --arg is the idiomatic form that avoids any future quoting fragility.
  • runtime_val is not declared local inside metric_json, allowing it to leak into the global shell scope (same pre-existing issue exists for override).

Confidence Score: 4/5

  • Safe to merge; changes are additive and well-structured with only minor style issues.
  • The core logic is sound — the priority chain, jq guards, and status checks are all correct. The three issues found are all style/best-practice level (dead variable, interpolation style, missing local), none of which cause a runtime bug in the current codebase.
  • No files require special attention beyond the three style notes in scripts/collect-continuous-learning-baseline.sh.

Important Files Changed

Filename Overview
scripts/collect-continuous-learning-baseline.sh Adds read_runtime_metric to auto-populate metrics from runtime-latest.json with correct priority order (manual override > runtime file > pending). Three minor style issues: INJECTION_AUDIT_PATH is declared but never used, jq filters use bash string interpolation instead of --arg, and runtime_val is not declared local.
Prompt To Fix All With AI
This is a comment left during a code review.
Path: scripts/collect-continuous-learning-baseline.sh
Line: 138

Comment:
**Unused variable — dead code**

`INJECTION_AUDIT_PATH` is declared here but is never referenced anywhere else in the script. Either this was meant to be used in this PR (and was forgotten), or it should be removed to keep the script clean.

```suggestion
RUNTIME_METRICS_PATH="$PROJECT_ROOT/.aceclaw/metrics/continuous-learning/runtime-latest.json"
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: scripts/collect-continuous-learning-baseline.sh
Line: 144-146

Comment:
**Prefer `jq --arg` over bash string interpolation in filters**

Both `jq` calls interpolate `$key` directly into the filter string. While the keys are all statically defined alphanumeric+underscore strings (so there's no real injection risk today), using `--arg` is the idiomatic, safer form that avoids any quoting fragility if new keys are ever added.

```suggestion
    status="$(jq -r --arg k "$key" '.metrics[$k].status // ""' "$RUNTIME_METRICS_PATH" 2>/dev/null)"
    if [[ "$status" == "measured" ]]; then
      jq -r --arg k "$key" '.metrics[$k].value // "null"' "$RUNTIME_METRICS_PATH" 2>/dev/null
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: scripts/collect-continuous-learning-baseline.sh
Line: 167-170

Comment:
**`runtime_val` not declared `local`**

`runtime_val` leaks into the global shell scope because it lacks a `local` declaration, mirroring the pre-existing behaviour of `override`. While this doesn't cause a functional bug today (each loop iteration overwrites it), declaring it `local` inside `metric_json` would make the scoping explicit and consistent.

```suggestion
  elif local runtime_val; runtime_val="$(read_runtime_metric "$key")"; then
```

Or, more portably, add `local runtime_val` alongside the other local declarations at the top of the function and keep the `elif` line unchanged.

How can I resolve this? If you propose a fix, please make it concise.

Last reviewed commit: "feat: baseline colle..."

Greptile also left 3 inline comments on this PR.

Baseline collector now reads exported runtime metrics (runtime-latest.json)
before falling back to manual --metric overrides or pending_instrumentation.

Priority: manual override > runtime-latest.json > pending
Each metric now includes 'source' field for provenance tracking.

Closes #308

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented Mar 19, 2026

Copy link
Copy Markdown

Warning

Rate limit exceeded

@xinhuagu has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 19 minutes and 12 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: d0547d9f-54fb-454d-b74d-b440127b02eb

📥 Commits

Reviewing files that changed from the base of the PR and between 0d9cf5f and 9ed0da9.

📒 Files selected for processing (1)
  • scripts/collect-continuous-learning-baseline.sh
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feature/issue-308
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@xinhuagu xinhuagu merged commit b9a20e4 into main Mar 19, 2026
4 checks passed
@xinhuagu xinhuagu deleted the feature/issue-308 branch March 19, 2026 22:28
}

RUNTIME_METRICS_PATH="$PROJECT_ROOT/.aceclaw/metrics/continuous-learning/runtime-latest.json"
INJECTION_AUDIT_PATH="$PROJECT_ROOT/.aceclaw/memory/injection-audit.jsonl"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Unused variable — dead code

INJECTION_AUDIT_PATH is declared here but is never referenced anywhere else in the script. Either this was meant to be used in this PR (and was forgotten), or it should be removed to keep the script clean.

Suggested change
INJECTION_AUDIT_PATH="$PROJECT_ROOT/.aceclaw/memory/injection-audit.jsonl"
RUNTIME_METRICS_PATH="$PROJECT_ROOT/.aceclaw/metrics/continuous-learning/runtime-latest.json"
Prompt To Fix With AI
This is a comment left during a code review.
Path: scripts/collect-continuous-learning-baseline.sh
Line: 138

Comment:
**Unused variable — dead code**

`INJECTION_AUDIT_PATH` is declared here but is never referenced anywhere else in the script. Either this was meant to be used in this PR (and was forgotten), or it should be removed to keep the script clean.

```suggestion
RUNTIME_METRICS_PATH="$PROJECT_ROOT/.aceclaw/metrics/continuous-learning/runtime-latest.json"
```

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +144 to +146
status="$(jq -r ".metrics.\"$key\".status // \"\"" "$RUNTIME_METRICS_PATH" 2>/dev/null)"
if [[ "$status" == "measured" ]]; then
jq -r ".metrics.\"$key\".value // \"null\"" "$RUNTIME_METRICS_PATH" 2>/dev/null

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Prefer jq --arg over bash string interpolation in filters

Both jq calls interpolate $key directly into the filter string. While the keys are all statically defined alphanumeric+underscore strings (so there's no real injection risk today), using --arg is the idiomatic, safer form that avoids any quoting fragility if new keys are ever added.

Suggested change
status="$(jq -r ".metrics.\"$key\".status // \"\"" "$RUNTIME_METRICS_PATH" 2>/dev/null)"
if [[ "$status" == "measured" ]]; then
jq -r ".metrics.\"$key\".value // \"null\"" "$RUNTIME_METRICS_PATH" 2>/dev/null
status="$(jq -r --arg k "$key" '.metrics[$k].status // ""' "$RUNTIME_METRICS_PATH" 2>/dev/null)"
if [[ "$status" == "measured" ]]; then
jq -r --arg k "$key" '.metrics[$k].value // "null"' "$RUNTIME_METRICS_PATH" 2>/dev/null
Prompt To Fix With AI
This is a comment left during a code review.
Path: scripts/collect-continuous-learning-baseline.sh
Line: 144-146

Comment:
**Prefer `jq --arg` over bash string interpolation in filters**

Both `jq` calls interpolate `$key` directly into the filter string. While the keys are all statically defined alphanumeric+underscore strings (so there's no real injection risk today), using `--arg` is the idiomatic, safer form that avoids any quoting fragility if new keys are ever added.

```suggestion
    status="$(jq -r --arg k "$key" '.metrics[$k].status // ""' "$RUNTIME_METRICS_PATH" 2>/dev/null)"
    if [[ "$status" == "measured" ]]; then
      jq -r --arg k "$key" '.metrics[$k].value // "null"' "$RUNTIME_METRICS_PATH" 2>/dev/null
```

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +167 to +170
elif runtime_val="$(read_runtime_metric "$key")"; then
val="$runtime_val"
status="measured"
source="runtime-latest.json"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 runtime_val not declared local

runtime_val leaks into the global shell scope because it lacks a local declaration, mirroring the pre-existing behaviour of override. While this doesn't cause a functional bug today (each loop iteration overwrites it), declaring it local inside metric_json would make the scoping explicit and consistent.

Suggested change
elif runtime_val="$(read_runtime_metric "$key")"; then
val="$runtime_val"
status="measured"
source="runtime-latest.json"
elif local runtime_val; runtime_val="$(read_runtime_metric "$key")"; then

Or, more portably, add local runtime_val alongside the other local declarations at the top of the function and keep the elif line unchanged.

Prompt To Fix With AI
This is a comment left during a code review.
Path: scripts/collect-continuous-learning-baseline.sh
Line: 167-170

Comment:
**`runtime_val` not declared `local`**

`runtime_val` leaks into the global shell scope because it lacks a `local` declaration, mirroring the pre-existing behaviour of `override`. While this doesn't cause a functional bug today (each loop iteration overwrites it), declaring it `local` inside `metric_json` would make the scoping explicit and consistent.

```suggestion
  elif local runtime_val; runtime_val="$(read_runtime_metric "$key")"; then
```

Or, more portably, add `local runtime_val` alongside the other local declarations at the top of the function and keep the `elif` line unchanged.

How can I resolve this? If you propose a fix, please make it concise.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

xinhuagu added a commit that referenced this pull request Mar 19, 2026
PR #313:
- Restore replaySuiteMinPerCategory default to 3 (was lowered to 1)
- Redistribute sample cases: workflow_reuse 10→5, others 5→6-7 each
- Fix CI-short: tools-short-gate moved from workflow_reuse to adversarial

PR #314:
- Fix lifecycle metric key mismatch: extract promotion_precision and
  false_learning_rate (was promotion_rate/demotion_rate — safety gate
  was silently disabled)
- Clean up stale sampleSizes key after first_try rename
- Fix NaN handling: use explicit putNull instead of ambiguous put overload

PR #315:
- Remove dead INJECTION_AUDIT_PATH variable
- Use jq --arg instead of string interpolation in filters
- Add local declarations for override and runtime_val

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
xinhuagu added a commit that referenced this pull request Mar 19, 2026
* fix: address Greptile review findings from PRs 313-315

PR #313:
- Restore replaySuiteMinPerCategory default to 3 (was lowered to 1)
- Redistribute sample cases: workflow_reuse 10→5, others 5→6-7 each
- Fix CI-short: tools-short-gate moved from workflow_reuse to adversarial

PR #314:
- Fix lifecycle metric key mismatch: extract promotion_precision and
  false_learning_rate (was promotion_rate/demotion_rate — safety gate
  was silently disabled)
- Clean up stale sampleSizes key after first_try rename
- Fix NaN handling: use explicit putNull instead of ambiguous put overload

PR #315:
- Remove dead INJECTION_AUDIT_PATH variable
- Use jq --arg instead of string interpolation in filters
- Add local declarations for override and runtime_val

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: add promotion_precision and false_learning_rate to replay report

CodeRabbit finding: BenchmarkScorecardCli extracts promotion_precision
and false_learning_rate, but generate-replay-report.sh only produces
promotion_rate and demotion_rate. Safety metrics were always missing.

Added both as pending_instrumentation in report output — they need
real data from candidate lifecycle to become measured, but the keys
now exist so BenchmarkScorecard can detect them.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test: add BenchmarkScorecardCli integration tests

4 tests covering:
- Full report parsing and scorecard JSON output
- promotion_precision and false_learning_rate extraction and evaluation
- pending_instrumentation metrics show INSUFFICIENT_DATA
- Missing replay report produces all INSUFFICIENT_DATA (no false failures)

Addresses CodeRabbit review requirement for integration test coverage.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat(benchmark): replace manual baseline metric overrides with exported runtime and lifecycle inputs

1 participant