feat: baseline collector auto-reads runtime metrics#315
Conversation
Baseline collector now reads exported runtime metrics (runtime-latest.json) before falling back to manual --metric overrides or pending_instrumentation. Priority: manual override > runtime-latest.json > pending Each metric now includes 'source' field for provenance tracking. Closes #308 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Warning Rate limit exceeded
⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
| } | ||
|
|
||
| RUNTIME_METRICS_PATH="$PROJECT_ROOT/.aceclaw/metrics/continuous-learning/runtime-latest.json" | ||
| INJECTION_AUDIT_PATH="$PROJECT_ROOT/.aceclaw/memory/injection-audit.jsonl" |
There was a problem hiding this comment.
INJECTION_AUDIT_PATH is declared here but is never referenced anywhere else in the script. Either this was meant to be used in this PR (and was forgotten), or it should be removed to keep the script clean.
| INJECTION_AUDIT_PATH="$PROJECT_ROOT/.aceclaw/memory/injection-audit.jsonl" | |
| RUNTIME_METRICS_PATH="$PROJECT_ROOT/.aceclaw/metrics/continuous-learning/runtime-latest.json" |
Prompt To Fix With AI
This is a comment left during a code review.
Path: scripts/collect-continuous-learning-baseline.sh
Line: 138
Comment:
**Unused variable — dead code**
`INJECTION_AUDIT_PATH` is declared here but is never referenced anywhere else in the script. Either this was meant to be used in this PR (and was forgotten), or it should be removed to keep the script clean.
```suggestion
RUNTIME_METRICS_PATH="$PROJECT_ROOT/.aceclaw/metrics/continuous-learning/runtime-latest.json"
```
How can I resolve this? If you propose a fix, please make it concise.| status="$(jq -r ".metrics.\"$key\".status // \"\"" "$RUNTIME_METRICS_PATH" 2>/dev/null)" | ||
| if [[ "$status" == "measured" ]]; then | ||
| jq -r ".metrics.\"$key\".value // \"null\"" "$RUNTIME_METRICS_PATH" 2>/dev/null |
There was a problem hiding this comment.
Prefer
jq --arg over bash string interpolation in filters
Both jq calls interpolate $key directly into the filter string. While the keys are all statically defined alphanumeric+underscore strings (so there's no real injection risk today), using --arg is the idiomatic, safer form that avoids any quoting fragility if new keys are ever added.
| status="$(jq -r ".metrics.\"$key\".status // \"\"" "$RUNTIME_METRICS_PATH" 2>/dev/null)" | |
| if [[ "$status" == "measured" ]]; then | |
| jq -r ".metrics.\"$key\".value // \"null\"" "$RUNTIME_METRICS_PATH" 2>/dev/null | |
| status="$(jq -r --arg k "$key" '.metrics[$k].status // ""' "$RUNTIME_METRICS_PATH" 2>/dev/null)" | |
| if [[ "$status" == "measured" ]]; then | |
| jq -r --arg k "$key" '.metrics[$k].value // "null"' "$RUNTIME_METRICS_PATH" 2>/dev/null |
Prompt To Fix With AI
This is a comment left during a code review.
Path: scripts/collect-continuous-learning-baseline.sh
Line: 144-146
Comment:
**Prefer `jq --arg` over bash string interpolation in filters**
Both `jq` calls interpolate `$key` directly into the filter string. While the keys are all statically defined alphanumeric+underscore strings (so there's no real injection risk today), using `--arg` is the idiomatic, safer form that avoids any quoting fragility if new keys are ever added.
```suggestion
status="$(jq -r --arg k "$key" '.metrics[$k].status // ""' "$RUNTIME_METRICS_PATH" 2>/dev/null)"
if [[ "$status" == "measured" ]]; then
jq -r --arg k "$key" '.metrics[$k].value // "null"' "$RUNTIME_METRICS_PATH" 2>/dev/null
```
How can I resolve this? If you propose a fix, please make it concise.| elif runtime_val="$(read_runtime_metric "$key")"; then | ||
| val="$runtime_val" | ||
| status="measured" | ||
| source="runtime-latest.json" |
There was a problem hiding this comment.
runtime_val not declared local
runtime_val leaks into the global shell scope because it lacks a local declaration, mirroring the pre-existing behaviour of override. While this doesn't cause a functional bug today (each loop iteration overwrites it), declaring it local inside metric_json would make the scoping explicit and consistent.
| elif runtime_val="$(read_runtime_metric "$key")"; then | |
| val="$runtime_val" | |
| status="measured" | |
| source="runtime-latest.json" | |
| elif local runtime_val; runtime_val="$(read_runtime_metric "$key")"; then |
Or, more portably, add local runtime_val alongside the other local declarations at the top of the function and keep the elif line unchanged.
Prompt To Fix With AI
This is a comment left during a code review.
Path: scripts/collect-continuous-learning-baseline.sh
Line: 167-170
Comment:
**`runtime_val` not declared `local`**
`runtime_val` leaks into the global shell scope because it lacks a `local` declaration, mirroring the pre-existing behaviour of `override`. While this doesn't cause a functional bug today (each loop iteration overwrites it), declaring it `local` inside `metric_json` would make the scoping explicit and consistent.
```suggestion
elif local runtime_val; runtime_val="$(read_runtime_metric "$key")"; then
```
Or, more portably, add `local runtime_val` alongside the other local declarations at the top of the function and keep the `elif` line unchanged.
How can I resolve this? If you propose a fix, please make it concise.Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
PR #313: - Restore replaySuiteMinPerCategory default to 3 (was lowered to 1) - Redistribute sample cases: workflow_reuse 10→5, others 5→6-7 each - Fix CI-short: tools-short-gate moved from workflow_reuse to adversarial PR #314: - Fix lifecycle metric key mismatch: extract promotion_precision and false_learning_rate (was promotion_rate/demotion_rate — safety gate was silently disabled) - Clean up stale sampleSizes key after first_try rename - Fix NaN handling: use explicit putNull instead of ambiguous put overload PR #315: - Remove dead INJECTION_AUDIT_PATH variable - Use jq --arg instead of string interpolation in filters - Add local declarations for override and runtime_val Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: address Greptile review findings from PRs 313-315 PR #313: - Restore replaySuiteMinPerCategory default to 3 (was lowered to 1) - Redistribute sample cases: workflow_reuse 10→5, others 5→6-7 each - Fix CI-short: tools-short-gate moved from workflow_reuse to adversarial PR #314: - Fix lifecycle metric key mismatch: extract promotion_precision and false_learning_rate (was promotion_rate/demotion_rate — safety gate was silently disabled) - Clean up stale sampleSizes key after first_try rename - Fix NaN handling: use explicit putNull instead of ambiguous put overload PR #315: - Remove dead INJECTION_AUDIT_PATH variable - Use jq --arg instead of string interpolation in filters - Add local declarations for override and runtime_val Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: add promotion_precision and false_learning_rate to replay report CodeRabbit finding: BenchmarkScorecardCli extracts promotion_precision and false_learning_rate, but generate-replay-report.sh only produces promotion_rate and demotion_rate. Safety metrics were always missing. Added both as pending_instrumentation in report output — they need real data from candidate lifecycle to become measured, but the keys now exist so BenchmarkScorecard can detect them. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: add BenchmarkScorecardCli integration tests 4 tests covering: - Full report parsing and scorecard JSON output - promotion_precision and false_learning_rate extraction and evaluation - pending_instrumentation metrics show INSUFFICIENT_DATA - Missing replay report produces all INSUFFICIENT_DATA (no false failures) Addresses CodeRabbit review requirement for integration test coverage. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Summary
Baseline collector now auto-reads
runtime-latest.jsonfor core metrics instead of requiring--metricoverrides. Each metric output includessourcefield (runtime-latest.json,manual_override, or empty for pending).Closes #308
Test plan
./gradlew buildpasses🤖 Generated with Claude Code
Greptile Summary
This PR adds automatic reading of
runtime-latest.jsonto the baseline collector script, eliminating the need for--metricoverrides for standard metrics. The priority chain (manual override → runtime file → pending) is correctly implemented and the newsourcefield in JSON output cleanly identifies where each metric value came from.read_runtime_metricis correct: it checksjqavailability, guards on file existence, and only emits a value whenstatus == "measured".INJECTION_AUDIT_PATH(line 138) is defined but never referenced anywhere in the script — it should either be used or removed.jqcalls inread_runtime_metricinterpolate$keydirectly into the filter string; usingjq --argis the idiomatic form that avoids any future quoting fragility.runtime_valis not declaredlocalinsidemetric_json, allowing it to leak into the global shell scope (same pre-existing issue exists foroverride).Confidence Score: 4/5
jqguards, and status checks are all correct. The three issues found are all style/best-practice level (dead variable, interpolation style, missinglocal), none of which cause a runtime bug in the current codebase.scripts/collect-continuous-learning-baseline.sh.Important Files Changed
read_runtime_metricto auto-populate metrics fromruntime-latest.jsonwith correct priority order (manual override > runtime file > pending). Three minor style issues:INJECTION_AUDIT_PATHis declared but never used,jqfilters use bash string interpolation instead of--arg, andruntime_valis not declaredlocal.Prompt To Fix All With AI
Last reviewed commit: "feat: baseline colle..."