fix: review findings from PRs 312-315 (3 P1 + 5 P2)#316
Conversation
P1: Add benchmarkScorecard to ops doc dependency chain diagram
P1: Update replay-prompts-pr-sample.json to canonical categories
P1: BenchmarkScorecardCli: catch JSON parse errors, exit code 2
P2: Baseline collector: validate JSON before reading, skip null values,
set source='none' for pending metrics, add corruption warning
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Warning Rate limit exceeded
⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (4)
✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
| if ! jq empty "$RUNTIME_METRICS_PATH" 2>/dev/null; then | ||
| echo "WARNING: runtime-latest.json exists but contains invalid JSON" >&2 | ||
| return 1 | ||
| fi |
There was a problem hiding this comment.
Corruption warning emitted once per metric key
read_runtime_metric is called from inside metric_json, which is invoked for every entry in metric_keys (currently 21 keys). If runtime-latest.json is corrupt, the WARNING: runtime-latest.json exists but contains invalid JSON message will be printed 21 times — once per metric — making the output very noisy.
Consider validating the file a single time before the metric-collection loop using a module-level flag variable, then skipping the per-call jq empty check once the file is already known to be invalid. This would ensure the warning is printed exactly once regardless of how many metrics are queried.
Prompt To Fix With AI
This is a comment left during a code review.
Path: scripts/collect-continuous-learning-baseline.sh
Line: 143-146
Comment:
**Corruption warning emitted once per metric key**
`read_runtime_metric` is called from inside `metric_json`, which is invoked for every entry in `metric_keys` (currently 21 keys). If `runtime-latest.json` is corrupt, the `WARNING: runtime-latest.json exists but contains invalid JSON` message will be printed 21 times — once per metric — making the output very noisy.
Consider validating the file a single time before the metric-collection loop using a module-level flag variable, then skipping the per-call `jq empty` check once the file is already known to be invalid. This would ensure the warning is printed exactly once regardless of how many metrics are queried.
How can I resolve this? If you propose a fix, please make it concise.
Summary
Fixes from code review of PRs #312-#315:
benchmarkScorecardto preMergeCheck dependency diagramreplay-prompts-pr-sample.jsonold categories → canonical taxonomyBenchmarkScorecardCli: wrap JSON parsing in try-catch, proper exit code 2source="none"for pending, add corruption warningTest plan
./gradlew buildpasses🤖 Generated with Claude Code
Greptile Summary
This PR addresses 3 P1 and 5 P2 findings from the previous batch of PRs (#312–315), covering a missing CI task in documentation, stale taxonomy values in a sample fixture, defensive error handling in the Java CLI, and several robustness improvements to the baseline-collection shell script.
Changes:
docs/continuous-learning-operations.md: AddsbenchmarkScorecardas a sibling ofreplayQualityGateunderpreMergeCheckin both the prose description and the ASCII dependency tree — correctly reflecting the Gradle task wiring.docs/reports/samples/replay-prompts-pr-sample.json: Migrates all 15 test-case categories from the old informal taxonomy (core,tools,permissions,timeouts,regression) to the canonical taxonomy (workflow_reuse,user_correction,error_recovery,adversarial). All IDs, prompts, and other metadata are preserved.BenchmarkScorecardCli.java: Wraps both JSON-parsing blocks intry-catch. Replay-report parse failures are treated as fatal (exit code 2), while runtime-metrics failures are treated as non-fatal (a warning is printed and the scorecard proceeds without them). The differentiation is well-reasoned and consistent with the class-level Javadoc.collect-continuous-learning-baseline.sh: Adds ajq emptyJSON-integrity check before reading runtime metrics, fixesread_runtime_metricso it correctly returns 1 (instead of the literal string"null") when a value is absent, and initialises thesourcefield to"none"for pending metrics. One P2 issue: becauseread_runtime_metricis called once per metric key (~21 times), a corruptruntime-latest.jsonwill cause the corruption warning to be printed 21 times.Confidence Score: 4/5
runtime-latest.jsoncauses the same warning to be emitted 21 times. No functional correctness issues were found.Important Files Changed
benchmarkScorecardas a sibling dependency ofreplayQualityGateunderpreMergeCheckin both the prose bullet list and the ASCII dependency tree. Documentation matches the intended Gradle task wiring.jq empty, fixes the null-value guard soread_runtime_metricreturns 1 instead of emitting the literal string "null", and initialisessourceto "none" for pending metrics. One issue: the corruption warning fires once per metric key (~21×) when the file is invalid.Prompt To Fix All With AI
Last reviewed commit: "fix: review findings..."