Summary
Add an optional auto mode to dev.sh so that after finishing a feature branch I can run one command and get the right level of benchmark validation for the current diff.
Motivation
Right now the workflow is clear but manual:
- normal branches should run a local benchmark check
- self-learning / replay / benchmark / rollout branches should run the local benchmark check plus baseline export
This is easy to forget and adds friction when dogfooding AceClaw while developing AceClaw.
Proposal
Extend dev.sh with an automatic mode that inspects the current branch diff against origin/main and chooses the right benchmark depth.
Suggested behavior:
./dev.sh
- keep current behavior: rebuild, restart daemon if needed, launch CLI
./dev.sh --check
- explicitly run the local benchmark check before launching
- command:
./gradlew preMergeCheck -PreplayGateStrict=false
./dev.sh --baseline
- explicitly run the local benchmark check plus baseline export before launching
- commands:
./gradlew preMergeCheck -PreplayGateStrict=false
./scripts/export-injection-audit-summary.sh
./scripts/collect-continuous-learning-baseline.sh --output .aceclaw/metrics/continuous-learning/baseline.json
./dev.sh --auto
- inspect
git diff --name-only origin/main...HEAD
- if the diff only touches normal feature areas, run
--check
- if the diff touches learning / replay / benchmark / rollout sensitive areas, run
--baseline
Suggested auto classification
Run check for general product changes such as:
aceclaw-cli/
- general daemon or core changes that are not learning / replay / rollout related
- docs-only or UX/output fixes
Upgrade to baseline when changed files touch benchmark or self-learning sensitive areas such as:
aceclaw-memory/
scripts/generate-replay-report.sh
scripts/replay-quality-gate.sh
scripts/collect-continuous-learning-baseline.sh
scripts/export-injection-audit-summary.sh
- runtime metrics / scorecard / rollout / candidate lifecycle code in daemon
Acceptance Criteria
dev.sh supports --check, --baseline, and --auto
- default
./dev.sh behavior remains fast and unchanged
--auto determines mode from git diff --name-only origin/main...HEAD
- normal feature branches run local check only
- learning / benchmark sensitive branches run local check plus baseline export
- chosen mode is printed clearly before execution
- benchmark steps fail the script if they fail
- docs are updated to explain when to use each mode
Notes
The goal is not to make dev.sh a full CI replacement. The goal is to make the local "finish branch, dogfood it, and see whether it looks healthy" workflow cheaper and more consistent.
Summary
Add an optional auto mode to
dev.shso that after finishing a feature branch I can run one command and get the right level of benchmark validation for the current diff.Motivation
Right now the workflow is clear but manual:
This is easy to forget and adds friction when dogfooding AceClaw while developing AceClaw.
Proposal
Extend
dev.shwith an automatic mode that inspects the current branch diff againstorigin/mainand chooses the right benchmark depth.Suggested behavior:
./dev.sh./dev.sh --check./gradlew preMergeCheck -PreplayGateStrict=false./dev.sh --baseline./gradlew preMergeCheck -PreplayGateStrict=false./scripts/export-injection-audit-summary.sh./scripts/collect-continuous-learning-baseline.sh --output .aceclaw/metrics/continuous-learning/baseline.json./dev.sh --autogit diff --name-only origin/main...HEAD--check--baselineSuggested auto classification
Run
checkfor general product changes such as:aceclaw-cli/Upgrade to
baselinewhen changed files touch benchmark or self-learning sensitive areas such as:aceclaw-memory/scripts/generate-replay-report.shscripts/replay-quality-gate.shscripts/collect-continuous-learning-baseline.shscripts/export-injection-audit-summary.shAcceptance Criteria
dev.shsupports--check,--baseline, and--auto./dev.shbehavior remains fast and unchanged--autodetermines mode fromgit diff --name-only origin/main...HEADNotes
The goal is not to make
dev.sha full CI replacement. The goal is to make the local "finish branch, dogfood it, and see whether it looks healthy" workflow cheaper and more consistent.