Skip to content

feat(dev): auto-select benchmark mode in dev.sh based on changed files #330

Description

@xinhuagu

Summary

Add an optional auto mode to dev.sh so that after finishing a feature branch I can run one command and get the right level of benchmark validation for the current diff.

Motivation

Right now the workflow is clear but manual:

  • normal branches should run a local benchmark check
  • self-learning / replay / benchmark / rollout branches should run the local benchmark check plus baseline export

This is easy to forget and adds friction when dogfooding AceClaw while developing AceClaw.

Proposal

Extend dev.sh with an automatic mode that inspects the current branch diff against origin/main and chooses the right benchmark depth.

Suggested behavior:

  • ./dev.sh
    • keep current behavior: rebuild, restart daemon if needed, launch CLI
  • ./dev.sh --check
    • explicitly run the local benchmark check before launching
    • command: ./gradlew preMergeCheck -PreplayGateStrict=false
  • ./dev.sh --baseline
    • explicitly run the local benchmark check plus baseline export before launching
    • commands:
      • ./gradlew preMergeCheck -PreplayGateStrict=false
      • ./scripts/export-injection-audit-summary.sh
      • ./scripts/collect-continuous-learning-baseline.sh --output .aceclaw/metrics/continuous-learning/baseline.json
  • ./dev.sh --auto
    • inspect git diff --name-only origin/main...HEAD
    • if the diff only touches normal feature areas, run --check
    • if the diff touches learning / replay / benchmark / rollout sensitive areas, run --baseline

Suggested auto classification

Run check for general product changes such as:

  • aceclaw-cli/
  • general daemon or core changes that are not learning / replay / rollout related
  • docs-only or UX/output fixes

Upgrade to baseline when changed files touch benchmark or self-learning sensitive areas such as:

  • aceclaw-memory/
  • scripts/generate-replay-report.sh
  • scripts/replay-quality-gate.sh
  • scripts/collect-continuous-learning-baseline.sh
  • scripts/export-injection-audit-summary.sh
  • runtime metrics / scorecard / rollout / candidate lifecycle code in daemon

Acceptance Criteria

  • dev.sh supports --check, --baseline, and --auto
  • default ./dev.sh behavior remains fast and unchanged
  • --auto determines mode from git diff --name-only origin/main...HEAD
  • normal feature branches run local check only
  • learning / benchmark sensitive branches run local check plus baseline export
  • chosen mode is printed clearly before execution
  • benchmark steps fail the script if they fail
  • docs are updated to explain when to use each mode

Notes

The goal is not to make dev.sh a full CI replacement. The goal is to make the local "finish branch, dogfood it, and see whether it looks healthy" workflow cheaper and more consistent.

Metadata

Metadata

Assignees

No one assigned

    Labels

    documentationImprovements or additions to documentationenhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions