Skip to content

chore: bump planner threshold 3→4 (follow-up to #467)#468

Merged
xinhuagu merged 1 commit into
mainfrom
chore/planner-threshold-4
May 1, 2026
Merged

chore: bump planner threshold 3→4 (follow-up to #467)#468
xinhuagu merged 1 commit into
mainfrom
chore/planner-threshold-4

Conversation

@xinhuagu

@xinhuagu xinhuagu commented May 1, 2026

Copy link
Copy Markdown
Owner

Summary

Calibration follow-up to #467. Initial drop to 3 was too noisy — every single `refactor X` / `extract Y` triggered a planner LLM call before any actual work (the REFACTORING regex matches `extract` too, so single-line extractions also got caught). 4 is the middle ground:

Prompt Score T=3 (was) T=4 (now)
"refactor X" 3 ✓ plan
"extract method" 3 ✓ plan
"do A and then B" 3 ✓ plan
"refactor X then add tests" 5
"investigate then refactor Y" 5
"rename across files" 3+2=5

Users who want planning on a borderline single-signal prompt use `/plan ` (also from #467) — that's the escape hatch that makes this trade-off OK.

Test plan

  • Updated `refactoring_singleSignalDoesNotPlan_atDefaultThreshold` (flipped assertion: `shouldPlan() == false`)
  • New `refactoring_plusSecondSignal_plans`: "refactor X and add tests" → score ≥ 4 → `shouldPlan() == true`
  • `./gradlew :aceclaw-core:test :aceclaw-daemon:test :aceclaw-cli:test` — all green

🤖 Generated with Claude Code

Initial drop to 3 in #467 went too far the other way: every single
"refactor X" / "extract Y" hit threshold 3 and triggered a planner
LLM call before any actual work, even on trivial prompts (REFACTORING
regex matches "extract" too, so single-line extractions also got
caught). 4 is the calibrated middle ground:

  - Threshold 5: needed two explicit signals → planner never fired
  - Threshold 3: any +3 signal alone triggered → too noisy
  - Threshold 4: single-signal +3 stays plain ReAct, +3 with ANY
    second signal (long description, second action, multiple files,
    testing, …) triggers planning

Users who want planning on a borderline single-signal prompt now use
the /plan slash command (also from #467) — which is exactly the
escape hatch that makes this trade-off acceptable.

Updated test: refactoring_singleSignalDoesNotPlan_atDefaultThreshold
(was refactoring_highScore — flipped the shouldPlan assertion to
false). New test refactoring_plusSecondSignal_plans pins the
"+3 with anything else triggers" rule so future threshold tweaks
have to update an explicit assertion rather than silently drift.

All test suites green: aceclaw-core, aceclaw-cli, aceclaw-daemon.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented May 1, 2026

Copy link
Copy Markdown

Warning

Rate limit exceeded

@xinhuagu has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 40 minutes and 10 seconds before requesting another review.

To keep reviews running without waiting, you can enable usage-based add-on for your organization. This allows additional reviews beyond the hourly cap. Account admins can enable it under billing.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: e6d0cc49-618a-4ee8-bd84-0673cdb743cc

📥 Commits

Reviewing files that changed from the base of the PR and between a5482e6 and 70e5132.

📒 Files selected for processing (3)
  • aceclaw-core/src/main/java/dev/aceclaw/core/planner/ComplexityEstimator.java
  • aceclaw-core/src/test/java/dev/aceclaw/core/planner/ComplexityEstimatorTest.java
  • aceclaw-daemon/src/main/java/dev/aceclaw/daemon/AceClawConfig.java
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch chore/planner-threshold-4

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
Review rate limit: 0/1 reviews remaining, refill in 40 minutes and 10 seconds.

Comment @coderabbitai help to get the list of available commands and usage tips.

@greptile-apps greptile-apps Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

@xinhuagu xinhuagu merged commit 747a38a into main May 1, 2026
7 of 8 checks passed
@qodo-code-review

Copy link
Copy Markdown

Review Summary by Qodo

Bump planner threshold 3→4 to reduce false-positive planning triggers

✨ Enhancement

Grey Divider

Walkthroughs

Description
• Bump planner complexity threshold from 3 to 4
• Single +3 signals no longer trigger planning automatically
• Adding any second signal to +3 prompt enables planning
• Updated tests to pin threshold behavior and prevent silent regressions
Diagram
flowchart LR
  A["Single +3 signal<br/>e.g. refactor X"] -->|"Threshold 3"| B["Plan triggered<br/>too noisy"]
  A -->|"Threshold 4"| C["Plain ReAct<br/>no LLM call"]
  D["Single +3 signal<br/>+ second signal<br/>e.g. refactor + tests"] -->|"Threshold 4"| E["Plan triggered<br/>appropriate"]
  F["/plan command"] -->|"Force planning"| E
Loading

Grey Divider

File Changes

1. aceclaw-core/src/main/java/dev/aceclaw/core/planner/ComplexityEstimator.java ⚙️ Configuration changes +8/-5

Increase default complexity threshold to 4

• Bumped DEFAULT_THRESHOLD constant from 3 to 4
• Expanded javadoc explaining threshold calibration rationale
• Documented trade-off: threshold 3 too noisy, threshold 5 too strict
• Clarified that single +3 signals stay as ReAct, second signals trigger planning

aceclaw-core/src/main/java/dev/aceclaw/core/planner/ComplexityEstimator.java


2. aceclaw-core/src/test/java/dev/aceclaw/core/planner/ComplexityEstimatorTest.java 🧪 Tests +23/-7

Update tests to pin threshold 4 behavior

• Renamed refactoring_highScore() to refactoring_singleSignalDoesNotPlan_atDefaultThreshold()
• Flipped assertion from shouldPlan() == true to shouldPlan() == false
• Added new test refactoring_plusSecondSignal_plans() to verify second signal triggers planning
• Added detailed comments explaining threshold behavior and escape hatch via /plan command

aceclaw-core/src/test/java/dev/aceclaw/core/planner/ComplexityEstimatorTest.java


3. aceclaw-daemon/src/main/java/dev/aceclaw/daemon/AceClawConfig.java ⚙️ Configuration changes +21/-11

Update config threshold and calibration documentation

• Bumped DEFAULT_PLANNER_THRESHOLD constant from 3 to 4
• Expanded javadoc with detailed threshold calibration history and rationale
• Documented why threshold 5 was too strict and threshold 3 was too noisy
• Explained middle-ground approach and /plan command escape hatch

aceclaw-daemon/src/main/java/dev/aceclaw/daemon/AceClawConfig.java


Grey Divider

ⓘ You are approaching your monthly quota for Qodo. Upgrade your plan

Qodo Logo

@qodo-code-review

qodo-code-review Bot commented May 1, 2026

Copy link
Copy Markdown

Code Review by Qodo

🐞 Bugs (3) 📘 Rule violations (0)

Grey Divider


Action required

1. PlannerThreshold Javadoc outdated 🐞 Bug ⚙ Maintainability
Description
AceClawConfig.plannerThreshold() Javadoc still claims the default is 5 even though this PR changes
DEFAULT_PLANNER_THRESHOLD to 4, so public-facing documentation becomes incorrect and can cause wrong
tuning expectations.
Code

aceclaw-daemon/src/main/java/dev/aceclaw/daemon/AceClawConfig.java[R78-99]

+     * Default complexity score for triggering the planner. Bumped
+     * from 5 → 4 (initially landed at 3, dialled back after review).
+     *
+     * <p>Threshold 5 required two explicit signals — most everyday
+     * agentic prompts hit at most one, so the planner essentially
+     * never fired. Threshold 3 went too far the other way: every
+     * single "refactor X" / "extract Y" (REFACTORING regex matches
+     * "extract" too) triggered a planner LLM call before any actual
+     * work, even on trivial prompts.
+     *
+     * <p>Threshold 4 is the middle ground: single-signal +3 prompts
+     * ("refactor X" alone) stay as plain ReAct, but adding ANY
+     * second signal (a long description, a second action, multiple
+     * files, testing, …) flips on planning. Users who explicitly
+     * want the planner on a borderline prompt can use
+     * {@code /plan <prompt>} as the escape hatch — that bypasses
+     * this heuristic entirely.
+     *
+     * <p>See {@link ComplexityEstimator} for the score table.
+     */
+    private static final int DEFAULT_PLANNER_THRESHOLD = 4;
    private static final boolean DEFAULT_ADAPTIVE_REPLAN_ENABLED = true;
Evidence
This PR updates the actual default threshold constant to 4, but the accessor documentation in the
same class still states the default is 5, which is now false.

aceclaw-daemon/src/main/java/dev/aceclaw/daemon/AceClawConfig.java[77-99]
aceclaw-daemon/src/main/java/dev/aceclaw/daemon/AceClawConfig.java[1022-1036]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
`AceClawConfig.DEFAULT_PLANNER_THRESHOLD` is now 4, but the Javadoc for `plannerThreshold()` still says the default is 5.

### Issue Context
This mismatch is user-facing (API docs) and can lead to incorrect assumptions when configuring or debugging planner behavior.

### Fix Focus Areas
- aceclaw-daemon/src/main/java/dev/aceclaw/daemon/AceClawConfig.java[1022-1036]
- aceclaw-daemon/src/main/java/dev/aceclaw/daemon/AceClawConfig.java[77-99]

### Suggested change
Update the `plannerThreshold()` Javadoc to state the correct default (4), ideally referencing the constant (e.g., `Defaults to {@value #DEFAULT_PLANNER_THRESHOLD}.`).

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools



Remediation recommended

2. Handler default threshold inconsistent 🐞 Bug ☼ Reliability
Description
StreamingAgentHandler still defaults plannerThreshold to 5, so if setPlannerConfig() isn’t invoked
(e.g., tests/alternate wiring), planner triggering behavior diverges from the new default
threshold=4.
Code

aceclaw-daemon/src/main/java/dev/aceclaw/daemon/AceClawConfig.java[R78-99]

+     * Default complexity score for triggering the planner. Bumped
+     * from 5 → 4 (initially landed at 3, dialled back after review).
+     *
+     * <p>Threshold 5 required two explicit signals — most everyday
+     * agentic prompts hit at most one, so the planner essentially
+     * never fired. Threshold 3 went too far the other way: every
+     * single "refactor X" / "extract Y" (REFACTORING regex matches
+     * "extract" too) triggered a planner LLM call before any actual
+     * work, even on trivial prompts.
+     *
+     * <p>Threshold 4 is the middle ground: single-signal +3 prompts
+     * ("refactor X" alone) stay as plain ReAct, but adding ANY
+     * second signal (a long description, a second action, multiple
+     * files, testing, …) flips on planning. Users who explicitly
+     * want the planner on a borderline prompt can use
+     * {@code /plan <prompt>} as the escape hatch — that bypasses
+     * this heuristic entirely.
+     *
+     * <p>See {@link ComplexityEstimator} for the score table.
+     */
+    private static final int DEFAULT_PLANNER_THRESHOLD = 4;
    private static final boolean DEFAULT_ADAPTIVE_REPLAN_ENABLED = true;
Evidence
The system default threshold is now 4 (AceClawConfig + core estimator), but StreamingAgentHandler’s
internal default remains 5. Multiple daemon tests construct a StreamingAgentHandler without calling
setPlannerConfig, meaning they retain the internal default and can silently differ from production
wiring via AceClawDaemon.

aceclaw-daemon/src/main/java/dev/aceclaw/daemon/AceClawConfig.java[77-99]
aceclaw-core/src/main/java/dev/aceclaw/core/planner/ComplexityEstimator.java[16-27]
aceclaw-daemon/src/main/java/dev/aceclaw/daemon/StreamingAgentHandler.java[1741-1743]
aceclaw-daemon/src/main/java/dev/aceclaw/daemon/AceClawDaemon.java[542-563]
aceclaw-daemon/src/test/java/dev/aceclaw/daemon/AntiPatternGateOverrideRpcTest.java[20-31]
aceclaw-daemon/src/test/java/dev/aceclaw/daemon/DaemonIntegrationTest.java[109-116]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
`StreamingAgentHandler` has an internal default `plannerThreshold = 5`, which no longer matches the system default threshold (now 4). Any code path (especially tests) that constructs a handler but does not call `setPlannerConfig()` will use the stale default.

### Issue Context
In production, `AceClawDaemon` calls `agentHandler.setPlannerConfig(config.plannerEnabled(), config.plannerThreshold())`, but several tests instantiate `StreamingAgentHandler` without calling `setPlannerConfig()`.

### Fix Focus Areas
- aceclaw-daemon/src/main/java/dev/aceclaw/daemon/StreamingAgentHandler.java[1741-1743]
- aceclaw-daemon/src/main/java/dev/aceclaw/daemon/AceClawDaemon.java[542-563]
- aceclaw-daemon/src/main/java/dev/aceclaw/daemon/AceClawConfig.java[77-99]

### Suggested change
Set `StreamingAgentHandler`’s default `plannerThreshold` to 4 (to match config default), or refactor so the handler cannot be used without explicitly setting planner config (constructor param / required setter call).

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools



Advisory comments

3. Stale threshold test comment 🐞 Bug ⚙ Maintainability
Description
ComplexityEstimatorTest still says the threshold is “now 3”, but this PR changes the default
threshold to 4, making the test misleading for future maintainers.
Code

aceclaw-core/src/main/java/dev/aceclaw/core/planner/ComplexityEstimator.java[R18-27]

+     * {@code AceClawConfig.DEFAULT_PLANNER_THRESHOLD} — keep the
+     * two in sync. Settled at 4: too low (3) made every single
+     * "refactor X" / "extract Y" trigger a planner LLM call even on
+     * trivial prompts; too high (5) required two explicit signals
+     * which most prompts didn't hit. At 4, a single +3 signal alone
+     * stays as plain ReAct, but adding ANY second signal flips on
+     * planning.
     */
-    private static final int DEFAULT_THRESHOLD = 3;
+    private static final int DEFAULT_THRESHOLD = 4;
Evidence
The estimator’s default threshold is now 4, but the test file retains an outdated comment that
references 3.

aceclaw-core/src/main/java/dev/aceclaw/core/planner/ComplexityEstimator.java[16-27]
aceclaw-core/src/test/java/dev/aceclaw/core/planner/ComplexityEstimatorTest.java[12-18]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
A comment in `ComplexityEstimatorTest` states the threshold is “now 3”, but the default threshold was bumped to 4.

### Issue Context
This doesn’t affect runtime behavior, but it will confuse readers and can lead to incorrect future edits.

### Fix Focus Areas
- aceclaw-core/src/test/java/dev/aceclaw/core/planner/ComplexityEstimatorTest.java[12-18]
- aceclaw-core/src/main/java/dev/aceclaw/core/planner/ComplexityEstimator.java[16-27]

### Suggested change
Edit the comment to say “now 4” or remove the hardcoded number and refer to the default threshold more generally.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


Grey Divider

ⓘ You are approaching your monthly quota for Qodo. Upgrade your plan

Qodo Logo

@xinhuagu xinhuagu deleted the chore/planner-threshold-4 branch May 1, 2026 23:33
Comment on lines +78 to 99
* Default complexity score for triggering the planner. Bumped
* from 5 → 4 (initially landed at 3, dialled back after review).
*
* <p>Threshold 5 required two explicit signals — most everyday
* agentic prompts hit at most one, so the planner essentially
* never fired. Threshold 3 went too far the other way: every
* single "refactor X" / "extract Y" (REFACTORING regex matches
* "extract" too) triggered a planner LLM call before any actual
* work, even on trivial prompts.
*
* <p>Threshold 4 is the middle ground: single-signal +3 prompts
* ("refactor X" alone) stay as plain ReAct, but adding ANY
* second signal (a long description, a second action, multiple
* files, testing, …) flips on planning. Users who explicitly
* want the planner on a borderline prompt can use
* {@code /plan <prompt>} as the escape hatch — that bypasses
* this heuristic entirely.
*
* <p>See {@link ComplexityEstimator} for the score table.
*/
private static final int DEFAULT_PLANNER_THRESHOLD = 4;
private static final boolean DEFAULT_ADAPTIVE_REPLAN_ENABLED = true;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Action required

1. Plannerthreshold javadoc outdated 🐞 Bug ⚙ Maintainability

AceClawConfig.plannerThreshold() Javadoc still claims the default is 5 even though this PR changes
DEFAULT_PLANNER_THRESHOLD to 4, so public-facing documentation becomes incorrect and can cause wrong
tuning expectations.
Agent Prompt
### Issue description
`AceClawConfig.DEFAULT_PLANNER_THRESHOLD` is now 4, but the Javadoc for `plannerThreshold()` still says the default is 5.

### Issue Context
This mismatch is user-facing (API docs) and can lead to incorrect assumptions when configuring or debugging planner behavior.

### Fix Focus Areas
- aceclaw-daemon/src/main/java/dev/aceclaw/daemon/AceClawConfig.java[1022-1036]
- aceclaw-daemon/src/main/java/dev/aceclaw/daemon/AceClawConfig.java[77-99]

### Suggested change
Update the `plannerThreshold()` Javadoc to state the correct default (4), ideally referencing the constant (e.g., `Defaults to {@value #DEFAULT_PLANNER_THRESHOLD}.`).

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant