chore: bump planner threshold 3→4 (follow-up to #467) by xinhuagu · Pull Request #468 · xinhuagu/AceClaw

xinhuagu · 2026-05-01T23:30:44Z

Summary

Calibration follow-up to #467. Initial drop to 3 was too noisy — every single `refactor X` / `extract Y` triggered a planner LLM call before any actual work (the REFACTORING regex matches `extract` too, so single-line extractions also got caught). 4 is the middle ground:

Prompt	Score	T=3 (was)	T=4 (now)
"refactor X"	3	✓ plan	❌
"extract method"	3	✓ plan	❌
"do A and then B"	3	✓ plan	❌
"refactor X then add tests"	5	✓	✓
"investigate then refactor Y"	5	✓	✓
"rename across files"	3+2=5	✓	✓

Users who want planning on a borderline single-signal prompt use `/plan ` (also from #467) — that's the escape hatch that makes this trade-off OK.

Test plan

Updated `refactoring_singleSignalDoesNotPlan_atDefaultThreshold` (flipped assertion: `shouldPlan() == false`)
New `refactoring_plusSecondSignal_plans`: "refactor X and add tests" → score ≥ 4 → `shouldPlan() == true`
`./gradlew :aceclaw-core:test :aceclaw-daemon:test :aceclaw-cli:test` — all green

🤖 Generated with Claude Code

Initial drop to 3 in #467 went too far the other way: every single "refactor X" / "extract Y" hit threshold 3 and triggered a planner LLM call before any actual work, even on trivial prompts (REFACTORING regex matches "extract" too, so single-line extractions also got caught). 4 is the calibrated middle ground: - Threshold 5: needed two explicit signals → planner never fired - Threshold 3: any +3 signal alone triggered → too noisy - Threshold 4: single-signal +3 stays plain ReAct, +3 with ANY second signal (long description, second action, multiple files, testing, …) triggers planning Users who want planning on a borderline single-signal prompt now use the /plan slash command (also from #467) — which is exactly the escape hatch that makes this trade-off acceptable. Updated test: refactoring_singleSignalDoesNotPlan_atDefaultThreshold (was refactoring_highScore — flipped the shouldPlan assertion to false). New test refactoring_plusSecondSignal_plans pins the "+3 with anything else triggers" rule so future threshold tweaks have to update an explicit assertion rather than silently drift. All test suites green: aceclaw-core, aceclaw-cli, aceclaw-daemon. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

coderabbitai · 2026-05-01T23:30:49Z

Warning

Rate limit exceeded

@xinhuagu has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 40 minutes and 10 seconds before requesting another review.

To keep reviews running without waiting, you can enable usage-based add-on for your organization. This allows additional reviews beyond the hourly cap. Account admins can enable it under billing.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: e6d0cc49-618a-4ee8-bd84-0673cdb743cc

📥 Commits

Reviewing files that changed from the base of the PR and between a5482e6 and 70e5132.

📒 Files selected for processing (3)

aceclaw-core/src/main/java/dev/aceclaw/core/planner/ComplexityEstimator.java
aceclaw-core/src/test/java/dev/aceclaw/core/planner/ComplexityEstimatorTest.java
aceclaw-daemon/src/main/java/dev/aceclaw/daemon/AceClawConfig.java

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch chore/planner-threshold-4

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Review rate limit: 0/1 reviews remaining, refill in 40 minutes and 10 seconds.}

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

greptile-apps

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

qodo-code-review · 2026-05-01T23:30:57Z

Review Summary by Qodo

Bump planner threshold 3→4 to reduce false-positive planning triggers

✨ Enhancement

Walkthroughs

Description

• Bump planner complexity threshold from 3 to 4
• Single +3 signals no longer trigger planning automatically
• Adding any second signal to +3 prompt enables planning
• Updated tests to pin threshold behavior and prevent silent regressions

Diagram

flowchart LR
  A["Single +3 signal<br/>e.g. refactor X"] -->|"Threshold 3"| B["Plan triggered<br/>too noisy"]
  A -->|"Threshold 4"| C["Plain ReAct<br/>no LLM call"]
  D["Single +3 signal<br/>+ second signal<br/>e.g. refactor + tests"] -->|"Threshold 4"| E["Plan triggered<br/>appropriate"]
  F["/plan command"] -->|"Force planning"| E

File Changes

1. aceclaw-core/src/main/java/dev/aceclaw/core/planner/ComplexityEstimator.java ⚙️ Configuration changes +8/-5

Increase default complexity threshold to 4

• Bumped DEFAULT_THRESHOLD constant from 3 to 4
• Expanded javadoc explaining threshold calibration rationale
• Documented trade-off: threshold 3 too noisy, threshold 5 too strict
• Clarified that single +3 signals stay as ReAct, second signals trigger planning

aceclaw-core/src/main/java/dev/aceclaw/core/planner/ComplexityEstimator.java

2. aceclaw-core/src/test/java/dev/aceclaw/core/planner/ComplexityEstimatorTest.java 🧪 Tests +23/-7

Update tests to pin threshold 4 behavior

• Renamed refactoring_highScore() to refactoring_singleSignalDoesNotPlan_atDefaultThreshold()
• Flipped assertion from shouldPlan() == true to shouldPlan() == false
• Added new test refactoring_plusSecondSignal_plans() to verify second signal triggers planning
• Added detailed comments explaining threshold behavior and escape hatch via /plan command

aceclaw-core/src/test/java/dev/aceclaw/core/planner/ComplexityEstimatorTest.java

3. aceclaw-daemon/src/main/java/dev/aceclaw/daemon/AceClawConfig.java ⚙️ Configuration changes +21/-11

Update config threshold and calibration documentation

• Bumped DEFAULT_PLANNER_THRESHOLD constant from 3 to 4
• Expanded javadoc with detailed threshold calibration history and rationale
• Documented why threshold 5 was too strict and threshold 3 was too noisy
• Explained middle-ground approach and /plan command escape hatch

aceclaw-daemon/src/main/java/dev/aceclaw/daemon/AceClawConfig.java

ⓘ You are approaching your monthly quota for Qodo. Upgrade your plan

qodo-code-review · 2026-05-01T23:30:59Z

Code Review by Qodo

🐞 Bugs (3) 📘 Rule violations (0)

1. PlannerThreshold Javadoc outdated 🐞 Bug ⚙ Maintainability

Description

AceClawConfig.plannerThreshold() Javadoc still claims the default is 5 even though this PR changes
DEFAULT_PLANNER_THRESHOLD to 4, so public-facing documentation becomes incorrect and can cause wrong
tuning expectations.

Code

aceclaw-daemon/src/main/java/dev/aceclaw/daemon/AceClawConfig.java[R78-99]

+     * Default complexity score for triggering the planner. Bumped
+     * from 5 → 4 (initially landed at 3, dialled back after review).
+     *
+     * <p>Threshold 5 required two explicit signals — most everyday
+     * agentic prompts hit at most one, so the planner essentially
+     * never fired. Threshold 3 went too far the other way: every
+     * single "refactor X" / "extract Y" (REFACTORING regex matches
+     * "extract" too) triggered a planner LLM call before any actual
+     * work, even on trivial prompts.
+     *
+     * <p>Threshold 4 is the middle ground: single-signal +3 prompts
+     * ("refactor X" alone) stay as plain ReAct, but adding ANY
+     * second signal (a long description, a second action, multiple
+     * files, testing, …) flips on planning. Users who explicitly
+     * want the planner on a borderline prompt can use
+     * {@code /plan <prompt>} as the escape hatch — that bypasses
+     * this heuristic entirely.
+     *
+     * <p>See {@link ComplexityEstimator} for the score table.
+     */
+    private static final int DEFAULT_PLANNER_THRESHOLD = 4;
    private static final boolean DEFAULT_ADAPTIVE_REPLAN_ENABLED = true;

Evidence
This PR updates the actual default threshold constant to 4, but the accessor documentation in the
same class still states the default is 5, which is now false.
aceclaw-daemon/src/main/java/dev/aceclaw/daemon/AceClawConfig.java[77-99]
aceclaw-daemon/src/main/java/dev/aceclaw/daemon/AceClawConfig.java[1022-1036]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
`AceClawConfig.DEFAULT_PLANNER_THRESHOLD` is now 4, but the Javadoc for `plannerThreshold()` still says the default is 5.

### Issue Context
This mismatch is user-facing (API docs) and can lead to incorrect assumptions when configuring or debugging planner behavior.

### Fix Focus Areas
- aceclaw-daemon/src/main/java/dev/aceclaw/daemon/AceClawConfig.java[1022-1036]
- aceclaw-daemon/src/main/java/dev/aceclaw/daemon/AceClawConfig.java[77-99]

### Suggested change
Update the `plannerThreshold()` Javadoc to state the correct default (4), ideally referencing the constant (e.g., `Defaults to {@value #DEFAULT_PLANNER_THRESHOLD}.`).

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

2. Handler default threshold inconsistent 🐞 Bug ☼ Reliability

Description

StreamingAgentHandler still defaults plannerThreshold to 5, so if setPlannerConfig() isn’t invoked
(e.g., tests/alternate wiring), planner triggering behavior diverges from the new default
threshold=4.

Code

aceclaw-daemon/src/main/java/dev/aceclaw/daemon/AceClawConfig.java[R78-99]

+     * Default complexity score for triggering the planner. Bumped
+     * from 5 → 4 (initially landed at 3, dialled back after review).
+     *
+     * <p>Threshold 5 required two explicit signals — most everyday
+     * agentic prompts hit at most one, so the planner essentially
+     * never fired. Threshold 3 went too far the other way: every
+     * single "refactor X" / "extract Y" (REFACTORING regex matches
+     * "extract" too) triggered a planner LLM call before any actual
+     * work, even on trivial prompts.
+     *
+     * <p>Threshold 4 is the middle ground: single-signal +3 prompts
+     * ("refactor X" alone) stay as plain ReAct, but adding ANY
+     * second signal (a long description, a second action, multiple
+     * files, testing, …) flips on planning. Users who explicitly
+     * want the planner on a borderline prompt can use
+     * {@code /plan <prompt>} as the escape hatch — that bypasses
+     * this heuristic entirely.
+     *
+     * <p>See {@link ComplexityEstimator} for the score table.
+     */
+    private static final int DEFAULT_PLANNER_THRESHOLD = 4;
    private static final boolean DEFAULT_ADAPTIVE_REPLAN_ENABLED = true;

Evidence
The system default threshold is now 4 (AceClawConfig + core estimator), but StreamingAgentHandler’s
internal default remains 5. Multiple daemon tests construct a StreamingAgentHandler without calling
setPlannerConfig, meaning they retain the internal default and can silently differ from production
wiring via AceClawDaemon.
aceclaw-daemon/src/main/java/dev/aceclaw/daemon/AceClawConfig.java[77-99]
aceclaw-core/src/main/java/dev/aceclaw/core/planner/ComplexityEstimator.java[16-27]
aceclaw-daemon/src/main/java/dev/aceclaw/daemon/StreamingAgentHandler.java[1741-1743]
aceclaw-daemon/src/main/java/dev/aceclaw/daemon/AceClawDaemon.java[542-563]
aceclaw-daemon/src/test/java/dev/aceclaw/daemon/AntiPatternGateOverrideRpcTest.java[20-31]
aceclaw-daemon/src/test/java/dev/aceclaw/daemon/DaemonIntegrationTest.java[109-116]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
`StreamingAgentHandler` has an internal default `plannerThreshold = 5`, which no longer matches the system default threshold (now 4). Any code path (especially tests) that constructs a handler but does not call `setPlannerConfig()` will use the stale default.

### Issue Context
In production, `AceClawDaemon` calls `agentHandler.setPlannerConfig(config.plannerEnabled(), config.plannerThreshold())`, but several tests instantiate `StreamingAgentHandler` without calling `setPlannerConfig()`.

### Fix Focus Areas
- aceclaw-daemon/src/main/java/dev/aceclaw/daemon/StreamingAgentHandler.java[1741-1743]
- aceclaw-daemon/src/main/java/dev/aceclaw/daemon/AceClawDaemon.java[542-563]
- aceclaw-daemon/src/main/java/dev/aceclaw/daemon/AceClawConfig.java[77-99]

### Suggested change
Set `StreamingAgentHandler`’s default `plannerThreshold` to 4 (to match config default), or refactor so the handler cannot be used without explicitly setting planner config (constructor param / required setter call).

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

3. Stale threshold test comment 🐞 Bug ⚙ Maintainability

Description

ComplexityEstimatorTest still says the threshold is “now 3”, but this PR changes the default
threshold to 4, making the test misleading for future maintainers.

Code

aceclaw-core/src/main/java/dev/aceclaw/core/planner/ComplexityEstimator.java[R18-27]

+     * {@code AceClawConfig.DEFAULT_PLANNER_THRESHOLD} — keep the
+     * two in sync. Settled at 4: too low (3) made every single
+     * "refactor X" / "extract Y" trigger a planner LLM call even on
+     * trivial prompts; too high (5) required two explicit signals
+     * which most prompts didn't hit. At 4, a single +3 signal alone
+     * stays as plain ReAct, but adding ANY second signal flips on
+     * planning.
     */
-    private static final int DEFAULT_THRESHOLD = 3;
+    private static final int DEFAULT_THRESHOLD = 4;

Evidence
The estimator’s default threshold is now 4, but the test file retains an outdated comment that
references 3.
aceclaw-core/src/main/java/dev/aceclaw/core/planner/ComplexityEstimator.java[16-27]
aceclaw-core/src/test/java/dev/aceclaw/core/planner/ComplexityEstimatorTest.java[12-18]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
A comment in `ComplexityEstimatorTest` states the threshold is “now 3”, but the default threshold was bumped to 4.

### Issue Context
This doesn’t affect runtime behavior, but it will confuse readers and can lead to incorrect future edits.

### Fix Focus Areas
- aceclaw-core/src/test/java/dev/aceclaw/core/planner/ComplexityEstimatorTest.java[12-18]
- aceclaw-core/src/main/java/dev/aceclaw/core/planner/ComplexityEstimator.java[16-27]

### Suggested change
Edit the comment to say “now 4” or remove the hardcoded number and refer to the default threshold more generally.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

ⓘ You are approaching your monthly quota for Qodo. Upgrade your plan

qodo-code-review · 2026-05-01T23:33:31Z

+     * Default complexity score for triggering the planner. Bumped
+     * from 5 → 4 (initially landed at 3, dialled back after review).
+     *
+     * <p>Threshold 5 required two explicit signals — most everyday
+     * agentic prompts hit at most one, so the planner essentially
+     * never fired. Threshold 3 went too far the other way: every
+     * single "refactor X" / "extract Y" (REFACTORING regex matches
+     * "extract" too) triggered a planner LLM call before any actual
+     * work, even on trivial prompts.
+     *
+     * <p>Threshold 4 is the middle ground: single-signal +3 prompts
+     * ("refactor X" alone) stay as plain ReAct, but adding ANY
+     * second signal (a long description, a second action, multiple
+     * files, testing, …) flips on planning. Users who explicitly
+     * want the planner on a borderline prompt can use
+     * {@code /plan <prompt>} as the escape hatch — that bypasses
+     * this heuristic entirely.
+     *
+     * <p>See {@link ComplexityEstimator} for the score table.
+     */
+    private static final int DEFAULT_PLANNER_THRESHOLD = 4;
    private static final boolean DEFAULT_ADAPTIVE_REPLAN_ENABLED = true;


1. Plannerthreshold javadoc outdated 🐞 Bug ⚙ Maintainability

AceClawConfig.plannerThreshold() Javadoc still claims the default is 5 even though this PR changes DEFAULT_PLANNER_THRESHOLD to 4, so public-facing documentation becomes incorrect and can cause wrong tuning expectations.

Agent Prompt

### Issue description `AceClawConfig.DEFAULT_PLANNER_THRESHOLD` is now 4, but the Javadoc for `plannerThreshold()` still says the default is 5. ### Issue Context This mismatch is user-facing (API docs) and can lead to incorrect assumptions when configuring or debugging planner behavior. ### Fix Focus Areas - aceclaw-daemon/src/main/java/dev/aceclaw/daemon/AceClawConfig.java[1022-1036] - aceclaw-daemon/src/main/java/dev/aceclaw/daemon/AceClawConfig.java[77-99] ### Suggested change Update the `plannerThreshold()` Javadoc to state the correct default (4), ideally referencing the constant (e.g., `Defaults to {@value #DEFAULT_PLANNER_THRESHOLD}.`).

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

greptile-apps Bot reviewed May 1, 2026

View reviewed changes

xinhuagu merged commit 747a38a into main May 1, 2026
7 of 8 checks passed

xinhuagu deleted the chore/planner-threshold-4 branch May 1, 2026 23:33

qodo-code-review Bot reviewed May 1, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: bump planner threshold 3→4 (follow-up to #467)#468

chore: bump planner threshold 3→4 (follow-up to #467)#468
xinhuagu merged 1 commit into
mainfrom
chore/planner-threshold-4

xinhuagu commented May 1, 2026

Uh oh!

coderabbitai Bot commented May 1, 2026

Rate limit exceeded

Uh oh!

greptile-apps Bot left a comment

Uh oh!

Uh oh!

qodo-code-review Bot commented May 1, 2026

Uh oh!

qodo-code-review Bot commented May 1, 2026 •

edited

Loading

Uh oh!

qodo-code-review Bot May 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

xinhuagu commented May 1, 2026

Summary

Test plan

Uh oh!

coderabbitai Bot commented May 1, 2026

Rate limit exceeded

Uh oh!

greptile-apps Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

qodo-code-review Bot commented May 1, 2026

Review Summary by Qodo

Walkthroughs

File Changes

Uh oh!

qodo-code-review Bot commented May 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review by Qodo

Uh oh!

qodo-code-review Bot May 1, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

qodo-code-review Bot commented May 1, 2026 •

edited

Loading