Background
count_available_for_run() in crates/gossip-coordination-etcd/src/backend.rs currently performs a full prefix-scan of every shard in a run (scan_run_shards), re-decoding each shard record into a fresh Vec/HashMap/ByteSlab on every call. This function is invoked after every successful acquire_and_restore_into and renew operation, making those steady-state hot-path operations O(n) in shard count with significant per-call allocation churn.
This violates the project's allocation policy: HOT paths (per-shard/per-claim/per-tick steady-state loops) must remain allocation-silent where practical.
Intended Fix
Replace the full scan with incrementally maintained per-run CapacityHint counters that are updated whenever a shard's status or lease changes (acquire, renew, terminal transitions, etc.). The cached hint should be read directly from coordinator state in acquire/renew instead of calling count_available_for_run.
Design considerations:
- Counters must remain consistent across CAS retries (only committed on txn success).
- Lease revocations (best-effort) must not corrupt counts — use etcd-side TTL expiry detection on next read instead of decrementing on revoke.
- Future terminal transitions (complete, park, split) must update the same counters.
References
Requested by: @ahrav
Background
count_available_for_run()incrates/gossip-coordination-etcd/src/backend.rscurrently performs a full prefix-scan of every shard in a run (scan_run_shards), re-decoding each shard record into a freshVec/HashMap/ByteSlabon every call. This function is invoked after every successfulacquire_and_restore_intoandrenewoperation, making those steady-state hot-path operations O(n) in shard count with significant per-call allocation churn.This violates the project's allocation policy: HOT paths (per-shard/per-claim/per-tick steady-state loops) must remain allocation-silent where practical.
Intended Fix
Replace the full scan with incrementally maintained per-run
CapacityHintcounters that are updated whenever a shard's status or lease changes (acquire, renew, terminal transitions, etc.). The cached hint should be read directly from coordinator state in acquire/renew instead of callingcount_available_for_run.Design considerations:
References
Requested by: @ahrav