Skip to content

perf: replace O(n) full-run scan in count_available_for_run with incremental CapacityHint counters #138

@coderabbitai

Description

@coderabbitai

Background

count_available_for_run() in crates/gossip-coordination-etcd/src/backend.rs currently performs a full prefix-scan of every shard in a run (scan_run_shards), re-decoding each shard record into a fresh Vec/HashMap/ByteSlab on every call. This function is invoked after every successful acquire_and_restore_into and renew operation, making those steady-state hot-path operations O(n) in shard count with significant per-call allocation churn.

This violates the project's allocation policy: HOT paths (per-shard/per-claim/per-tick steady-state loops) must remain allocation-silent where practical.

Intended Fix

Replace the full scan with incrementally maintained per-run CapacityHint counters that are updated whenever a shard's status or lease changes (acquire, renew, terminal transitions, etc.). The cached hint should be read directly from coordinator state in acquire/renew instead of calling count_available_for_run.

Design considerations:

  • Counters must remain consistent across CAS retries (only committed on txn success).
  • Lease revocations (best-effort) must not corrupt counts — use etcd-side TTL expiry detection on next read instead of decrementing on revoke.
  • Future terminal transitions (complete, park, split) must update the same counters.

References

Requested by: @ahrav

Metadata

Metadata

Assignees

Labels

No labels
No labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions