fix(v3rpc): make lease keep-alive stream processing synchronous to prevent goroutine leak#21960
fix(v3rpc): make lease keep-alive stream processing synchronous to prevent goroutine leak#21960HarshalPatel1972 wants to merge 1 commit into
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: HarshalPatel1972 The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
Hi @HarshalPatel1972. Thanks for your PR. I'm waiting for a etcd-io member to verify that this patch is reasonable to test. If it is, they should reply with Regular contributors should join the org to skip this step. Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
2b75569 to
bd0d15a
Compare
…event goroutine leak (etcd-io#16942) Signed-off-by: Harshal Patel <hp842484@gmail.com>
Description
Fixes #16942
This PR resolves a persistent gRPC lease keep-alive stream background goroutine leak that occurs during unexpected client disconnects, network partitions, or rapid context cancellations.
Technical Root Cause
Previously,
LeaseKeepAlivedecoupled the RPC lifecycle from the actual processing engine by spinning upls.leaseKeepAlive(stream)inside an independent background goroutine (go func()), while the parent handler blocked on a separateselectblock listening to<-stream.Context().Done().When a network drop or context cancellation unblocked the parent
select, the parent handler exited immediately, signaling to gRPC that the stream was dead. However, the background processing thread remained permanently trapped and concurrently blocked onstream.Recv(). Because the stream context was already torn down, it could never return a cleanio.EOF, resulting in silent goroutine aggregation and massive heap memory creep under volatile networking workloads.Resolution Strategy
LeaseKeepAliveto synchronously execute and return the loop handler:return ls.leaseKeepAlive(stream).stream.Recv()is natively context-aware, collapsing the async goroutine architecture binds the execution thread directly to the lifespans of the underlying network frames, allowing the runtime to drain out naturally when a cancel sequence fires.Testing & Reproduction steps
TestLeaseKeepAlive_GoroutineLeakOnDisconnectinsideserver/etcdserver/api/v3rpc/lease_test.go.runtime.NumGoroutine()that the background reader exits instantly without leaving hanging execution contexts.