You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
After Redis Replication pods restart (rolling update, eviction, OOM kill), sentinel replicas mymaster continues to return the old pod IPs with flags s_down,slave,disconnected. These stale entries are never cleaned up and accumulate over successive restarts.
This breaks any automation that relies on Sentinel for replica discovery (e.g., backup scripts using sentinel replicas to find a healthy replica to back up from).
Steps to Reproduce
Deploy RedisReplication with Sentinel enabled (3 replicas)
Note the current replica pod IPs
Trigger a rolling restart (kubectl rollout restart statefulset redis-replication)
After all pods are running with new IPs, query Sentinel:
ip
100.127.134.110 <-- old pod IP, no longer exists
flags
s_down,slave,disconnected
Meanwhile the actual replica at 100.127.186.202 may or may not appear depending on timing.
Root Cause
This is a downstream effect of #1791 — the operator uses ephemeral pod IPs for REPLICAOF commands. Sentinel learns replica addresses from the master's INFO replication output, which reports pod IPs. When pods restart with new IPs:
Old IP becomes unreachable → Sentinel marks it s_down,disconnected
Sentinel never removes the stale entry (by design — it waits indefinitely for the replica to return)
New pod gets a new IP → appears as a separate replica entry
Over time, the sentinel replicas list grows with stale entries from every restart
Expected Behavior
sentinel replicas mymaster should only return currently reachable replicas. After a pod restart, the old entry should be cleaned up automatically.
Periodic sentinel reset: The operator could periodically run SENTINEL RESET mymaster to force Sentinel to re-discover replicas, clearing stale entries. This is a workaround, not a fix.
Reconcile loop cleanup: During reconciliation, the operator could compare Sentinel's replica list against actual pod IPs and remove stale entries via SENTINEL RESET.
Description
After Redis Replication pods restart (rolling update, eviction, OOM kill),
sentinel replicas mymastercontinues to return the old pod IPs with flagss_down,slave,disconnected. These stale entries are never cleaned up and accumulate over successive restarts.This breaks any automation that relies on Sentinel for replica discovery (e.g., backup scripts using
sentinel replicasto find a healthy replica to back up from).Steps to Reproduce
kubectl rollout restart statefulset redis-replication)kubectl exec redis-replication-s-0 -n redis-replication -- redis-cli -p 26379 sentinel replicas mymasterMeanwhile the actual replica at
100.127.186.202may or may not appear depending on timing.Root Cause
This is a downstream effect of #1791 — the operator uses ephemeral pod IPs for
REPLICAOFcommands. Sentinel learns replica addresses from the master'sINFO replicationoutput, which reports pod IPs. When pods restart with new IPs:s_down,disconnectedsentinel replicaslist grows with stale entries from every restartExpected Behavior
sentinel replicas mymastershould only return currently reachable replicas. After a pod restart, the old entry should be cleaned up automatically.Possible Fixes
Use headless DNS for REPLICAOF (fixes root cause, same as Replicas use pod IPs for REPLICAOF instead of headless DNS — replication breaks after master pod restart #1791): If replicas use stable DNS names (
redis-replication-1.redis-replication-headless.svc), Sentinel tracks DNS names instead of IPs. Pod restarts don't change the DNS name, so no stale entries accumulate.Periodic
sentinel reset: The operator could periodically runSENTINEL RESET mymasterto force Sentinel to re-discover replicas, clearing stale entries. This is a workaround, not a fix.Reconcile loop cleanup: During reconciliation, the operator could compare Sentinel's replica list against actual pod IPs and remove stale entries via
SENTINEL RESET.Related Issues
Environment