Skip to content

Sentinel accumulates stale replica entries after pod restarts — sentinel replicas returns disconnected IPs #1793

@rmalisetti

Description

@rmalisetti

Description

After Redis Replication pods restart (rolling update, eviction, OOM kill), sentinel replicas mymaster continues to return the old pod IPs with flags s_down,slave,disconnected. These stale entries are never cleaned up and accumulate over successive restarts.

This breaks any automation that relies on Sentinel for replica discovery (e.g., backup scripts using sentinel replicas to find a healthy replica to back up from).

Steps to Reproduce

  1. Deploy RedisReplication with Sentinel enabled (3 replicas)
  2. Note the current replica pod IPs
  3. Trigger a rolling restart (kubectl rollout restart statefulset redis-replication)
  4. After all pods are running with new IPs, query Sentinel:
kubectl exec redis-replication-s-0 -n redis-replication -- redis-cli -p 26379 sentinel replicas mymaster
  1. Output includes stale entries like:
ip
100.127.134.110        <-- old pod IP, no longer exists
flags
s_down,slave,disconnected

Meanwhile the actual replica at 100.127.186.202 may or may not appear depending on timing.

Root Cause

This is a downstream effect of #1791 — the operator uses ephemeral pod IPs for REPLICAOF commands. Sentinel learns replica addresses from the master's INFO replication output, which reports pod IPs. When pods restart with new IPs:

  1. Old IP becomes unreachable → Sentinel marks it s_down,disconnected
  2. Sentinel never removes the stale entry (by design — it waits indefinitely for the replica to return)
  3. New pod gets a new IP → appears as a separate replica entry
  4. Over time, the sentinel replicas list grows with stale entries from every restart

Expected Behavior

sentinel replicas mymaster should only return currently reachable replicas. After a pod restart, the old entry should be cleaned up automatically.

Possible Fixes

  1. Use headless DNS for REPLICAOF (fixes root cause, same as Replicas use pod IPs for REPLICAOF instead of headless DNS — replication breaks after master pod restart #1791): If replicas use stable DNS names (redis-replication-1.redis-replication-headless.svc), Sentinel tracks DNS names instead of IPs. Pod restarts don't change the DNS name, so no stale entries accumulate.

  2. Periodic sentinel reset: The operator could periodically run SENTINEL RESET mymaster to force Sentinel to re-discover replicas, clearing stale entries. This is a workaround, not a fix.

  3. Reconcile loop cleanup: During reconciliation, the operator could compare Sentinel's replica list against actual pod IPs and remove stale entries via SENTINEL RESET.

Related Issues

Environment

  • redis-operator: latest (main branch)
  • Redis: 8.0
  • Kubernetes: 1.32
  • Sentinel enabled with 3 sentinel pods

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions