Deploy and Operate a Redis Cluster in Kubernetes
Why Redis Cluster?
Redis is an open source in-memory data structure store. It can be used as a distributed key-value database, cache and message broker. We use Redis primarily for caching and light-weight messaging between distributed components using its pubsub channels.
To make sure it runs reliably in production, it needs to be configured with the HA setup . It typically has two HA configurations:
- Redis Sentinel
The sentinel configuration has one master node, and multiple slave nodes replicating data from the master. The master node will handle write traffic, and all nodes can serve read traffic. Master node will be re-elected if the original master node is down. If your application’s caching memory requirements exceed system memory or is write heavy which need multiple nodes to maintain the write performance levels, Redis Cluster is where you should be looking.
2. Redis Cluster
Redis Cluster is configured to spread data across a given number of Redis instances. Data will be partitioned by the key, each data partition has a master node and a configured number of slaved nodes with the replicated data from the partition. Below is the high-level architecture diagram of a 3-master, 3-slave redis cluster.

In this article, we will be talking about how to provision and operate a Redis cluster in Kubernetes, as it will be the desired configuration for our use case.
Why not GCP MemoryStore/Redis?
We run our software stack in GCP, it should be a natural choice to use GCP managed Redis. But the GCP MemoryStore/Redis option has some significant limitations:
- Only versions up to 5.0 is available
- Only the active/standby HA setup is available, no clustering option.
The latest 6.0 Redis release has significant performance improvement in supporting multi-threaded IO. And we would prefer to use the Redis Cluster configuration in production. Therefore we choose to provision and maintain our own Redis cluster in Kubernetes. The cluster use small amount of memory (up to 6GB altogether) at it is light on CPU in our use case. We ended up deploying a version 6.0.x Redis cluster into our existing K8S cluster without incurring additional hardware cost.
Deploy Redis Cluster in Kubernetes
Assume you have helm chart client installed on your dev box and your K8S client has correctly configured to point to the target K8S cluster. You can run the following commands to deploy a Redis cluster in K8S redis
namespace using the Bitnami helm chart with the vanilla configurations.
helm repo add bitnami https://charts.bitnami.com/bitnamihelm install -n redis staging bitnami/redis-cluster
After the deployment, you should see all the Redis cluster components in the GKE console in the following screenshot . We will cover the customization of the helm chart values.yaml later.

Remember that the helm chart deployment will generate a random password for the Redis cluster. You can retrieve the password through the command line:
export REDIS_PASSWORD=$(kubectl get secret --namespace redis staging-redis-cluster -o jsonpath="{.data.redis-password}" | base64 --decode)
Customize values.yaml
Why customization? Because the default Redis cluster helm chart configurations might not be optimal for your use case.
Make a local copy of values.yaml from https://github.com/bitnami/charts/blob/master/bitnami/redis-cluster/values.yaml. You can modify the content in values.yaml
and apply the config changes to the Redis cluster by running:
helm upgrade -n redis -f values.yaml staging
There are a lot of configurations can be customized in values.yaml
. Below is the simple example of increasing the default # of nodes from 3 to 6 in the Redis cluster.
## Redis Cluster settings
cluster:
init: true
nodes: 6
replicas: 1
Access Redis Cluster through Redisinsight
Although some people are perfectly happy and productive to use redis-cli
command line util to interact with the Redis cluster. I’ve found it more intuitive and productive to use the Web UI to achieve the same. There are a few open source web UIs available, but we have opted to use the redisinsight
developed by RedisLab. The web UI can be deployed into K8S as a Deployment
. Below is the slightly modified version from what’s provided from RedisLab official document. The main difference is that a PVC (persistent volume claim) has been added, so that the configurations won’t be lost due to restart:
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: redisinsight-pv-claim
labels:
app: redisinsight
namespace: redis
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: redisinsight
namespace: redis
labels:
app: redisinsight
spec:
replicas: 1
selector:
matchLabels:
app: redisinsight
template:
metadata:
labels:
app: redisinsight
spec:
containers:
- name: redisinsight
image: redislabs/redisinsight:1.9.0
imagePullPolicy: IfNotPresent
securityContext:
runAsUser: 0
volumeMounts:
- name: db
mountPath: /db
ports:
- containerPort: 8001
protocol: TCP
volumes:
- name: db
persistentVolumeClaim:
claimName: redisinsight-pv-claim
Save the above YAML into redisinsight.yaml
, deploy it into the K8S by running:
kubectl apply -f redisinsight.yaml
After the completion of the deployment, run the port forwarding:
kubectl port-forward deployment/redisinsight -n redis 8001
Then you can access the redisinsight
web UI by opening http://localhost:8001 in your web browser. You can click on Connect to a Redis Database
button in the UI, the following popup window will show up:

The Host
IP will be the redis cluster service’s IP value available in the K8S console. Port
is the default redis port of 6379
. Username
default value is default
. Name
could be any name of your choice. Password
need to be retrieved from the config map through kubectl
command line as described in the previous section. After clicking on the ADD REDIS DATABASE
button, it will prompt you to choose all or any one of the Redis cluster members as the seed node to connect to the cluster. You can choose all or any one of them. Once the connection configuration is done, you should be able to access a nice featured, fully functional web UI to view and manage the Redis cluster you’ve just installed.

As you can see in the above screen shot, there are 3 master and 3 slave nodes in the Redis cluster we have just provisioned. It also shows how many keys in each partition and how much memory is being used.
Automate Cluster Upgrade using CircleCI
Since we have committed the values.yaml
for the Redis cluster in a github repository, we would like to automate the cluster upgrade through a CI/CD tool to avoid error-prone manual operations upon changes including:
- New Redis image containing bug fixes or feature enhancement
- New cluster configurations
We are using CircleCI as we have been using it to automate interactions with K8S clusters since the very beginning. Both image name and other configuration updates will result in a modification in the values.yaml
file being merged in the master branch. And it will trigger the CircleCI to initiate a new deployment workflow:

In the above screenshot, we see a new workflow with Approval
steps for staging and production environment to upgrade the corresponding Redis clusters.
Enable DataDog Monitoring
You will need to have the DataDog DaemonSet Agent installed in the target K8S cluster and have Redis Integration
enabled in the DataDog console. Add the following pod annotation specific to DataDog monitoring in the values.yaml
and redeploy through helm upgrade
command as mentioned above:
podAnnotations:
ad.datadoghq.com/redis.check_names: '["redisdb"]'
ad.datadoghq.com/redis.init_configs: '[{}]'
ad.datadoghq.com/redis.instances: '[{"host":"%%host%%", "port":"6379", "password":"%%env_REDIS_PASSWORD%%"}]'
Phantom Node IP upon Restart Issue
A few times after the Redis cluster restarted for maintenance, the cluster getting into a weird state. The redisinsight
UI is having issue connecting to the cluster. It seems some node IP in the cluster is not accessible at all and causing random connection issue for the Redis client applications. And it looks like it’s a known problem as described in issue 4645. The stale IP is the old IP of one of the redis node before the restart. Created a local patch to the redis node startup command in redis-statefulset.yaml
in the helm chart solved the issue:
args:
# ......
local_pod_ip=$(hostname -i)
sed -i -e '/myself/ s/[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,4\}:/'${local_pod_ip}':/g' /bitnami/redis/data/nodes.conf # Actual command to start the Redis service
/opt/bitnami/scripts/redis-cluster/entrypoint.sh /opt/bitnami/scripts/redis-cluster/run.sh
The sed
command was added to replace the IP in nodes.conf
with the up-to-date local pod IP to make sure the the obsolete old IP is wiped out consistently.
After this change this issue did not come back again.
Redis Client Connection Issue upon Cluster Restart
The Redis client applications are using the Redis cluster’s service IP to connect to the cluster. The client application is a Java application which uses Redisson
java client library to talk to the cluster. It looks like the client library use the service IP as the seed and get the IPs for all the nodes in the Redis cluster. After the completion of the initialization, the client will use individual node’s IP instead of the fronting service IP for further communication. If the Redis cluster has been restarted due to upgrade or other maintenance operations, the redis nodes in the cluster will be running on new IPs even though its service IP remains the same. The client applications end up holding connections with the old IP and running into redis connection errors.
To recover from this type of issues, we have added logic in the client applications to do the following when encountering Redis connection errors:
- Reinitialize the Redis cluster connection through the service IP
- Retry the failed Redis operations with specified retry count and interval
- Emit the Redis connection errors as custom metrics to DataDog
After these changes, we were able to know when these error happens, and whether it’s a lasting issue or not. So far, we have seen these as transient, and the system quickly recovered from these type of errors.
Any other issues you have seen with Redis cluster running in K8S environment? I would love to hear and learn from you.