Deploy and Operate a Redis Cluster in Kubernetes

Why Redis Cluster?

Redis is an open source in-memory data structure store. It can be used as a distributed key-value database, cache and message broker. We use Redis primarily for caching and light-weight messaging between distributed components using its pubsub channels.

To make sure it runs reliably in production, it needs to be configured with the HA setup . It typically has two HA configurations:

  1. Redis Sentinel

The sentinel configuration has one master node, and multiple slave nodes replicating data from the master. The master node will handle write traffic, and all nodes can serve read traffic. Master node will be re-elected if the original master node is down. If your application’s caching memory requirements exceed system memory or is write heavy which need multiple nodes to maintain the write performance levels, Redis Cluster is where you should be looking.

2. Redis Cluster

Redis Cluster is configured to spread data across a given number of Redis instances. Data will be partitioned by the key, each data partition has a master node and a configured number of slaved nodes with the replicated data from the partition. Below is the high-level architecture diagram of a 3-master, 3-slave redis cluster.

In this article, we will be talking about how to provision and operate a Redis cluster in Kubernetes, as it will be the desired configuration for our use case.

Why not GCP MemoryStore/Redis?

We run our software stack in GCP, it should be a natural choice to use GCP managed Redis. But the GCP MemoryStore/Redis option has some significant limitations:

  • Only versions up to 5.0 is available
  • Only the active/standby HA setup is available, no clustering option.

The latest 6.0 Redis release has significant performance improvement in supporting multi-threaded IO. And we would prefer to use the Redis Cluster configuration in production. Therefore we choose to provision and maintain our own Redis cluster in Kubernetes. The cluster use small amount of memory (up to 6GB altogether) at it is light on CPU in our use case. We ended up deploying a version 6.0.x Redis cluster into our existing K8S cluster without incurring additional hardware cost.

Deploy Redis Cluster in Kubernetes

Assume you have helm chart client installed on your dev box and your K8S client has correctly configured to point to the target K8S cluster. You can run the following commands to deploy a Redis cluster in K8S redis namespace using the Bitnami helm chart with the vanilla configurations.

After the deployment, you should see all the Redis cluster components in the GKE console in the following screenshot . We will cover the customization of the helm chart values.yaml later.

Remember that the helm chart deployment will generate a random password for the Redis cluster. You can retrieve the password through the command line:

Customize values.yaml

Why customization? Because the default Redis cluster helm chart configurations might not be optimal for your use case.

Make a local copy of values.yaml from https://github.com/bitnami/charts/blob/master/bitnami/redis-cluster/values.yaml. You can modify the content in values.yaml and apply the config changes to the Redis cluster by running:

There are a lot of configurations can be customized in values.yaml. Below is the simple example of increasing the default # of nodes from 3 to 6 in the Redis cluster.

Access Redis Cluster through Redisinsight

Although some people are perfectly happy and productive to use redis-cli command line util to interact with the Redis cluster. I’ve found it more intuitive and productive to use the Web UI to achieve the same. There are a few open source web UIs available, but we have opted to use the redisinsight developed by RedisLab. The web UI can be deployed into K8S as a Deployment. Below is the slightly modified version from what’s provided from RedisLab official document. The main difference is that a PVC (persistent volume claim) has been added, so that the configurations won’t be lost due to restart:

Save the above YAML into redisinsight.yaml , deploy it into the K8S by running:

After the completion of the deployment, run the port forwarding:

Then you can access the redisinsight web UI by opening http://localhost:8001 in your web browser. You can click on Connect to a Redis Database button in the UI, the following popup window will show up:

The Host IP will be the redis cluster service’s IP value available in the K8S console. Port is the default redis port of 6379. Username default value is default. Name could be any name of your choice. Password need to be retrieved from the config map through kubectl command line as described in the previous section. After clicking on the ADD REDIS DATABASE button, it will prompt you to choose all or any one of the Redis cluster members as the seed node to connect to the cluster. You can choose all or any one of them. Once the connection configuration is done, you should be able to access a nice featured, fully functional web UI to view and manage the Redis cluster you’ve just installed.

As you can see in the above screen shot, there are 3 master and 3 slave nodes in the Redis cluster we have just provisioned. It also shows how many keys in each partition and how much memory is being used.

Automate Cluster Upgrade using CircleCI

Since we have committed the values.yaml for the Redis cluster in a github repository, we would like to automate the cluster upgrade through a CI/CD tool to avoid error-prone manual operations upon changes including:

  • New Redis image containing bug fixes or feature enhancement
  • New cluster configurations

We are using CircleCI as we have been using it to automate interactions with K8S clusters since the very beginning. Both image name and other configuration updates will result in a modification in the values.yaml file being merged in the master branch. And it will trigger the CircleCI to initiate a new deployment workflow:

In the above screenshot, we see a new workflow with Approval steps for staging and production environment to upgrade the corresponding Redis clusters.

Enable DataDog Monitoring

You will need to have the DataDog DaemonSet Agent installed in the target K8S cluster and have Redis Integration enabled in the DataDog console. Add the following pod annotation specific to DataDog monitoring in the values.yaml and redeploy through helm upgrade command as mentioned above:

Phantom Node IP upon Restart Issue

A few times after the Redis cluster restarted for maintenance, the cluster getting into a weird state. The redisinsightUI is having issue connecting to the cluster. It seems some node IP in the cluster is not accessible at all and causing random connection issue for the Redis client applications. And it looks like it’s a known problem as described in issue 4645. The stale IP is the old IP of one of the redis node before the restart. Created a local patch to the redis node startup command in redis-statefulset.yaml in the helm chart solved the issue:

The sed command was added to replace the IP in nodes.conf with the up-to-date local pod IP to make sure the the obsolete old IP is wiped out consistently.

After this change this issue did not come back again.

Redis Client Connection Issue upon Cluster Restart

The Redis client applications are using the Redis cluster’s service IP to connect to the cluster. The client application is a Java application which uses Redisson java client library to talk to the cluster. It looks like the client library use the service IP as the seed and get the IPs for all the nodes in the Redis cluster. After the completion of the initialization, the client will use individual node’s IP instead of the fronting service IP for further communication. If the Redis cluster has been restarted due to upgrade or other maintenance operations, the redis nodes in the cluster will be running on new IPs even though its service IP remains the same. The client applications end up holding connections with the old IP and running into redis connection errors.

To recover from this type of issues, we have added logic in the client applications to do the following when encountering Redis connection errors:

  • Reinitialize the Redis cluster connection through the service IP
  • Retry the failed Redis operations with specified retry count and interval
  • Emit the Redis connection errors as custom metrics to DataDog

After these changes, we were able to know when these error happens, and whether it’s a lasting issue or not. So far, we have seen these as transient, and the system quickly recovered from these type of errors.

Any other issues you have seen with Redis cluster running in K8S environment? I would love to hear and learn from you.

Founding Engineer @ Trace Data. Experienced software engineer, tech lead in building scalable distributed systems/data platforms.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store