RabbitMQ unrestricted migration to Kubernetes

RabbitMQ unrestricted migration to Kubernetes




RabbitMQ is a message broker written in Erlang that allows you to organize a failover cluster with full data replication to several nodes, where each node can serve read and write requests. Having a lot of Kubernetes clusters in production-operation, we support a large number of RabbitMQ installations and are faced with the need to migrate data from one cluster to another without downtime.

We needed this operation in at least two cases:

  1. Transferring data from a RabbitMQ cluster that is not located in Kubernetes to a new one — already “corenated” (i.e., operating in K8s pods) —the cluster.
  2. RabbitMQ migration from Kubernetes from one namespace to another (for example, if the contours are delimited by namespaces, then to transfer the infrastructure from one contour to another).

The recipe proposed in the article is focused on situations (but not limited to them at all), in which there is an old RabbitMQ cluster (for example, from 3 nodes), located either in K8s or on some old servers. It works with an application placed in Kubernetes (already there or in the future):



... and we are faced with the task of migrating it to a new production in Kubernetes.

First, a general approach to the migration itself will be described, and after that - technical details on its implementation.

Migration Algorithm


The first, preliminary, stage before any action is to check that the old installation of RabbitMQ has high availability enabled ( HA ). The reason is obvious - we do not want to lose any data. To perform this check, you can go to the RabbitMQ admin panel and in the Admin → Policies tab make sure that the value is set to ha-mode: all :



The next step is to raise a new RabbitMQ cluster in Kubernetes pods (in our case, for example, consisting of 3 nodes, but their number may be different).

After that, we merge the old and new RabbitMQ clusters, getting a single cluster (out of 6 nodes):



The process of data synchronization between the old and new RabbitMQ clusters is initiated. After all data is synchronized between all nodes in the cluster, we can switch the application to use the new cluster:



After these operations, it is enough to remove the old nodes from the RabbitMQ cluster, and the move can be considered complete:



We have repeatedly used this scheme in production. However, for their own convenience, they implemented it as part of a specialized system that distributes typical RMQ configurations on Kubernetes cluster sets (for those who are curious: this is addon-operator , which we recently told ) . Below will be presented separately taken instructions that everyone can apply on their installations to try the proposed solution in action.

Trying to practice


Requirements


Requisites are very simple:

  1. Kubernetes cluster (minikube will do as well);
  2. RabbitMQ cluster (maybe deployed on bare metal, and made like a regular cluster in Kubernetes from the official Helm-chart).

For the example below, I deployed RMQ in Kubernetes and called it rmq-old .

Stand preparation


1. Download the Helm-chart and edit it a bit:

  helm fetch --untar stable/rabbitmq-ha  

For convenience, we set a password, ErlangCookie and make the policy ha-all so that by default the queues are synchronized between all nodes of the RMQ cluster:

  rabbitmqPassword: guest
 rabbitmqErlangCookie: mae9joopaol7aiVu3eechei2waiGa2we
 definitions:
 policies: | -
  {
  "name": "ha-all",
  "pattern": ". *",
  "vhost": "/",
  "definition": {
  "ha-mode": "all",
  "ha-sync-mode": "automatic",
  "ha-sync-batch-size": 81920
  }
  }  

2. Set the chart:

  helm install.  --name rmq-old --namespace rmq-old  

3. Go to the RabbitMQ admin area, create a new queue and add several messages. They will be needed so that after migration we can make sure that all the data has been saved and we have not lost anything:



The test bench is ready: we have an “old” RabbitMQ with data that needs to be transferred.

RabbitMQ Cluster Migration


1. First, let's deploy a new RabbitMQ in a other namespace with the same ErlangCookie and password for the user. To do this, we perform the operations described above, changing the final RMQ installation command to the following:

  helm install.  --name rmq-new --namespace rmq-new  

2. Now you need to combine the new cluster with the old one. To do this, go to each of the new RabbitMQ pods and execute the commands:

  export OLD_RMQ=rabbit@rmq-old-rabbitmq-ha-0.rmq-old-rabbitmq-ha-discovery.rmq-old.svc.cluster.local & &;  \
  rabbitmqctl stop_app & amp; & amp;  \
  rabbitmqctl join_cluster $ OLD_RMQ & amp; & amp;  \
  rabbitmqctl start_app  

The OLD_RMQ variable contains the address of one of the old nodes of the RMQ cluster.

These commands will stop the current new node of the RMQ cluster, attach it to the old cluster and restart it.

3. RMQ cluster of 6 nodes is ready:



You must wait until the messages are synchronized between all nodes. It is not difficult to guess that the time of message synchronization depends on the power of the iron in which the cluster is deployed, and on the number of messages. In the described scenario, there are only 10 of them, so the data was synchronized instantly, but with a sufficiently large number of messages, synchronization can take hours.

So, the synchronization status:



Here, +5 means that the messages are already still on 5 nodes (except what is indicated in the Node field). So the synchronization was successful.

4. It remains only to switch the address of the RMQ to the new cluster in the application (specific actions here depend on the technology stack you are using and other specifics of the application), after which you can say goodbye to the old one.

For the last operation (i.e., already after switching the application to the new cluster) go to each node of the old cluster and execute the commands:

  rabbitmqctl stop_app
 rabbitmqctl reset  

The cluster “forgot” about the old nodes: you can delete the old RMQ, on which the move will be completed.

Note : If you use RMQ with certificates, then nothing fundamentally changes - the process of moving will be carried out in the same way.

Findings


The described scheme is suitable for almost all cases when we need to move RabbitMQ or just move to a new cluster.

In our case, difficulties arose only once, when RMQ was contacted from many places, and we did not have the opportunity to change the address of RMQ everywhere to a new one. Then we launched the new RMQ in the same namespace with the same labels, so that it would fall under the already existing services and Ingress, and when the pod was started, hands were manipulating the labels, deleting them at the beginning, so that no queries would get on the empty RMQ, and adding them back after synchronizing messages.

We used the same strategy when upgrading RabbitMQ to a new version with a modified configuration - everything worked like a clock.

P.S.


As a logical continuation of this material, we are preparing articles about MongoDB (migration from the iron server to Kubernetes) and MySQL (how we prepare this DBMS inside Kubernetes). They will be published in the coming months.

P.P.S.


Read also in our blog:

Source text: RabbitMQ unrestricted migration to Kubernetes