How we built a reliable PostgreSQL cluster on Patroni

How we built a reliable PostgreSQL cluster on Patroni




Today, high availability of services is required always and everywhere, not only in large expensive projects. Temporarily inaccessible sites with the message “Sorry, maintenance is being carried out” are still encountered, but they usually cause a condescending smile. Add to this the life in the clouds, when to start an additional server you need only one call to the API, and you don’t need to think about the "iron" operation. And there are no more excuses for why the critical system was not made reliably using cluster technologies and redundancy.

We will tell you what solutions we considered to ensure the reliability of the databases in our services and what we came up with. Plus a demo with far-reaching conclusions.

Legacy in High Availability Architecture


This is even better seen in the context of the development of various opensource-systems. Older solutions had to add high availability technologies as demand increased. And their quality was different. New generation solutions put high availability at the heart of their architecture. For example, MongoDB positions the cluster as the primary use case. The cluster scales horizontally, which is a strong competitive advantage of this DBMS.

Let's go back to PostgreSQL. This is one of the oldest popular opensource projects, the first release of which took place in the 95th year of the last century. The project team for a long time did not consider high availability a task that needs to be addressed by the system. Therefore, the replication technology for creating copies of data became built-in only in version 8.2 in 2006, but it was file (log shipping). In 2010, version 9.0 introduced streaming replication, and it is the basis for creating a wide variety of clusters. This, in fact, is very surprising to people who get acquainted with PostgreSQL after Enterprise SQL or modern NoSQL - the standard solution from the community is just a pair of master-replica with synchronous or asynchronous replication. At the same time, the master is switched manually in stock, and the client switching issue is also proposed to be solved independently.

How we decided to make reliable PostgreSQL and what we have chosen for this


However, PostgreSQL would not have become so popular if there were not a huge number of projects and tools that help build a fault-tolerant solution that does not require constant attention. In the Mail.ru Cloud Solutions (MCS) cloud, from the very start of DBaaS single PostgreSQL servers and master replica pairs with asynchronous replication were available. < br/>
Naturally, we wanted to simplify life for everyone and make such an installation of PostgreSQL available, which could serve as the basis for high-availability services, which wouldn’t be necessary to constantly monitor and wake up at night to make the switch. In this segment, there are old proven solutions, and the generation of new utilities that use the latest developments.

Today, the problem of high availability rests not on the reservation (this by itself), but on the consensus - the algorithm chosen by the leader (Leader election). Most often, major crashes do not occur because of a shortage of servers, but because of problems with consensus: a new leader did not emerge, two leaders appeared in different data centers, etc. An example is a crash on a Github MySQL cluster — they wrote a detailed post mortem .

The mathematical base in this matter is very serious. On the one hand, there is the CAP theorem , which imposes theoretical restrictions on the ability to build HA solutions, on the other hand, mathematically proven definition algorithms consensus, such as Paxos and Raft . On this basis, there are quite popular DCS (decentralized consensus systems) - Zookeeper, etcd, Consul. Therefore, if a decision system works on some of its own algorithms, written independently, you should be very careful about it. After analyzing a huge number of systems, we stopped at Patroni - opensource-system mainly developed by Zalando.

As a lyrical digression, I will say that we also considered multi-master solutions, that is, clusters that can be horizontally scaled to write. However, for two main reasons, they decided not to make such a cluster. First, such solutions have high complexity and, therefore, more vulnerabilities. It will be hard to make a stable solution for all cases. Secondly, in this case, PostgreSQL ceases to be native (native), some functions will be unavailable, some applications may have hidden bugs when working.

Patroni


So how does Patroni work? The developers did not reinvent the wheel and suggested using one of the proven DCS solutions. At the mercy of him all questions are given with the synchronization of configurations, the choice of the leader and a quorum. We have chosen for this etcd.

Next, Patroni deals with the correct application of all settings on PostgreSQL and replication settings, as well as the execution of commands on switchover and failover (that is, regular and abnormal master switching). Specifically, in the MCS cloud, you can create a cluster of a master, a synchronous replica, and one or more asynchronous replicas. The presence of a synchronous replica ensures data integrity on at least 2 servers, and it is this replica that will be the main “master candidate”.

Since etcd is deployed on the same servers, the recommended number of servers is 3 or 5 for the optimal quorum value. Such a cluster is scaled horizontally for reading (I wrote about scaling per record above). However, it should be borne in mind that asynchronous replicas tend to lag, especially under high loads.

Using such replicas for reading (hot standby) is reasonable for reporting or analytics tasks and offloads the master server.

If you want to make such a cluster yourself, you will need:

  • prepare 3 or more servers, configure IP addressing and firewall rules between them;
  • install packages for etcd, Patroni, PostgreSQL services;
  • set up etcd cluster;
  • Configure the patroni service to work with PostgreSQL.

That is, in total, you need to correctly compile a dozen configuration files and never make a mistake. For this, it is definitely worth using a configuration management tool, such as Ansible, for example. In this case, there is still no high-available TCP balancer. Make it a separate job.

For those who need a ready cluster, but do not want to poke around in all of this, we tried to simplify life and made a ready cluster on Patroni in our cloud, it can be tested for free. In addition to the cluster itself, we did:

  • TCP balancer; on different ports, it always points to the current master, synchronous or asynchronous replica, respectively;
  • API to switch the active Patroni wizard.

They can be connected via both the MCS cloud API and the web console.

Demo


To test the capabilities of the PostgreSQL cluster in the MCS cloud, let's see how a live application behaves when it comes to problems with a DBMS.

The following is the application code that will log artificial events and report it to the screen. In case of errors, it will report this and continue its work in the cycle until we stop it with the Ctrl + C combination.

  from __future__ import print_function

 from datetime import datetime
 from random import randint
 from time import sleep
 import psycopg2


 def main ():
  try:
  connection = psycopg2.connect (user = "admin",
  password = "P @ ssw0rd",
  host = "89.208.87.38",
  port = "5432",
  database = "myproddb")

  cursor = connection.cursor ()
  cursor.execute ("SELECT version ();")
  record = cursor.fetchone ()
  print ("Connection opened to", record [0])

  cursor.execute (
  "INSERT INTO log VALUES ({});". Format (randint (1, 10000)))
  connection.commit ()
  cursor.execute ("SELECT COUNT (event_id) from log;")
  record = cursor.fetchone ()
  print ("Logged a value, overall count: {}". format (record [0]))
  except Exception as error:
  print ("Error while connecting to PostgreSQL", error)
  finally:
  if connection:
  cursor.close ()
  connection.close ()
  print ("Connection closed")


 if __name__ == '__main__':
  try:
  while true:
  try:
  print (datetime.now ())
  main ()
  sleep (3)
  except Exception as e:
  print ("Caught error: \ n", e)
  sleep (1)
  except KeyboardInterrupt:
  print ("exit")
  

The application needs PostgreSQL to work. Create a cluster in the MCS cloud using the API. In a standard terminal, where the OS_TOKEN variable contains a token for access to the API (can be obtained with the openstack token issue command), we type the following commands:

Creating a cluster:

  cat & lt; & lt; EОF & gt;  pgc10.json
 {"cluster": {"name": "postgres10", "allow_remote_access": true, "datastore": {"type": "postgresql", "version": "10"}, "databases": [{"name  ":" myproddb "}]," users ": [{" databases ": [{" name ":" myproddb "}]," name ":" admin "," password ":" P @ ssw0rd "}],  "instances": [{"key_name": "shared", "availability_zone": "DP1", "flavorRef": "d659fa16-c7fb-42cf-8a5e-9bcbe80a7538", "nics": [{"net-id":  "b91eafed-12b1-4a46-b000-3984c7e01599"}], "volume": {"size": 50, "type": "DP1"}}, {"key_name": "shared", "availability_zone": "DP1  "," flavorRef ":" d659fa16-c7fb-42cf-8a5e-9bcbe80a7538 "," nics ": [{" net-id ":" b91eafed-12b1-4a46-b000-3984c7e01599 "}]," volume ": {"  size ": 50," type ":" DP1 "}}, {" key_name ":" shared "," availability_zone ":" DP1 "," flavorRef ":" d659fa16-c7fb-42cf-8a5e-9bcbe80a7538 "," nics  ": [{" net-id ":" b91eafed-12b1-4a46-b000-3984c7e01599 "}]," volume ": {" size ": 50," type ":" DP1 "}}]}}
 EOF

 curl -s -H "X-Auth-Token: $ OS_TOKEN" \
 -H 'Accept: application/json' \
 -H 'Content-Type: application/json' \
 -d @ pgc10.json https://infra.mail.ru:8779/v1.0/ce2a41bbd1434013b85bdf0ba07c770f/clusters
  



When the cluster goes into ACTIVE status, all fields will receive the current values ​​- the cluster is ready.

In GUI:



Let's try to connect and create a table:

  psql -h 89.208.87.38 -U admin -d myproddb
 Password for user admin:
 psql (11.1, server 10.7)
 Type "help" for help.

 myproddb = & gt;  CREATE TABLE log (event_id integer NOT NULL);
 CREATE TABLE
 myproddb = & gt;  INSERT INTO log VALUES (1), (2), (3);
 INSERT 0 3
 myproddb = & gt;  SELECT * FROM log;
  event_id
 ----------
  one
  2
  3
 (3 rows)

 myproddb = & gt;
  



In the application, we indicate the actual settings for connecting to PostgreSQL. We will specify the address of the TCP balancer, thereby eliminating the need for manual switching to the master's address. Run it. As you can see, events are successfully logged into the database.



Scheduled Master Switching


Now let's test the operation of our application during a scheduled switch of the wizard:



Watching the application. We see that the work of the application is really interrupted, but it takes only a few seconds, in this particular case, a maximum of 9.



Falling Machine


Now let's try to simulate the fall of the virtual machine, the current master. It would be possible to simply turn off the virtual machine through the Horizon interface, only this would be a regular shutdown. Such a switch will be handled by all services, including Patroni.

We need unpredictable shutdown. Therefore, I asked our administrators to switch off the virtual machine for testing purposes — the current master — in an abnormal way.



At the same time, our application continued to work. Naturally, such an emergency master switch cannot pass unnoticed.

  2019-03-29 10:45: 56.071234
 Connection opened to PostgreSQL 10.7 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-36), 64-bit
 Logged a value, overall count: 453
 Connection closed
 2019-03-29 10: 45: 59.205463
 Connection opened to PostgreSQL 10.7 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-36), 64-bit
 Logged a value, overall count: 454

 Connection closed
 2019-03-29 10: 46: 02.661440
 Error while connecting to the PostgreSQL server closed the connection unexpectedly
  This probably means the server terminated abnormally
  before or while processing

 Caught error:
  local variable 'connection' referenced before assignment
 ……………………………………………………… .. - there are some errors here
 2019-03-29 10: 46: 30.930445
 Error while connecting to the PostgreSQL server closed the connection unexpectedly
  This probably means the server terminated abnormally
  before or while processing

 Caught error:
  local variable 'connection' referenced before assignment
 2019-03-29 10: 46: 31.954399
 Connection opened to PostgreSQL 10.7 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-36), 64-bit
 Logged a value, overall count: 455
 Connection closed
 2019-03-29 10: 46: 35.409800
 Connection opened to PostgreSQL 10.7 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-36), 64-bit
 Logged a value, overall count: 456
 Connection closed
 ^ Cexit
  

As you can see, the application was able to continue its work in less than 30 seconds. Yes, a certain number of users of the service will have time to notice the problems. However, this is a serious server crash, it happens not so often. At the same time, the person (administrator) would hardly have time to react just as quickly, unless he was sitting at the console ready with the switching script.

Output


It seems to me that such a cluster gives a huge advantage to administrators. In fact, serious breakdowns and failures of the database servers will not be noticeable for the application and, accordingly, for the user. You do not have to repair something in a hurry and switch to temporary configurations, servers, etc. And if such a solution is used as a ready-made service in the cloud, then you will not need to spend time preparing it. It will be possible to do something more interesting.

Source text: How we built a reliable PostgreSQL cluster on Patroni