[MariaDB]: HA with Multi-source Replication

 

The Basics

The topology uses 3 servers: A is the active master, B a stand-by master set up to replicate from A and C a multi purpose slave replicating from A. I also set up A to replicate from B. This way, if A fails, the system would start writing to B and once A comes back up, it will recover all the transactions from B through replication. As I mentioned in my previous article, typically this set up is used in combination with a tool like Master High Availability Manager (MHA) to migrate the slave from one master to the next.

For this particular case, assuming that the application(s) write to the active master only, we’re going to set the same Domain ID for all servers to  ‘1’.

SET GLOBAL gtid_domain_id = 1;

and in the my.cnf files:

gtid-domain-id=1

Setup Slave to Replicate From 2 Masters

With the basics taken care of, we now will set up server C to replicate from both server A (active master) and B (standby master). Keep in mind that now each transaction that reaches C will come from 2 different sources. For example, if you issue an INSERT on the active master A, it will replicate to C and B. Since C is also B’s slave, the same transaction will replicate from B to C. By default, C will apply both transactions, which is redundant and can lead to data inconsistencies and errors. To avoid this, it is necessary to set gtid_ignore_duplicates variable to ON.

SET GLOBAL gtid_ignore_duplicates=ON;

and in the my.cnf files:

gtid-ignore-duplicates=ON

This way, when C receives the same transaction through 2 different slave connections, it will verify the GTID and only apply it once.

The next step is to set the 2nd slave connection. In my previous article, I setup the gtid_slave_pos global variable. Since server C is already replicating successfully from server A, this variable already has the proper value and there is no need to set it up manually. We only need to define the new connection using CHANGE MASTER TO:

CHANGE MASTER 's-102' TO  MASTER_HOST='192.168.56.102', 
 MASTER_USER='repl', MASTER_PASSWORD='repl',
 master_use_gtid=slave_pos;

Notice that we add a label “s-102” after the MASTER reserved word. This string identifies the 2nd replication connection. This identifier needs to be used for every replication command. So, to start replicating from server B, you would use START SLAVE like this:

START SLAVE 's-102';

In the same way, to check the status of this replication connection you would use:

SHOW SLAVE 's-102' STATUS\G

You can also issue the commands to all replication connections by using the keyword ALL, for example:

SHOW ALL SLAVES STATUS\G
STOP ALL SLAVES;
START ALL SLAVES;

Notice that when using the keyword ALL, you also have to use SLAVES (plural form). The STOP and START commands will show as many warnings as the number of replication connections:

MariaDB [(none)]> START ALL SLAVES;
Query OK, 0 rows affected, 2 warnings (0.02 sec)
MariaDB [(none)]> SHOW WARNINGS;
+-------+------+-----------------------+
| Level | Code | Message               |
+-------+------+-----------------------+
| Note  | 1937 | SLAVE 's-102' started |
| Note  | 1937 | SLAVE '' started      |
+-------+------+-----------------------+
2 rows in set (0.00 sec)

After completing the setup, the resulting topology will look similar to this:

+-----+         +-----+
|  A  | <- - -> |  B  |
+-----+         +-----+
   |               |
   \               /
    \             /
     V           V
        +-----+
        |  C  |
        +-----+

Failures

If server A goes away or the connection between server A and server C fails, server C will still receive all the transactions through the alternate replication connection to server B (labeled ‘s-102’). Once server A comes back online again, and/or server C can reconnect to it, server C will compare all the GTIDs coming from server A and ignore those which have already been applied.

The advantage of using this approach over the traditional single source replication is that now there is no need to re-source the slaves. When the active master fails and the applications start writing to the alternative masters, the slaves will continue to replicate from it simplifying the management of failover scenarios.

  • Ask Question