MySQL provides a built-in data replication functionality for maintaining identical copies of its data to one or more backend servers, thus providing a simple High Availability mechanism. On the other hand, the Open Source community has several projects to implement failover techniques, being one of them Heartbeat.
This article will show you how to implement a clustered, highly available and inexpensive solution based on GNU/Linux and combining MySQL as the database engine and Heartbeat as the failover mechanism. The configuration will consist of a 2-node active/passive cluster.
I assume you have MySQL up and running on both nodes and that your are working with MySQL 4.0.13 or above. If not, please refer to MySQL manual here and download a recent copy here.
How does replication works in MySQL
Replication in MySQL is very simple: one machine acts as the master server and one or more machines act as the backup servers (the replica servers). The master server keeps all changes made to its databases in binary log files, so the backup server(s) can read these files and apply the changes to its own copy of the data.
In more detail, the binary log file records all the changes (UPDATE, DELETE, INSERT…) made to the master's databases since the first time the replication was configured and started. The master also creates and maintains an index file to keep track of the binary logs created. Upon connecting, the slave server(s) obtains new updates from the binary log and aplies them to its copy of the data.
Note: As MySQL suggests, visit their website often to check the latest changes and improvements to its database replication implementation.
How does Heartbeat works
Heartbeat is a piece of software that provides High Availability features such as monitoring the availability of the machines in the cluster, transferring the virtual IPs (more on this later) in case of failures and starting and stopping services.
The Heartbeat software running on the slave server periodically checks the health of the master server by listening to its heartbeats sent via null modem cable and/or a crossover ethernet cable. Note that in the best scenario slave's main task is nothing but to monitor the health of its master. In case of a crash the slave will not receive the heartbeats from the master and then it will take over the virtual IPs and the services offered by the master.
The overall picture
Next figure shows the picture of our cluster.
As previously stated, our configuration will consist of a 2-node active/passive cluster: dbserv1, the master server and dbserv2, the slave server. Both machines are linked via serial COM port /dev/ttyS0 (null modem cable) and a crossover ethernet cable (eth0), through which they send its heartbeats to each other.
The 192.168.1.103 IP address at eth1:0 is the floating IP address, the virtual IP. This is the service IP where the master listens to and that will be transferred to the slave in case of a failure in the master. Requests from the application servers will be made through the virtual IP.
Both servers have another IP address that can be used to administer the machines: 192.168.1.101 and 192.168.1.102. Bear in mind that the virtual IP (192.168.1.103) is set up by Heartbeat, meaning that if it is not up and running in the active server there will be no access to the virtual service.
Setting up replication
1. Create a replication user on the master:
mysql -u root -p
At MySQL prompt type:
GRANT REPLICATION SLAVE ON *.* TO replica@"%" IDENTIFIED BY 'replica_passwd';
2. Stop MySQL on both the master server and the slave server. Take a snapshot of your databases from the master.
/etc/init.d/mysql stop
tar cvzf mysqldb.tgz /path/to/your/databases
In my configuration I would…
/etc/init.d/mysql stop
tar cvzf mysqldb.tgz /var/mysql-data/*
3. Copy the data to the slave
scp /path/to/mysqldb.tgz admin@dbserv2:/path/to/your/databases
If you are using InnoDB tables, copy your tablespace file(s) and associated log files to the slave. In my case, the tablespace is called ibdata and the log files are those ib_*. So:
scp /var/mysql-data/ibdata admin@dbserv2:/var/mysql-data
scp /var/log/mysql/ib_* admin@dbserv2:/var/log/mysql
4. Activate the binary log and assign a unique ID to the master:
vi /etc/my.cnf
Then add/change the following
[mysqld]
…..
# Enable binary logs. Path to bin log is optional
log-bin=/var/log/mysql/dbserv1
# If the binary log exceeds 10M, rotate the logs
max_binlog_size=10M
# Set master server ID
server-id=1
…..
Now you can start mysqld on the master. Watch the logs to see if there are problems.
/etc/init.d/mysql start
5. Log in on the slave.
vi /etc/my.cnf
Then add/change the following:
server-id=2
# This is eth0. Take a look at figure 1
master-host=192.168.100.1
master-user=replica
master-password=replica_passwd
# Port that master server is listening to
master-port=3306
# Number of seconds before retrying to connect to master. Defaults to 60 secs
#master-connect-retry
6. Uncompress the databases
cd /path/to/your/databases
tar xvzf mysqldb.tgz
chown -R mysql.mysql /path/to/your/databases
Make sure your tablespace file(s) and associated files are in place (/path/to/your/databases in our example).
7. Start mysqld on the slave. Watch the logs to see if there are problems.
/etc/init.d/mysql start
8. Check if replication is working. For example, log in on the master, create a database and see if it is replicated on the slave:
mysql -u root -p
create database replica_test;
show databases;
+----------------+
| Database |
+----------------+
| replica_test |
| mysql |
| test |
| tmp |
+----------------+
Log in on the slave server and make sure the database replica_test is created:
mysql -u root -p
show databases;
+----------------+
| Database |
+----------------+
| replica_test |
| mysql |
| test |
| tmp |
+----------------+
If you have problems, please refer to MySQL manual here.
Installing and setting up Heartbeat
Download a recent copy of Heartbeat from here and then as usual….
configure
make
make install
or:
rpm -Uhv heartbeat-1.0.4-1.i386.rpm
if you downloaded the RPM based package.
Configuring heartbeat
There are three files involved in the configuration of heartbeat:
- ha.cf: the main configuration file that describes the machines involved and how they behave.
- haresources: this configuration file specifies virtual IP (VIP) and services handled by heartbeat.
- authkeys: specifies authentication keys for the servers.
Sample /etc/ha.d/ha.cf
# Time between heartbeats in seconds
keepalive 1
# Node is pronounced dead after 15 seconds
deadtime 15
# Prevents the master node from re-acquiring cluster resources after a failover
nice_failback on
# Device for serial heartbeat
serial /dev/ttyS0
# Speed at which to run the serial line (bps)
baud 19200
# Port for udp (default)
udpport 694
# Use a udp heartbeat over the eth0 interface
udp eth0
debugfile /var/log/ha/ha.debug
logfile /var/log/ha/ha.log
# First node of the cluster (must be uname -a)
node dbserv1
# Second node of the cluster (must be uname -a)
node dbserv2
Sample /etc/ha.d/haresources
dbserv1 Ipaddress::192.168.1.103::eth1
This tells Heartbeat to set up 192.168.1.103 as the virtual IP (VIP). See figure above.
Sample /etc/ha.d/authkeys
auth 1
1 crc
2 sha1 HI!
3 md5 Hello!
This file determines the authentication keys. Must be mode 600. As I assume that our network is relatively secure I configure crc as the authentication method. There is also md5 and sha1 available.
Now start heartbeat on dbserv1 and the on dbserv2, watch the logs, then stop heartbeat on the first node and see what happens on the second node. Start again heartbeat on the first node and stop it on the second and see the logs. If all is okay, you have a 2-node cluster up and running.
What we have
At this point we have a 2-node cluster with certain degree of availability and fault tolerance. Despite this could be a valid solution for non-critical environments, in really critical environments this configuration should be improved.
Advantages
- The cluster is fault tolerant
- The cluster is relatively secure
- There is no single point of failure (comments?)
- Automatic fail over mechanism
- Proven and solid OpenSource software for production environment (my experience)
- Simple and easy to install and configure
- Easy to administer
- Inexpensive
Disadvantages
Our cluster presents almost one serious problem in critical environments (i.e. 99,99% availability). As you know, when the master node fails, the standby node takes over the service and the virtual IP address. In this scenario, when the master comes back online again, it will act as the stand-by node (remember nice_failback on from /etc/ha.d/ha.cf?). As our configuration has not implemented a two-way replication mechanism, the actual master is not generating binary logs and the actual slave is not configured to act as such. There are means to avoid this disadvantage, but this is your homework ;-). Let me know your progress.
As usual, comments are very welcome.
References:
--