DRBD and Heartbeat for high availability on Linux

Lee — Mon, 18 Jun 2007 22:46:30 +0000

I’ve been trying to get a HA solution put together for one of our software projects here at EMC and I figured I’d share the configuration of these two products in the environment that we’re using. I have to write the documentation for it anyway, so I might as well post it here for everyone else to see and learn from first

We are going to configure 2 machines to be in a Active/Passive failover situation, which means that if the primary machine dies, the secondary will take over its identity and continue functioning as previously.

Primary: lava2042 (10.5.140.42) (192.168.1.1 for crossover interface)
Secondary: lava2138 (10.5.140.138) (192.168.1.2 for crossover interface)
HA-address: lava2222 (10.5.140.222)

Configuring heartbeat

Step 1.
Install Heartbeat and DRBD on BOTH machines that you are planning on configuring. This should be a very straightforward step and I’m not going to go into detail.

Step 2.
We’re going to need a way to connect the machines, you can use either a crossover cable from an additional ethernet port to the other or you can use a serial cable. In this example I’m using a crossover cable.

Step 3.
Now we’re going to configure the /etc/ha.d/ha.cf file for our machine. Here is what I’ve put into the /etc/ha.d/ha.cf file ON EACH MACHINE:
bcast eth1 keepalive 2 warntime 10 deadtime 30 initdead 120 udpport 694 auto_failback on node lava2042 node lava2138
Check this page if you have trouble or are using a serial connection instead of a crossover cable. It has instructions on how to configure this file for a serial interface.

Step 4.
Now configure the /etc/ha.d/authkeys file ON EACH MACHINE for what kind of security and file checking you want, I don’t care about security in this example so I put this is the file since it’s the fastest:
auth 2 2 crc
(See here for more information)

We’ll also need to configure the /etc/ha.d/haresources file, but we won’t do that until we get DRBD working correctly.

Configuring DRBD

Step 1.
The /etc/drbd.conf file needs to be configured. It should already have an example setup in the file. I used the already existing resource r0 and edited the nodes. Inside the “resource r0 {” bracket there should be a part that says “on ”. Here is what I put for my 2 nodes:
on lava2042 { device /dev/drbd0; disk /dev/sda1; address 10.5.140.42:7788; meta-disk internal; }

on lava2138 { device /dev/drbd0; disk /dev/sda8; address 10.5.140.138:7788; meta-disk internal; }

Now let me give a little background. I had already made the /dev/sda1 partition on lava2042 and the /dev/sda8 partition on lava2138, each 1 gig to store the data that was going to be shared. /dev/drbd0 is the device that will actually be mounted and read from. Other than that, I left the entire file to be it’s defaults. Make sure to comment out any other resources unless you need more than one filesystem replicated.

Step 2.
Make sure to load the drbd module by doing a modprobe drbd and check the dmesg command to make sure the output looks correct (Sorry, I don’t have what it should look like, I’ll keep better notes in the future).

Step 3.
Now we need to initialize our metadata for DRBD. We do this by running this on EACH machine:
drbdmeta create-md r0
Where r0 is the name of the resource from the /etc/drbd.conf file. You should now be able to run the following on each machine:
drbdadm up all
After running these two commands, you should be able to check dmesg and /proc/drbd to see the status of your filesystem.

Step 4.
The next step is to force one of the machines to be the primary and create a filesystem. In this case I’m choosing lava2042 as the primary, so I will run this on the machine:
lava2042# drbdsetup /dev/drbd0 primary -o
This will do the initial sync between the machines, you should only need to do this once. After that, run this command:
lava2042# drbdadm primary all
To force lava2042 into the primary state and make /etc/drbd0 usable. From here you can create a filesystem by doing a:
lava2042# mkfs.ext3 /dev/drbd0 (or whatever filesystem you want)
And mount the filesystem to check it out (make sure to unmount it after you’re done)

You should now be able to do a drbdadm primary all on either machine (while in a Secondary/Secondary state (check /proc/drbd)) and mount the filesystem

Step 5.
Okay, now let’s drop back into secondary mode for lava2042 by doing this:
lava2042# drbdadm secondary all
The /proc/drbd file should look something like this:
version: 8.0.3 (api:86/proto:86) SVN Revision: 2881 build by root@lava2138, 2007-06-18 09:50:33 0: cs:Connected st:Secondary/Secondary ds:UpToDate/UpToDate C r--- ns:316952 nr:1221300 dw:1222380 dr:346211 al:8 bm:107 lo:0 pe:0 ua:0 ap:0 resync: used:0/31 hits:81456 misses:98 starving:0 dirty:0 changed:98 act_log: used:0/257 hits:262 misses:8 starving:0 dirty:0 changed:8
(Important part bolded) The filesystem needs to be in a Secondary state for both machines in order for heartbeat to work properly

And now we’re going to edit the /etc/ha.d/haresources file to take care of sharing the filesystem. Here’s what I have in the file:
lava2042 drbddisk::r0 Filesystem::/dev/drbd0::/opt/EMC::ext3 10.5.140.222 httpd
Let’s go through it line by line:
lava2042 – the machine that will be the primary node
drbddisk::r0 – activate the r0 resource disk (make sure r0 corresponds to whatever your resource is named)
Filesystem::/dev/drbd0::/opt/EMC::ext3 – mount /dev/drbd0 on /opt/EMC as an ext3 filesystem
10.5.140.222 – the IP address for our solution (see the beginning of the post)
httpd – the service we’re going to watch over and take care of, in this case httpd (which wasn’t really what I was configuring, but it’s the easiest to show as an example)
Don’t forget this file has to be the same on BOTH MACHINES.

Step 6.
Make sure heartbeat and the service(s) you’re watching DO NOT start at boot, otherwise things get really ugly if when you screw up:
chkconfig heartbeat off
chkconfig httpd off
/etc/init.d/httpd stop
(on both machines)

Step 7. (The cross your fingers step)
Alright, it’s finally time to test your failover configuration. First, we need to start heartbeat on the primary machine:
lava2042# /etc/init.d/heartbeat start
Then, start it on the secondary machine
lava2138# /etc/init.d/heartbeat start

You should now be able to ping the cluster IP (lava2222 or 10.5.140.222). You can also check that the /dev/drbd0 filesystem is mounted on the primary node using df. Check the /var/log/messages file on either machine for debugging information.

The moment of truth
Go to your primary node and yank the power cable out of the back. Head back to your machine and carefully watch the /var/log/messages file on the secondary node. You should see information about the link being down, the drbd having trouble accessing the filesystem, then heartbeat should kick in and start taking over, mounting the filesystem and finally starting your httpd service. Congratulations, you have now successfully failed over.

If you have an error, check the error messages and see if you can figure out what to do, if you need any help leave a comment or email me and I’ll try and help. Hopefully this helps somebody as this took me quite a while to figure out, having never worked with either piece of software.

Additional links:
Information mostly pulled from:
http://linux-ha.org/GettingStarted
http://www.linux-ha.org/DRBD/GettingStarted
http://www.linux-ha.org/DRBD/HowTo

P.S. Ralf Ramge emailed me an updated version of his bash zfs backup script. I am still working on getting it put together to post. Thanks for the email Ralf

:wq - blog » failover

DRBD and Heartbeat for high availability on Linux