:wq - blog » work http://writequit.org/blog Tu fui, ego eris Mon, 22 Dec 2014 14:54:59 +0000 en-US hourly 1 http://wordpress.org/?v=4.1.5 DRBD and Heartbeat for high availability on Linux http://writequit.org/blog/2007/06/18/drbd-and-heartbeat-for-high-availability-on-linux/ http://writequit.org/blog/2007/06/18/drbd-and-heartbeat-for-high-availability-on-linux/#comments Mon, 18 Jun 2007 22:46:30 +0000 http://writequit.org/blog/?p=61 I’ve been trying to get a HA solution put together for one of our software projects here at EMC and I figured I’d share the configuration of these two products in the environment that we’re using. I have to write the documentation for it anyway, so I might as well post it here for everyone else to see and learn from first ;)

We are going to configure 2 machines to be in a Active/Passive failover situation, which means that if the primary machine dies, the secondary will take over its identity and continue functioning as previously.

Primary: lava2042 (10.5.140.42) (192.168.1.1 for crossover interface)
Secondary: lava2138 (10.5.140.138) (192.168.1.2 for crossover interface)
HA-address: lava2222 (10.5.140.222)

Configuring heartbeat

Step 1.
Install Heartbeat and DRBD on BOTH machines that you are planning on configuring. This should be a very straightforward step and I’m not going to go into detail.

Step 2.
We’re going to need a way to connect the machines, you can use either a crossover cable from an additional ethernet port to the other or you can use a serial cable. In this example I’m using a crossover cable.

Step 3.
Now we’re going to configure the /etc/ha.d/ha.cf file for our machine. Here is what I’ve put into the /etc/ha.d/ha.cf file ON EACH MACHINE:
bcast eth1
keepalive 2
warntime 10
deadtime 30
initdead 120
udpport 694
auto_failback on
node lava2042
node lava2138

Check this page if you have trouble or are using a serial connection instead of a crossover cable. It has instructions on how to configure this file for a serial interface.

Step 4.
Now configure the /etc/ha.d/authkeys file ON EACH MACHINE for what kind of security and file checking you want, I don’t care about security in this example so I put this is the file since it’s the fastest:
auth 2
2 crc

(See here for more information)

We’ll also need to configure the /etc/ha.d/haresources file, but we won’t do that until we get DRBD working correctly.

Configuring DRBD

Step 1.
The /etc/drbd.conf file needs to be configured. It should already have an example setup in the file. I used the already existing resource r0 and edited the nodes. Inside the “resource r0 {” bracket there should be a part that says “on <something>”. Here is what I put for my 2 nodes:
on lava2042 {
device /dev/drbd0;
disk /dev/sda1;
address 10.5.140.42:7788;
meta-disk internal;
}

on lava2138 {
device /dev/drbd0;
disk /dev/sda8;
address 10.5.140.138:7788;
meta-disk internal;
}

Now let me give a little background. I had already made the /dev/sda1 partition on lava2042 and the /dev/sda8 partition on lava2138, each 1 gig to store the data that was going to be shared. /dev/drbd0 is the device that will actually be mounted and read from. Other than that, I left the entire file to be it’s defaults. Make sure to comment out any other resources unless you need more than one filesystem replicated.

Step 2.
Make sure to load the drbd module by doing a modprobe drbd and check the dmesg command to make sure the output looks correct (Sorry, I don’t have what it should look like, I’ll keep better notes in the future).

Step 3.
Now we need to initialize our metadata for DRBD. We do this by running this on EACH machine:
drbdmeta create-md r0
Where r0 is the name of the resource from the /etc/drbd.conf file. You should now be able to run the following on each machine:
drbdadm up all
After running these two commands, you should be able to check dmesg and /proc/drbd to see the status of your filesystem.

Step 4.
The next step is to force one of the machines to be the primary and create a filesystem. In this case I’m choosing lava2042 as the primary, so I will run this on the machine:
lava2042# drbdsetup /dev/drbd0 primary -o
This will do the initial sync between the machines, you should only need to do this once. After that, run this command:
lava2042# drbdadm primary all
To force lava2042 into the primary state and make /etc/drbd0 usable. From here you can create a filesystem by doing a:
lava2042# mkfs.ext3 /dev/drbd0 (or whatever filesystem you want)
And mount the filesystem to check it out (make sure to unmount it after you’re done)

You should now be able to do a drbdadm primary all on either machine (while in a Secondary/Secondary state (check /proc/drbd)) and mount the filesystem

Step 5.
Okay, now let’s drop back into secondary mode for lava2042 by doing this:
lava2042# drbdadm secondary all
The /proc/drbd file should look something like this:
version: 8.0.3 (api:86/proto:86)
SVN Revision: 2881 build by root@lava2138, 2007-06-18 09:50:33
0: cs:Connected st:Secondary/Secondary ds:UpToDate/UpToDate C r---
ns:316952 nr:1221300 dw:1222380 dr:346211 al:8 bm:107 lo:0 pe:0 ua:0 ap:0
resync: used:0/31 hits:81456 misses:98 starving:0 dirty:0 changed:98
act_log: used:0/257 hits:262 misses:8 starving:0 dirty:0 changed:8

(Important part bolded) The filesystem needs to be in a Secondary state for both machines in order for heartbeat to work properly

And now we’re going to edit the /etc/ha.d/haresources file to take care of sharing the filesystem. Here’s what I have in the file:
lava2042 drbddisk::r0 Filesystem::/dev/drbd0::/opt/EMC::ext3 10.5.140.222 httpd
Let’s go through it line by line:
lava2042 – the machine that will be the primary node
drbddisk::r0 – activate the r0 resource disk (make sure r0 corresponds to whatever your resource is named)
Filesystem::/dev/drbd0::/opt/EMC::ext3 – mount /dev/drbd0 on /opt/EMC as an ext3 filesystem
10.5.140.222 – the IP address for our solution (see the beginning of the post)
httpd – the service we’re going to watch over and take care of, in this case httpd (which wasn’t really what I was configuring, but it’s the easiest to show as an example)
Don’t forget this file has to be the same on BOTH MACHINES.

Step 6.
Make sure heartbeat and the service(s) you’re watching DO NOT start at boot, otherwise things get really ugly if when you screw up:
chkconfig heartbeat off
chkconfig httpd off
/etc/init.d/httpd stop
(on both machines)

Step 7. (The cross your fingers step)
Alright, it’s finally time to test your failover configuration. First, we need to start heartbeat on the primary machine:
lava2042# /etc/init.d/heartbeat start
Then, start it on the secondary machine
lava2138# /etc/init.d/heartbeat start

You should now be able to ping the cluster IP (lava2222 or 10.5.140.222). You can also check that the /dev/drbd0 filesystem is mounted on the primary node using df. Check the /var/log/messages file on either machine for debugging information.

The moment of truth
Go to your primary node and yank the power cable out of the back. Head back to your machine and carefully watch the /var/log/messages file on the secondary node. You should see information about the link being down, the drbd having trouble accessing the filesystem, then heartbeat should kick in and start taking over, mounting the filesystem and finally starting your httpd service. Congratulations, you have now successfully failed over.

If you have an error, check the error messages and see if you can figure out what to do, if you need any help leave a comment or email me and I’ll try and help. Hopefully this helps somebody as this took me quite a while to figure out, having never worked with either piece of software.

Additional links:
Information mostly pulled from:
http://linux-ha.org/GettingStarted
http://www.linux-ha.org/DRBD/GettingStarted
http://www.linux-ha.org/DRBD/HowTo

P.S. Ralf Ramge emailed me an updated version of his bash zfs backup script. I am still working on getting it put together to post. Thanks for the email Ralf

]]>
http://writequit.org/blog/2007/06/18/drbd-and-heartbeat-for-high-availability-on-linux/feed/ 8
Not-as-simple perl script for ZFS snapshot auditing http://writequit.org/blog/2007/06/05/not-as-simple-perl-script-for-zfs-snapshot-auditing/ http://writequit.org/blog/2007/06/05/not-as-simple-perl-script-for-zfs-snapshot-auditing/#comments Tue, 05 Jun 2007 21:45:40 +0000 http://writequit.org/blog/?p=55 Hi everyone, I’m back again with another perl script to hopefully be useful to a few of you.

Firstly, the script: http://lee.hinmanphoto.com/files/zdiff.txt (formatting long scripts in wordpress’ crazy editor is a very long and arduous process, thus I’m just linking to the script in this case, if anyone knows of a better place to stick it let me know). chmod +x it and away you go!

Edit: Sun was nice enough to host the file for me, here’s a link to their version in case the other one goes down: http://www.sun.com/bigadmin/scripts/submittedScripts/zdiff.txt

In a nutshell, here’s what it does:

  • Allows you to diff a file inside a ZFS snapshot with the current file in the filesystem and (optionally) print out the line differences
  • Recursively diff an entire snapshot using md5 sums and (optionally) printing out the line differences
  • Display the md5 sums for each file in a ZFS snapshot and filesystem (this can get old to look at very quickly)

Basically, that doesn’t mean a whole lot, here’s the output from the -h option:

ZFS Snapshot diff
./zdiff.pl [-dhirv] <zfs shapshot name> [filename]

-d Display the lines that are different (diff output)
-h Display this usage
-i Ignore files that don't exist in the snapshot (only necessary for recursing)
-r Recursively diff every file in the snapshot (filename not required)
-v Verbose mode

[filename] is the filename RELATIVE to the ZFS snapshot root. For example, if
I had a filesystem snapshot called pool/data/zone@initial. The filename '/etc/passwd'
would refer to the filename /pool/data/zone/etc/passwd in the filesystem and filename
/pool/data/zone/.zfs/snapshot/initial/etc/passwd in the snapshot.

A couple of examples:
./zdiff.pl -v -r -i pool/zones/lava2019@Fri
Checks the current pool/zones/lava2019 filesystem against the snapshot
returning the md5sum difference of any files (ignore files that don't
exist in the snapshot). With verbose mode

./zdiff.pl -d pool/zones/lava2019@Mon /root/etc/passwd
Check the md5sum for /pool/zones/lava2019/root/etc/passwd and compare
it to /pool/zones/lava2019/.zfs/snapshot/Mon/root/etc/passwd. Display
the lines that are different also.

Here’s what the output is going to look like:

-bash-3.00# ./zdiff.pl -d -v -r -i pool/zones/lava2019@Fri
Recursive diff on pool/zones/lava2019@Fri
Filesystem: /pool/zones/lava2019, Snapshot: Fri
Comparing: /pool/zones/lava2019/
to: /pool/zones/lava2019/.zfs/snapshot/Fri/
** /pool/zones/lava2019/root/etc/shadow is different
** MD5(/pool/zones/lava2019/root/etc/shadow)= 04fa68e7f9dbc0afbf8950bbb84650a6
** MD5(/pool/zones/lava2019/.zfs/snapshot/Fri/root/etc/shadow)= 4fc845ff7729e804806d8129852fa494
17d16
< tom:*LK*:::::::
** /pool/zones/lava2019/root/etc/dfs/dfstab is different
** MD5(/pool/zones/lava2019/root/etc/dfs/dfstab)= 8426d34aa7aae5a512a0c576ca2977b7
** MD5(/pool/zones/lava2019/.zfs/snapshot/Fri/root/etc/dfs/dfstab)= c3803f151cb3018f77f42226f699ee1b
13d12
< share -F nfs -o rw -d "Data" /data

etc, etc, etc.

I am planning on using it so I can audit certain files on different zones (like /etc/passwd) against an initial ZFS snapshot to see what’s changed. Nice little way to keep track of stuff. Email me with any bugs. Matthew dot hinman at gmail dot com.

]]>
http://writequit.org/blog/2007/06/05/not-as-simple-perl-script-for-zfs-snapshot-auditing/feed/ 5
Use SVM to make RAID0 and RAID1 meta-partitions http://writequit.org/blog/2007/05/17/use-svm-to-make-raid0-and-raid1-meta-partitions/ http://writequit.org/blog/2007/05/17/use-svm-to-make-raid0-and-raid1-meta-partitions/#comments Thu, 17 May 2007 18:55:13 +0000 http://writequit.org/blog/?p=50 Firstly, the easy one:

RAID0:
Given 4 slices, each ~5g:

First, need a metadb, I created a 100MB slice on c1t1d0s0 (which I am NOT using for the RAID, entirely separate drive) and ran this command to initiate the database. It is a good idea to mirror the database in a minimum of 3 positions, but that is beyond the scope of this tutorial
metadb -a -f c1t1d0s0

Then, it’s as easy as 1 command to bring multiple drives into one slice/partition with the following command:
metainit d100 1 4 c2t2d0s0 c2t3d0s0 c2t4d0s0 c2t5d0s0
NOTE: I already created slice 0 on each of the drives.

To see the status of your meta-slice:
metastat d100
d100: Concat/Stripe
Size: 40878080 blocks (19 GB)
Stripe 0: (interlace: 32 blocks)
Device Start Block Dbase Reloc
c2t2d0s0 0 No Yes
c2t3d0s0 4096 No Yes
c2t4d0s0 4096 No Yes
c2t5d0s0 4096 No Yes

Device Relocation Information:
Device Reloc Device ID
c2t2d0 Yes id1,sd@n6006048cb0ca0ceeef67fa7a33ce4c94
c2t3d0 Yes id1,sd@n6006048cb275dda20f654d7248d17197
c2t4d0 Yes id1,sd@n6006048c5aa658e3c69370f2bad75bc0
c2t5d0 Yes id1,sd@n6006048cc092136a695a21eeaa948f88

See? Now we’ve got a 19GB slice. Feel free to newfs /dev/md/dsk/d100 and mount it somewhere fun.

Next up: RAID1
This is actually not as hard as it looks. First, make sure you init your database like the first step from above. Then initialize your first meta slice:
metainit d101 1 1 c2t2d0s0

Then, create the mirror for that slice which will become your final RAID1 slice by issuing the following command:
metainit d100 -m d101

Then initialize the other slices in your mirror, in this care there are 3 additional slices:
metainit d102 1 1 c2t3d0s0
metainit d103 1 1 c2t4d0s0
metainit d104 1 1 c2t5d0s0

From there, it’s quite easy to finish it up by attaching the mirrors:
metattach d100 d102
metattach d100 d103
metattach d100 d104

Then, monitor metastat for the sync progress percentage until all the mirrors are sync’d. Finished!
metastat d100

]]>
http://writequit.org/blog/2007/05/17/use-svm-to-make-raid0-and-raid1-meta-partitions/feed/ 0
Getting EMC Celerras to work for iscsi on Solaris 10 http://writequit.org/blog/2007/05/17/getting-emc-celerras-to-work-for-iscsi-on-solaris-10/ http://writequit.org/blog/2007/05/17/getting-emc-celerras-to-work-for-iscsi-on-solaris-10/#comments Thu, 17 May 2007 18:31:12 +0000 http://writequit.org/blog/?p=49 For fun and profit!

Basically, for my own categorization:

1. Celerra-side:
Create filesystems (I am using 4 because I want to stripe across all 4:
nas_fs -n iscsiRAID1_5g -c size=5G pool=clar_r5_performance
nas_fs -n iscsiRAID2_5g -c size=5G pool=clar_r5_performance
nas_fs -n iscsiRAID3_5g -c size=5G pool=clar_r5_performance
nas_fs -n iscsiRAID4_5g -c size=5G pool=clar_r5_performance

Mount filesystems:
server_mount server_2 iscsiRAID1_5g /iscsiRAID1_5g
(repeat for all 4 filesystems)

Create iscsi target:
server_iscsi server_2 -target -alias target_3 -create 1000:np=10.5.140.151
(10.5.140.151 is the datamover IP for this Celerra, “target_3″ is the target name)

Create iscsi LUNs:
server_iscsi server_2 -lun -number 1 -create target_3 -size 5000 -fs iscsiRAID1_5g
server_iscsi server_2 -lun -number 2 -create target_3 -size 5000 -fs iscsiRAID2_5g
server_iscsi server_2 -lun -number 3-create target_3 -size 5000 -fs iscsiRAID3_5g
server_iscsi server_2 -lun -number 4 -create target_3 -size 5000 -fs iscsiRAID4_5g

I am creating 4 luns, 1 for each of the 4 filesystems

2. On the Sun side:
iscsiadm modify discovery --sendtargets enable
iscsiadm add discovery-address 10.5.140.151:3260

(10.5.140.151 is the datamover for our Celerra, it will be our iscsi target)

Run this command so you can get the initiator node name:
iscsiadm list initiator-node
It’ll spit out something that looks like this:
Initiator node name: iqn.1986-03.com.sun:01:ba88a3f5ffff.4648d8d8
Initiator node alias: -
Login Parameters (Default/Configured):
Header Digest: NONE/-
Data Digest: NONE/-
Authentication Type: NONE
RADIUS Server: NONE
RADIUS access: unknown
Configured Sessions: 1

We’re interested in the bold part up there, the part that starts with iqn.blahblahblah

Back on the Celerra:
server_iscsi server_2 -mask -set target_3 -initiator iqn.1986-03.com.sun:01:ba88a3f5ffff.4648d8d8 -grant 1-4
(use the initiator you got from the previous command, we are granting access to LUNs 1 through 4 (our raid LUNs))
And start the iscsi service if it hasn’t been started already:
server_iscsi server_2 -service -start
You are now completely done on the Celerra side, you can log off.

Back on the Sun:
Run this command to make sure you can see your targets alright
iscsiadm list target
Target: iqn.1992-05.com.emc:apm000650039080000-3
Alias: target_3
TPGT: 1000
ISID: 4000002a0000
Connections: 1

You should see something similar to the above. If you do, you now have a successful connection to the Celerra for iscsi. Don’t forget to create device nodes for your drives by running this:
devfsadm -i iscsi
Now run “format” and you should be able to see your drives show up. Don’t forget to open port 3260 in your firewall so that iscsi traffic can get through.

You should now be in business with your 4 drives. I’m still working on the RAID/mirror/striping part. I will add another post once I figure this out.

If you run into an error where the iscsi driver will not online, take a look at this link.

]]>
http://writequit.org/blog/2007/05/17/getting-emc-celerras-to-work-for-iscsi-on-solaris-10/feed/ 2