Protect Vmware guest under RedHat Cluster

Most documentation on the net is about how to run a cluster-in-a-box under Vmware. Very few seem to care about protecting Vmware guests under real RedHat cluster with a shared storage.

This article is just about it. While I would not recommend using Vmware in such a setup, it has been the case, and that Vmware guest actually resides on the shared storage. To relocate it is out of the question, so migrating it together with other resources is the only valid option.

To do so, I have created a simple script which will accept start/stop/status arguments. The Vmware guest VMX is hard-coded into the script, but in an easy-to-change format. This script will attempt to freeze the Vmware guest, and only if it fails, to shut it down. Mind you that the blog’s HTML formatting might alter quotation marks into UTF-8 marks which will not be understood well by shell.

#!/bin/bash
# This script will start/stop/status VMware machine
# Written by Ez-Aton
# http://www.tournament.org.il/run

# Hardcoded. Change to match your own settings!
VMWARE=”/export/vmware/hosts/Windows_XP_Professional/Windows XP Professional.vmx”
VMRUN=”/usr/bin/vmrun”
TIMEOUT=60

function status () {
# This function will return success if the VM is up
$VMRUN list | grep “$VMWARE” &>/dev/null
if [[ "$?" -eq "0" ]]
then
echo “VM is up”
return 0
else
echo “VM is down”
return 1
fi
}

function start () {
# This function will start the VM
$VMRUN start “$VMWARE”
if [[ "$?" -eq "0" ]]
then
echo “VM is starting”
return 0
else
echo “VM failed”
return 1
fi
}

function stop () {
# This function will stop the VM
$VMRUN suspend “$VMWARE”
for i in `seq 1 $TIMEOUT`
do
if status
then
echo
else
echo “VM Stopped”
return 0
fi
sleep 1
done
$VMRUN stop “$VMWARE” soft
}

case “$1″ in
start)     start
;;
stop)    stop
;;
status)    status
;;
esac
RET=$?

exit $RET

Since the formatting is killed by the blog, you can find the script here: vmware1

I intend on building a “real” RedHat Cluster agent script, but this should do for the time being.

Enjoy!

Xen Networking - Bonding with VLAN Tagging

The simple scripts in /etc/xen/scripts which manage networking are fine for most usages, however, when your server is using bonding together with VLAN tagging (802.11q) you should consider an alternative.

A PDF document written by Mark Nielsen, GPS Senior Consultant, Red Hat, Inc (I lost the original link, sorry) named “BOND/VLAN/Xen Network Configuration” as a service to the community, game me few insights on the subject. Following one of its references, I saw a bit more elegant method of doing a bridging setup under RedHat, which takes managing the bridges away from xend, and leaves it at the system level. Lets see how it’s done on RedHat style Linux distribution.

Manage your normal networking configurations

If you’re using VLAN tagging over bonding, than you should have to setup a bonding device (be it bond0) which has definitions such as this:

/etc/sysconfig/network-scripts/ifcfg-eth0 and /etc/sysconfig/network-scripts/ifcfg-eth1

DEVICE=eth0
BOOTPROTO=none
ONBOOT=yes
MASTER=bond0
SLAVE=yes
ISALIAS=no

/etc/sysconfig/network-scripts/ifcfg-bond0

DEVICE=bond0
BOOTPROTO=none
BONDING_OPTS=”mode=1 miimon=100″
ONBOOT=yes

This is rather stright-forward, and should be rather a default for such a setup. Now comes the more interesting part. Originally, the next configuration part would be bond0.2 and bond0.3 (in my example). The original configuration would have looked like this (this is in bold because I tend to fast-read myself, and tend to miss things too often). This is not how it should look when we’re done!

/etc/sysconfig/network-scripts/ifcfg-bond0.2 (same applies to ifcfg-bond0.3)

DEVICE=bond0.2
BOOTPROTO=static
IPADDR=192.168.0.2
NETMASK=255.255.255.0
ONBOOT=yes
VLAN=yes

Configure bridging

To setup a bridge device for bond0.2, replace the mentioned above ifcfg-bond0.2 with this new /etc/sysconfig/network-scripts/ifcfg-bond0.2

DEVICE=bond0.2
BOOTPROTO=static
ONBOOT=yes
VLAN=yes
BRIDGE=xenbr0

Now, create a new file /etc/sysconfig/network-scripts/ifcfg-xenbr0

DEVICE=xenbr0
BOOTPROTO=static
IPADDR=192.168.0.2
NETMASK=255.255.255.0
ONBOOT=yes
TYPE=bridge

Now, on network restart, the bridge will be brought up, holding the right IP address - all done by initscripts, with no Xen intervention. You will want to repeat the last the “Configure bridge” part for any additional bridge you want to be enabled for Xen machines.

Don’t let Xen bring any bridges up

This is the last part of our drill, and it is very important. If you don’t do it, you’ll get a nice networking mess. As said before, Xen (community), by default, can’t handle bondings or vlan tags, so it will attempt to create or modify bridges to eth0 or the likes. Edit /etc/xen/xend-config.sxp and remark any line containing a directive containing starting with “network-script“. Such a directive would be, for example

(network-script network-bridge)

Restart xend and restart networking. You should now be able to configure VMs to use xenbr0 and xenbr1, etc (according to your own personal settings).

MySQL permissions for LVM Snapshots

aking LVM snapshots as a mean of backing up MySQL is rather simple, as can be described here. However, if you are into security, you would strive to grant minimal permissions for the action to the MySQL user. Per MySQL Documentation, the required privileges is “RELOAD”. That should be enough, granted on *.*, of course.

Raw devices for Oracle on RedHat (RHEL) 5

There is a major confusion among DBAs regarding how to setup raw devices for Oracle RAC or Oracle Clusterware. This confusion is caused by the turn RedHat took in how to define raw devices.

Raw devices are actually a manifestation of character devices pointing to block devices. Character devices are non-buffered, so they act as FIFO, and have no OS cache, which is why Oracle likes them so much for Clusterware CRS and voting.

On other Unix types, commonly there are two invocations for each disk device - a block device (i.e /dev/dsk/c0d0t0s1) and a character device (i.e. /dev/rdsk/c0d0t0s1). This is not the case for Linux, and thus, a special “raw”, aka character, device is to be defined for each partition we want to participate in the cluster, either as CRS or voting disk.

On RHEL4, raw devices were setup easily using the simple and coherent file /etc/sysconfig/rawdevices, which included an internal example. On RHEL5 this is not the case, and customizing in a rather less documented method the udev subsystem is required.

Check out the source of this information, at this entry about raw devices. I will add it here, anyhow, with a slight explanation:

1. Add to /etc/udev/rules.d/60-raw.rules:

ACTION==”add”, KERNEL==”sdb1″, RUN+=”/bin/raw /dev/raw/raw1 %N”

2. To set permission (optional, but required for Oracle RAC!), create a new /etc/udev/rules.d/99-raw-perms.rules containing lines such as:

KERNEL==”raw[1-2]“, MODE=”0640″, GROUP=”oinstall”, OWNER=”oracle”

Notice this:

  1. The raw-perms.rules file name has to begin with the number 99, which defines its order during rules apply, so that it will be used after all other rules take place. Using lower numbers might cause permissions to be incorrect.
  2. The following permissions have to apply:
  • OCR Device(s): root:oinstall , mode 0640
  • Voting device(s): oracle:oinstall, mode 0666
  • You don’t have to use raw devices for ASM volumes on Linux, as the ASMLib library is very effective and easier to manage.

    Xen VMs performance collection

    Unlike VMware Server, Xen’s HyperVisor does not allow an easy collection of performance information. The management machine, called “Domain-0″ is actually a privileged virtual machine, and thus - get its own small share of CPUs and RAM. Collecting performance information on it will lead to, well, collecting performance information for a single VM, and not the whole bunch.

    Local tools, such as “xentop” allows collection of information, however, combining this with Cacti, or any other SNMP-based collection tool is a bit tricky.

    A great solution is provided by Ian P. Christian in his blog post about Xen monitoring. He has created a Perl script to collect information. I have taken the liberty to fix several minor things with his permission. The modified scripts are presented below. Name the script (according to your version of Xen) “/usr/local/bin/xen_stats.pl” and set it to be executable:

    For Xen 3.1

    #!/usr/bin/perl -w

    use strict;

    # declare…
    sub trim($);
    # <a href=“/blog/uploads/xen_cloud.tar.gz” title=“xen_cloud.tar.gz” target=“_blank”>xen_cloud.tar.gz</a>
    # we need to run 2 iterations because CPU stats show 0% on the first, and I’m putting .1 second between them to speed it up
    my @result = split(/\n/, `/usr/sbin/xentop -b -i 2 -d.1`);

    # remove the first line
    shift(@result);

    shift(@result) while @result && $result[0] !~ /^xentop - /;

    # the next 3 lines are headings..
    shift(@result);
    shift(@result);
    shift(@result);
    shift(@result);

    foreach my $line (@result)
    {
    my @xenInfo = split(/[\t ]+/, trim($line));
    printf(“name: %s, cpu_sec: %d, cpu_percent: %.2f, vbd_rd: %d, vbd_wr: %d\n,
    $xenInfo[0],
    $xenInfo[2],
    $xenInfo[3],
    $xenInfo[14],
    $xenInfo[15]
    );
    }

    # trims leading and trailing whitespace
    sub trim($)
    {
    my $string = shift;
    $string =~ s/^\s+//;
    $string =~ s/\s+$//;
    return $string;
    }

    For Xen 3.2 and Xen 3.3

    #!/usr/bin/perl -w

    use strict;

    # declare…
    sub trim($);

    # we need to run 2 iterations because CPU stats show 0% on the first, and I’m putting .1 second between them to speed it up
    my @result = split(/\n/, `/usr/sbin/xentop -b -i 2 -d.1`);

    # remove the first line
    shift(@result);
    shift(@result) while @result && $result[0] !~ /^[\t ]+NAME/;
    shift(@result);

    foreach my $line (@result)
    {
    my @xenInfo = split(/[\t ]+/, trim($line));
    printf(“name: %s, cpu_sec: %d, cpu_percent: %.2f, vbd_rd: %d, vbd_wr: %d\n,
    $xenInfo[0],
    $xenInfo[2],
    $xenInfo[3],
    $xenInfo[14],
    $xenInfo[15]
    );
    }

    # trims leading and trailing whitespace
    sub trim($)
    {
    my $string = shift;
    $string =~ s/^\s+//;
    $string =~ s/\s+$//;
    return $string;
    }

    Cron settings for Domain-0

    Create a file “/etc/cron.d/xenstat” with the following contents:

    # This will run xen_stats.pl every minute
    */1 * * * * root /usr/local/bin/xen_stats.pl > /tmp/xen-stats.new && cat /tmp/xen-stats.new > /var/run/xen-stats

    SNMP settings for Domain-0

    Add the line below to “/etc/snmp/snmpd.conf” and then restart the snmpd service

    extend xen-stats   /bin/cat /var/run/xen-stats

    Cacti

    I reduced Ian Cacti script to be based on a per-server setup, meaning this script gets the host (dom-0) name from Cacti, but cannot support live migrations. I will try to deal with combining live migrations with Cacti in the future.

    Download and extract my modified xen_cloud.tar.gz file. Extract it, place the script and config in its relevant location, and import the template into Cacti. It should work like charm.

    A note - the PHP script will work only on PHP5 and above. Works flawlessly on Centos5.2 for me.

    Oracle RAC with EMC iSCSI Storage Panics

    I have had a system panicking when running the mentioned below configuration:

    • RedHat RHEL 4 Update 6 (4.6) 64bit (x86_64)
    • Dell PowerEdge servers
    • Oracle RAC 11g with Clusterware 11g
    • EMC iSCSI storage
    • EMC PowerPate
    • Vote and Registry LUNs are accessible as raw devices
    • Data files are accessible through ASM with libASM

    During reboots or shutdowns, the system used to panic almost before the actual power cycle. Unfortunately, I do not have a screen capture of the panic…

    Tracing the problem, it seems that iSCSI, PowerIscsi (EMC PowerPath for iSCSI) and networking services are being brought down before “killall” service stops the CRS.

    The service file init.crs was never to be executed with a “stop” flag by the start-stop of services, as it never left a lock file (for example, in /var/lock/subsys), and thus, its existence in /etc/rc.d/rc6.d and /etc/rc.d/rc0.d is merely a fake.

    I have solved it by changing /etc/init.d/init.crs script a bit:

    • On “Start” action, touch a file called /var/lock/subsys/init.crs
    • On “Stop” action, remove a file called /var/lock/subsys/init.crs

    Also, although I’m not sure about its necessity, I have changed init.crs script SYSV execution order in /etc/rc.d/rc0.d and /etc/rc.d/rc6.d from wherever it was (K96 in one case and K76 on another) to K01, so it would be executed with the “stop” parameter early during shutdown or reboot cycle.

    It solved the problem, although future upgrades to Oracle ClusterWare will require being aware of this change.

    Minimal Centos5 and SSH Server with X forwarding

    Installation of minimal selection of Centos 5 - only base, core and dialup selections, will leave a small-sized system, with the minimum required.

    This is a good setup to start setting up a firewall, and it lets you add any required package later using ‘yum’.

    This said, you cannot forward X over SSH at this state. A minimal setup is a minimal setup, after all. I was searching for the package required to allow X forwarding over SSH, and found it - xorg-x11-xauth

    Install this one, and X will be forwarded over your SSH connection (you still need to connect with the ‘-X’ flag from the client).

    An experiment

    My brother is a computer illiterate. He can use a computer for the purpose of e-mail messaging and for editing documents, spreadsheets, etc.

    I have decided to “abuse” his older laptop, an IBM X31 and install Ubuntu on it. This is some sort of an experiment. I wonder how he, a simple user, can cope up with using Linux as a desktop.

    I have made sure he had the following, for now:

    • Ubuntu 8.04 32bit
    • Firefox 3 with Adblock Plus
    • Hebrew fonts, and msttfcorefonts package installed
    • OpenOffice which defaults to saving in MS Office formats - doc, xls, ppt
    • Skype
    • VLC media player
    • Hebrew layout enabled

    I will let him use it for a few days, and keep my blog up to date on this. It interests me :-)

    Engrish, anyone?

    I love it when I get an Engrish product. Their manuals, their texts - I just love it.

    This is the cover of a flash light I was given during the latest RedHat summit here in Israel.

    The interesting part is the text scanned above. I will quote (grammar and spelling mistakes are kept as in the original)

    PRODUCT CHARACTERISTICS:

    1. This product is a new science and technology product and made with high and new science and technology. It can illuminate only placing it in rhythm.

    2. No need any power no environmental pollution. Low noise and health. Comparing with common torch,it can be several times on lift.

    3. Con stantly using this health torch,  it can benefit to your palm, arm and shoulder stretching and blood circulation,so as to let your hands relax and brain clever,hand and brain coor dinate and promote your brain memory and health composition.

    A manual flash light, right? :-)

    x86 Scale Up

    I have been introduced to a very cool software/hardware combination yesterday. It has been, without exaggerating, one of the coolest things I have seen in a while.

    As you may know, x86 has an issue with scaling up. It’s that x86 architectures and price don’t justify scaling up to tenths and hundreds of CPUs. The multi-core technology introduced in the last few years made a four-way server seem trivial today, where in the past it was a high-performance server for large (and expensive) data centers. It is very common today to purchase an eight-way server at a price of a mere commodity server - all thanks to the multi-core technology.

    However, when compared to the large Unix data centers, where 64 and 128 cpus are rather common (I will emphasis - the large Unix data centers), although nowadays, per-core, x86 is somewhat more powerful, for a large load set, it could not rival any many-way server. The common solution with x86 was to “scale out” - add more cheap servers and manage the workload in a more distributed way. Yes, you might pay with communication overhead, however, this can be made cheaper still.

    With a distributed load sharing came the illnesses of communication latencies. Myrinet, 10Gb/s Ethernet and Infiniband were a common, yet expensive (as it was a niche market) solutions, and still - for distribution of high loads, they were well worth it. Still - a large scaled-up server based on x86 was nowhere to find.

    No more. With ScaleMP’s concatenation you can “bundle” a set of servers using Infiniband link into a single huge-multi-way, huge-ram server at a very low cost, relatively.

    Think about how you can purchase your current server, for example, your eight-core server (two quad-core cpus), and in time, scale it up into more powerful server (add another two quad-core cpus), or add more RAM, or more network interfaces, or whatever.

    This is not as fast as the IBM x3950 board-link (excuse me for not knowing the exact name), so it is not ideal for databases or systems which tend to create a lot of cache-misses, however for large (actually - very large) SMP systems, it could be great. It can allow any company which feels that the current server might not be enough the safety and assurance that they can actually scale up, using the same server, into adding more cpus and more RAM to the server at any time.

    I is supported, as far as I know, only for Linux at the time being. It diminishes some of the distance between the large Unix machines and the modern Linux, for a fracture of the price.

    I liked it.