Configuring and Installing Remus

Host Configuration

Setting up the Dev Environment

This entire tutorial assumes that you have a Debian based system (Debian Squeeze or Ubuntu 10.04) [32 or 64 bit].
If you are behind a proxy server you have probably to set this commands to pass through.

export http_proxy=http://<username>:<password>@<proxyaddress>:<proxyport>
export https_proxy=http://<username>:<password>@<proxyaddress>:<proxyport>
export ftp_proxy=http://<username>:<password>@<proxyaddress>:<proxyport>

If you are behind a firewall or proxy server and don't have possibilities to enable git port to be able to use git protocol, you will have to set the command below in order to exchange data with git server using http instead of git protocol:
export GIT_HTTP=y

Install necessary packages
apt-get update
apt-get upgrade
apt-get install git-core mercurial screen tcpdump minicom ntp ntpdate tree debootstrap bcc bin86 gawk bridge-utils iproute libcurl3 libcurl4-openssl-dev bzip2 module-init-tools transfig tgif texinfo pciutils-dev build-essential make gcc libc6-dev zlib1g-dev python python-dev python-twisted libncurses5-dev patch libvncserver-dev libjpeg62-dev iasl libbz2-dev e2fslibs-dev uuid-dev libtext-template-perl autoconf debhelper debconf-utils docbook-xml docbook-xsl dpatch xsltproc rcconf bison flex gcc-multilib ocaml-findlib libyajl-dev yajl-tools libglib2.0-dev libsdl-ttf2.0-0 libsdl-ttf2.0-dev
apt-get clean

From here on, the tutorial will assume that Machine1 and Machine2 are two physical hosts and SystemHA is the VM to be protected. Please substitute their references by the respective hostnames, IP addresses or whatever floats your boat :).

Download and Install Xen 4.1.2 from Xensource Repo

The patches provided in this tutorial will apply cleanly against changeset 23173:d5bb65d3ec24 in xen-4.1-testing.hg repository. First make sure that "hgext.mq=" is uncommented in /etc/mercurial/hgrc.d/hgext.rc

cd /usr/src
hg clone -r RELEASE-4.1.2 http://xenbits.xen.org/xen-4.1-testing.hg xen-4.1.2

Apply the following set of patches:

  1. 01_remus_compression.patch - adds checkpoint compression functionality (also available in upstream xen i.e xen unstable)
  2. 02_persistent_bitmap.patch - creates a permanent mapping of the PV guest in xc_domain_save, instead of mapping/unmapping in batches of 4MB. This patch will have no effect on HVMs.
  3. 03_config_fixups.patch
  4. 04_stats_fix.patch - pretty printing of remus checkpoint stats for post processing and analysis
  5. 05_timeouts.patch - increases the failure detection timeout. Once your installation is stable, please adjust the timeout values in this patch according to your needs.
  6. 06_qdisc_3.4_fix.patch - This patch enables support for sch_plug modules when using 3.4+ dom0 kernels.
wget http://remusha.wikidot.com/local--files/configuring-and-installing-remus/01_remus_compression.patch -O /tmp/01_remus_compression.patch
wget http://remusha.wikidot.com/local--files/configuring-and-installing-remus/02_persistent_bitmap.patch -O /tmp/02_persistent_bitmap.patch
wget http://remusha.wikidot.com/local--files/configuring-and-installing-remus/03_config_fixups.patch -O /tmp/03_config_fixups.patch
wget http://remusha.wikidot.com/local--files/configuring-and-installing-remus/04_stats_fix.patch -O /tmp/04_stats_fix.patch
wget http://remusha.wikidot.com/local--files/configuring-and-installing-remus/05_timeouts.patch -O /tmp/05_timeouts.patch
wget http://remusha.wikidot.com/local--files/configuring-and-installing-remus/06_qdisc_3.4_fix.patch -O /tmp/06_qdisc_3.4_fix.patch

###NOTE: Make sure "hgext.mq=" line is uncommented in /etc/mercurial/hgrc.d/hgext.rc else the following commands wont work.
cd /usr/src/xen-4.1.2
hg qinit
hg qimport /tmp/01_remus_compression.patch
hg qpush
hg qimport /tmp/02_persistent_bitmap.patch
hg qpush
hg qimport /tmp/03_config_fixups.patch
hg qpush
hg qimport /tmp/04_stats_fix.patch
hg qpush
hg qimport /tmp/05_timeouts.patch
hg qpush
hg qimport /tmp/06_qdisc_3.4_fix.patch
hg qpush

###add configure options according to your needs
./configure
# if behind proxy, then enablt git over http for xen configure file.
# ./configure --enable-githttp
make clean
make install-xen
make tools

Once you have done "make tools", you should be having a tools/ioemu-remote directory that contains the qemu device model code, to be used for HVM domUs. The qemu device model code currently does not handle drbd disk backed HVM domUs properly. Apply the following patch drbd-hvm-fix.
cd /usr/src/xen-4.1.2/tools/qemu-xen-traditional-dir-remote
wget http://remusha.wikidot.com/local--files/configuring-and-installing-remus/drbd-hvm-fix
patch -p1 <drbd-hvm-fix
cd /usr/src/xen-4.1.2
make install-tools

Dom0 and DomU kernels

Dom0 kernel support required for Remus is present in mainline kernels 3.4+. If you are using any other kernel version, make sure you have
the sch_plug module installed in Dom0. sch_plug is not available in 3.0 kernels. Checkout out http://pasik.reaktio.net/xen/remus/linux3x/
Download the Makefile and sch_plug.c , and compile/install the module.

DomU Kernel - If you want to reduce the Remus checkpointing overhead, use a kernel with Suspend Event Channel support. As of this writing, only Open Suse kernels (3.0+) have this capability. Otherwise, Remus falls back to the slower version (xenstore based suspend/resume) for checkpointing. The slower version works for both HVM and PV DomUs.

Download and install DRBD 8.3.9 (remus version) or 8.3.11

Remus uses a modified version of DRBD (based on 8.3.9) for disk replication. You can download a tarball of the source from here.

cd /usr/src/
wget http://remusha.wikidot.com/local--files/configuring-and-installing-remus/drbd-8.3.9-remus.tar.gz
tar xzf drbd-8.3.9-remus.tar.gz
chown -R root:root drbd-8.3-remus 
cd /usr/src/drbd-8.3-remus
chmod 777 autogen.sh
./autogen.sh
dpkg-buildpackage -b -uc
cd /usr/src/drbd-8.3-remus/drbd
make clean
make
make install
cd /usr/src/
dpkg -i drbd8-utils_8.3.9-5_i386.deb
cp /usr/src/drbd-8.3-remus/scripts/global_common.conf.protoD /etc/drbd.d/global_common.conf
cp /usr/src/drbd-8.3-remus/scripts/testvms_protoD.res /etc/drbd.d/SystemHA_protoD.res
###Modify SystemHA_protoD.res according to your needs

This code is based on DRBD version 8.3.9. Here is the full patch.

Conor Winchcombe has ported the remus patch from 8.3.9 to 8.3.11. The patched source tarball. If you would like to see the changes w.r.t the base 8.3.11, here is the patch.

I have not tested the debian packaging based installation procedure yet. However, the standard installation procedure outlined in the DRBD website should work.

Add Xen boot entry to Grub

Edit the file /etc/default/grub and add the lines below at the end of file (An example file can be found here)

# Disable OS prober to prevent virtual machines on logical volumes from appearing in the boot menu.
GRUB_DISABLE_OS_PROBER="true"

Uncoment the entry on file /etc/default/grub (Remove #)
GRUB_DISABLE_LINUX_RECOVERY="true"

Create file on /etc/grub.d/08_xen and add the lines below or download it here
#!/bin/sh
exec tail -n +3 $0
menuentry "Xen Unstable / Debian Squeeze kernel 2.6.32.40" {
        insmod ext2
        set root='(hd0,1)'
        multiboot (hd0,1)/boot/xen.gz dummy dom0_mem=512M
        module (hd0,1)/boot/vmlinuz-2.6.32.40 dummy root=UUID=8e339522-dab5-4a81-8066-c41cc3908a15 ro quiet console=tty0 nomodeset
        module (hd0,1)/boot/initrd.img-2.6.32.40
}

You will have to modify UUID hash according to your system. You can find the corresponding hash by executing the blkid command.
chmod -x /etc/grub.d/20_linux_xen
chmod 755 /etc/grub.d/08_xen
update-grub2

NB: Sometimes, it helps to give dom0 atleast 2 vcpus (if its a hyperthreaded host, 2 hyperthreads atleast). Some users reported that DRBD was unstable on a dom0 with just 1 vcpu (check this forum post)

Fix init.d scripts to start xend daemon on boot

update-rc.d xencommons defaults 19 18
update-rc.d xend defaults 20 21
update-rc.d xendomains defaults 21 20

Restart the system (and cross your fingers). After reboot, in case xend hasnt started, do the following manually.

/etc/init.d/xencommons start
/etc/init.d/xend start
/etc/init.d/xendomains start

Create DRBD volume and its meta data

Most of the stuff here can also be found in the drbd.org website. Configuring remus version of DRBD is "almost" same as configuring DRBD 8.3. The only difference is that the setup operates in Dual Primary mode and the replication protocol is protocol D. The following assumes that

  1. you have lvm setup on your system and that your VM is on a lvm volume named drbd-vm.
  2. you have configured (not setup) DRBD resource for the VM
lvcreate -n SystemHA-disk -L 10G volgroup
##/dev/volgroup/drbd-vm is your VM's disk.
###This should be in your drbd resource file SystemHA.protoD.res or whatever you chose to name it
resource drbd-vm {
        device /dev/drbd1;
        disk /dev/volgroup/SystemHA-disk;
        meta-disk /dev/volgroup/drbd-meta[0];
        on MACHINE1 {
            address 10.0.0.1:7791;
        }
        on MACHINE2 {
            address 10.0.0.2:7791;
        }
}

Create the meta-data for the SystemHA-disk and then bring up the resource. Do this on both machines

drbdadm create-md drbd-vm
###answer y or yes for all questions, in the above command
drbdadm up drbd-vm

###Sanity check. You should see something like this in the output
root@null:~$ cat /proc/drbd 
version: 8.3.9 (api:88/proto:86-95)
GIT-hash: 3ba4fc581d6215744597b3d4c525db276ce000ee build by root@null, 2011-05-14 19:33:56

 1: cs:Connected ro:Secondary/Secondary ds:Inconsistent/Inconsistent D r-----

Synchronize the DRBD resource for first time

drbdadm -- --overwrite-data-of-peer primary drbd-vm
##sanity check
root@null:~$ cat /proc/drbd 
version: 8.3.9 (api:88/proto:86-95)
GIT-hash: 3ba4fc581d6215744597b3d4c525db276ce000ee build by root@null, 2011-05-14 19:33:56

 1: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate D r-----

Note:When configuring your VM, use drbd:drbd-vm for the disk. not phy:/… or any other format

Working with Remus

The following steps assume that you have a VM named SystemHA, that has booted and its backed by the previously created drbd volume.
Assuming Machine 1 is the PRIMARY and Machine 2 is the BACKUP, execute the following command on Machine 1

nohup remus -i 40 SystemHA Machine2 >/var/log/xen/domU.log 2>&1 &
###Sanity check to see if DRBD is replicating properly
root@null:~$ cat /proc/drbd 
version: 8.3.9 (api:88/proto:86-95)
GIT-hash: 3ba4fc581d6215744597b3d4c525db276ce000ee build by root@null, 2011-05-14 19:33:56

 1: cs:Connected ro:Primary/Primary ds:UpToDate/UpToDate D r-----
    ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b def:0 chkpt:100 oos:0

Note the Primary/Primary role and the chkpt:<some value greater than 1> in the output above. This indicates that checkpoint based
replication is in progress for the disk. This chkpt number does not necessarily correspond to the current Remus checkpoint indicated in the
remus log file. So dont be alarmed.

If you do "tail -f /var/log/xen/domU.log", you should see a steady stream of log messages like

REMUS:10:suspendAt:1316126973.112921:scall:433:rcall:569:dcall:47:suspendFor:674:ctime:571:flush:6:commit:0:tosend:136:comp:15222
Total pages sent= 0.00x

Replicating to /dev/null

Sometimes, for development or analysis purposes, you dont really want to replicate to a physical machine. You just want to gather up the statistics
such as number of pages that changed in a checkpoint, size of data sent, etc. In this case, all you need is a system to continuously checkpoint the VM
and replicate it to a sink like /dev/null, while still gathering up stats.

nohup remus -i 40 --blackhole --no-net SystemHA dummyHost >/var/log/xen/domU-blackhole.log 2>&1 &
## sanity check: DRBD is in Primary/Secondary role here, since disk is not replicated and chkpt value is 1
root@null:~$ cat /proc/drbd 
version: 8.3.9 (api:88/proto:86-95)
GIT-hash: 3ba4fc581d6215744597b3d4c525db276ce000ee build by root@null, 2011-05-14 19:33:56

 1: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate D r-----
    ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b def:0 chkpt:1 oos:0

The VM SystemHA is continuously checkpointed but replicated to /dev/null. Gather up all the stats you want and then kill remus
pkill -USR1 remus
Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License