How to deploy the Ganeti cluster management software.
Created: Tue 28 June 2011 / Last updated: Tue 19 June 2012
Ganeti is a not so thin layer on the side of an hypervisor to facilitate the management of your virtual machines. It helps you move virtual machine instances from one node to another, create an instance with DRBD replication on another node and do the live migration from one to another and basically everything you can expect from a robust platform.
Historically, Ganeti started within Google to manage the business infrastructure (print servers, LDAP, accounting, etc.). I looked at it because as it was possible to install it from source and without a patched kernel on Debian Squeeze. The quality of the code, documentation, development process and discussions on the news group finished to convince me to try it. Here is the way I have setup the system with KVM.
This document must be read together with the Ganeti installation documentation. Some of the steps described in the official Ganeti installation documentation are not described here.
For reference, the software versions are:
Everything is running Debian Squeeze in 64bit, so many terms are taken from the Debian way to name things.
Your cluster will run on your network, it means that this configuration will need to be adapted to fit your requirements. In this case, the servers are hosted with OVH and have:
eth0
with a fixed public IP address. eth0.2186
. On this interface, two networks are available a private network 192.168.0.0/16
and a public network 178.33.145.128/26
. The public network is a RIPE block.The goal is to have for each VM:
The private IP address is used for the infrastructure and the public IP for outside communication. The VLAN is working accross the 3 datacenters of OVH.
Each server has at 12GB+ RAM and two harddrives (750GB or 1.5TB). They are all basically the same. It is important to have an homogeneous park of servers to have better predictability in the performances. It is also very important to setup them the same way. These notes are very manual, scripting things with fabric is recommended.
The base setup and Ganeti must be performed on all the nodes. As each node can become master, you need the software on each node.
The partitions are pretty simple. The base OS is on 25GB software RAID1 partition and each drive get a 12GB swap partition for a total of 36GB virtual memory.
root@node1:~# cat /etc/fstab
# <file system> <mount point> <type> <options> <dump> <pass>
/dev/md1 / ext4 errors=remount-ro 0 1
/dev/sda2 swap swap defaults 0 0
/dev/sdb2 swap swap defaults 0 0
The rest of the drives is used as a big LVM xenvg
volume group — at origin Ganeti was only supporting Xen, this is why the default names often use xen
. On this node there is 2.6 TiB of raw available storage for the VMs. No RAID is used, that is, if you create a non DRBD replicated VM, you have a single point of failure. See the replication and backup strategies below.
root@node1:~# pvdisplay
--- Physical volume ---
PV Name /dev/sda5
VG Name xenvg
PV Size 1.33 TiB / not usable 3.00 MiB
...
--- Physical volume ---
PV Name /dev/sdb5
VG Name xenvg
PV Size 1.33 TiB / not usable 3.00 MiB
...
After the storage setup, the network needs to be setup too. Ganeti supports both routed and bridged networking, here bridged is used.
You need to be sure to have the right packages to support the bridge and VLAN.
apt-get install vlan netcat fping tcpdump netmask bridge-utils
The setup is pretty simple, each node gets the dedicated address assigned by the provider on eth0
and a private IP address on eth0.2186
(replace 2186 with your own VLAN or maybe your own NIC, for example eth1
).
root@node1:~# cat /etc/network/interfaces
auto lo
iface lo inet loopback
# This is given by our provider, it used only for monitoring at the
# provider level (health, load, etc) and maintenance of the server
auto eth0
iface eth0 inet static
address 188.165.237.1
netmask 255.255.255.0
network 188.165.237.0
broadcast 188.165.237.255
gateway 188.165.237.254
# This bridge is where all the VMs are connected. It bridges over the
# tagged interface.
auto xen-br0
iface xen-br0 inet static
# of course you need a different IP for each node
address 192.168.0.1
netmask 255.255.0.0
network 192.168.0.0
broadcast 192.168.255.255
bridge_ports eth0.2186
bridge_stp off
bridge_fd 0
No routes are defined on the bridge. The routes are directly defined in the VM.
Some kernel parameters need to be adjusted too, to ensure IP forwarding and a good working bridge.
root@node1:~# cat /etc/sysctl.conf
net.ipv4.tcp_syncookies=1
net.ipv4.ip_forward=1
net.ipv4.conf.all.accept_redirects=1
net.ipv4.conf.all.accept_source_route=1
net.ipv4.conf.all.send_redirects=1
net.ipv4.conf.all.rp_filter=0
net.ipv4.conf.default.rp_filter=0
net.ipv4.conf.all.log_martians=0
net.ipv4.conf.all.proxy_arp=0
net.ipv4.conf.default.proxy_arp=0
Then update the changes:
sysctl -p
It is very important to disable proxy_arp
on your interfaces. This
is because you are not creating a pseudo bridge but a real one.
Ganeti prefers the en_US.UTF-8
locale and I prefer the UTC
timezone, so:
dpkg-reconfigure locales
dpkg-reconfigure tzdata
To be sure that everything is working correctly, I normally reboot.
Keep your hands on the official documentation, only the Debian specific things are given here. A rather long list of packages are required, just go ahead:
apt-get install lvm2 ssh bridge-utils iproute iputils-arping \
ndisc6 python python-pyopenssl openssl \
python-pyparsing python-simplejson \
python-pyinotify python-pycurl socat \
python-paramiko debootstrap dump kpartx \
qemu-utils gawk make drbd-utils qemu-kvm
Then download and install Ganeti itself:
mkdir -p /home/vendors
cd /home/vendors
wget http://ganeti.googlecode.com/files/ganeti-2.4.2.tar.gz
tar -xzf ganeti-2.4.2.tar.gz
cd ganeti-2.4.2
./configure --localstatedir=/var --sysconfdir=/etc
make
make install
mkdir /srv/ganeti/ /srv/ganeti/os /srv/ganeti/export
At the end, you will have to install the startup scripts and the watcher.
cp /home/vendors/ganeti-2.4.2/doc/examples/ganeti.initd /etc/init.d/ganeti
chmod +x /etc/init.d/ganeti
update-rc.d ganeti defaults 20 80
cp /home/vendors/ganeti-2.4.2/doc/examples/ganeti.cron /etc/cron.d/ganeti
The base software is now installed.
Follow the recommendations from the official
documentation. Especially, be sure to have the usermode helper being
just /bin/true
, that is, in your /etc/modules
file you have a line
with:
drbd minor_count=128 usermode_helper=/bin/true
To be able to install instances you need to have an Operating System installation script. You need the scripts on all the nodes (maybe using Puppet to manage them) as the creation of an instance is done directly on the target node.
The easiest way to go was to use the ganeti-instance-image package and the ganeti-instance-debootstrap package. These packages are only loosely connected to the Ganeti release number, so at the moment, it is possible to directly install them from the provided debian packages.
cd /home/vendors
wget http://code.osuosl.org/attachments/download/2169/ganeti-instance-image_0.5.1-1_all.deb
dpkg -i ganeti-instance-image_0.5.1-1_all.deb
cd /srv/ganeti/os
ln -s /usr/share/ganeti/os/image
apt-get install ganeti-instance-debootstrap
ln -s /usr/share/ganeti/os/debootstrap
The symbolic links are needed as Ganeti is looking at the OS
definitions in /srv/ganeti/os
.
Repeat all this setup for each node. Now, you know why automation
is needed. For example, use 192.168.0.2
as the IP of the secondary
node.
It is extremely simple, first define the IP address of your cluster. In my case I selected 192.168.1.1
with the name clust1.ceondo.net
. So, in the /etc/hosts
of each node, I add:
192.168.1.1 clust1.ceondo.net
Then on the first node run:
gnt-cluster init clust1.ceondo.net
Ok, so what do you have now?
192.168.0.1
.192.168.1.1
is added to the xen-br0
bridge. This is done automatically by Ganeti, you must not do it yourself. This IP must be on the same subnet as your bridge, because only the ip is added and it reuses the network information of the bridge.The next step is of course to add another node to the cluster. If you do not have a DNS server for your private network, simply add the node IP to your hosts file. For example:
192.168.0.2 node2.ceondo.net
Now on the master, run:
gnt-node add node2.ceondo.net
Doing so will add the node to the cluster and update the ssh configuration the node to ensure communication between the nodes. So, now, you can get information about your nodes — here 3 instances are running on these nodes:
# gnt-node list
Node DTotal DFree MTotal MNode MFree Pinst Sinst
node1.ceondo.net 2.7T 2.6T 11.8G 514M 11.2G 2 0
node2.ceondo.net 2.7T 2.6T 11.8G 226M 11.5G 1 0
If you have setup a third node, you can add it too... but now, you are maybe more interested by creating your first instance.
The first simplest way to create an instance is to simply boot an installation CD with KVM and VNC. All the operations are run on the master. If you want to create the instance on the secondary node, you need to download the iso file on the secondary node.
cd /home/vendors
wget http://cdimage.debian.org/debian-cd/6.0.1a/amd64/iso-cd/debian-6.0.1a-amd64-netinst.iso
Create your first instance without doing any installation and without starting it. Read the gnt-instance
man page for more information.
gnt-instance add -t plain -s 10g -o image+default -n node1.ceondo.net \
--no-start --no-install -H kvm:vnc_bind_address=127.0.0.1 vm116.ceondo.net
Decomposing the command to help you understand what is going on:
gnt-instance add
: add an instance to the cluster.-t plain
: the instance will run from a plain LVM volume.-s 10g
: it will have a single disk of 10GB (the partitions in the disk are up to you).-o image+default
: it will use the image
os with de default
variant.-n node1.ceondo.net
: it will be created on node1
.--no-start --no-install
: after the addition, we do not start it and we do not run the image+default
os installation scripts.-H kvm:vnc_bind_address=127.0.0.1
: we inform the hypervisor that we want VNC binded on localhost. You can put 0.0.0.0 to bind on all the interfaces if you do not want to use a ssh tunnel, but this is not really secure and the default Gnome VNC viewer — Remote Desktop Viewer — supports ssh tunneling very easily.vm116.ceondo.net
: this is the name of the instance. The name must resolve. If you do not have a DNS server, put it in the node hosts file.If you run this command, it basically just adds the instance to the cluster on node1. Now, it is time to boot and install the instance. First, we need to be sure that KVM will but with the kernel from the CD and we do not want serial console.
gnt-instance modify -H serial_console=false vm116.ceondo.net
gnt-instance modify -H kernel_path= vm116.ceondo.net
Ganeti offers very convenient tools to manage the configuration of your VMs. So, time to boot this instance:
gnt-instance start -d -H \
boot_order=cdrom,cdrom_image_path=/home/vendors/debian-6.0.1a-amd64-netinst.iso \
vm116.ceondo.net
When starting with the -H
option, it means that for this boot and
this boot only, KVM will uses these parameters. It also mean that if
you restart the instance, it will not have the cdrom — which is what
we want.
After you run this command, run:
gnt-instance info
at the top, you will have the information on the VNC IP and port. So
just connect your VNC client. For example to 127.0.0.1:11001
and use
the host, in my case provided by OVH, root@ns12345.ovh.net
as SSH
tunnel. You can now start the installation.
To be able to clone and reuse this instance as template for new instances, the partitions can only be ext3/ext4 or swap and the order of the disks in the partition table must be either:
/dev/$disk1 /boot
/dev/$disk2 swap
/dev/$disk3 /
or
/dev/$disk1 /boot
/dev/$disk2 /
I prefer to run without swap and a possible careful over commit of the memory at the node level. RedHat provides some good background information about it. The /boot
partition is needed because the kernel used is not the kernel from the node.
Run the installation as usual, you will have to define the network connection manually, in my case, this means providing the RIPE block netmask and gateway information:
178.33.145.152
255.255.255.192
178.33.145.190
8.8.8.8
. A private dns server is available on the private network, but the CD installation does not offer the ability to have two IP address directly.So, everything is fine, you can finish the installation (do not forget to install SSH!) and then restart the instance without VNC:
gnt-instance modify -H vnc_bind_address= vm116.ceondo.net
gnt-instance reboot vm116.ceondo.net
then, from your personal computer, you should be able to ssh into your node:
$ ssh yourlogin@vm116.ceondo.net
Customize, clean, make this instance a base for mass deployment. It
will be the template used by the image
os definition. The image
template will take care of changing the IP/hostname etc. for you and
even the RAM and disk size.
So, everything is nice under the Sun, you have your instance running,
but now you want to start a new instance. Better not to have to go
through the CD install each time. The image
OS definition is doing
just that. You can create a template out of a running instance and
reuse it to deploy as many times as you need.
First, shutdown the instance to have the disks in a consistent state for the dump:
gnt-instance shutdown vm116.ceondo.net
Now, we create the default image
OS definition. When creating a new instance, it means we will pass the -o image+default
option. You can create many variants, but pay attention if you have too many of them, it will fast be a nightmare to manage them. So, our default will be:
SWAP=no
FILESYSTEM="ext4"
IMAGE_NAME="debian-6.0"
IMAGE_TYPE="dump"
IMAGE_DIR="/srv/ganeti/instance-image"
ARCH="x86_64"
CUSTOMIZE_DIR="/etc/ganeti/instance-image/hooks"
IMAGE_DEBUG=0
You can either put them in /etc/default/ganeti-instance-image
as I
do — this makes sane defaults for all the variants — or directly for
the default
variant definition in
/etc/ganeti/instance-image/variants/default.conf
. After you update
the file, do not forget to sync it on all the cluster nodes. Again,
Ganeti as some tools to do it:
gnt-cluster copyfile /etc/default/ganeti-instance-image
or
gnt-cluster copyfile /etc/ganeti/instance-image/variants/default.conf
This is now time to make the dump of the first instance to reuse it as template.
mkdir /srv/ganeti/instance-image
cd /srv/ganeti/os/image/tools/
./make-dump vm116.ceondo.net
Now, you have the files debian-6.0-x86_64-root.dump
and debian-6.0-x86_64-boot.dump
in your /srv/ganeti/instance-image
folder. You need to sync this folder on all your nodes to create an instance from this template. You can also have a small NFS share, mount it as /srv/ganeti/instance-image
and that way you do not have to sync. This is up to you. My provider OVH has some managed NAS which fit perfectly this requirement.
Time to create a new instance based on this template. As you can
expect it, it will be for vm117.ceondo.net
. As we do not want it to
have the same IP address, we need to define the customization of the
instance in the OS installation scripts. To do that, you need to
define the network of your instance and its IP address. As you will
reuse the network information many times, it receives its own
definition:
# cat /etc/ganeti/instance-image/networks/subnets/ripe
GATEWAY=178.33.145.190
NETMASK=255.255.255.192
NETWORK=178.33.145.128
BROADCAST=178.33.145.191
Basically, a simple ripe
text file with the definition of the ripe
network. Then for the instance, we create a file with the fully qualified name of the instance:
# cat /etc/ganeti/instance-image/networks/instances/vm117.ceondo.net
ADDRESS=178.33.145.157
SUBNET=ripe
As the instance has its own kernel, we not only need the interfaces
hook but also the grub
hook to be run at OS installation:
chmod +x /etc/ganeti/instance-image/hooks/grub
chmod +x /etc/ganeti/instance-image/hooks/interfaces
Do not worry, the boilerplate is only at first run, next time you will just need a single file for the instance IP and subnet selection.
Time to add the instance:
gnt-instance add -t plain -o image+default -s 25g -n node1.ceondo.net vm117.ceondo.net
Done, the new instance is available and you can start to play with it. What you can notice is that instead of a 15GB disk, you can change to use a different size. You can also change the RAM size. Even better, you can use DRBD instead of plain LVM volume, just pass -t drbd
as disk template.
For example, changing the number of virtual CPUs an the memory:
gnt-instance add -t plain -o image+default -s 100g -B memory=4G,vcpus=4 \
-n node3.ceondo.net vm152.ceondo.net
Now, if you haven't done it yet, do not forget to add the init and crontab files of Ganeti.
If you are using EC2, you are used to get two network interfaces for each instance, one with a private address and one with a public address. Ganeti is extremely flexible and allows you to startup an instance with two network interface or add a new network interface to an instance:
gnt-instance modify --net add vm123
Modified instance vm123
- nic.1 -> add:mac=aa:00:00:2a:12:34,ip=None,mode=bridged,link=xen-br0
This is adding a new NIC nic.1
with a new random MAC address. The default parameters come from the cluster wide parameters. So, if your hardware node has two bridges one on the public network xen-br0
and one the private network xen-br1
, you would add a NIC on the private network by running:
gnt-instance modify --net add:link=xen-br1 vm123
Modified instance vm123
- nic.1 -> add:mac=aa:00:00:2a:12:35,ip=None,mode=bridged,link=xen-br1
The new NIC is not using the cluster wide default but the specified bridge. This provides a lot of flexibility in managing your instance networking. As this is bridged networking, you have to do the traditional network configuration at the instance level.
To create right from the start an instance with two network cards based on an image, you could run:
gnt-instance add -t plain -o image+default -s 100g -B memory=4G,vcpus=4 \
-n node3.ceondo.net \
--net 0:ip=192.168.1.152 \
--net 1:ip=178.33.145.192 \
vm152.ceondo.net
You put two --net
arguments to define the two network cards.
After setting a new system, running nmap
is a good idea. You will figure out that the remote api binds on all the interfaces of your master node. This is not so good. This can be changed. As the cluster IP in this case is on the private network, this can be use. 127.0.0.1
is also an option:
root@node1:~# cat /etc/default/ganeti
RAPI_ARGS="-b 192.168.1.1"
NODED_ARGS="-b 192.168.0.1"
CONFD_ARGS="-b 192.168.0.1"
Do not forget to have it on all your nodes. Pay attention that the remote API daemon is binding on the cluster IP and the noded, confd daemons on the IP of the node.
Ganeti does not provide HA. It is like Amazon EC2, you can create an instance, perform backup, restore and better than EC2 you can move one to another node without downtime, but the automatic failover system is not provided.
The only provided automation is the watcher running from the cron. If an instance is down in error state, it will try to start it. Nothing more but nothing prevents you to build HA on top of Ganeti or to have HA at your application level and not at the instance level (this is what I prefer).
For real time replication you can use DRBD, just create your instance with the -t drbd
template and Ganeti will take care of all the DRBD details. Please remember that replication is not backup. If you replicate corrupted data, you have nothing left, if you drop your database in your replicated instance, you have nothing left.
Again Replication is not Backup. This is why Google still use tapes to perform backup! Céondo's approach, which is not necessarily the best, but which fits the way our software is designed is:
Once you have replication, you can do backup. If your replication is well designed, you can stop the replication the time to perform a backup.
Ganeti provide an easy way to backup a stopped instance and restore it:
gnt-backup export <instance>
gnt-backup import <instance>
this can be a convenient way to increase the disk size of an instance as you can change the disk size at import time. The problem is of course that you need your instance to be down. To limit downtime, you can do a LVM snapshot and/or try to limit the size of your instances.
The backup destination can be on a NAS in another data center to do
point in time recovery. Once you push a backup file on the NAS, chmod
it as 0444
to prevent accidents.
Oh, backups are of no use if you do not test them. This is hard, it means that you need a special environment to restore and test without affecting your production system.
If you do not require very specific CPU features, you can pass to the -cpu host
flag to KVM.
dpkg-divert --add --rename --divert /usr/bin/kvm.real /usr/bin/kvm
cat <<EOF > /usr/bin/kvm
#!/bin/sh
/usr/bin/kvm.real -cpu host "\$@"
EOF
chmod +x /usr/bin/kvm
If you do not need it, you should disable VNC. In our case, it was eating 6% of a CPU all the time.
gnt-instance modify -H vnc_bind_address= <instance>
Ganeti is very nice, not only because it works well, but also because when things are not going well, a lot of diagnostic tools are available to figure out what is going on. The first thing to do is checking your instance configuration:
gnt-instance info <instance>
The second one is to check the cluster info and verify it:
gnt-cluster info
gnt-cluster verify
The last one is to do some testing of your cluster.
/usr/local/lib/ganeti/tools/burnin -o image+default --disk-size=10g <newinstance>
Take a look at the information, read the manual pages, Ganeti is well designed, it means that it is usually easy to figure out what is going on.