What is an Elastic Virtual Farm? - by Sara from 13 Jul 2015
An Elastic Virtual Farm is a cluster of virtual machines able to auto-scale accoring to the load. It runs a batch system based on HTCondor. In its basic configuration it's composed by a master and a (configurable) fixed number of workers. When the number of queued jobs exceeds the workers' capacity, new workers are istantiated automatically on the Cloud and undeployed whenever idle. This functionality is provided by a custom daemon, running on the master machine, called elastiq.
Each virtual farm runs in a sandboxed environment within the Cloud. This means that each cluster has its ownn isolated virtual network and an associated VRouter acting as gateway and providing DHCP and DNS functionalities. One machine of the cluster, normally the master, can be assigned a public IP. This is called the elastic IP and corresponds to the VRouter public address. Once a virtual machine is assigned the elastic IP (more on this later), a port forwarding rule is enabled on the VRouter for port 22. More ports can be forwarded on request.
More info:
HTCondor,
Elastiq,
Sandbox.
Let's get started! - by Sara from 13 Jul 2015
In order to get a user account, send us a mail (link on top right of this page) indicating you requests, i.e. operating system, disk space, anything else. Or come to visit us at the Computing Centre.
You will be provided with a user account valid for this web portal and for the public login machine to access the Cloud (one-access.to.infn.it) and included in the newsletter mailing-list. A new sandboxed environment and elastic IP will be created for you.
You will use this web portal only to configure and instantiate a new virtual farm. In order to manage your farm, you should login to one-access.to.infn.it, run the command cloud-enter and input your username and password. This will set-up the proper environment to access the Cloud. Most operations can also be performed with the Cloud Dashboard, accessible fom the right menu on your user space in this site.
Some preliminary action once you have logged in to one-access for the first time:
Creating a new virtual farm.- by Sara from 14 Jul 2015
Click on Go to the User Portal in the top menu and then Create a new virtual farm in the right hand menu. Find below an explanation of the most relevant fields to be filled in.
-
EC2 access key: your username
-
EC2 secret key: the encrypted password
-
Root ssh key: the name of the ssh keypair as stored in OpenNebula.
-
OS image: for the time being only a CentOS 6.6 and UbuntuServer 14.04 images are available. If you need anotehr OS, please contact us. Master and Workers will run the same OS.
-
Master/Worker flavour: type of the instance to be run. You can specify a different flavour for Master and Workers. For a list of available flavours refer to this page. If you need a custom flavour, please contact us.
-
Master/Worker user-data: a custom script (i.e. bash) to configure your virtual machines. It is executed automatically after the boot sequence. HTCondor and Elastiq are already installed so you do not have to take care of this.
IMPORTANT: in the master user-data the instruction: service elastiq start should always be present. Alternatively, you have to run the command by hand on the master machine as privileged user.
-
Condor shared secret: any alphanumeric string. This is the password used to secure communication between Master and Workers.
-
Minimum jobs waiting time: time in seconds a job should stay in the queue before Elastiq instantiates a new virtual machine.
-
Number of jobs per VM: normally this should be set to the number of CPUs of the Worker virtual machine (1 job per core), unless your application is multicore.
-
Maximum VM idle time: time in seconds before an idle virtual machine is automatically undeployed by Elastiq.
-
Min/Max number of workers: minimum number of workers never undeployed and maximum number of workers that Elastiq can instantiate. The latter is in any case limited by the user quota agreed upon.
After configuring your farm, you should click the button Submit. This will save your farm definition and redirect you to a summary page, from which you can either instantiate or delete your newly created farm. The Instantiate button automatically deploys the Master virtual machine on the Cloud. You can check this either from the Dashboard or from the command line.
On one-access run the command:
euca-describe-instances
You will see the first virtual machine instantiated and after a while also the number of workers that you specified with
Min/Max number of workers should appear.
Now it's time to associate the elastic IP to the first instance:
euca-associate-address 193.205.66.xxx -i i-000xxxxx
You can connect to the Master like this:
ssh -i privatekey.pem root@193.205.66.xxx
or alternatively:
ssh -i privatekey.pem root@cloud-gw-xxx.to.infn.it
Customizing your farm. - by Sara from 14 Jul 2015
We give here an example of user-data scripts to contextuaize master and worker virtual machines. The example shows how to configure an NFS server on the Master in order to export an external disk (i.e. your persistent home space) to all Workers. First of all, attach the disk to the Master using the OpenNebula GUI or the command-line. Then login to the master and run the script /root/mount_home.sh, this will mount and export the home directory to the other nodes. Moreover, the script also creates new users. Always insert a blank line at the end of the script!
Master user-data:
#!/bin/sh
# write a script to mount and export the home dir
# the script should be run by hand after attaching the device
cat <<EOF > /root/mount_home.sh
#!/bin/sh
mkdir -p /export/home
echo "/dev/vdd /home ext4 defaults,noatime 0 0" >> /etc/fstab
echo "/home /export/home none bind 0 0" >> /etc/fstab
mount -a
echo "/home 172.16.XXX.0/24(rw,sync,no_root_squash,no_subtree_check)" >> /etc/exports
service rpcbind start
service nfs start || service nfs-kernel-server start
exportfs -a
service nfs restart || service nfs-kernel-server restart
systemctl enable nfs || systemctl enable nfs-kernel-server
# add users
# it should be done in this script after mounting the home
# it is important to specify the UID in order to be consistent with
# the permissions on the homes already created on the volume
useradd -m -u 503 dummy
# always remember to start elastiq
# it is stopped at boot to give you time to mount the home
# or the nodes will not be able to import it
service elastiq start
EOF
chmod +x /root/mount_home.sh
Workers user-data:
#!/bin/sh
IP=`condor_config_val CONDOR_HOST`
mount -t nfs $IP:/home /home
useradd -m -u 503 dummy
Please consider that the Cloud is a volatile environment. Virtual machines could disappear at any moment (hopefully they will not) and any configuration done by hand will be lost. So write down any customization (i.e. configurations or packages installation) on the context script so that the same environment is recovered when your instance is re-deployed. Alternatively save everything on the persistent disk space.
Operating your farm. - by Sara from 14 Jul 2015
In order to operate your farm (i.e. check running instances, get accounting data...) use the Cloud Dashboard or the command line from one-access.to.infn.it. In the first case refer to the
OpenNebula documentation, in the latter to
this page (starting from section 2.3).