Sign up for an EC2 account at http://aws.amazon.com/ec2/.
Start the AWS Management Console (at the top of the page at http://aws/amazon.com/ec2, click "My Account/Console" and choose "AWS Management Console").
From the management console home page, choose "EC2".
Now we are ready to start an EC2 instance.
Choose an AMI (Amazon Machine Image):
In the navigation panel, click AMIs.
In "Filter", choose "Public Images", "Amazon Images", and search for "gpu".
Click on the check box for the most recent amazon image, which should have a name something like amzn-ami-gpu-hvm-2013.03.01.x86_64-ebs
.
At the top of the page click "Launch". This will open a "Launch dialog".
For "Instance Type", choose "CG1 Cluster GPU"
Either choose an "On-Demand Instance" or "Request Spot Instances", as desired. (See http://aws.amazon.com/ec2/spot-instances/ for information.)
Don't do anything on the "Advanced Instance Options" page.
Choose a key pair for public key authorization (you will need this to log in using ssh). You can either import a key that you already have or generate and download one.
And on the final page, click "Launch".
If you placed a spot request, go to the "Spot Requests" page (using the navigation panel). It will take several minutes before you learn if your request was successful.
Go the the "Instances" page (using the navigation panel). You should see your instance starting. This may take several minutes.
Once your instance is launched, you are ready to log into it. Right click the instance and choose "Connect". You will get two options. The simplest is to log in using the Java client. Otherwise, follow the instructions to connect using a stand-alone SSH client.
Now that you are logged into the instance, it is time to configure it. Some scripts that will help with this may be found at http://www.quantosanalytics.org/garland/gpu_workshop/config.zip. Copy these to the GPU server, e.g.,
wget www.quantosanalytics.org/garland/gpu_workshop/config.zip
unzip config.zip
The mail-status.sh
script will send an email whenever the instance is started or stopped and once per hour (please update this script with your email address).
Run the config.sh
script using sudo (first, edit the script as desired).
You will need to log out and back in after running it to update environment variables.
After logging in again, you can try compiling some of the code samples that should now be in your home directory.
Get information about the GPU using the nvidia-smi
command. See man nvidia-smi
for details.
cuda_examples
: Try compiling 0_simple/vectorAdd
. Just cd into the directory and type make
. Assuming it compiles, the binary will be in ../../bin/linux/release
; try running it.
arrayfire_examples
: Try compiling the helloworld example. Just cd into the helloworld directory, type make
, then run the executable.
pycuda
: Run make tests
from the pycuda subdirectory of the directory from which the config.sh script was run. This will run some tests to verify that the installation succeeded (some of these may fail...).
Now that you have successfully created and configured your instance, you may want to save it as an AMI for future use. To do this right click the instance and choose "Create Image" (follow instructions in dialog).
Remember that you are being charged for as long as the instance is running, so be sure to stop it when done. Right click the instance in the EC2 dashboard and choose either "Stop" or "Terminate". If you "Stop" the instance, you can restart it later and pick up where you left off. If you "Terminate" it, any changes made in the image are lost. Note that spot instances can only be terminated, not stopped.
You will run up a big bill if you forget to stop (or terminates) an instance. If you install the mail-status.sh
script, any running instance should send you an email once per hour and when started or shutdown. You may find this to be useful...
If you set up your own instance, you will need to use public key authentication (which is much preferred over passwords anyway). I have included a private/public key pair (gpu_workshop.pem
and gpu_workshop.pub
) in the config directory, and the key is authorized for access to the guest accounts used in this workshop. So you can experiment with this if you like. When setting up your own server, you have the choice between either generating a new key (and downloading the private key) or importing a key file that you already have.
If you are using PuTTY, you will need to convert the private key into PuTTY format before using. See the ssh notes for details.
Spot instances are nice because you can get time on a GPU server for as little as $0.40/hour (the usual price for a GPU server is $2.10/hour). The drawback to spot instances is that they can be terminated at any time if the spot price rises above whatever limit you have set. If you have a long-running job, you will need to save data at "checkpoints," from which you can restart the job to continue it. It is typically straightforward to do this (and is a good idea for long-running jobs anyway).
The best way to save the checkpoint data is by saving to an attached Amazon storage volume with the persistent option checked (see below). You could also, e.g., set up a script to email checkpoint data to yourself or rsync it to an accessible server.
To attach a persistent Amazon storage volume, see http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/Storage.html. Such volumes can be mounted in the usual way (e.g., using fstab
). One option might be to attach the volume at /home..
A volume can be attached when the instance is initially launched (using the launch dialog), or from the "Volumes" page of the EC2 management console. If an empty volume is attached, it must be set up before being used, e.g. (replace /dev/sdb with your device name; change mount points as desired):
mkfs -t ext4 /dev/sdb
mkdir /mnt/q
mount /dev/sdb /mnt/q
rsync -avh /home/ /mnt/q
echo "/dev/sdb /mnt/home ext4 defaults,noatime 0 0" >> /etc/fstab
parted -l
<reboot>
For the process of software development, it is best to work on a local copy of the code and upload to the server to run (but it is also possible to edit code on the host remotely).
The easiest way to do this is to set up an rsync script. I like to use a makefile with a target called, e.g., 'put-ec2'. An alternative would be to use git or something similar (e.g., using a github repository). This would have some real advantages at the cost of a slight increase in complexity. In the simplest case, one could just use sftp.
I usually keep an editor window and two terminal tabs open on my local machine. One terminal tab is to run the rsync script. The other is for an ssh session on the EC2 instance. It is often convenient to run jobs on the server using either 'tmux' or 'screen' so that they continue to run when you log out. An alternative would be to run background jobs and pipe output to a file.
Remember that spot instances can be terminated at any time and the root storage volume is not persistent (so either work on a local copy of your software or keep it on at a persistent attached volume).
The "Default" security group disables all outside access to the server and is almost certainly NOT what you want. The "Quick-start" group enables ssh access and IS almost certainly what you want. The simplest thing to do is to edit the default group so that it is identical to the quick-start group.
There are two ways that an instance can be stopped or terminated: from the management console; or from within the instance (using the Linux 'shutdown' command). If an instance is "terminated", any changes made to it since the time it was started will be lost. If an instance is "stopped", it can be restarted from where it was.
It is possible to protect against inadvertant termination.
To protect against termination from the AWS management console, the "Termination protection" option should be checked when launching the instance (see http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/Using_ChangingDisableAPITermination.html).
To protect against termination from within the instance, right click the instance from within the management console and select the "Change shutdown behavior" item (see http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/Using_ChangingInstanceInitiatedShutdownBehavior.html).
Spot instances can not be "stopped", only "terminated". The most useful way to avoid data loss in this case is to keep your work on an attached volume with the "persistent" option checked.
It is possible to take snapshots of a storage volume at various points in time. These are essentially "backups". An AMI can be created from any snapshot, serving as a "system restore".
It is possible to launch a cluster of GPU servers connected by high speed ethernet. The usual way to do this is using MPI to communicate across the cluster and CUDA on each node. See http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using_cluster_computing.html for details.
For android, use e.g., ConnectBot or JuiceSSH. These can both use public key authorization.
In addition to the web-based management console, Amazon also makes a set of command line tools available (see http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/SettingUp_CommandLine.html). These tools will make life easier for heavy users; for occasional use, the web console is simpler. List of commands is at http://docs.aws.amazon.com/AWSEC2/latest/CommandLineReference/command-reference.html.
Samples:
ec2-run-instances ami-cf3758a6 -t cg1.4xlarge -k <key> -g <security-group>
ec2-request-spot-instance ami-cf3758a6 -t cg1.4xlarge -k <key> -g <security-group> -p <price>
ec2-describe-instances
ec2-describe-spot-instance-requests
ec2-stop-instances <instance-id>
ec2-terminate-instances <instance-id>
The output of some of these commands can be "challenging" to read. There is a perl interface that could be useful (Net::Amazon::EC2). Alternatively, one could write a simple perl script to parse and generate formatted output with little effort. One could also use Python (or Ruby or whatever...).