About the Whales

The Whales are two SUN machines running Ubuntu linux at the University of Geneva and managed by the Centre Universitaire d'Informatique (CUI). One is called minke and the other humpback. The names were chosen by Stephane Marchand-Maillet.

Each machine contains:

  • 32 CPUs (8 x Quad-Core AMD Opteron™ Processor 8356)
  • 64 GB shared main memory
  • Shared filesystem: NFS

The activity of the cluster can be monitored with Hobbit, but is not on Ganglia. For questions, problems et cetera, contact EMail: Nicolas Mayencourt, the system administrator:

Phone: +41 (0)22 379 0198

If any information on this page is incorrect or outdated, contact the system administrator or try to find someone who knows what to do. And update the webpage while you're at it…

Getting an account

You should be able to login with your regular UNIGE acount. If not send an email to EMail: Nicolas Mayencourt.

Usage

All users should log in on any of the two machines and run their jobs through the batch system. Do not launch your programs directly from the command line! See the explanations below.

Software

Matlab and MPI are available “out of the box”.

Compilers

For GCC, the following compiler flag might be interesting:

-march=native Generate code exclusively for the type of processor in the computation nodes, including vectorization.

If you are using 'make', try this command line option:

-j 16 Run upto 16 processes at the same time.

Running Batch Jobs

Batch jobs must be submitted to TORQUE to be run. TORQUE is a derivative of Portable Batch System (OpenPBS), a system designed to run batch jobs from queues. The scheduler and the queues implement policies, which allow some form of coordinated use of resources. Please do not bypass the batch system. If there is something you would like to do but you do not know how to do it using the batch system ask for help, bypassing the batch system will only annoy your collegues.

Working with the batch system TORQUE

To run a non-interactive job, a script is needed. Then the job has to be submitted tot the batch system using the command qsub. The standard output and standard error produced by the script will be put in files upon completion of the job. For interactive jobs, see corresponding example below.

Matlab

Matlab uses multi-threading, and unless you tell it differently, nothing will prevent it from using more processors than you asked for, which will cause the machines to overload and slow down to 10% of their speed or so. So, please tell Matlab to use only as many threads per job as you have reserved with the queue system. You'll need the command maxNumCompThreads(N). Even if you reserve several CPU's in the queue system for one Matlab process and enable as many threads in Matlab, Matlab will use these threads only a small portion of the time. To use the machines most efficiently, use only one thread per process.

Graphical Interface

In order to use Matlab's graphical environment, login `ssh -X minke.unige.ch' or `ssh -Y minke.unige.ch'. Start an interactive session using qsub -I <parameters>, see belows for an explanation of the parameters. Launch `matlab' normally from the interactive sessions's shell. To avoid using the graphical interface, start Matlab with `matlab -nodesktop'.

Matlab Batch Job

Then you need a script to launch the job:

#!/bin/sh
 
matlab -nodesktop < my_matlabscript.m > my_output.txt

In the rest of the text we suppose that the above script is called `job_matlab.sh'. You need to make your script executable using the command 'chmod 755 job_matlab.sh', or it cannot run!

Writing Serial (normal) Matlab Scripts

Matlab is a multithreaded program, and will use upto 32 threads unless you tell it otherwise. In your matlab script, you have to tell matlab how many threads it can use.

% Tell matlab to use 2 threads maximum
maxNumCompThreads(2); 
 
%% Generate Data
a=zeros(1,10);
b=ones(1,10);
c=b;
 
for i=1:10
        a(i) = b(i) + c(i);
end
a      % display output
quit   % close gracefully

Let's call this script 'my_matlabscript.m', see below how to submit it to the queue system. To submit this script, you'll need to ask for two cpu's.

Writing Parallel Matlab Scripts

You can use parallel programming in Matlab. In stead of having multiple threads, you can ask Matlab to create a number of processes. Here is an example that uses 2 workers, so you would need to ask the queue system for two processors:

%% Only for master process, workers use only one thread. Should not exceed nr of requested workers
maxNumCompThreads(2); 
 
%% Generate Data
a=zeros(1,10);
b=ones(1,10);
c=b;
 
%% Parallel Computation using 2 single-threaded workers
matlabpool local 2 % start a pool with 2 workers
parfor i=1:10
        a(i) = b(i) + c(i); % parallel computation here
end
matlabpool close % close the worker matlab processes in the pool
a      % display output
quit   % close gracefully

Let's call this script 'my_matlabscript_par.m', see below how to submit it to the queue system.

Matlab can apparently handle upto 8 workers.

Multi-thread

For OpenMP programs, a bit of magic is needed to make sure your program uses the correct number of processors. The example below shows the magic. This way, the number of CPUs used by default by OpenMP matches the number of processors-per-node (ppn) in your qsub command!

#!/bin/bash
# keep line above to ensure that we have a BASH shell
 
# magic line to limit CPU usage for OpenMP programs
export OMP_NUM_THREADS=`wc -l ${PBS_NODEFILE} | cut -d ' ' -f1`
 
# call your OpenMP program here
myprogram_threaded param1 param2

In the rest of the text we suppose that the above script is called `job_threads.sh'.

MPI

#!/bin/sh
 
cd /program/working/directory
mpiexec myprogram param1

In the rest of the text we suppose that the above script is called `job_mpi.sh'.

Another example:

#!/bin/sh
#PBS -l nodes=2:ppn=20
#PBS -l walltime=00:01:00
 
cd /program/working/directory
mpiexec myprogram input.txt

This script contains directives for TORQUE, so they don't need to be specified on the command line when submitting a job. In the rest of the text we suppose that the above script is called `job_pbs.sh'.

Submitting a job

To run the program, one must submit a job to TORQUE with the command 'qsub'. However, several parameters must be taken into account: the number of required nodes, the running time, the required memory and the job queue. If your job exceeds its specified runtime or requested memory, it will be killed!

Example 1 - Submit a job

qsub -l nodes=1:ppn=8,walltime=00:01:00,vmem=4gb job_mpi.sh

This will submit a job in script `job_mpi.sh' to run on 8 processors with a runtime of upto 1 minute using 4gb of memory.

Example 2 - Using threads: match the parameters in your script!

qsub -l nodes=1:ppn=4,walltime=00:01:00 job_threads.sh

Submits `job_threads.sh' using 4 processors. The magic line in job_threads.sh will deduce that 4 threads are to be used. No vmem is specified, so 8gb of available memory is required.

Example 3 - Using directives in the script

qsub job_pbs.sh

Submits `job_pbs.sh', using 20 processors on both machines, running upto 1 minute - as specified by the TORQUE directives in the script.

Example 4 - Overriding directives in the script

qsub -l nodes=1:ppn=32 job_pbs.sh

Submits `job_pbs.sh' running upto 1 minute - as specified by the TORQUE directives in the script. The number of nodes is 1 and the number of processors is 32, not 2 and 20 respectively as the command line parameter overrides the TORQUE directive in the script.

Example 5 - Using specific nodes

qsub -l nodes=humpback,walltime=1:00 job_mpi.sh

Submits `job_mpi.sh' to the default queue, upto the default runtime, but specifically demand that the job be run on humpback.

Example 6 - Start an interactive a job

If you need to provide manual input, then you do not need a script. You need to ask an interactive session using the option -I, which will give you a command line for the requested period.

qsub -I -l nodes=1:ppn=16,walltime=1:00

How to determine the right parameters

Performance of the cluster is directly influenced by the performance of your programs. Besides that, the ability of the scheduler to realize high throughput and low return times is largely influenced by the accuracy of specified runtimes. The scalability and efficiency of your jobs affect not only you but also the other users of the cluster.

Gather information

You can use the test queue to gain insight in the performance and scalability of your program, which should provide you with information to determine the `right' number of nodes for a job and estimate the corresponding runtime.

It is possible to observe the behaviour of your program during runtime. Use `qstat -n' to find out on which nodes it is running. Then, use 'top' to observe your program. If it appears that it is using less memory or CPU than you requested, you could launch it with fewer resources.

Number of nodes

In theory, increasing the number of nodes should make your job run faster and decrease the computation time. However, there are numerous reasons why increasing the number of nodes might not be a good idea. Hence, the `right' amount of nodes is not necessarily `as many as possible'.

For parallel jobs, it is possible to use n tasks per node, i.e. `-l nodes=1:ppn=n' where n is the number of desired processors. The `right' number of nodes is the smallest number that offers a noticeable advantage over using less nodes. If you have several jobs, it is more efficient to launch them at the same time on a few nodes each, than sequentially on many nodes - scalability is generally less than linear.

The following is a list of reasons to limit the number of nodes you want to use. Some can be explained by the fact that scalability is generally less than linear.

  • The Machines are a shared resource. Consider the needs of your collegues, respect the rules described for submitting jobs.
  • Even if your program scales well, there might be a number of nodes where computation is no longer a bottleneck. If communication or I/O becomes a bottleneck, increasing the nodes any further will generally not reduce the runtime.
  • Increasing the number of nodes will make it harder to schedule your job in the near future.

Runtime

The runtime of a job is an important parameter. Although each queue has a limit and a default to the runtimes of the jobs in the queue, you should consider specifying a value. If you do not specify a runtime, the queue will use the default runtime for your job with possibly disastrous results.

You should minimize the specified runtime without making it too short. If your job exceeds its specified runtime, it will be killed! However, you will be rewarded for having short runtimes as the scheduler might be able to start your job earlier. If you do not specify a runtime, check whether the default is sufficient. Remember to add some extra time to your estimate.

To reduce your runtime, try out different compilers and compiler options to create faster programs and write your programs with performance and scalability in mind.

Memory

The memory usage of a job is important as well. By default the system will give you 8GB of memory, but you should consider specifying a smaller value if your job uses less memory. With 128 GB combined in the minke and humpback, only 16 jobs can run using 8GB of memory each.

You should minimize the specified memory usage without making it too small. If your job exceeds its specified memory usage, it will be killed! However, you will be rewarded for having a smaller memory footprint as more jobs can be run at the same time. Remember to add some extra to your estimated memory footprint.

Hints

  • If you have big jobs, you are advised to write your program in such a way that it can pick up where it left off in case of a crash or hardware failure.
  • Temporary space is available locally on each node in /tmp/. You can create a personal folder for your temporary files, but don't forget to clean up this space after use.

Useful commands

The commands should be used on either machine.

qstat Show the status of the jobs in the queues, such as which jobs are running or queued, who owns them and the corresponding job-ids.
qstat -a Show more information of the status of the jobs in the queues.
watch qstat -a Like `qstat -a', but frequently updated.
qstat -an Show nodes assigned to jobs in addition to `qstat -a'.
qstat -f <job-id> Show detailed information on the specified job.
qstat -Q Display information about the configuration of the queues.
qdel <job-id> Remove the job from the queue. If the job is running, it will be killed.
qalter -l … <job-id> Alter the resource requested for a queued job
xpbs Graphical user interface for PBS commands.
xpbsmon Graphical user interface to monitor the PBS nodes.
showstart <job-id> Let MAUI show estimated time to start of a job.
showq Let MAUI show queue status.
diagnose Diagnose various problems with queues, the scheduling of jobs, et cetera in MAUI.
checkjob <job-id> Let MAUI display information about a job.
showbf Let MAUI show the backfill window.
showres Let MAUI show the reservations.

Further notes

Other Computing resources: myrinet and poulailler.