The aim of this session is get up and running on the NCI Raijin system and give you an introduction to MPI.
To do this session, you need to have obtained an NCI login ID and password, by registering for an account (well) beforehand.See the post on Piazza.
The Raijin system is supported by the National Computational Infrastructure program. Staff at Australian Universities are allocated time on this system through a competitive process for use in their research projects. We are extremely fortunate to have been given access to this system for this course. Please use the machine with respect. Note that it is NOT administered by the CS Technical Support Group.
There is comprehensive documentation for the NCI Raijin system available here. You should familiarize yourself with the content. It will be referenced in what follows.
Log on to the Raijin system using your user ID:
ssh raijin.nci.org.au -l <username>
Each user has a file space quota. CPU time is also limited collectively over the entire group. This means one user can exhaust all the time of the entire group. Thus please monitor your usage of this machine.
Read the section of the userguide labelled Accounting. Execute the following commands on Raijin:
A tar file containing all the programs for this session is available here. Save
this tar file on your local desktop and then transfer it to the Raijin
system. Thus from a terminal window on your desktop and in the
directory where you have saved the lab1.tar file execute the scp
scp prac1.tar <username>@raijin.nci.org.au:~
then in a terminal window that is logged on to the Raijin untar the
tar -xvf prac1.tar
Raijin uses modules to provide different user environments. This allows, for instance, users to access old versions of libraries or compilers. Take a quick look at the Environment Modules section of the user manual.
module availto see what modules are available, and
module listto see what modules you are using
module load openmpito add MPI. What version of OpenMPI are you using by default? Add the above command into your ~/.profile for later.
You can both load and unload a module. For this prac we will run with the default environment.
UNIX editors are installed including nano, vim and
emacs. For graphical editing, you may choose to forward X Windows to
your desktop i.e.
ssh raijin.nci.org.au -l <username>
which will allow you to run emacs as a graphical
editor. Alternatively, kate will allow you to edit files over Secure
FTP. For example, from your desktop, open the
This program is just to get started. Note there are 3 basic requirements for all MPI codes:
#include "mpi.h" MPI_Init(&argc, &argv); MPI_Finalize();
You can find the header file in
/apps/openmpi/1.6.3/include/mpi.h. (Do you know what
version of OpenMPI you are using now?) Take a look at it. It provides
the definition of
MPI_COMM_WORLD in a complicated
fashion involving a global structure that is initialized in another function
in the library (it used to be easier!).
MPI_Init() and MPI_Finalize() should be the
first and last executable statements in your code -- basically
because it is not clear what happens before or after calls to these
man MPI_Init says:
The MPI Standard does not say what a program can do before an MPI_Init or after an MPI_Finalize. In the Open MPI implementation, it should do as little as possible. In particular, avoid anything that changes the external state of the program, such as opening files, reading standard input, or writing to standard output.
If you want to know what an MPI function does you can:
man MPI_<function>(assuming the openmpi module has been loaded first).
Note that at the moment we are only interested in MPI1.
Compile the code:
This will result in:
mpicc -c mpiexample1.c mpicc -o mpiexample1 mpiexample1.o
mpicc is a wrapper that will end up calling a standard
C compiler (in this case gcc) Do
mpicc -v mpiexample1.c
to see all the details. mpicc also ensures that the
program links with the MPI library.
Run the code interactively by typing
You should find the executable runs using just one process. With some MPI implementations the code will fail because you have not defined the number of processes to be used. Using OpenMPI this is done using the command mpirun.
Try running the code interactively again but this time by
mpirun -np 2 ./mpiexample1.
mpirun -np 6 ./mpiexample1.
-np 20; it will fail - why? What is the
maximum number of MPI processes you can create interactively?
If you run this program enough times you may see that the order in which the output appears changes. Output to stdout is line buffered, but beyond that can appear in any order.
mpirun has a host of different options. Do
man mpirun for more information. The
-np refers to the number of processes that
you wish to spawn.
So far we have only been running our code on one of the Raijin
nodes. In total the Raijin has 3592 nodes (and 57,472 cores). Six of
these are reserved for interactive logins; the remaining nodes are
only available via a batch queuing system. (Which of the six
interactive nodes are you logged on to? Run the
hostname if unsure.)
Go back to the userguide and read the section entitled
Submission and Scheduling including subsections of
Queue Structure, PBSPro Scheduling and PBSPro Basics.
Now we will run the same job, but using the PBS batch queuing
system. To submit a job to the queuing system we have to write batch
script. An example of this is given in
file batch_job. Take a look at this. Lines starting with
PBS are commands to the queuing system, informing it of
how much resources you require and how your job should be executed. We
use one of these lines to set the number of processors you want to
use. Very important is the line to limit the walltime:
#PBS -l walltime=00:00:10
Please ensure you limit walltime
similarly for any batch job that you use.
After all this setup information you run the job by issuing the
mpirun command, but taking the number of processes from the
number of processors allocated by the queuing system.
To submit your job to the queuing system, run
It will respond with something like
$ qsub batch_job 9485588.r-man2
where 9485588.r-man2 is the id of the job in the
queuing system. To see what is happening on the batch queue, run
aaa444@raijin:~/prac1> qstat Job id Name User Time Use S Queue ---------------- ---------------- ---------------- -------- - ----- 9266041.r-man2 arg.62A ahf564 285:28:5 R normal-node ---lots of jobs--- 9485672.r-man2 batch_job aaa444 0 Q express-node
This gives a long list of jobs. In the above the top job is running as indicated by the R in the S column, while my job is queued as indicated by the Q.
Now compare the result of running
To track the progress of only your job, try
To track all of your current jobs, use
qstat -u $USER.
To delete a job from the queue, run
When your job completes, the combined standard output and error will be put in a file, in this case named batch_job.o9485672. Inspect this file.
Make sure you are happy with the above since you will need to use the batch system later.
Modify the code in mpiexample1.c to also print out the name of the node each process is executing on. Do this by using the system call:
mpiexample1interactively. What nodes of the cluster are being used?
Throughout the course we will be measuring the elapsed time taken to run our parallel jobs. So we start by assessing how good our various timing functions are.
man MPI_Wtime). Insert extra code to test the resolution of this routine. What do you estimate the resolution to be?
In mpiexample2.c, each process allocates an integer buffer of size BUFFLEN (= 128 integers). Each buffer is initialized to the rank of the process. Process 0 sends its buffer to process 1 and vice versa, i.e. process 0 sends a message of zeros and receives a message of 1s, while process 1 does the opposite.
mpiexample3.c is a basic pingpong code. Run the code and make sure it works. After doing so several times, do you observe a potential problem for timing this operation?
|Message Size (ints)||time for pingpong between two processes|
|within a node||between two nodes|