Cambridge-Cranfield HPCF > Information for Users > Programming and Compilation > Hartree |
The nodes visible from outside the HPCF are hartree.hpcf.cam.ac.uk and hartree-2.hpcf.cam.ac.uk. Both support ssh. These are hartree_a and hartree_e respectively and are I/O nodes. The compute nodes are hartree_b to hartree_j, skipping e. Compilation and job submission can be done interactively on either I/O node.
As code is compiled on hartree_a, with power3 processors, but run on
the compute nodes with power4 processors, some care is need.
The default options tune for power4 but allow the code to
run on power3 for debugging.
The compilers mpxlf, mpxlf90 and mpxlc deal with MPI and the obvious languages. Note there is no -lmpi: this is done automatically.
The current MPI release only supports 64 bit programs with the thread safe version, eg mpxlf_r for fortran.
mpxlf90 will not compile F90 programs whose names end `.f90'. for these use `mpxlf90 -qsuffix=f=f90'
For BLAS etc, add -lessl
For LAPACK add -llapack
(This is merely a version I have compiled. I did use the Call Conversion
Interface which means that the parts of Lapack in the ESSL library are used
in preference to the standard Fortran ones and of course it is linked with
the ESSL so you have to have to link with that as well. )
You might want to try using the mass library as an alternative to the
standard maths library. Using the maths functions it contains can give a
2 fold increase over the standard maths library. If you use the vectorized
version which requires you to alter your code it can yeild upto a 6 fold
increase.
-lmass
-lmassp3v vector version but needs code changes
"In some cases MASS is not as accurate as the system library"
"Compared to the standard mathematical library, libm.a, the MASS library
can only differ in the last bit"
Further details of performance and accuracy look
here
.
The default compiler options on hartree are -O3 -qtune=pwr4.
To get any reasonable optimisation, specify at least -O3
To improve performance -O4 or -O5 can be used. They set -qessl -qhot
-qipa -qarch=auto -qtune=auto -qcache=auto. These last options set the
tuning for the machine you are compiling on so you need to set
-qtune=pwr4 -qarch=pwr4 if you use -O4 or higher.
Other options that might help include -qlargepage.
Use -qstrict if you are worried about non-bitwise identical results.
Consider using -qipa
and -Q
to perform interprocedural optimisations and more procedure inlining,
(The former can dramatically increase compile time).
Consider using
-qfloat=hsflt
for faster single precission expressions. hsflt is the highest performing
option but that it is unsafe since exponent overflow can go undected. The
highest performing safe option is -qfloat=hssngl
.
However, there is a potentially important compiler optimization (reciprocal
multiply) that is enabled only if
-qfloat=hsflt
is specified.
Finally if it is worth the effort compile using in addition
-qpdf1
run the program with a variety of typical data sets. Then recompile
with
-qpdf2
instead of
-qpdf1
This process should optimise your code based on how it runs.
By default the compiler works in 32bit mode. To use more than 256Mb of memory you must link with -bmaxdata:0xY0000000 where Y is the number of 256Mb segments you want. (8 to use the maximum avaible).
The same is true of -bmaxstack:0xY0000000 but the default is 512mb.
On line manuals: http://www.hpcf.cam.ac.uk/manuals/ibm/ .
The man command!
Email support: hpcf-support@ucs.cam.ac.uk