Compiling on Maxwell
The pathscale compilers have a wide range of compiler options On
the HPCF, the relevant commands have been surrounded by wrapper
scripts, so that the default is to provide a reasonable set of defaults
for the HPCF systems and some detection of potential pitfalls. This is
optional, and can be disabled entirely, but you are advised not to do
that unless you know what you are doing. In particular, you are
strongly advised not to do that just to make some
horrible autoconfigure script or Makefile `work'. It is very likely
that such scripts have not been updated in years and are totally
inappropriate for the HPCF systems.
The following description serves two purposes. It is the
specification
of the wrapper script system, how to control it and why it does what it
does. And it is a summary of the most important compiler and linker
options, some suggestions of which ones are worth looking at further,
and some indication of which ones not to use. Case is significant, as
conventional under Unix.
Unsupported Facilities
While the gcc and g77 compilers are available the Pathscale compilers
in general give significantly better performance. The
commands that should be used are pathf90, pathcc,
pathCC, mppathf90,
mppathcc and mppathCC. The mp... commands
are for
building MPI programs.
General Environment Variables etc.
If the environment variable HPCF_MODE is set to yes,
the wrapper scripts will operate in HPCF mode. If it is unset or set to
no, they will do nothing and the commands will behave as
supplied by Pathscale. Other values will cause an error. It is set to
yes by default in the system login scripts. If it is unset or
set to no, none of the other environment variables will be
inspected, and the rest of this document is relevant only as a
description of Pathscale's compiler options. The operation of
HPCF_MODE relies on having /usr/local/bin first on
your search path; this is also set up by default.
If the environment variable HPCF_ARCH is unset or set to
64bit, the wrapper scripts will operate in 64 bit mode. If it
is 32bit, they will operate in 32 bit mode. Other values will
cause an error. It is set to 64bit by default. You are
strongly advised to work in 64 bit mode, even if integers in your
program are kept as 32 bits, for a large number of reasons. You cannot
link mixtures of 32 and 64 bit code, but this is not a major problem,
as mistakes are diagnosed by the linker.
The environment variable HPCF_VERBOSE can be set to
no, yes or all, and controls how many
static
diagnostic options are set. It does not set any options that will slow
down execution significantly. An unset value is equivalent to of
yes, and other values will cause an error. The default is
yes.
If the compilation command option -hpcf_dryrun is specified
and
HPCF_MODE is set, the scripts will print out the expanded
commands that they would have called, and do nothing. If
HPCF_MODE is not set, it will cause a compilation error. This
may help with debugging, and in building scripts and make files for use
elsewhere, but be careful that you do not set an option that relies on
the configuration of the HPCF.
General Optimisation Features
If HPCF_MODE is yes, the compilation commands will
set the options:
- Fortran in 64-bit mode: -O2 -m64 -mcmodel=medium -WOPT:warn_uninit=on <arguments>
-lacml -lstdc++
- Fortran in 32-bit mode: -O2 -m32 -WOPT:warn_uninit=on <arguments>
-lacml -lstdc++
- C in 64-bit mode: -m64 -O2 -WOPT:warn_uninit=on -mcmodel=medium
-I/opt/acml2.5.1/include <arguments> -lm -lacml -lpathfortran
lstdc++
- C in 32-bit mode: -m32 -O2 -WOPT:warn_uninit=on
-I/opt/acml2.5.1/include <arguments> -lm -lacml
-lpathfortran -lstdc++
- C++ in 64-bit mode: -m64 -O2 -mcmodel=medium -WOPT:warn_uninit=on -I/opt/acml2.5.1/include
<arguments>
-lm -lacml -lpathfortran -lstdc++
- C++ in 32-bit mode: -m32 -O2 -WOPT:warn_uninit=on
-I/opt/acml2.5.1/include <arguments> -lm -lacml
-lpathfortran -lstdc++
-O2 sets the optimisation level to keep compilation speed up
and the size of the compiled code down; generally, levels higher than 2
should be used after doing some analysis of where the program
is spending its time -- see below for some pointers as to how to
proceed.
-m64 selects 64bit.
-m32 selects 32bit.
-mcmodel=medium allows the data segment to be larger
than 2GB.
-WOPT:warn_uninit=on gives a warning of unitialised variables. If you find the output unhelpful you can turn it off with -WOPT:warn_uninit=off.
-lacml selects AMD's mathematical library.
-I/opt/acml2.5.1/include includes the acml headers for C and C++
-lpathfortran is the required PathScale compiler run-time library.
-lm is purely for
convenience so users do not need to add it.
-lstdc++ is a library used by C++. It is included for all languages to allow cross compiling.
The For really serious optimisation, you are recommended
to try -O3, -O3 -OPT:Ofast or
-Ofast in order of
aggressiveness. But you may well want to do this
only on the most critical parts of your code, and you will have to put
a
fair amount of effort in to do this properly. There are also
dangers of a loss or numerical accuracy for some codes at these higher
levels of optimisation.
-OPT:Ofast is equivalent
to -OPT:roundoff=2:Olimit=0:div_split=on:alias=typed
-OPT:roundoff=2 allows for
fairly
extensive code transformations that may result in floating point
round-off or overflow differences in computations.
-OPT:Olimit=0 is a
generally safe option but may result in the compilation taking a long
time or consuming large quantities of memory. This option tells the
compiler to optimize the files being compiled at the specified levels
no matter how large they are.
-OPT:div_split=on allows the
conversion of x/y into x*(recip(y)) which may result in less accurate
floating point computations.
-OPT:alias=typed assumes that
the program has been coded in adherence with the ANSI/ISO C standard
which states that two pointers of
different types cannot point to the same location in memory.
-Ofast is equivalent to -O3 -ipa -OPT:fast -fno-math-errno
-fno-math-errno bypasses the
setting of ERRNO in math functions. This can result in a performance
improvement if the program does not rely on IEEE
exception handling to detect runtime floating point errors.
The Inter-Procedural Analysis can also give significant performance
improvements and can be activated with -ipa. When you are using -ipa,
all the .o files have to have been compiled with
-ipa all libraries have to have been compiled without -ipa and
linking must be done with -ipa for your compilation to be successful.
The total compile time can be considerably longer with IPA than without.
Understanding your code is important in choosing the best compiler
options.
General Debugging Features
If HPCF_VERBOSE=all and HPCF_MODE=yes then compilation
commands will also set the option for all compilers of -v which makes
the compilers more verbose and -Wall -fullwarn which turns on
all warnings.
Other options that can help with debugging include
-trapuv Trap uninitialized variables when compiled at -O0
-g to turn on debugging for use with gdb
For Fortran you can also use
-ffortran-bounds-check Check bounds.
-C Perform runtime subscript range checking.
Subscripts that are out of range cause
fatal run time errors.
Building MPI
You should set the environment variable HPCF_MPI to
yes if you are building an MPI program, though all it
does is to trap the mistake of using the serial commands to compile or
link code. An unset value is equivalent to no, and other
values will cause an error.
The reason that you should not use the serial commands directly if
you
are building an MPI program on any modern system is that
the mp... commands set up the paths and libraries correctly,
and these are quite hard to get right. We have no idea why some of them
need to be specified, nor what will happen if you get them wrong, but
at
least some will cause failure to compile or link, and others will cause
serious inefficiency. Note that this applies to autoconfigure scripts
and make files as much as your own commands. This is a
serious `gotcha'.
To run your MPI program, you must use the mpirun command;
see the local Web page on that for more details.