Back to Pt4 (Franklin) |
Hartree: Running Jobs
When logging into Hartree you will actually get connected to one of the I/O nodes, hartree_a. The I/O nodes can be used for short interactive jobs and compilation. Batch jobs are submitted to the queuing system and processed on the computation nodes. There should be no need to log into the computation nodes directly.
Interactive Usage
Short serial jobs can be run on the I/O nodes. To run an MPI exectuable for testing or debugging, use the command poe eg
hartree_a [1] poe prog.x -procs 2
Your current directory must contain a host.list file for node allocation, with as many lines as processors requested eg,
hartree_a [2]cat host.list hartree_a hartree_a
Note that executable compiled with certain Power4 optimisations may not run interactively on the I/O nodes. See the webpages on compilation for more details
Batch Jobs
On Hartree the LoadLeveller scheduler is used. The queues are named [u | t | s][2 | 4 | 8 | 16 | 32 ]: 'u' for 10 minutes, 't' for 2 hours, and 's' for 24 hours. (The ``s'' queues are for production runs, ``t'' and ``u'' are for testing and development) All queues of 8 or more processors have exclusive use of the node(s) they run on. A single node contains 8 1.1GHz Power4 processors and 16GB of shared memory.
For communication intensive codes (eg any FFT based codes) it is recommended to use the 8 processor queues. Jobs requiring larger resources should use Franklin
To run an MPI program on 8 processors in the 2hr queue a suitable job submission script is
hartree_a [6] cat test.job #!/bin/sh #@ class = t8 #@queue date ./myprog.mpi
Note that the mpriun command is not used. The first three lines are required by the Loadleveller scheduler, only the class should be altered to reflect the queue you wish to submit to.
This script can be submitted with
hartree_a [7] llsubmit test.job stderr will be sent to LoadL.err.$(jobid).$(stepid) stdout will be sent to LoadL.out.$(jobid).$(stepid) llsubmit: Processed .. Submit Filter: "/var/loadl/home/prefilter". llsubmit: The job "hartree_a.823" has been submitted.
The llq command is used to monitor the status of the job.
hartree_a [8] llq Id Owner Submitted ST PRI Class Running On -------------------- ------- ----------- -- --- ------ ----------- hartree_e.14120.0 spqr1 2/23 10:43 ST 50 t8 hartree_b
The llcancel command is used to delete jobs.
hartree_a [9] llcancel hartree_e.14120.0 llcancel: Cancel command has been sent to the central manager.
Back to Pt4 (Franklin) |