Home >> Justin Turney >> SJOB Suite
 

JetJob Suite of Scripts

The purpose of the JetJob suite is to facilitate a method of job submission and monitoring.

SJOB communicates with the PBS server to execute common everyday commands that we all use. SJOB takes care of creating a suitable submission script and submits the job for you. By allowing SJOB to handle submission, you can utilize MJOB for monitoring when your job is complete.

All scripts described herein can be found in /home/usrb/b1/jturney/bin.

SJOB communicates with the queueing system to suggest to the user a different queue if their queue they want is full and the other has free nodes. To disable this capability (useful if used in scripts) set the environmental variable SJOB_SUGGEST to 'n'.

How to use SJOB

Using SJOB can be a simple task. As with any new program you just have to learn what it likes and does not.

A call to SJOB has the following form: (SJOB has been modified to display the following table when not required arguments are not given or --help is passed.)

% sjob queue[=nodes] progname inputfile jobname outputfile
Option Description
queue Tells SJOB to which queue to submit the job to. You can put anything here but for the current cluster only the following will result in success.

long
reg
opt

nodes This is optional and using it depends on to which queue you submit to. It is pointless to use this option on the long.

For the reg queue the only valid options are bigmem, bigmem2, p3000, and p3200.

For the opt queue you need to say how many nodes you want and the number of processors per node. So saying 2:ppn=2 is a perfectly valid option.

The following are valid options to use with SJOB:

long
reg=bigmem2
opt=2:ppn=2 It is no longer required to append :ppn=2 SJOB will do this automatically if needed.
opt=2 Same as previous example.

progname You need to tell SJOB which program you want to run. To be on the safe side make sure the paths to the executable are located in your path.

Valid options:
ACES ACESII ACES2 aces acesii aces2 uses /home/usrb/b1/jturney/bin/runaces2_default which uses the Sattelmeyer compilation of ACESII.
UTACES UTACESII UTACES2 utaces utacesii utaces2 uses the new UT ACESII from Prof. Stanton.
PSI3 psi3 uses the one in your path, or if you can ask for 1 Opteron node and it will use a 64-bit version of PSI3.
PSI2 psi2 uses the one in your path. Be careful having both PSI2 and PSI3 in your path.
MPQC mpqc uses the version compiled by Steven
MOLPRO molpro uses /usr/local/share/molpro/i686-generic-linux2.4b/molpro for serial and /usr/local/share/molpro/x86_64/bin/run_molpro for parallel.
NWCHEM nwchem uses /usr/local/share/nwchem/bin/nwchem for serial and /usr/local/share/nwchem4.7/bin64/run_nwchem for parallel.
NWCHEM4.6 nwchem4.6 uses /usr/local/share/nwchem/bin/nwchem for serial and /usr/local/share/nwchem/bin64/run_nwchem for parallel.
GAMESS gamess uses a version of GAMESS that has Block Localized Wavefunction (BLW) capabilities

inputfile The name of the input file. If not given then ZMAT is assumed for ACESII and input.dat for all other programs. This option is ignored for PSI2 and input.dat is mandatory.
jobname By default the name of the job is the name of the containing directory. If you want a different name set it here.
outputfile The default is to let the program given in progname to decide. If the program is able to use a different name you can set it here. This is ignored with PSI2.

Example usage of SJOB

To submit an ACESII job to the bigmem2's any of the following is valid:

% sjob reg=bigmem2 aces
SJOB - Job submitted
  ID      : 1000.clortho.ccqc.uga.edu
  Path    : /home/usrb/b1/jturney/chem/acestest
  Program : ACESII
% sjob reg=p3200 aces2
SJOB - Job submitted
  ID      : 1001.clortho.ccqc.uga.edu
  Path    : /home/usrb/b1/jturney/chem/acestest
  Program : ACESII

To submit a MOLPRO job to any machine on the reg queue:

% sjob reg molpro
SJOB - Job submitted
  ID      : 1002.clortho.ccqc.uga.edu
  Path    : /home/usrb/b1/jturney/chem/molprotest
  Program : MOLPRO
or with a different input file:
% sjob reg molpro h2.input
SJOB - Job submitted
  ID      : 1003.clortho.ccqc.uga.edu
  Path    : /home/usrb/b1/jturney/chem/molprotest
  Program : MOLPRO

To submit a NWChem job to the Opteron queue to run in parallel on 4 nodes:

% sjob opt=4:ppn=2 nwchem
SJOB - Job submitted
  ID      : 1004.clortho.ccqc.uga.edu
  Path    : /home/usrb/b1/jturney/chem/nwchemtest
  Program : NWCHEM

Monitoring your jobs

You can use MJOB to monitor your jobs. It runs in a continuous loop refreshing every minute. I recommend just leaving MJOB running in its own window this way you are constantly up to date on the status of your jobs.

MJOB has been updated to display walltime and executing node for all jobs running.

% mjob
MJOB - Monitor jobs compatible with SJOB
Report for jturney

Jobs currently waiting in the queue:
  No jobs in queue.

Jobs currently running in the queue:
  24977 29:22:40 opt19/0+opt19/1
                 ~jturney/molpro/lion_triplet/lion_ts/cas

Jobs possibly completed/aborted:
  23945 SUCCESS  ~jturney/molpro/hnc/a_linear/ccsd
  24981 SUCCESS  ~jturney/molpro/lion_triplet/lino_ts/cas

Deleting your jobs

If you need to delete your job from the queue and/or from the MJOB list use DJOB. DJOB deletes jobs not only completed/aborted but also running or waiting.

It is advised that before submitting another job in a directory that holds a previously completed job that you run djob anyways. If this is not done MJOB can become confused.

% djob 12979
Attempting to qdel: 12979 done.

If you are in the same directory of the job you want to delete use DJOB with no command line argument:

% djob
Attempting to qdel: 12979 done.

If you are in the habit of running several jobs in the same directory and MJOB becomes confused to fix this run:

% djob all
Attempting to qdel: 12979 12980 12981 done.
This deletes all jobs in the directory even ones that are still running.

If you're still in the habit of running several jobs in the same directory, you have a job currently running a job in the directory, and MJOB becomes confused try:

% djob done
Attempting to qdel: 12979 12980 12981 done.
This will only removes jobs that are completed in the current directory.