README for the examples directory of the Condor 6.0 distribution 

This file describes how to use the various example programs in this
directory.  It is intended to serve as both a tutorial on how to use
Condor, and as a way to test the basic functioning of your
installation.  It is not intended as an exhaustive test of Condor.  

There are C, C++ and Fortran jobs included.  The Makefile assumes you
have the GNU compilers for these various languages installed.  If you
want to use your vendor's compiler for any or all of these, simply
edit the first 3 lines of the Makefile accordingly, or do the
compilations by hand, as described below.

To build all the programs and link them for use with Condor, simply
type "make".  This will compile the programs of all three languages
and create a number of executables linked for Condor, named
[program].remote.  If you only wish to build the examples for a given
language, type "make c", "make c++", or "make fortran" accordingly.

To build these programs by hand, (or any other programs you wish to
use with Condor), simply put "condor_compile" at the front of the
command you would otherwise use to compile your code.  condor_compile
has two	installation modes: regular install and full install.  See the
INSTALL file for details.  If using the regular install, you can only
use condor_compile with the following commands:

	gcc, g++, g77, cc, acc, c89, CC, f77, fort77, ld

If you use the full installation, you can use condor_compile with any
command you would use to build your program.  For example, you can do
things like: 

	condor_compile make

and it will link any binaries produced by your make file for Condor.
However, given a regular installation of condor_compile, to build the
programs in this directory, simply type:

	condor_compile [compiler] [prog].[Ccf] -o [prog].remote

For example:

	condor_compile cc loop.c -o loop.remote
	condor_compile fort77 printer.f -o printer.remote
	condor_compile g++ env.C -o env.remote
	
Normally, Condor will only run jobs on idle machines.  However, to
speed up the testing process, you may want to force Condor to run jobs
on the machine you are using.  To do this, edit the local config file
(described in the INSTALL file) and add the following lines:

START    = Owner == "your user name" || Owner == "condor"
SUSPEND  = False
CONTINUE = True
PREEMPT  = False
KILL     = False

Of course, you should change "your user name" to the string containing
your actual user name.  For example:

START = Owner == "coltrane" || Owner == "condor"

To make these changes take effect, use "condor_reconfig" to
reconfigure the local host, or "condor_reconfig [hostname1]
[hostname2] ..." to reconfigure remote hosts.  (NOTE: condor_reconfig
is an administrative command and can by found in the "sbin" directory
of your Condor release directory.  Be sure to undo these changes once
you are done testing, or you may find Condor starting up jobs on your
machine when you don't want them.

Once you have compiled all the test programs you wish to use and have
set up Condor to run the jobs on your machine, you can execute the
"submit" script in this directory to submit all the tests to Condor.
This script just executes the command:

condor_submit [program].cmd

for each job you have compiled.  Please read the condor_submit man
page for details about these .cmd files and what they do.  You can
find all the man pages in the doc/commands directory of this
installation.  

By submitting a job, Condor places the job in the "job queue".  You
can view the job queue on the current machine by typing "condor_q".
After running submit, your job queue should look something like:

-- Schedd: perdita.cs.wisc.edu : <128.105.73.32:63364>
   ID    OWNER           SUBMITTED    CPU_USAGE  ST PRI SIZE CMD

   1.0   condor          6/13 10:58   0+00:00:00 I  0   0.5  io.remote         
   2.0   condor          6/13 10:58   0+00:00:00 I  0   0.5  env.remote        
   3.0   condor          6/13 10:58   0+00:00:00 I  0   0.5  fstream.remote    
   4.0   condor          6/13 10:58   0+00:00:00 I  0   0.5  loop.remote       
   4.1   condor          6/13 10:58   0+00:00:00 I  0   0.5  loop.remote       
   4.2   condor          6/13 10:58   0+00:00:00 I  0   0.5  loop.remote       
   4.3   condor          6/13 10:58   0+00:00:00 I  0   0.5  loop.remote       
   4.4   condor          6/13 10:58   0+00:00:00 I  0   0.5  loop.remote       
   5.0   condor          6/13 10:58   0+00:00:00 I  0   0.5  registers.remote  
   6.0   condor          6/13 10:58   0+00:00:00 I  0   0.5  reader.remote     
   7.0   condor          6/13 10:58   0+00:00:00 I  0   0.5  printer.remote    
   8.0   condor          6/13 10:58   0+00:00:00 I  0   0.5  fortIO.remote     
   9.0   condor          6/13 10:58   0+00:00:00 I  0   0.0  sh_loop           

13 jobs; 13 idle, 0 running, 0 held

Notice that loop.remote is in the queue with 5 entries 4.0, 4.1, etc.
This is a "job cluster".  See the section on loop.c below for
details. 

All of these jobs have a status (listed in the "ST" column) of "I",
idle, meaning they are not being run by Condor.  If a job starts
executing, it will have a status of "R", running:

   ID    OWNER           SUBMITTED    CPU_USAGE  ST PRI SIZE COMMAND           

   1.0   condor          6/13 10:58   0+00:00:00 R  0   0.5  io.remote         
   2.0   condor          6/13 10:58   0+00:00:00 I  0   0.5  env.remote        
   3.0   condor          6/13 10:58   0+00:00:00 I  0   0.5  fstream.remote    

   ...

13 jobs; 12 idle, 1 running, 0 held

At this point, if you do a ps command to find all the Condor daemons
on your machine, (such as "ps -aux | egrep condor_") you should see a
processes for condor_shadow, condor_starter and condor_exec.  If the
job is executing on another machine, you will only see the
condor_shadow process, and the execute machine with have the
condor_starter and condor_exec processes.

If the job checkpoints, it will have status of "I", idle, and the
CPU_USAGE should be updated to list the amount of CPU time the job has
used.  You can force a job to checkpoint on a machine by typing
"condor_checkpoint hostname" (or just "condor_checkpoint" for the
local host).  This will make the job checkpoint but it will
immediately restart and continue to execute on the machine.  You can
issue a "condor_vacate hostname" command to force the job to
checkpoint, and then be returned to the queue, not restarted.  Both
condor_checkpoint and condor_vacate are administrative commands that
are in the sbin directory (just like condor_reconfig).

Once the job has completed, it is removed from the job queue and
placed in the "history file".  You can view the history file by typing
"condor_history", which will look something like:

 ID      OWNER            SUBMITTED    CPU_USAGE ST PRI SIZE CMD               
   1.0   condor          6/13 10:58   0+00:00:00 C  0   0.9  io.remote         

You will notice that the status is now "C", for completed.  (Note,
since the io.remote program spends all of its time doing I/O system
calls, it uses very little CPU time, and might be listed with
CPU_USAGE 0, as above.  This is normal.  Other jobs will have a larger
value here, depending on how much CPU time they actually used).

By using condor_q and condor_history, you can follow the progress of
your jobs as they run.  You can also run "condor_status" to see the
status of all the machines in your pool.  Most of the jobs in this
directory are also submitted so they create a "user log", a file that
details major events in the life of the job such as when it was
submitted, when it checkpoints, what machines it runs on, and when it
completes.  These user logs are called [program].log.  See the *.cmd
files or the condor_submit man page for details on how to create these
files.

The following sections describe the details of the various programs,
what they test, and the expected output of the job.  Please read them
to make sure your tests are properly executed.

**********************************************************************
C tests
**********************************************************************

io.c 

   This program tests the proper functioning of Condor during heavy
   I/O system calls.  While the test is running, be sure to issue some
   "condor_vacate hostname" commands (where hostname is the name of
   the machine where the job is executing) to force the job to
   checkpoint.  If there is a problem with the job, it will exit right
   away.  If there are no problems, it will print to the io.out file
   the message:

       Normal End Of Job
       Be sure the job checkpointed at least once.

loop.c

   This program just computes in a loop for the given number of
   seconds.  The time it takes is passed as a command-line argument.
   See the file loop.cmd to change the time it computes.  This program
   also demonstrates how to submit multiple instances of the same
   executable with different parameters.  Read the file loop.cmd which
   is well documented.  Also, read the condor_submit man page for the
   complete story.

registers.c

   This program uses a recursive function call to place known values
   in most of all of the processor's general purpose registers.  The
   program checkpoints and restarts with several copies of function
   "foo" on the stack.  After the restart, each copy of the function
   foo will print out a set of values, and then return to the previous
   level of recursion.  Normal output will look like:

	 About to checkpoint
	 Returned from checkpoint
	 args: 35 35 35 35 35 35 35 35
	 args: 34 34 34 34 34 34 34 34
	 locals: -35 -35 -35 -35 -35 -35 -35 -35

	 args: 33 33 33 33 33 33 33 33
	 locals: -34 -34 -34 -34 -34 -34 -34 -34

	 ...

	 args: 0 0 0 0 0 0 0 0
	 locals: -1 -1 -1 -1 -1 -1 -1 -1

**********************************************************************
C++ tests
**********************************************************************

env.C

   This program checks for the proper handling of the environment by
   Condor during checkpoints.  Correct output will look something
   like: 

	 environ[0] = "alpha=a"
	 environ[1] = "bravo=b"
	 environ[2] = "charlie=c"
	 argv[0] = "condor_exec.61.0"
	 argv[1] = "foo"
	 argv[2] = "bar"
	 argv[3] = "glarch"

	 Environment vector OK
	 Argument vector OK

	 Environment vector OK
	 Argument vector OK

	 Environment vector OK
	 Argument vector OK

	 Normal End Of Job

fstream.C

   This program checks for proper functioning of C++ file stream
   operations within Condor. Once it is completed, the file
   fstream.output should be the same as the file fstream.C.  The
   command:

	diff fstream.output fstream.C

   should yield no differences.

**********************************************************************
Fortran tests
**********************************************************************

fortIO.f

   This program simply checks the functioning of file I/O in Fortran.
   It writes "Hello" to a file named x.x in the current directory.
   fortIO.out and fortIO.err should be empty.

printer.f
 
   This program also checks proper fortran I/O with Condor.  When it
   is done, the file printer.out should contain only the following
   lines: 

	 set i = 0
	 set i = 1

reader.f

   This program also checks proper fortran I/O with Condor.  When it
   is done, the file reader.out should have two lines, which both list
   the same number of lines read by the program:

       Read 4404 lines
       Read 4404 lines

   The actual number will vary, since the file it is reading is
   supplied on stdin (see the files reader.cmd and reader.in) and is
   the reader.remote binary, which varies in size. 

**********************************************************************
Vanilla jobs
**********************************************************************

sh_loop

    This program is just a shell script, and is submitted to Condor in
    the "vanilla" universe.  Vanilla jobs are not linked with the
    Condor libraries and therefore do not perform remote system calls
    or checkpoint.  However, any program can be run as a vanilla job,
    including shell scripts.  See the manual for complete details on
    using vanilla jobs, or the condor_submit man page.  loop_sh stays
    in a loop and prints out a number, then sleeps for a second.  At
    the end, sh_loop.out should contain the numbers from 1 to 60 and
    the message "Normal End-of-Job".

**********************************************************************
PVM jobs
**********************************************************************

    The PVM subdirectory contains an example PVM job.  You must have
    Condor PVM support installed to run this example.

**********************************************************************
DAGMan jobs
**********************************************************************

    The final example is one of inter-job dependencies in Condor,
    using the Directed Acyclical Graph Manager (DAGMan).  It is all
    contained within its own subdirectory, "dagman".  If you cd into
    this directory you will find a README in both plain ASCII text
    (README.txt) and in HTML (README.html).
