LLNL CHAOS-SPECIFIC RELEASE NOTES FOR SLURM VERSION 2.1
8 March 2010

This lists only the most significant changes from SLURM v2.0 to v2.1
with respect to Chaos systems. See the file RELEASE_NOTES for other
changes.

For system administrators:
* The pam_slurm Pluggable Authentication Module for SLURM, previously
  distributed separately, has been moved within the main SLURM distribution
  and is packaged as a separate RPM.
* Added MaxTasksPerNode configuration parameter to control how many tasks that
  the slurmd daemon can launch. The default value is 128 (same as Slurm v2.0 
  value).
* Changed "scontrol show job" command:
  - ReqProcs (number of processors requested) is replaced by NumCPUs (number
    of cpus requested or actually allocated)
  - ReqNodes (number of nodes requested) is replaced by NumNodes (number of
    nodes requested or actually allocated).
  - Added a --detail option to "scontrol show job" to display the cpu/memory
    allocation information on a node-by-node basis.
  - Reorganized the output into functional groupings.
* Added command "sacctmgr show problems" to display problems in the accounting
  database (e.g. accounts with no users, users with no UID, etc.).
* A mechanism has been added for SPANK plugins to set environment variables
  for Prolog, Epilog, PrologSLurmctld and EpilogSlurmctld programs using the
  functions spank_get_job_env, spank_set_job_env, and spank_unset_job_env. See
  "man spank" for more information.

Mostly for users:
* Added -"-signal=<int>@<time>" option to salloc, sbatch and srun commands to
  notify programs before reaching the end of their time limit.
* Add support for job step time limits.
* Sbatch response changed from "sbatch: Submitted batch job #" written to
  stderr to "Submitted batch job #" written to stdout.
* Added a --detail option to "scontrol show job" to display the cpu/memory
  allocation informaton on a node-by-node basis.
* Add new job wait reason, ReqNodeNotAvail: Required node is not available 
  (down or drained).
* Added environment variable support to sattach, salloc, sbatch and srun
  to permit user control over exit codes so application exit codes can be
  distiguished from those generated by SLURM. SLURM_EXIT_ERROR specifies the
  exit code when a SLURM error occurs. SLURM_EXIT_IMMEDIATE specifies the 
  exit code when the --immediate option is specified and resources are not
  available. Any other non-zero exit code would be that of the application
  run by SLURM.
 
SLURM state files in version 2.1 are different from those of version 2.1.
After installing SLURM version 2.1, plan to restart without preserving 
jobs or other state information. While SLURM version 2.0 is still running, 
cancel all pending and running jobs (e.g.
"scancel --state=pending; scancel --state=running"). Then stop and restart 
daemons with the "-c" option or use "/etc/init.d/slurm startclean".
