DPH benchmarks
==============

This directory contains several DPH benchmarks.  Currently, only the following 
are working:

sumsq	   - parallel version of `sum [x * x | x <- [1..n]]'
dotp       - dot product of two dense vectors
smvm       - sparse matrix/vector multiplication

For more details and benchmark results, see

  http://hackage.haskell.org/trac/ghc/wiki/DataParallel/BenchmarkStatus


Benchmark driver
----------------

This directory contains a benchmark driver `runbench.hs', which runs all 
available benchmarks on a set of standard configuartions while choosing the 
number of threads in dependence on the capabilities of the hardware.


Options of the benchmark framework
----------------------------------

The following options are common to all benchmarks:

  --runs=N                Repeat each benchmark N times
  -r N

  --verbose[=N]           Set the verbosity level
  -v[N]

  --help                  Show a help screen


Running benchmarks
------------------

For parallel benchmarks, you usually want to use

  benchmark --runs=<R> <INPUT> +RTS -N<N>

Here, N is the number of threads to use and R the number of times the
benchmark should be repeated (you probably want something between 3 and 10).  
Some benchmarks require further <INPUT>, some don't.

The output will look as follows:

  ....: wall_best/cpu_best wall_avg/cpu_avg wall_worst/cpu_worst

Here, wall_{best|avg|worst} is the best, average and worst wall-clock time,
respectively; cpu_{best|avg|worst} is the CPU time. Note that for parallel
benchmarks on a multiprocessor, the wall-clock time will typically decrease
with more threads whereas the CPU time will slightly increase.  The results
are in milliseconds. 

For sequential benchmarks, the number of threads does not have to be
specified, i.e., --threads and +RTS -N can be omitted.

At higher verbosity levels, more information (in particular, the timings of
the individual runs) will be displayed.
