"MapReduce-MPI WWW Site"_mws - "MapReduce-MPI Documentation"_md :c

:link(mws,http://www.cs.sandia.gov/~sjplimp/mapreduce.html)
:link(md,Manual.html)

:line

Background :h3

MapReduce is the programming paradigm popularized by Google
researchers "Dean and Ghemawat"_#Dean.  Their motivation was to enable
analysis programs to be rapidly developed and deployed within Google
to operate on the massive data sets residing on their large
distributed clusters.  Their paper introduced a novel way of thinking
about certain kinds of large-scale computations as "map" operations
followed by "reduces".  The power of the paradigm is that when cast in
this way, a traditionally serial algorithm now becomes two highly
parallel application-specific operations (requiring no communication)
sandwiched around an intermediate operation that requires parallel
communication, but which can be encapsulated in a library since the
operation is independent of the application.

The Google implementation of MapReduce was a C++ library with
communication between networked machines via remote procedure calls.
They allow for fault tolerance when large numbers of machines are
used, and can use disks as out-of-core memory to process huge data
sets.  Thousands of MapReduce programs have since been written by
Google researchers and are part of the daily compute tasks run by the
company.

While I had heard about MapReduce, I didn't appreciate its power for
scientific computing on a monolithic distributed-memory parallel
machine, until reading a SC08 paper by "Tu, et al"_#Tu of the
D.E. Shaw company.  They showed how to think about tasks such as the
post-processing of simulation output as MapReduce operations.  In this
context it can be useful for computations that would normally be
thought of as serial, such as reading in a large data set and scanning
it for events of a desired kind.  As before, the computation can be
formulated as a highly parallel "map" followed by a "reduce".  The
encapsulated parallel operation in the middle requires all-to-all
communication to reorgnanize the data, a familiar MPI operation.

Tu's implementation of MapReduce was in parallel Python with
communication between processors via MPI, again allowing disks to be
used for out-of-core operations.

This MapReduce-MPI (MR-MPI) library is a very simple and lightweight
implementation of the basic MapReduce functionality, borrowing ideas
from both the "Dean and Sanjay"_#Dean and "Tu, et al"_#Tu papers.  It
has the following features:

C++ library using MPI for inter-processor communication.  This allows
precise control over the memory allocated during a large-scale
MapReduce. :ulb,l

C++ and C and Python interfaces provided.  A C++ interface means that
one or more MapReduce objects can be instantiated and invoked by the
user's program.  A C interface means that the library can also be
called from C or other hi-level languages such as Fortran.  A Python
interface means the library can be called from a Python script,
allowing you to write serial map() and reduce() functions in Python.
If your machine can run Python in parallel, you can also run a
parallel MapReduce in that manner. :l

Small, portable.  The entire library is a few thousand lines of C++
code in a handful of C++ files which can be built on any machine with
a C++ compiler.  For parallel operation, you link with MPI, a standard
message passing library available on all distributed memory machines.
For serial operation, a dummy MPI library can be substituted, which is
provided.  The Python wrapper can be installed on any machine with a
version of Python that includes the ctypes module, typically Python
2.5 or later. :l

In-core or Out-of-core operation.  Each MapReduce object created
allocates per-processor "pages" of memory, where the page size is
determined by the user.  Typical MapReduce operations can be performed
using just a few such pages.  If your data set (key/value pairs) fits
in a single page, then the library performs its operations in-core.
If your data set exceeds the page size, then processors write to
temporary disk files as needed and subsequently read from them.  This
allows processing of data sets that are larger than will fit in the
aggregate memory of all the processors. :l,ule

This library also has the following limitation:

No fault tolerance.  Current MPI implementations do not enable easy
detection of a dead processor.  So like most MPI programs, a MapReduce
operation will hang or crash if a processor goes away. :ulb,l,ule

Finally, I call attention to "recent work"_#Gray by Alexander Gray and
colleagues at Georgia Tech.  They show that various kinds of
scientific computations such as N-body forces via multipole
expansions, k-means clustering, and machine learning algorithms, can
be formulated as MapReduce operations.  Thus there is an expanding set
of data-intense or compute-intense problems that may be amenable to
solution using a MapReduce library such as this.

:line

:link(Dean)
[(Dean)] J. Dean and S. Ghemawat, "MapReduce: Simplified Data
Processing on Large Clusters", OSDI'04 conference (2004); J. Dean and
S. Ghemawat, "MapReduce: Simplified Data Processing on Large
Clusters", Communications of the ACM, 51, p 107-113 (2008).

:link(Tu)
[(Tu)] T. Tu, C. A. Rendleman, D. W. Borhani, R. O. Dror,
J. Gullingsrud, M. O. Jensen, J. L. Kelpeis, P. Maragakis, P. Miller,
K. A. Stafford, D. E. Shaw, "A Scalable Parallel Framework for
Analyzing Terascale Molecular Dynamics Trajectories", SC08 proceedings
(2008).

:link(Gray)
[(Gray)] A. Gray, Georgia Tech, http://www.cc.gatech.edu/~agray
