
                         Linux-HA Phase I Requirements

   This document describes a set of general Linux-HA requirements which have
   been presented to the list, and no objections were made.  Of course, it
   would have been nice to have had a good discussion, but I take silence to
   mean assent :-)

Linux-HA Phase I General Goals

     * Simple
     * Reliable
     * Easy to configure
     * Easy to test
     * Easy to monitor
     * Redundant hardware and software are verified for working condition

Linux-HA Phase I Requirements

   The short-term goal of Phase I is to provide a more realistic demonstration
   of Linux-HA, in a form that will actually be usable to a certain set of
   customers (users) in a production sense.

   This demonstration is focused on providing High-Availability web service.
   The rationale for providing web service is simple:  It is well-understood,
   and Linux has a significant presence in the web server market.  This will
   provide more initial users and testers than most other applications.

   The following minimal requirements on the web service are considered
   sufficient for this demonstration:
     * An active/standby methodology is acceptable.  Load sharing need not be
       explicitly supported
     * Data on standby nodes must be continually replicated from their paired
       active nodes over dedicated LANs.  I am referring specifically to
       application data, not cluster config data.

     * Comment: It is expected that we will use "poor man's replication"
       between the active and standby nodes

     IP address takeover between active and standby hosts. Ability to start and
   stop applications as IP addresses move around the cluster.

     Basic cluster monitoring capabilities via /proc-like interface

     Simple configuration and installation documentation

     Basic support for either resource groups or dependencies

Restrictions Allowed for Demonstration

   The following restrictions are considered acceptable for the demonstration.
     * It is not necessary to provide load sharing between members of the
       cluster (An active/standby methodology is acceptable)
     * A single active/standby pair is sufficient at the beginning
     * No application level notification of cluster transitions need be
       provided (though see the stop/start requirement above)
     * No hardware diagnostics need be provided

Post-Demonstration Requirement Candidates

   After these demonstration requirements have been met, it is expected that
   the following capabilities will be added (not listed in priority order):
     * Integration of hardware and software diagnostics into the architecture
     * Support for in-node IP interface failover (failing between NICs within a
       single host)
     * Application notification of cluster transitions (support for arbitrary
       application failover)
     * Plug-in modules interface available for cluster management

     * {allowing support for: active/standby, n+1, load sharing, etc.}

     Cluster management uses diagnostic information in failover strategy

     Arbitrary number of nodes in cluster.

     Multiple pairs of active/standby servers in the cluster

     Easily configured support for common servers like these: ftp, smtp, pop3,
   imap, DNS, others (?)
   This is intended to be something more sophisticated than changing run
   levels.  Changing run levels only supports the active/standby model.  Note
   that these kinds of services may be started and stopped with
   /etc/rc.d/init.d scripts, but will not likely be tied to run levels.

     Load sharing between the active/replicator servers via NFS (?)

     Support for other replication configurations.  For example:
     * Shared SCSI
     * GFS
     * User-defined replication methods

     Sophisticated, cool GUI monitoring capabilities

     Cool GUI configuration tools

     Other cool and feasible things such as people are moved to do them :-)

     I have a bias against making the customer to write shell scripts to move
   resources around for "normal" cases.  This is in harmony with "easy to
   configure" and Cool GUI configuration tools.
