E-values for pair-wise local sequence alignment
===============================================

If we compare two completely random sequences, we may find weak
alignments that exist just by chance.  The expected number of such
chance alignments is called the "E-value".  The E-value depends on:
the alignment score threshold, the sequence lengths, the scoring
scheme (score matrix and gap costs), and the letter abundances.

E-values provide guidance for choosing alignment score thresholds: the
score threshold should be high enough that alignments are unlikely to
exist just by chance.

The following tables show E-values expressed as "alignments per square
gigabase".  In other words, if we compared two completely random
sequences of length 1 billion each, this is how many alignments we
would expect to find.  For example, if we compare DNA sequences with
60% A+T, using the HoxD70 matrix with a gap cost of 400 + 30 * (gap
length), we expect about 14 alignments with score >= 4000 per square
gigabase.

The E-values were calculated with this formula:

E-value = 10^18 * K * exp(-lambda * score)

The values of K and lambda are also shown: they depend on the scoring
scheme and letter abundances.  They were calculated using ALP
(ftp://ftp.ncbi.nih.gov/pub/spouge/BLAST_Gumbel/).

For short sequences -- very roughly, shorter than 100 -- these
E-values are inaccurate (too high).  This can be fixed using a
so-called finite size correction, which is available in ALP.


Scoring scheme for DNA:
match score = 1, mismatch cost = 1, gap cost = 2 + (gap length)

===  =======  =======  ======  ======  ======  ======  ======  ======  ======
AT%  Lambda   K        30      35      40      45      50      55      60
===  =======  =======  ======  ======  ======  ======  ======  ======  ======
50   0.989    0.167    22000   150     1.1     0.0078  5.6e-5  4.0e-7  2.8e-9
60   0.916    0.138    1.6e+5  1600    17      0.17    0.0018  1.8e-5  1.9e-7
70   0.677    0.0479   7.2e+7  2.5e+6  83000   2800    95      3.2     0.11
===  =======  =======  ======  ======  ======  ======  ======  ======  ======


Scoring scheme for DNA:
match score = 1, mismatch cost = 1, gap cost = infinity

===  =======  =======  ======  ======  ======  ======  ======  ======  ======
AT%  Lambda   K        20      25      30      35      40      45      50
===  =======  =======  ======  ======  ======  ======  ======  ======  ======
50   1.10     0.111    3.1e+7  1.3e+5  520     2.1     0.0086  3.5e-5  1.4e-7
60   1.05     0.109    8.3e+7  4.3e+5  2300    12      0.063   3.3e-4  1.7e-6
70   0.895    0.101    1.7e+9  1.9e+7  2.2e+5  2500    29      0.33    0.0037
===  =======  =======  ======  ======  ======  ======  ======  ======  ======


Scoring scheme for DNA:
match/mismatch scores = HoxD70, gap cost = 400 + 30 * (gap length)

===  =======  =======  ======  ======  ======  ======  ======  ======  ======
AT%  Lambda   K        3000    3500    4000    4500    5000    5500    6000
===  =======  =======  ======  ======  ======  ======  ======  ======  ======
50   0.00936  0.0895   57000   530     4.9     0.046   4.2e-4  3.9e-6  3.6e-8
60   0.00908  0.0806   1.2e+5  1300    14      0.14    0.0015  1.7e-5  1.8e-7
70   0.00714  0.0312   1.6e+7  4.4e+5  12000   350     9.8     0.28    0.0077
===  =======  =======  ======  ======  ======  ======  ======  ======  ======


Scoring scheme for DNA:
match/mismatch scores = HoxD70, gap cost = infinity

===  =======  =======  ======  ======  ======  ======  ======  ======  ======
AT%  Lambda   K        2000    2500    3000    3500    4000    4500    5000
===  =======  =======  ======  ======  ======  ======  ======  ======  ======
50   0.0104   0.184    1.7e+8  9.4e+5  5200    29      0.16    8.7e-4  4.8e-6
60   0.0102   0.181    2.5e+8  1.5e+6  9300    57      0.35    0.0021  1.3e-5
70   0.00911  0.164    2.0e+9  2.1e+7  2.2e+5  2.3e+3  24      0.26    0.0027
===  =======  =======  ======  ======  ======  ======  ======  ======  ======


Scoring scheme for proteins:
match/mismatch scores = Blosum62, gap cost = 11 + 2 * (gap length)
amino acid abundances = standard

=======  =======  ======  ======  ======  ======  ======  ======  ======
Lambda   K        80      90      100     110     120     130     140
=======  =======  ======  ======  ======  ======  ======  ======  ======
0.299    0.0883   3.6e+6  1.8e+5  9100    460     23      1.2     0.058
=======  =======  ======  ======  ======  ======  ======  ======  ======


Scoring scheme for proteins:
match/mismatch scores = Blosum62, gap cost = infinity
amino acid abundances = standard

=======  =======  ======  ======  ======  ======  ======  ======  ======
Lambda   K        60      70      80      90      100     110     120
=======  =======  ======  ======  ======  ======  ======  ======  ======
0.318    0.0973   5.0e+8  2.1e+7  8.7e+5  36000   1500    63      2.6
=======  =======  ======  ======  ======  ======  ======  ======  ======
