last-dotplot
============

This script makes a dotplot, a.k.a. Oxford Grid, of pair-wise sequence
alignments in MAF or LAST tabular format.  It requires the `Python
Imaging Library <https://pillow.readthedocs.io/>`_ to be installed.
It can be used like this::

  last-dotplot my-alignments my-plot.png

The output can be in any format supported by the Imaging Library::

  last-dotplot alns alns.gif

To get a nicer font, try something like::

  last-dotplot -f /usr/share/fonts/liberation/LiberationSans-Regular.ttf alns alns.png

or::

  last-dotplot -f /Library/Fonts/Arial.ttf alns alns.png

If the fonts are located somewhere different on your computer, change
this as appropriate.

Choosing sequences
------------------

If there are too many sequences, the dotplot will be very cluttered,
or the script might give up with an error message.  You can exclude
sequences with names like "chrUn_random522" like this::

  last-dotplot -1 'chr[!U]*' -2 'chr[!U]*' alns alns.png

Option "-1" selects sequences from the 1st (horizontal) genome, and
"-2" selects sequences from the 2nd (vertical) genome.  'chr[!U]*' is
a *pattern* that specifies names starting with "chr", followed by any
character except U, followed by anything.

==========  =============================
Pattern     Meaning
----------  -----------------------------
``*``       zero or more of any character
``?``       any single character
``[abc]``   any character in abc
``[!abc]``  any character not in abc
==========  =============================

If a sequence name has a dot (e.g. "hg19.chr7"), the pattern is
compared to both the whole name and the part after the dot.

You can specify more than one pattern, e.g. this gets sequences with
names starting in "chr" followed by one or two characters::

  last-dotplot -1 'chr?' -1 'chr??' alns alns.png

You can also specify a sequence range; for example this gets the first
1000 bases of chr9::

  last-dotplot -1 chr9:0-1000 alns alns.png

Options
-------

  -h, --help
      Show a help message, with default option values, and exit.
  -v, --verbose
      Show progress messages & data about the plot.
  -1 PATTERN, --seq1=PATTERN
      Which sequences to show from the 1st (horizontal) genome.
  -2 PATTERN, --seq2=PATTERN
      Which sequences to show from the 2nd (vertical) genome.
  -x WIDTH, --width=WIDTH
      Maximum width in pixels.
  -y HEIGHT, --height=HEIGHT
      Maximum height in pixels.
  -c COLOR, --forwardcolor=COLOR
      Color for forward alignments.
  -r COLOR, --reversecolor=COLOR
      Color for reverse alignments.
  --sort1=N
      Put the 1st genome's sequences left-to-right in order of: their
      appearance in the input (0), their names (1), their lengths (2).
  --sort2=N
      Put the 2nd genome's sequences top-to-bottom in order of: their
      appearance in the input (0), their names (1), their lengths (2).
  --trim1
      Trim unaligned sequence flanks from the 1st (horizontal) genome.
  --trim2
      Trim unaligned sequence flanks from the 2nd (vertical) genome.
  --border-pixels=INT
      Number of pixels between sequences.
  --border-color=COLOR
      Color for pixels between sequences.

Text options
~~~~~~~~~~~~

  -f FILE, --fontfile=FILE
      TrueType or OpenType font file.
  -s SIZE, --fontsize=SIZE
      TrueType or OpenType font size.
  --rot1=ROT
      Text rotation for the 1st genome: h(orizontal) or v(ertical).
  --rot2=ROT
      Text rotation for the 2nd genome: h(orizontal) or v(ertical).
  --lengths1
      Show sequence lengths for the 1st (horizontal) genome.
  --lengths2
      Show sequence lengths for the 2nd (vertical) genome.

Annotation options
~~~~~~~~~~~~~~~~~~

These options read annotations of sequence segments, and draw them as
colored horizontal or vertical stripes.  This looks good only if the
annotations are reasonably sparse: e.g. you can't sensibly view 20000
gene annotations in one small dotplot.

  --bed1=FILE
      Read `BED-format
      <https://genome.ucsc.edu/FAQ/FAQformat.html#format1>`_
      annotations for the 1st genome.  They are drawn as stripes, with
      coordinates given by the first three BED fields.  The color is
      specified by the RGB field if present, else pale red if the
      strand is "+", pale blue if "-", or pale purple.
  --bed2=FILE
      Read BED-format annotations for the 2nd genome.
  --rmsk1=FILE
      Read repeat annotations for the 1st genome, in RepeatMasker .out
      or rmsk.txt format.  The color is pale purple for "low
      complexity" and "simple repeats", else pale red for "+" strand
      and pale blue for "-" strand.
  --rmsk2=FILE
      Read repeat annotations for the 2nd genome.

Gene options
~~~~~~~~~~~~

  --genePred1=FILE
      Read gene annotations for the 1st genome in `genePred format
      <https://genome.ucsc.edu/FAQ/FAQformat.html#format9>`_.
  --genePred2=FILE
      Read gene annotations for the 2nd genome.
  --exon-color=COLOR
      Color for exons.
  --cds-color=COLOR
      Color for protein-coding regions.

Unsequenced gap options
~~~~~~~~~~~~~~~~~~~~~~~

Note: these "gaps" are *not* alignment gaps (indels): they are regions
of unknown sequence.

  --gap1=FILE
      Read unsequenced gaps in the 1st genome from an agp or gap file.
  --gap2=FILE
      Read unsequenced gaps in the 2nd genome from an agp or gap file.
  --bridged-color=COLOR
      Color for bridged gaps.
  --unbridged-color=COLOR
      Color for unbridged gaps.

An unsequenced gap will be shown only if it covers at least one whole
pixel.

Colors
~~~~~~

Colors can be specified in `various ways described here
<http://effbot.org/imagingbook/imagecolor.htm>`_.
