$Id: TODO,v 1.186 2004/04/07 20:43:17 flacoste Exp $

 ______________________________
'                              `
| this file is largely obsolete |
`-------------------------------'

* proxy

- update lr_config, user-manual, website, address.cf, online responder etc.
  for new service and superservice.  integrate squid2dlf, part of welf,
  socks2dlf.  write a ms_proxy2dlf.  write more reports.

* database

- add PostgreSQL plugin

- add more reports

- add "query-type" extension. Possible types are "create table", "insert", 
  "select", "show", etc

* socks

- finish or purge socks stuff.

* ftp

- look at Pure FTPD: log formats: CLF!!! (Apache!), Stats format

  see: http://pureftpd.sourceforge.net/README       Pure FTPD
       http://www.shagged.org/ftpstats/             Stats format

* www

- Report on specific webpages: regexp in config file, matched against 
  interesting dlf fields: get the trackfilter stuff more generic, get
  the nonpics stuff merged in this new scheme.

- Lire::WWW::Filename::Attack

  Add more attacks. See e.g. http://www.securityfocus.com/.

* email

- Get email convertors use error messages and status messages in log. Redefine
  dlf format for this. (Requested by schr)

- In email convertors: use timestamps in a smarter way: do not use same 
  timestamp on all dlf lines about same message.


* responder

- Make it easy to publish html reports on a website automatically, get the
  responder using this.  E.g.: get the responder respond with an url like
  http://logreport.net/reports/0914616612616663161/, where an html report
  with graphics is published for, say, a week.

- Add a http file upload interface to the responder.  (Josh is working on
  this.)


* all

- Sanitize debug output, and use it. (Requested by, a.o., schr)

- Clean up stuff which goes to stderr. document a policy on this, so that we
  can build a lr2dlf thingie.

- Decide on a filesystem layout for a .dlf archive.  This should be used to
  combine old reports / logs to reports over longer periods.  It should enable
  dealing sanely with logs which don't span a 00:00 - 23:59 period.  A possible 
  implementation:

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

The archive should store files in .xml and .dlf format.  It should reside
somewhere under /var/lib/lire/data.  (The current Lire .deb creates
/var/lib/lire .)

<for table: see the Lire User Manual>

Per kept file, we wanna be able to find out:

   - filename
   - service
   - superservice
   - timerange
   - subject/hostname/fromaddress (maybe even complete mailheaders
       of email message which contained the log file)
   - some external id (e.g. hostname, to be able to merge different reorts
       which report on the same thing)
   - format (xml, log, report, or maybe even something else)

We use an 'LR_ID' to identify a job for the Lire system, i.e. a received
email message or local log file.

We use a 'REPORT_ID' to identify a report.  One log file could get split in
parts about e.g. different days.  For each day, a separate report could get
generated.  Other ways to split are possible (e.g. for log files which carry
lines about different hosts or even services.)

Perhaps it's wise to include an LR_ID in the generated report.

We could store meta information in an index file (e.g.
/var/lib/lire/data/meta/index), which could look like:

 LR_ID-9871614364-1456 subject gelfand test
 LR_ID-9871614364-1456 service email
 LR_ID-9871614364-1456 time_span 2001050427
 LR_ID-9871614364-1456 type rcvdemail

 LR_ID-987161443426-234 time_span 20010527-20010528
 LR_ID-98716144999-234 time_span 200105270104-200105282359
 LR_ID-98716144988-261 time_span 200105
 LR_ID-98716144988-261 type report

 LR_ID-98716144988-261 dlflines 45

 $LR_ID time_rfc "$RFC_TIME"
 $LR_ID time_begin "$TIME_BEGIN"
 $LR_ID time_end "$TIME_END"
 $LR_ID time_span "$LR_TIME"

 LR_ID-98716144988-261 extid gelfand_20010513

That is: idtag space key space value-with-possibly-embedded-spaces .
type can be: rcvdemail, sntemail, report, log, dlf

Perhaps we should think of some relational database model, and implement it
accordingly.

time ranges should be UTC, in "allmost human readable format":
yyyymm[dd[hh[mm[ss]]]][-yyyymm[dd[hh[mm[ss]]]]]

The directorylayout could be:

                                  service.subservice (sub)reporttype
 /var/lib/lire/data/report/xml/email/postfix/complete/extid/20010527-20010528
 /var/lib/lire/data/report/html/
 /var/lib/lire/data/report/ascii/
 /var/lib/lire/data/email/raw/
 /var/lib/lire/data/email/plain/
 /var/lib/lire/data/log/dlf/www[/apachecommon?]/viewtype/extid/200105
                                                ^^^^^^^^

We should get rid of "subservices" like apache's common.

where should different 'views' go?  and filtered logs?  E.g., currently we
have 'filter' and 'filter_messages' for email.  The are filters from dlf to
dlf.

Currently ( Fri Jun 22 00:17:54 CEST 2001 ) these fields are used by the
various scripts:

field           set by                          read by

extid       lr_processmail(if ARCH),
              lr_log2mail(if ARCH)

time_span   lr_dlf2xml(if ARCH)       lr_processmail (if ARCHIVE set, to
                                        construct name stored file),
                                      lr_log2mail (if ARCHIVE set),
                                      lr_log2report (ARCH) ,lr_log2xml (ARCH)

time_rfc    lr_dlf2xml (if ARCH)
time_begin  lr_dlf2xml (if ARCH)
time_end    lr_dlf2xml (if ARCH)

loglines    lr_log2report (if ARCH)
dlflines    lr_log2xml                       lr_log2report (and purged if
                                               ARCH unset)


After a the system runs for a while, var/lib/lire/data could be holding files
like these:

data/email/raw/email/exim/exim_anon_from_hibou/20001202121106-20011130081041
data/log/dlf/email/complete/exim_anon_from_hibou/20001202121106-20011130081041
data/log/dlf/email/complete/localhost/20001202121106-20011130081041
data/log/dlf/email/filter/exim_anon_from_hibou/20001202121106-20011130081041
data/log/dlf/email/filter/localhost/20001202121106-20011130081041
data/log/dlf/email/filter_messages/exim_anon_from_hibou/20001202121106-20011130081041
data/log/dlf/email/filter_messages/localhost/20001202121106-20011130081041
data/log/dlf/www/complete/localhost/20010626053604-20010626142307
data/log/dlf/www/filter_pics/localhost/20010626053604-20010626142307
data/log/dlf/www/filter_trackpage/localhost/20010626053604-20010626142307
data/log/raw/email/complete/exim_anon_from_hibou/20001202121106-20011130081041
data/log/raw/www/complete/localhost/20010626053604-20010626142307
data/meta/index
data/report/ascii/email/exim/complete/exim_anon_from_hibou/20001202121106-20011130081041
data/report/xml/email/complete/exim_anon_from_hibou/20001202121106-20011130081041
data/report/xml/www/complete/localhost/20010626053604-20010626142307

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

- Extract the name of the loghost from the submitted log file, use this in the
  generated report.

- Maybe it's useful to store complete header of incoming mail in a control
  file.


* Documentation

- include doc/images/reportgen.png and dlf2email.png in the developers (or
  users) manual.

- use pod2html or dwww to generate manpages in html format.

- Deal with obsolete docs: Merge ./doc/blurb with current README file.

- Add pointer to cvs-hibou ... copyright.xml to developers reference.

* Other projects

- Take a look at others:

  - an exhaustive list is being maintained by Tina Bird, from securityfocus's
      LogAnalysis mailinglist

  - syslog-summary, a python script by Lars Wirzenius <liw@iki.fi>:
      This program summarizes the contents of a log file written by syslog,
      by displaying each unique (except for the time) line once, and also
      the number of times such a line occurs in the input. The lines are
      displayed in the order they occur in the input.

      License is GPL

      http://packages.debian.org/stable/admin/syslog-summary.html

  - logtools, a bunch of tools, written in C++, for managing log files, by
    Russell Coker <russell@coker.com.au>.  GPL-ed.
    fmerge - merge common-log-format web logs in order without sorting (good for
    when you have a gig of logs).
    logprn - like "tail -f" but after a specified time period of inactivity will
      run a program (such as lpr) and pipe the new data to it)
    funnel - pipe one stream of data to several files or processes.
    clfsplit - split CLF format web logs by client IP address.
    clfdomainsplit - split CLF logs by server domain.

    http://www.coker.com.au/logtools/

  - xlogmaster, a monitoring program, soon to be replaced by GNU AWACS, by the
     GNU project.

    The Xlogmaster is a program that lets you monitor everything that's going
    on on your system in a very quick and comfortable way. It allows reading
    log files, devices or running status-gathering programs, translating all
    data (if wished) and displaying it with filters for highlithing /
    lowlighting / hiding lines or taking actions upon user-defined events. 

    http://www.gnu.org/software/xlogmaster/

  - Log Tool, a syslog parser and convertor, by A.L.Lambert <max@xjack.org>,
     sponsored by ManISec Inc.

    Logtool is a command line program that will parse syslog (and
    syslog-like) log files into a more palatable format. It will take anything
    resembling a standard syslog file (this includes syslog-ng, and probably
    most of the other variants out there), and crunch it into one of the
    following formats for your viewing pleasure:
    * ANSI (colorized for easy "at a glance" viewing)
    * ASCII (for e-mail'ed reports, and term's that don't support color)
    * CSV (for importing into your favorite spreadsheet/database)
    * HTML (for generating web pages)
    * RAW (for no good reason)

    http://www.xjack.org/logtool/


  - Analog, shows the usage patterns on your web server
    http://www.analog.cx/

  - acidlab -- Analysis Console for Intrusion Databases

    The Analysis Console for Intrusion Databases (ACID) is a PHP-based
    analysis engine to search and process a database of incidents generated
    by security-related software such as IDSes and firewalls (e.g. Snort,
    ipchains).

    http://acidlab.sourceforge.net/

  - pflogsumm , a Postfix log file analyser, written in perl, GPL-ed

    pflogsumm.pl is designed to provide an over-view of postfix activity, with
    just enough detail to give the administrator a "heads up" for potential
    trouble spots

    http://jimsun.linxnet.com/postfix_contrib.html

  - fwanalog , a shell script to analyze firewall log files, by Balzs Brny,
    GPL-ed

    fwanalog is a shell script that parses 
    and summarizes firewall log files. It currently (version 0.1) understands 
    logs from ipf (tested with OpenBSD 2.8's ipf) and Linux 2.4 iptables

    http://tud.at/programm/fwanalog/

  - The Webalizer , a web server log file analysis program, by
    Bradford L. Barrett <brad@mrunix.net>

    The Webalizer is a fast, free web server log file analysis program. It
    produces highly detailed, easily configurable usage reports in HTML
    format, for viewing with a standard web browser.

    http://www.mrunix.net/webalizer/


  - Bug#98702: wnpp: ITP: libunix-syslog-perl

  - Take a look at the CPAN perl module Log::LogLite.

  - logcheck, http://www.psionic.com/abacus/logcheck

  - Take a look at what netstat does.  E.g. on 
    http://v1.nedstatbasic.net/stats?AAsIQwGx7TwVU48pu4qo/jS/exEw

  - NISCA, an MRTG replacement

    NISCA is a replacement for MRTG. It stands for "Network Interface
    Statistics Collection Agent".
    It gives traffic statistics on the network interfaces on routers and
    switches and whatnot. It doesn't require SNMP to do its job.

    http://www.isthisthingon.org/nisca,

  - swatch -- log file viewer with regexp matching
    http://www.oit.ucsb.edu/~eta/swatch

  - fwlogwatch -- Firewall log analyzer: ipchains, netfilter/iptables,
    ipfilter, Cisco IOS and Cisco PIX log summary reports in text and HTML
    form, http://cert.uni-stuttgart.de/projects/fwlogwatch/

  - News reporting, see vanbaal@gelfand email message 
    200103240017.f2O0H1N03210@nerys.ehv.lx

  - webreport : http://www.inter7.com/webreport/
    web report is a web log statistics reporting program
    especially designed for virtual hosting sites. It is also very useful
    for single hosting sites. The main difference between web report and
    other statistics programs is a configuration file which allows for
    easy manipulation of the features.

  - log2mail : log2mail is a small daemon watching log files and sending
    mail to a specified address if a regular expression is matched.

  - modlogan : they're doing very similar stuff as we are.

  - LogTrend : http://www.logtrend.org/english/

* Various

- secondary.com ns records.

- debian package: Make sure service/all/etc/lr_spoold caller gets installed 
  in /etc/init.d/ when doing a make, in case we're on a GNU/Linux platform.

- get lr_rawmail2mail deal with more than one log file. see mail wytze to 
  development: we should parse subject on the client side.

- get scripts in bin/ behave sanely when running as script --help
  and script --version.

- Add new services:
  [23-Mar:13:52 jama] joostvb: hoe hoog staan auth.log en kern.log op je todo
         lijstje?
  [23-Mar:13:54 jama] deze kunnen wel nuttig zijn bij onregelmatigheden.
  [23-Mar:13:55 joostvb] niet zo hoog, nog
  [23-Mar:13:55 jama] en misschien at scanlog reporting.
  [23-Mar:13:55 jama] oke.
  [23-Mar:13:56 joostvb] nu staan ze erop, tnx :)

I.e., auth.log and kern.log could be useful in case "irregularities" occur.

- Document list of supported services / superservices in _one_ place.  Refer
  to this place in manpages.

- Think about using mktemp or tempfile.  (Currently we use our own tmpdir.)


* Configuration

- Decide wether we need AC_CHECK_PROG(HASPDFXMLTEX, pdfxmltex, yes, no),
  AC_CHECK_PROG(HASJAVA, java, yes, no) and DBKXSLXHTML in configure.in.
  Currently (Sat Jun 23 11:08:06 CEST 2001) these are not being used.


* Images

- sanitize layout of graphics: Y-range is too large, X's to wide (rotate 90
  degrees?), or label?


* Packaging

- FreeBSD port?  Openpackages package for *BSD's ( http://openpackages.org/ ) ?


* Other

 cvs-sourceforge/logreport/service/doc/BUGS for stuff
  about Lire
 cvs-sourceforge/logreport/docs/website/new_content.txt for stuff
  about the website (logreport/docs/devel/notes.txt should get
  merged with this, actually)


