
Here's a tutorial on writing useful configs:

First, every log_analysis config should contain a "config_version"
directive.  So, start your config (say, log_analysis.conf) like so:

config_version 0.44

Then run log_analysis on your syslog file and check the output, ie:

log_analysis /var/log/syslog.20020719

If you use a non-standard basename (for example,
/var/log/routers.20020719), log_analysis won't know what kind of log
type it is, and will complain.  You can force the issue by adding this
to your config:

add arr syslog_filenames=
routers

And then run:

log_analysis -f log_analysis.conf /var/log/routers.20020719

So far, so good.  Now, suppose log_analysis complains that there are a
bunch of unknowns, like, say, 

Unknowns for type syslog:
1          kernel: firewall log: inp DENY eth0 PROTO=17 10.128.104.1:4745 255.255.255.255:137 (#22)
1          kernel: firewall log: inp DENY eth0 PROTO=17 10.128.104.1:67 255.255.255.255:68 (#23)
1          kernel: firewall log: inp DENY eth0 PROTO=17 10.128.104.1:67 255.255.255.255:68 (#24)
1          kernel: firewall log: inp DENY eth0 PROTO=17 10.128.104.1:67 255.255.255.255:68 (#25)

Suppose you want those four messages to go under a common heading,
like maybe "firewall denied".  You also want them rewritten so that
the three identical port 68 packets can correlate.  For example, you
have this output as your goal:

firewall denied:
3          from 10.128.104.1 to 255.255.255.255 port 68
1          from 10.128.104.1 to 255.255.255.255 port 137


Start off by adding a line to your config that says:

logtype: syslog

Add one of the raw data parts of an above line to your file, with the
keyword "pattern:", like so:

pattern: kernel: firewall log: inp DENY eth0 PROTO=17 10.128.104.1:67 255.255.255.255:68 (#24)

Next, edit that line, and escape (ie. put a backslash, "\") in front
of any punctuation characters or special characters in the data.  [1]
For example:

pattern: kernel\: firewall log\: inp DENY eth0 PROTO\=17 10\.128\.104\.1\:67 255\.255\.255\.255\:68 \(\#24\)

Now, figure out which parts are variable.  For example, "eth0" is an
interface, the "17" is an IP protocol, "10.128.104.1" is a source IP,
"67" is a port, "255.255.255.255" is an IP, "68" is a port, and "24"
is some sort of counter.  Replace each variable with something like
($pat{ip}), ($pat{int}), etc.  You can get a list of available "pats"
by running log_analysis -I pat.  Note that the parentheses here should
*not* be escaped.  [2] For example:

pattern: kernel\: firewall log\: inp DENY ($pat{word}) PROTO\=($pat{int}) ($pat{ip})\:($pat{int}) ($pat{ip}):($pat{int}) \(\#($pat{int})\)

Each variable was saved into a special token -- $1, $2, $3, etc.  So,
the interface would be in $1 (ie. "eth0"), the protocol in $2, source
IP in $3, etc.  

Next, decide what format you want the data to be rewritten in.  For
example, you might want the per-entry format to be "from SOURCE_IP to
DEST_IP port DEST_PORT", leaving out the source port (which is usually
uninteresting) and the counter (which is useless info).  So, add a
"format" line:

format: from $3 to $5 port $6

Now, you just need the category name to log to.  If you like "firewall
denied", you'd say:

dest: firewall denied

And you're done with this type of log message.  If you have more
unknowns, just keep addings patterns, formats, and dests.

If you have a *lot* of unknowns, or log_analysis is taking a long time
to run, you might find the "-u unknowns" and "-U" options useful.
Here's how I use them:

log_analysis -f myconfig.conf -u unknowns -U syslogfile
<control-c after a few seconds>
log_analysis -f myconfig.conf -u unknowns syslogfile
<edit myconfig.conf as described above>
log_analysis -f myconfig.conf -u unknowns syslogfile
<edit and repeat until not many unknowns left>
rm -r unknowns
log_analysis -f myconfig.conf -u unknowns -U syslogfile
<control-c after a few more seconds, and repeat>

log_analysis can do quite a bit more, but this should get you started.
If something here isn't clear, feel free to post a few lines of sample
logs and describe what you want done with them.


[1] Strictly speaking, you only need to escape characters that have a
special meaning in perl regexes, such as "(" or ")", not ":" or "=".
If you want to be lazy, read perl documentation or experiment.  When
in doubt, escape.  From a future compatibility perspective, escaping
is an even better idea.

[2] You can actually use any perl regex here if you want to; just
remember to put it in parentheses.  [3]

[3] Strictly speaking, you only need to put things in parentheses if
you might want to refer to them later, for example, in a format.  Only
things in parentheses get saved to $1, $2, etc.  But I recommend
putting all variable parts in parentheses.

