Preamble:
=========

This document lists all analysis components, that are components
that emit events to the reporting infrastructure on certain conditions.
The components in the following list are annotated with short
codes describing their properties to speed up search for suitable
analysis component.

  Property codes:

  * (A)utoconfiguration: This component may learn from the input
    data and adapt itself to new inputs.
  * (F)iltering: This component just filters input and distributes
    it to other analysis components.
  * (H)ardwired: This component generates events by hard rules.
    This is the opposite to "statistical triggering".
  * (N)ondeterministic: This component may react differently to
    the same input in two runs.
  * (R)eporting: This component will generate analysis reports
    for evaluation by an analyst. Those components can be very
    useful in the configuration phase to understand the processed
    data better.
  * (S)tatistical triggering: This component uses statistical
    methods to trigger on unexpected data. Such components may
    miss relevant events or cause false-positives.

List of components:

* EnhancedNewMatchPathValueComboDetector (AH): Same as
  NewMatchPathValueComboDetector but also supporting value transformation
  and storage of extra data.

* HistogramAnalysis.HistogramAnalysis (R): Create histogram reports
  for parsed values.

* HistogramAnalysis.PathDependentHistogramAnalysis (R): Create
  path-dependent histogram reports.

* MatchValueAverageChangeDetector (AS): Detect when average value
  of given parsed value changes over time.

* AtomFilters.MatchValueFilter (F): Use value of parsed element
  to forward input data to other analyzers.

* MatchValueStreamWriter (F): Forward selected input data e.g.
  as CSV list, to other compoments via stream, e.g. to perform
  analysis in another tool.

* MissingMatchPathValueDetector (AH): Detect when values for a
  given path are not received for a longer timespan, e.g. a host,
  service or address stopped sending/reporting.

* MissingMatchPathListValueDetector (AH): Like MissingMatchPathValueDetector
  but looking on more than one match path for key extraction.

* NewMatchPathDetector (AH): Generate events when new parser pathes
  are found.

* NewMatchPathValueComboDetector (AH): Same as NewMatchPathValueDetector
  but considers combination of values for list of data pathes,
  e.g. source IP, destination IP, destination port for link analysis.

* NewMatchPathValueDetector (AH): Generate events when new parsed
  values are observed for a given path, e.g. new MAC addresses,
  user names, ...

* TimeCorrelationDetector (ANR): Try to detect time correlations
  and report them.

* TimeCorrelationViolationDetector.TimeCorrelationViolationDetector (H):
  Detect changes in time correlation on a given ruleset.

* TimestampCorrectionFilters.SimpleMonotonicTimestampAdjust (F):
  Adjust decreasing timestamp of new records to the maximum observed
  so far to ensure monotony for other analysis components.

* TimestampsUnsortedDetector.TimestampsUnsortedDetector (HR):
  This detector is useful to to detect algorithm malfunction or
  configuration errors, e.g. invalid timezone configuration.

* AllowlistViolationDetector (FH): Check all inputs using ruleset
  and create events, forward input to other components.


HistogramAnalysis.HistogramAnalysis:
====================================

This component performs a histogram analysis on one or more input
properties. The properties are parsed values denoted by their
parsing path. Those values are then handed over to the selected
"binning function", that calculates the histogram bin.

* Binning:

Binning can be done using one of the predefined binning functions
or by creating own subclasses from "HistogramAnalysis.BinDefinition".

  * LinearNumericBinDefinition: Binning function working on numeric
    values and sorting them into bins of same size.

  * ModuloTimeBinDefinition: Binning function working on parsed
    datetime values but applying a modulo function to them. This
    is useful for analysis of periodic activities.


* Example:

The following example creates a HistogramAnalysis using only the
property "/model/line/time", binned on per-hour basis and sending
a report every week:

  from aminer.analysis import HistogramAnalysis
  # Use a time-modulo binning function
  modulo_time_bin_definition=HistogramAnalysis.ModuloTimeBinDefinition(
      3600*24, # Modulo values in seconds (1 day)
      3600,    # Division factor to get down to reporting unit (1h)
      0,       # Start of lowest bin
      1,       # Size of bin in reporting units
      24,      # Number of bins
      False)   # Disable outlier bins, not possible with time modulo
  histogram_analysis=HistogramAnalysis.HistogramAnalysis(
      aminer_config,
      [('/model/line/time', modulo_time_bin_definition)],
      3600*24*7,  # Reporting interval (weekly)
      report_event_handlers,        # Send report to those handlers
      reset_after_report_flag=True)  # Zero counters after sending of report
  # Send the appropriate input feed to the component
  atom_filter.add_handler(histogram_analysis)


HistogramAnalysis.PathDependentHistogramAnalysis:
=================================================

This component creates a histogram for only a single input property,
e.g. an IP address, but for each group of correlated match pathes.
Assume there two pathes that include the input property but they
separate after the property was found on the path. This might
be for example the client IP address in ssh log atoms, where the
parsing path may split depending if this was a log atom for a
successful login, logout or some error. This analysis component
will then create separate histograms, one for the path common
to all atoms and one for each disjunct part of the subpathes found.

The component uses the same binning functions as the standard
HistogramAnalysis.HistogramAnalysis, see documentation there.

* Example:

  # Perform path-dependent histogram analysis:
  from aminer.analysis import HistogramAnalysis
  # Use a time-modulo binning function
  modulo_time_bin_definition=HistogramAnalysis.ModuloTimeBinDefinition(
      3600*24, # Modulo values in seconds (1 day)
      3600,    # Division factor to get down to reporting unit (1h)
      0,       # Start of lowest bin
      1,       # Size of bin in reporting units
      24,      # Number of bins
      False)   # Disable outlier bins, not possible with time modulo
  path_dependent_histogram_analysis=HistogramAnalysis.PathDependentHistogramAnalysis(
      aminer_config,
      '/model/line/time',  # The value properties to check
      modulo_time_bin_definition,
      3600*24*7,                  # Reporting interval (weekly)
      report_event_handlers,        # Send report to those handlers
      reset_after_report_flag=True)  # Zero counters after sending of report
  # Send the appropriate input feed to the component
  atom_filter.add_handler(path_dependent_histogram_analysis)


AllowlistViolationDetector:
===========================

This detector manages a list of allowlist rules to filter parsed
atoms. All atoms not hit by any allowlist rule will cause events
to be generated. When an atom is matched by a rule, it will be
regarded as allowlisted by default but there is also an option
to call user-defined functions on a matching rule via MatchAction
elements, e.g. to forward the atom to another analyzer in one
pass. Predefined actions are:

  * EventGenerationMatchAction: Generate events, when a rule matches,
    e.g. to report interesting matches, violations or for debugging.

  * AtomFilterMatchAction: Filter out the parsed atoms on match
    and forward it to other handlers, e.g. analysis components.

* Rules:

The ruleset of this detector is created from classes defined in
aminer.analysis.Rules. See below for short list of supported rules
or source for full documentation:

  * AndMatchRule: match only if all subrules match
  * DebugMatchRule: print debugging text when matching
  * DebugHistoryMatchRule: keep history of matched LogAtoms
  * IPv4InRFC1918MatchRule: match IPs in private networks
  * ModuloTimeMatchRule: match cyclic time values, e.g. nighttime
  * NegationMatchRule: match only if other rule did not
  * OrMatchRule: match if any subrule matches
  * ParallelMatchRule: match if any subrule matches but do not
    stop at first successful match
  * PathExistsMatchRule: match if parsed data contains given path
  * StringRegexMatchRule: match if parsed data string matches
    given regular expression. If applicable, Value[X]MatchRule
    should be used instead.
  * ValueDependentDelegatedMatchRule: select match rules according
    to values from parsed data
  * ValueDependentModuloTimeMatchRule: like ModuloTimeMatchRule
    but select limits according to values from parsed data
  * ValueListMatchRule: match if value is in given lookup list
  * ValueMatchRule: match if parsed data contains specific value
  * ValueRangeMatchRule: match if parsed data value is within
    given range


* Example:

# Run a allowlisting over the parsed lines.
  from aminer.analysis import Rules
  from aminer.analysis import AllowlistViolationDetector
  violation_action=Rules.EventGenerationMatchAction('Analysis.GenericViolation',
      'Violation detected', anomaly_event_handlers)
  allowlist_rules=[]
# Filter out things so bad, that we do not want to accept the
# risk, that a too broad allowlisting rule will accept the data
# later on.
  allowlist_rules.append(Rules.ValueMatchRule('/model/services/cron/msgtype/exec/user', 'hacker', violation_action))
# Ignore Exim queue run start/stop messages
  allowlist_rules.append(Rules.PathExistsMatchRule('/model/services/exim/msg/queue/pid'))
# Add a debugging rule in the middle to see everything not allowlisted
# up to this point.
  allowlist_rules.append(Rules.DebugMatchRule(False))
# Ignore hourly cronjobs, but only when started at expected time
# and duration is not too long.
  allowlist_rules.append(Rules.AndMatchRule([
      Rules.ValueMatchRule('/model/services/cron/msgtype/exec/command', '(   cd / && run-parts --report /etc/cron.hourly)'),
      Rules.ModuloTimeMatchRule('/model/syslog/time', 3600, 17*60, 17*60+5)]))

  atom_filter.add_handler(AllowlistViolationDetector(allowlist_rules, anomaly_event_handlers))
