ANNOUNCE: amavisd-new-2.8.2-rc1 release candidate is available

Wed Sep 4 19:24:44 CEST 2013

A preview of the coming version 2.8.2 of amavisd-new is available at:

  http://www.ijs.si/software/amavisd/amavisd-new-2.8.2-rc1.tar.bz2
  http://www.ijs.si/software/amavisd/amavisd-new-2.8.2-rc1.tar.xz

Release notes are at:

  http://www.ijs.si/software/amavisd/release-notes.txt

amavisd-new-2.8.2-RC1 release notes

Contents:
  COMPATIBILITY
  BUG FIXES
  NEW FEATURES
  OTHER
  WHY REDIS?

COMPATIBILITY

There are no incompatible changes since the previous release.

The version 2.8.2 drops dependency on a Perl module Redis, and makes
dependency on modules Convert::TNEF and Convert::UUlib truly optional.

BUG FIXES

- if SQL logging was disabled a pen pals feature was non-functional even
  when a Redis storage backend was available and collecting data; now
  pen pals is fully functional with a Redis database backend and no SQL;

- provide our own Redis client code, avoiding Redis CPAN module bugs,
  its slowness and non-support for IPv6.
  The noteworthy Redis CPAN module bug is the #38 (failing to re-select
  a non-zero-index database after an automatic re-connect to a server).
  See: https://github.com/melo/perl-redis/issues/38
       https://github.com/melo/perl-redis/issues/28

- fixed a regexp in parsing wildcarded signing domain in a DKIM key
  declaration and in a wildcarded sender pattern in signing options
  (an exotic feature rarely used, compatibility with dkim_milter);

- drop hard-coded dependency on modules Convert::TNEF and Convert::UUlib.
  The Convert::TNEF was made optional in amavisd-new-2.8.0, but the
  program still failed if the module could not be loaded at startup.

  Both of these modules are now loaded at run time when first used,
  subject to @decoders setting. The use of module Convert::UUlib
  (the do_ascii entry) is disabled in a default setting of @decoders,
  and the module Convert::TNEF (the do_tnef entry) is not used
  if an external TNEF decoder (the do_tnef_ext entry) is available,
  or if it is disabled in the @decoders list.

NEW FEATURES

- IP address reputation

  When a Redis storage backend is enabled, besides the existing pen pals
  functionality, it now also offers information updating and retrieval
  on IP address reputation. This function is enabled by default when
  @storage_redis_dsn is nonempty, but can be disabled by setting
  $enable_ip_repu to false (to 0 or undef), per policy bank if necessary.

  For each mail message a list of public IP addresses is collected from
  its 'Received' trace header fields in a mail header section. A redis
  server maintains a database of each IP address encountered. For each
  IP address an entry carries the following counters: a number of spam
  messages having this IP address in a trace header, a number of ham
  messages, a number of banned or infected messages, and a total number
  of messages. Also a timestamp of the last encounter is kept (currently
  only used for logging purposes). Each entry is subject to automatic
  expiry, so that infrequently encountered IP addresses are eventually
  automatically purged from a database.

  When a new mail message is being processed, a lookup on all its public
  IP addresses from a trace is done. For each IP address found in a
  database a spam score is computed based on a ratio of ham versus all
  messages, and based on a total number of messages. The largest spam
  score of all encountered IP addresses is then contributed as a spam
  score of a message.

  A formula for computing spam score of each IP address is currently
  hard-coded, is non-linear and takes into account the total number of
  encounters, diluted by the ratio of ham messages versus all messages
  seen with this IP address. The computed score cannot be negative,
  i.e. the IP reputation can only contribute to spamminess of a message
  and cannot serve as a 'whitelisting' negative score.

  A time-to-live of each IP entry is assigned dynamically: frequently
  encountered IP addresses are given longer expiration times (days),
  infrequent IP addresses are short-lived and eventually expire,
  typically in few hours.

  It is possible to exclude certain IP addresses or networks from
  contributing spam score by listing them in an @ip_repu_ignore_networks
  list, e.g.:

    @ip_repu_ignore_networks =
      qw( 192.0.2.44 192.0.2.45 198.51.100.0/24 2001:db8::1:25 );

  This does not preclude a redis lookup or updating counts on an IP
  addresses matching the list, but just clears a resulting score to zero.
  The mechanism is appropriate for excluding site's own mailers (MSA
  and MX), or local (e.g. departmental) mailers, which may on occasion
  emit a spammy message, but should never receive a score penalty.
  There is no need to include private IP address networks in the list,
  as these are already exempt from IP reputation database.

  An associated list of lookup tables @ip_repu_ignore_maps (whose only
  default entry is the \@ip_repu_ignore_networks) offers more flexibility
  if needed, and is a member of policy banks.

  Like other self-learning mechanisms (e.g. SpamAssassin's auto-learn,
  and AWL), the quality of a result depends on a quality of other
  spam-gauging rules - the better spam/ham classification works
  (SpamAssassin), the more useful IP reputation becomes. For the purpose
  of IP reputation's spam and ham counts, a mail is considered spam if
  it is flagged with a contents category CC_SPAM or CC_SPAMMY (i.e. at
  tag2_level or above), and is considered ham when its final score is
  below 2.0. Intermediate scores are considered unclassified.

  A nice feature of the mechanism is that it reacts fairly quickly
  to a new rush-in of unwanted messages from some IP address, either
  foreign, or local.

  For insight on the IP address reputation behaviour, search the log
  for ' redis: IP '. At log level 2 only spammy hits are logged, at
  log level 3 also the clean hits are shown. The log entry shows
  spam, ham, banned+infected and unclassified counts for an IP address,
  a percentage of unwanted (spam+banned+infected) messages out of the
  total count, and the associated score.

  Apart from starting a redis server on a loopback interface (except for
  changing its 'bind' setting in redis.conf, no other configuration changes
  are necessary, a database need not be initialized), here is an example
  configuration in amavisd.conf:

  @storage_redis_dsn = (
    { server => '127.0.0.1:6379', db_id => 1 },
  );

  # list your MX and MSA mailer IP addresses or networks here:
  @ip_repu_ignore_networks = qw( 192.0.2.44 2001:db8::/64 );

  A redis server needs to support Lua scripting, which is available
  since version 2.6. Support for IPv6 is available since version 2.8.0.

OTHER

- dropped dependency on a CPAN module Redis, implementing our own
  client-side redis protocol implementation (Amavis::TinyRedis).
  It is faster and smaller, and supports opening sessions with a
  redis server over IPv6 (or over IPv4 or over a Unix socket).
  The redis server supports IPv6 starting with version 2.8.0.

  Currently supported options in @storage_redis_dsn are:
  server, db_id, password, and ttl.

  The 'server' specifies an INET or INET6 socket (a host IP address
  or name and a port number) or an absolute path to a Unix socket.
  An IPv6 address must be enclosed in square brackets. The default
  value is '127.0.0.1:6379'. Match this with your redis configuration.

  Option 'db_id' specifies a redis database index (given to a "SELECT"
  redis command). Its value is a (small) integer, defaults to 0.
  This allows for independent databases to co-exist on the same redis
  server, e.g. an amavis database and a SpamAssassin Bayes database.

  The 'ttl' option can override a global setting $storage_redis_ttl
  on a per-server basis. Its value is an integer, representing a number
  of seconds for expiration time of pen pals records. It defaults to
  $storage_redis_ttl, which in turn defaults to 16 days (in seconds).
  This setting does not affect IP reputation records, whose expiration
  time is computed dynamically.

  Example:
    $storage_redis_ttl = 22*24*3600;  # 22 days for pen pals records
    @storage_redis_dsn = (  # alternative servers, use the first which works
      { server => '[::1]:6379',      db_id => 1 },
      { server => '127.0.0.1:6379',  db_id => 1, password => 'abc...' },
      { server => '/tmp/redis.sock', db_id => 1, ttl => 8*24*3600 },
    );

  Btw, make sure to keep the setting $database_sessions_persistent
  at its default value (1, i.e. enabled), otherwise Redis performance
  will suffer somewhat.

- store only essential information for pen pals operation to a Redis
  storage backend to save memory on a database server; information on
  inbound messages is no longer stored there, i.e. only information on
  originating messages is kept;

- more informative logging of pen pals query results when using a Redis
  storage backend. The redis support code (Lua and protocol handling)
  was largely rewritten for efficiency since amavisd-new 2.8.1.

- added LDAP attribute amavisDisclaimerOptions 1.3.6.1.4.1.15312.2.2.1.47
  to LDAP.schema;  contributed by Quanah Gibson-Mount;

- filter for public IP addresses from a Received trace only once;

- add one digit of precision in the TIMING log report to reported small
  elapsed times (below 5 ms);

- documentation README.sql-mysql: added "CREATE INDEX msgs_idx_mail_id..."
  with a note on an InnoDB requirement for a foreign key;  by Jernej Porenta;

WHY REDIS?

A redis database was chosen initially because SpamAssassin 3.4.0 supports
keeping its Bayes database in a redis server, which makes it very fast,
so this makes a redis database readily available to amavisd too.

Redis has some features that make it suitable for use as a pen pals
database, for Bayes storage, and now for IP reputation:

- automatic expiration of entries based on key's individual time-to-live
  setting makes explicit database maintenance unnecessary;

- accessible over inet (or Unix sockets) allows several amavisd hosts
  to use a common redis server, possibly running on a dedicated host;

- supports Lua scripting, which makes it possible to perform multiple
  basic operations in one go as a single application's functional
  operation. It reduces multiple network round-trip times to a single
  network transaction, reducing network packet rate and latency;

- compared to SQL storage for pen pals (and for Bayes database), the
  redis read speed is faster, but the write speed is MUCH faster;

- as an im-memory database with optional periodic disk persistence
  it makes it suitable for use as a pen pals, as IP reputation and
  as Bayes storage: it is fast, and a potential redis server restart
  reloads data from the last snapshot, thus only losing the last
  minute or two of updates when trouble strikes, which is acceptable
  for these three databases.

- makes it possible to eliminate SQL r/w storage if its only purpose
  was to provide pen pals functionality (and SpamAssassin's Bayes);

Mark