ANNOUNCE: amavisd-new-2.9.0 has been released
Mark Martinec via amavis-users
amavis-users at amavis.org
Fri May 9 20:30:35 CEST 2014
The version 2.9.0 of amavisd-new has been released
and is available at:
http://www.ijs.si/software/amavisd/amavisd-new-2.9.0.tar.xz
http://www.ijs.si/software/amavisd/amavisd-new-2.9.0.tar.bz2
Release notes are at:
http://www.ijs.si/software/amavisd/release-notes.txt
There are no changes between 2.9.0-rc2 and 2.9.0 except for a
version bump.
Enjoy!
Mark
amavisd-new-2.9.0 release notes
Contents:
COMPATIBILITY
NEW FEATURES SUMMARY
RELAXED LICENSE
BUG FIXES
NEW FEATURES
OTHER
WHY REDIS?
COMPATIBILITY
This version drops dependency on a Perl module Redis, and makes
dependencies on modules Convert::TNEF and Convert::UUlib truly optional.
The following change may affect third-party log parsers:
To facilitate forensic log analysis and troubleshooting, log entries
'FWD from' and 'SEND from' at level 1 now carry one additional
prefixed information field which is the unique internal mail_id of
the message, possibly followed by a parent_mail_id in parenthesis,
e.g.:
(00525-02) XE9xnQYjrWyd FWD from <...> -> <...>, ...
(00495-02) v1pyIOMQkUYD(CIcqao-vCDO9) SEND from <...> -> <...>, ...
No other incompatibilities with a previous version 2.8.1 are expected.
NEW FEATURES SUMMARY
- structured log/reporting to a Redis server in JSON format;
- IP address reputation (uses a Redis server);
- added two minor content categories to the major ccat CC_UNCHECKED
(encrypted (=1) and over-limits/mail-bomb (=2) );
- introduced a by-recipient setting %final_destiny_maps_by_ccat.
RELAXED LICENSE
Some utility / auxiliary programs that were previously released under a
3-clause BSD license, are now available under a more relaxed 2-clause
BSD
license (also known as a "Simplified BSD License" or a "FreeBSD
License").
Affected programs are: amavis-mc, amavis-services, amavisd-status,
amavisd-snmp-subagent-zmq, amavisd-release, amavisd-submit,
p0f-analyzer.pl,
amavisd-nanny, amavisd-agent, amavisd-snmp-subagent, amavisd-signer,
JpegTester.pm, and TinyRedis.pm.
Note that TinyRedis.pm is provided in the package as a separate file
and includes a documentation section. Its copy is also included in
the file amavisd, so that the separate file is not needed for Amavis
operation. The separate copy is provided under a 2-clause BSD license
so that it may be useful for third parties if desired. Eventually it
could be moved to CPAN as an independent module.
A license of the main program 'amavisd' remains unchanged GPLv2.
BUG FIXES
- fixed "Insecure dependency in sprintf" in Sophos SAVI av-scanner,
reported by Maciej Uhlig;
- fixed the interface code to virus scanners Sophie, Trophie and
fpscand,
where a time-out on a long-running virus scan would leave a connection
to the virus scanner open and a late response from a scanner to a
previous request could be interpreted as a result of the current scan;
reported by David Schweikert;
- fixed a bug in transforming an IPv6 alternative form IP address into
a preferred form. One effect of this bug was declaring an IPv4-mapped
IPv6 address as syntactically incorrect; reported by Patrick Domack;
- if SQL logging was disabled a pen pals feature was non-functional even
when a Redis storage back-end was available and collecting data; now
pen pals is fully functional with a Redis database back-end and no
SQL;
- provided our own Redis client code, avoiding Redis CPAN module bugs,
its slowness and non-support for IPv6.
The noteworthy Redis CPAN module bug is the #38 (failing to re-select
a non-zero-index database after an automatic re-connect to a server).
See: https://github.com/melo/perl-redis/issues/38
https://github.com/melo/perl-redis/issues/28
- fixed a regexp in parsing wildcarded signing domain in a DKIM key
declaration and in a wildcarded sender pattern of signing options
(this feature is rarely used, exists for compatibility with
dkim_milter);
- dropped hard-coded dependency on modules Convert::TNEF and
Convert::UUlib.
The Convert::TNEF was made optional in amavisd-new-2.8.0, but the
program still failed if the module could not be loaded at startup.
Both of these modules are now loaded at run time when first used, if
specified in the @decoders setting. The use of module Convert::UUlib
(the do_ascii entry) is disabled in a default setting of @decoders,
and the module Convert::TNEF (the do_tnef entry) is not used
if an external TNEF decoder (the do_tnef_ext entry) is available,
or if disabled in the @decoders list;
- import a missing do_log_safe() in Amavis::LDAP::Connection to avoid
a warning: _WARN: \t(in cleanup)
Undefined subroutine &Amavis::LDAP::Connection::do_log_safe
called at (eval 101) line 76 during global destruction;
a patch by Quanah Gibson-Mount;
- at startup amavis may try to find a decoder for 7z and zip extensions
twice; a fix by Quanah Gibson-Mount;
- fixed the amavisd-new-courier.patch which resulted in two instances
of sub post_bind_hook(). Only tested for syntax. Thanks to Eray Aslan.
NEW FEATURES
- Structured logging/reporting in JSON format is now available through
a redis server.
Each processed mail message and each generated mail message (e.g.
a delivery status notification) generates a structured data object
(internally a perl associative array). Its fields carry information
on most attributes of a mail message and its processing, similar
to what is available for logging via macros. Unlike a plain text
log which can be difficult to parse and inconsistent due to user
configurability of the log template, the data object contains
information in a structured form as key/value pairs, where each
value can be a scalar or a list or an associative array.
This internal data object is then serialized to a JSON format and
sent to a redis server, where it is appended to a list under a key
(arbitrary string) configured by $redis_logging_key setting. This
list serves as a queue of log events, which may be pulled from the
queue by some third party application, e.g. by a logstash utility
or by some home-grown program. Redis server is quite handy for this
purpose as it offers blocking requests for pulling events from a
queue, which makes it easy to interface with an event processing
program. The queue also allows for independent and asynchronous
operation between amavisd child processes filling the queue, and
a log analyzer pulling entries from the queue.
The structured logging to redis is enabled when @storage_redis_dsn
is configured (see below at the 'IP address reputation' section)
and the setting $redis_logging_key is set to some nonempty and
nonzero string, and the $redis_logging_queue_size_limit is set
to some positive integer value (corresponding to a maximal number
of entries allowed in a queue).
Both the $redis_logging_key and $redis_logging_queue_size_limit are
undefined by default, so structured logging to redis is disabled
by default even if @storage_redis_dsn is configured.
The string in $redis_logging_key determines the key in a redis
database where the event queue (a redis list) will be maintained.
Semantically it is a name of the queue. This setting is a component
of policy banks, so log entries can be fed into different redis
queues depending on a policy bank loaded for each mail message.
To prevent a queue in the redis server from growing out of bounds,
e.g. when an event-pulling program is temporarily nonfunctional or
its processing is falling behind, the $redis_logging_queue_size_limit
setting imposes a maximal number of events that amavisd may push into
the queue, i.e. the maximal queue size. If the queue size limit is
reached, new log events from amavisd are discarded as long as the
queue size is at the limit. As a redis database is kept in memory, it
makes sense to choose the value of $redis_logging_queue_size_limit low
enough so that it does not use too much memory if the log processing
program goes down, but also high enough so that short outages of
the log processing program do not lose any log events. The setting
$redis_logging_queue_size_limit is global (not a component of policy
banks).
And example setting:
@storage_redis_dsn = ( { server => '[::1]:6379', db_id => 1 } );
$redis_logging_queue_size_limit = 300000;
# takes about 250 MB of redis memory per 100000 log entries
$redis_logging_key = 'amavis-log';
$policy_bank{'MYNETS'} = {
originating => 1,
redis_logging_key => 'amavis-log-myusers', # overrides global
setting
}
The oldest event may be pulled from listed queues by the redis
command:
BLPOP amavis-log amavis-log-myusers 0
so from a command line this may look like:
$ redis-cli -h ::1 -p 6379 -n 1
BLPOP amavis-log 0
The BLPOP redis command blocks if the queue is empty and only returns
when the queue becomes nonempty, which makes it easy to use. For high
event rates it may be more efficient to batch one LLEN and multiple
BLPOP calls in a Lua script executed on a redis server and return
events
in chunks.
An example of a logstash plugin configuration for pulling amavis log
events from a redis server and feeding them to Elasticsearch:
input {
redis {
type => "amavis"
host => "::1"
db => 1
data_type => "list"
key => "amavis-log"
codec => json {}
}
}
filter {
date { match => [ "time_unix", "UNIX" ] }
}
output {
# stdout { codec => rubydebug }
elasticsearch_http {
host => "127.0.0.1"
port => 9200
index_type => "%{type}"
document_id => "%{mail_id}"
codec => json {}
}
}
As an alternative for sending log events to a redis server, it is
possible to use a macro [:report_json] in a log template, which will
expand to a full JSON representation of a log event. As these strings
are fairly long (typically 2 kB to 3 kB), this is not a good solution
when logging to syslog. It may be usable when logging to a file, but
is not an efficient solution and has not been tested in production.
Here is a (fake) example of a structured log report entry in JSON
format, fields are loosely ordered by their semantics in this example.
Not all fields are always present. When a boolean fields is missing
it should be interpreted as a false.
{
"@timestamp" => "2014-05-06T09:29:47.048Z",
"time_unix" => 1399368587.048,
"time_iso_week_date" => "2014-W19-2",
"partition" => "19",
"type" => "amavis",
"host" => "mailer.example.net",
"src_ip" => "::1",
"dst_ip" => "::1",
"dst_port" => 10024,
"log_id" => "82329-04",
"mail_id" => "Jnk7NzYB8pvl",
"mail_id_related" => ["men7HTERZaOF"],
"client_port" => 41831,
"client_ip" => "2001:db8::143:1",
"ip_trace" => ["2001:db8::143:1", "192.0.2.242"],
"os_fp" => "Windows XP; dist: 6; raw_mtu: 1340; ...",
"originating" => true,
"policy_banks" => ["PROXY-ORIGINATING", "MYNETS"],
"size" => 302694,
"digest_body" => "a4a7db6307c140b12f57feaf076663f8",
"mail_from" => "mailing-list-1 at example.com",
"rcpt_to" => ["recip2 at example.org", "recip1 at example.net"],
"rcpt_num" => 2,
"message_id" => "<003701cf690d$b671b3f0$23551bd0 at example.com>",
"author" => ["sending-user at example.com"],
"to_addr" => ["recip1 at example.net"],
"cc_addr" => ["recip2 at example.org"],
"subject" => "Fw: An example 123 - test",
"subject_rot13" => "Sj: Na rknzcyr 123 - grfg",
"user_agent" => "Microsoft Office Outlook 12.0",
"is_bulk" => true,
"is_mlist" => true,
"action" => ["PASS"],
"actions_performed" => "RelayedInternal RelayedOutbound",
"checks_performed" => "V S H B F P",
"content_type" => "Clean",
"dkim_new_sig" => ["example.com"],
"dsn_sent" => false,
"elapsed" => { "Receiving" => 0.009,
"Decoding" => 0.053,
"VirusCheck" => 0.326
"SpamCheck" => 2.116,
"Sending" => 0.118,
"Amavis" => 0.215,
"Total" => 2.672,
},
"message" =>
"82329-04 PASS Clean <mailing-list-1 at example.com>
-> <recip2 at example.org>,<recip1 at example.net>",
"queued_as" => ["3gNFyR4Mfjzc3", "3gNFyR4n6Lzc4"],
"recipients" => [
{ "action" => "PASS",
"ccat_main" => "Clean",
"queued_as" => "3gNFyR4Mfjzc3",
"rcpt_is_local" => false,
"rcpt_to" => "recip2 at example.org",
"smtp_code" => "250",
"smtp_response" => "250 2.0.0 from MTA(smtp:[::1]:10013): 250
2.0.0 Ok: queued as 3gNFyR4Mfjzc3",
"spam_score" => -2.0
},
{ "action" => "PASS",
"ccat_main" => "Clean",
"mail_id_related" => "men7HTERZaOF",
"penpals_age" => 1114599,
"queued_as" => "3gNFyR4n6Lzc4",
"rcpt_is_local" => true,
"rcpt_to" => "recip1 at example.net",
"smtp_code" => "250",
"smtp_response" => "250 2.0.0 from MTA(smtp:[::1]:10013): 250
2.0.0 Ok: queued as 3gNFyR4n6Lzc4",
"spam_score" => -5.272
}
],
"smtp_code" => ["250"],
"spam_score" => -2.0,
"tests" => ["ALL_TRUSTED", "AM.PENPAL", "BAYES_00",
"MSGID_MULTIPLE_AT", "RP_MATCHES_RCVD"],
"tests_ham" =>
["AM.PENPAL","BAYES_00","ALL_TRUSTED","RP_MATCHES_RCVD"],
"tests_spam" => ["MSGID_MULTIPLE_AT"],
}
- IP address reputation
When a Redis storage back-end is enabled, besides the existing pen
pals
functionality, it now also offers information updating and retrieval
on IP address reputation. This function is enabled by default when
@storage_redis_dsn is nonempty, but can be disabled by setting
$enable_ip_repu to false (to 0 or undef), per policy bank if
necessary.
For each mail message a list of public IP addresses (IPv4 or IPv6) is
collected from its 'Received' trace header fields in a mail header
section. A redis server maintains a database of each IP address
encountered. For each IP address an entry carries a set of counters
corresponding to the number of mail messages encountered in the past
having this IP address in a trace header. These counters show: a
number
of spam messages, a number of ham messages, a number of banned or
infected messages, and a total number of messages. Also a timestamp of
the first and last encounter is kept. Each entry (a set of counters)
is subject to automatic expiry, so that infrequently encountered IP
addresses are eventually automatically purged from a database by a
redis server itself.
As a sending IP address may change its role (e.g. some machine was
infected (sending spam) but now has been cleaned, or a NAT-ted address
is reassigned to someone else), currently a crude way of data aging is
implemented by discarding entries older than three days since created.
This may be refined in the future.
When a new mail message is being processed, a lookup on all its public
IP addresses from a trace is done. For each IP address found in a
database a spam score is computed based on a ratio of ham versus
all messages, and based on a total number of messages. The largest
calculated spam score of all encountered IP addresses is then
contributed to a total spam score of a message.
A formula for computing spam score of each IP address is currently
hard-coded, is non-linear and takes into account the total number of
encounters of an IP address, diluted by the ratio of ham messages
versus all messages seen with this IP address. The computed score
cannot be negative, i.e. the IP reputation can only contribute to
spamminess of a message and cannot serve as a 'whitelisting' negative
score. For the exact formula in use see
query_and_update_ip_reputation()
in file amavisd.
A time-to-live of each IP entry is assigned dynamically: frequently
encountered IP addresses are given longer expiration times (days),
infrequent IP addresses are short-lived and eventually expire,
typically in few hours.
It is possible to exclude certain IP addresses or networks from
contributing spam score by listing them in an @ip_repu_ignore_networks
list, e.g.:
@ip_repu_ignore_networks =
qw( 192.0.2.44 192.0.2.45 198.51.100.0/24 2001:db8::1:25 );
This does not preclude a redis lookup on an IP addresses matching
the list, but just takes a zero as its score and does not update
counters on such address. The mechanism is appropriate for excluding
site's own mailers (MSA and MX), or local (e.g. departmental) mailers,
which may on occasion emit a spammy message, but should never receive
a score penalty. There is no need to include private IP address
networks
in the list, as these are already exempt from IP reputation database.
An associated list of lookup tables @ip_repu_ignore_maps (whose only
default entry is the \@ip_repu_ignore_networks) offers more
flexibility
if needed, and is a member of policy banks.
Like other self-learning mechanisms (e.g. SpamAssassin's auto-learn,
AWL, TxRep), the quality of a result depends on a quality of other
spam-gauging rules - the better spam/ham classification works
(SpamAssassin), the more useful IP reputation becomes. For the purpose
of IP reputation's spam and ham counts, a mail is considered spam if
its score is at or above 5, and is considered ham when its final score
is below 0.5. This is currently hard-coded (see sub save_info_final).
Intermediate scores are considered unclassified.
A nice feature of the mechanism is that it reacts fairly quickly
to a new rush-in of unwanted messages from some IP address, either
foreign, or local.
For insight on the IP address reputation behaviour, search the log
for ' redis: IP '. At log level 2 only spammy hits are logged, at
log level 3 also the clean hits are shown. The log entry shows
spam, ham, banned+infected and unclassified counts for an IP address,
a percentage of unwanted (spam+banned+infected) messages out of the
total count, and the associated score.
Apart from starting a redis server on a loopback interface (except for
changing its 'bind' setting in redis.conf, no other configuration
changes
are necessary, a database need not be initialized), here is an example
configuration in amavisd.conf:
@storage_redis_dsn = (
{ server => '[::1]:6379', db_id => 1 },
{ server => '127.0.0.1:6379', db_id => 1 },
);
# list your MX and MSA mailer IP addresses or networks here:
@ip_repu_ignore_networks = qw( 192.0.2.44 2001:db8::/64 );
A redis server needs to support Lua scripting, which is available
since version 2.6. Support for IPv6 is available since version 2.8.0
of the redis server.
- Added support for decompressing LZ4 streams in mail attachments when
an external utility lz4c is available and the 'file' utility
recognizes
such streams (probably since version file-5.17). Default settings
of @decoders and $map_full_type_to_short_type_re now recognize LZ4;
if these settings are replaced by a configuration file, the config
file needs to be updated to include the new entry.
- Added two minor content categories to the major ccat CC_UNCHECKED
to allow distinguishing between reasons of decoders failure.
* a minor ccat 1 now indicates that at least one mail part was
encrypted or otherwise scrambled (e.g. password protected archive);
* a minor ccat 2 now indicates that some of the limits for protection
against mail bombs was exceeded (e.g. $MAXLEVELS, $MAXFILES,
$MAX_EXPANSION_QUOTA, $MAX_EXPANSION_FACTOR).
Based on a suggestion and a patch by Carsten Wolff.
The additional information can be used in any of the *_maps_by_ccat
settings, e.g.:
$subject_tag_maps_by_ccat{CC_UNCHECKED.',1'} =
[ '***UNCHECKED(Encrypted)*** ' ];
$subject_tag_maps_by_ccat{CC_UNCHECKED.',2'} =
[ '***UNCHECKED(OverLimit)*** ' ];
or:
$defang_by_ccat{CC_UNCHECKED.',2'} = 1;
- introduced a setting %final_destiny_maps_by_ccat, which makes it
possible to specify by-recipient final destiny for each contents
category, e.g. use D_REJECT on spam to some users, and D_BOUNCE or
D_DISCARD or D_PASS for others. Introduced mostly for completeness.
As a backward compatibility measure the existing
%final_destiny_by_ccat
is now an alias for the new %final_destiny_maps_by_ccat;
- added a setting $outbound_disclaimers_only. When set to true and
disclaimers are enabled, it will only allow adding disclaimers
to non-local recipients. For backward compatibility the default
value is false (undef). Based on a patch by Quanah Gibson-Mount;
- the $recipient_delimiter setting can now hold a multi-character
string,
specifying all characters that can delimit an address extension from
a base e-mail address. Previously this setting was restricted to a
single character (typically a '+' or a '-').
When parsing existing e-mail address any of the characters in
$recipient_delimiter can delimit an address extension. When adding an
address extension (through %addr_extension_maps_by_ccat), the first
character in the $recipient_delimiter string is used as a delimiter.
The change is now in line with a postfix 2.11 that added support
for multi recipient-delimiters, and a similar feature in Dovecot.
A patch contributed by Patrick Domack.
- added macros report_json and rot13 (to be used in a log template):
* the macro 'report_json' expands to a JSON representation of a
structured log event;
* the macro 'rot13' replaces a string in its argument with an
obfuscated
string where letters are shifted by 13 positions of an English
alphabet (a popular variant of a Caesar cipher to conceal spoilers);
this may serve to (poorly) hide strings such as mail Subject or
an e-mail address from casual browsing of a log;
OTHER
- dropped dependency on a CPAN module Redis, implementing our own
client-side redis protocol implementation (Amavis::TinyRedis).
It is faster and smaller, and supports opening sessions with a
redis server over IPv6 (or over IPv4 or over a Unix socket).
The redis server supports IPv6 starting with version 2.8.0.
Currently supported options in @storage_redis_dsn are:
server, db_id, password, and ttl.
The 'server' specifies an INET or INET6 socket (a host IP address
or name and a port number) or an absolute path to a Unix socket.
An IPv6 address must be enclosed in square brackets. The default
value is '127.0.0.1:6379'. Match this with your redis configuration.
Option 'db_id' specifies a redis database index (given to a "SELECT"
redis command). Its value is a (small) integer, defaults to 0.
This allows for independent databases to co-exist on the same redis
server, e.g. an amavis database and a SpamAssassin Bayes database.
The 'ttl' option can override a global setting $storage_redis_ttl
on a per-server basis. Its value is an integer, representing a number
of seconds for expiration time of pen pals records. It defaults to
$storage_redis_ttl, which in turn defaults to 16 days (in seconds).
This setting does not affect IP reputation records, whose expiration
time is computed dynamically.
Example:
$storage_redis_ttl = 22*24*3600; # 22 days for pen pals records
@storage_redis_dsn = ( # alternative servers, use the first which
works
{ server => '[::1]:6379', db_id => 1 },
{ server => '127.0.0.1:6379', db_id => 1, password => 'abc...' },
{ server => '/tmp/redis.sock', db_id => 1, ttl => 8*24*3600 },
);
Btw, make sure to keep the setting $database_sessions_persistent
at its default value (1, i.e. enabled), otherwise Redis performance
will suffer somewhat.
- store only essential information for pen pals operation to a Redis
storage back-end to save memory on a database server; information on
inbound messages is no longer stored there, i.e. only information on
originating messages is kept;
- more informative logging of pen pals query results when using a Redis
storage back-end. The redis support code (Lua and protocol handling)
was largely rewritten for efficiency since amavisd-new 2.8.1.
- added LDAP attribute amavisDisclaimerOptions
1.3.6.1.4.1.15312.2.2.1.47
to LDAP.schema; contributed by Quanah Gibson-Mount;
- reduced EDNS payload size from 1240 bytes to a conservative default
of 1220 bytes when calling Mail::DKIM verifier;
- optimization: filter for public IP addresses from a Received trace
only once;
- added one digit of precision in the TIMING log report to reported
small
elapsed times (below 5 ms);
- in a milter setup (AM.PDP) the log-id wasn't unique; adding a request
sequence number to it; a patch by Andreas Schulze;
- avoid writing a notification to stdout about a warm reload for the
benefit
of a cron job; a patch by Andreas Schulze;
- reduced log level on some of the less useful log messages in a milter
setup; a patch by Andreas Schulze;
- documentation README.sql-mysql: added "CREATE INDEX
msgs_idx_mail_id..."
with a note on an InnoDB requirement for a foreign key; by Jernej
Porenta;
WHY REDIS?
A redis database was chosen initially because SpamAssassin 3.4.0
supports
keeping its Bayes database in a redis server, which makes it very fast,
so this makes a redis database readily available to amavisd too.
Redis has some features that make it suitable for use as a pen pals
database, for Bayes storage, and now for IP reputation and structured
logging:
- automatic expiration of entries based on key's individual time-to-live
setting makes explicit database maintenance unnecessary;
- accessible over INET (or Unix sockets) allows several amavisd hosts
to use a common redis server, possibly running on a dedicated host;
- supports Lua scripting, which makes it possible to perform multiple
basic operations in one go as a single application's functional
operation. It reduces multiple network round-trip times to a single
network transaction, reducing network packet rate and latency;
- compared to SQL storage for pen pals (and for Bayes database),
the redis read speed is somewhat faster, and the write speed is
MUCH faster;
- as an in-memory database with optional periodic disk persistence
it makes it suitable for use as a pen pals, as IP reputation and
as Bayes storage: it is fast, and a potential redis server restart
reloads data from the last snapshot, thus only losing the last
minute or two of updates when trouble strikes, which is acceptable
for these three databases.
- makes it possible to eliminate SQL r/w storage if its only purpose
was to provide pen pals functionality (and SpamAssassin's Bayes);
Caveat:
Redis server does not offer access controls or strong authentication
mechanisms. For running a server on the same host as amavisd is running
the solution is straightforward: just bind the redis server to a
loopback interface or use a Unix socket. If a network access is desired,
consider protecting the redis server by a firewall (host-local, or
on a dedicated subnet).
More information about the amavis-users
mailing list