Bayes expire files

Mark Martinec Mark.Martinec+amavis at ijs.si
Thu Oct 20 21:00:51 CEST 2011


Julien,

> "Gorn" <Gorn at xs4all.nl> a écrit :
> > http://old.nabble.com/bayes_toks.expire-problem-td22502372.html
> 
> I already tried some of the ways indicated here, and nothing very
> good..

You did try disabling auto-expire and running it manually,
as indicated in that thread?

> root at rei ~ % ls -lah /usr/jails/mail/var/amavis/.spamassassin/
> total 92138
> drwx------  2 110  110     9B 19 oct 12:35 .
> drwxr-xr-x  6 110  110     9B 19 oct 12:44 ..
> -rw-------  1 110  110    25K 19 oct 12:46 bayes_journal
> -rw-------  1 110  110   4,9M 19 oct 12:35 bayes_seen
> -rw-------  1 110  110    39M 19 oct 12:35 bayes_toks
> -rw-------  1 110  110    65M 18 oct 11:31 bayes_toks.expire10463
> -rw-------  1 110  110   128T 19 oct 12:16 bayes_toks.expire21624
> -rw-------  1 110  110   8,0T 19 oct 11:35 bayes_toks.expire64012
> -rw-r-----  1 110  110   109B 18 oct 11:31 razor-agent.log
> 
> On a 80 GB disk, this is a very good compression :)

 :-)

If a temporary tokens database gets so much larger than
the original database is, my guess is that the current database
is corrupted.

> On debug mode, I got a lot of :
> Oct 19 12:16:16.061 [21624] dbg: locker: refresh_lock:
> refresh /var/amavis/.spamassassin/bayes.lock
> 
> and after some time :
> HASH: Out of overflow pages.  Increase page size
> Segmentation fault (core dumped)

For bayes databases of any substantial size choosing an SQL-based
bayes usually offers a faster and more reliable operation. Instructions are
in the sql directory of the SpamAssassin distribution (files README.bayes
and bayes_mysql.sql or bayes_pg.sql). Choose either an MySQL with InnoDB
and Mail::SpamAssassin::BayesStore::MySQL as bayes_store_module,
or a fairly recent version of PostgreSQL. With a bayes on SQL it is usually
just fine to leave auto-expiry enabled.

As long as the rest of your SA rules and network tests are good,
it is not a big deal to start a new bayes database from scratch and
leaving it to auto-learning. For the first couple of hours it may be
prudent to lower the scores of BAYES_00 and BAYES_99 rules.

Btw, if starting from scratch, it is also a good idea to set:
  bayes_auto_learn_on_error 1
(introduced with SpamAssassin 3.3). 
See Mail::SpamAssassin::Plugin::AutoLearnThreshold man page
for a description of this setting.

  Mark


More information about the amavis-users mailing list