Bayes expire files

Julien Gormotte julien at gormotte.info
Fri Oct 21 09:02:06 CEST 2011


Le Thu, 20 Oct 2011 21:00:51 +0200,
Mark Martinec <Mark.Martinec+amavis at ijs.si> a écrit :

> Julien,
> 
> > "Gorn" <Gorn at xs4all.nl> a écrit :
> > > http://old.nabble.com/bayes_toks.expire-problem-td22502372.html
> > 
> > I already tried some of the ways indicated here, and nothing very
> > good..
> 
> You did try disabling auto-expire and running it manually,
> as indicated in that thread?

Yes, I set :
bayes_expiry_max_db_size                300000
bayes_auto_expire                       0

And run :
sa-learn --force-expire

I runned for quite some time, and I got these huge files. Before the
files were using "just" 34 GB.

> 
> > root at rei ~ % ls -lah /usr/jails/mail/var/amavis/.spamassassin/
> > total 92138
> > drwx------  2 110  110     9B 19 oct 12:35 .
> > drwxr-xr-x  6 110  110     9B 19 oct 12:44 ..
> > -rw-------  1 110  110    25K 19 oct 12:46 bayes_journal
> > -rw-------  1 110  110   4,9M 19 oct 12:35 bayes_seen
> > -rw-------  1 110  110    39M 19 oct 12:35 bayes_toks
> > -rw-------  1 110  110    65M 18 oct 11:31 bayes_toks.expire10463
> > -rw-------  1 110  110   128T 19 oct 12:16 bayes_toks.expire21624
> > -rw-------  1 110  110   8,0T 19 oct 11:35 bayes_toks.expire64012
> > -rw-r-----  1 110  110   109B 18 oct 11:31 razor-agent.log
> > 
> > On a 80 GB disk, this is a very good compression :)
> 
>  :-)
> 
> If a temporary tokens database gets so much larger than
> the original database is, my guess is that the current database
> is corrupted.

I tried to run :
sa-learn --clear

and then :
sa-learn --force-expire

It did not remove expire files, so I deleted them manually. I'll see
what happens.

> 
> > On debug mode, I got a lot of :
> > Oct 19 12:16:16.061 [21624] dbg: locker: refresh_lock:
> > refresh /var/amavis/.spamassassin/bayes.lock
> > 
> > and after some time :
> > HASH: Out of overflow pages.  Increase page size
> > Segmentation fault (core dumped)
> 
> For bayes databases of any substantial size choosing an SQL-based
> bayes usually offers a faster and more reliable operation.
> Instructions are in the sql directory of the SpamAssassin
> distribution (files README.bayes and bayes_mysql.sql or
> bayes_pg.sql). Choose either an MySQL with InnoDB and
> Mail::SpamAssassin::BayesStore::MySQL as bayes_store_module, or a
> fairly recent version of PostgreSQL. With a bayes on SQL it is
> usually just fine to leave auto-expiry enabled.

I'll see what happens after my last operations, and it may be a good
idea to try sql backend afterwards.

> 
> As long as the rest of your SA rules and network tests are good,
> it is not a big deal to start a new bayes database from scratch and
> leaving it to auto-learning. For the first couple of hours it may be
> prudent to lower the scores of BAYES_00 and BAYES_99 rules.
> 
> Btw, if starting from scratch, it is also a good idea to set:
>   bayes_auto_learn_on_error 1
> (introduced with SpamAssassin 3.3). 
> See Mail::SpamAssassin::Plugin::AutoLearnThreshold man page
> for a description of this setting.
> 
>   Mark

I'll take some time to see this as soon as I can, thanks for the
advices :)


More information about the amavis-users mailing list