Increasing spam filtering with spamassassin

Nikolaos Milas nmilas at noa.gr
Sat Aug 27 15:37:52 CEST 2016


On 27/8/2016 1:02 πμ, Marc Pujol wrote:

> ...
> At this point I would ditch the entire database and start from scratch, disabling auto-learning first (put "bayes_auto_learn 0" in your config).
> ...
> You could also try to move/copy your/root/.spamassin database over to the amavis location (check the permissions!).
> ...

Thank you all for your suggestions. Your remarks were in fact all 
correct and I have started understanding / correcting things (I believe).

I have started by doing the above, and here is the result on a specimen:

    Aug 27 15:04:41 mailgw3 amavis[7963]: (07963-15) SA dbg: bayes:
    tie-ing to DB file R/O /var/amavis/var/.spamassassin/bayes_toks
    Aug 27 15:04:41 mailgw3 amavis[7963]: (07963-15) SA dbg: bayes:
    tie-ing to DB file R/O /var/amavis/var/.spamassassin/bayes_seen
    Aug 27 15:04:41 mailgw3 amavis[7963]: (07963-15) SA dbg: bayes:
    found bayes db version 3
    Aug 27 15:04:41 mailgw3 amavis[7963]: (07963-15) SA dbg: bayes: DB
    journal sync: last sync: 0
    Aug 27 15:04:41 mailgw3 amavis[7963]: (07963-15) SA dbg: bayes:
    corpus size: nspam = 1680, nham = 0
    Aug 27 15:04:41 mailgw3 amavis[7963]: (07963-15) SA dbg: bayes:
    cannot use bayes on this message; none of the tokens were found in
    the database
    Aug 27 15:04:41 mailgw3 amavis[7963]: (07963-15) SA dbg: bayes: not
    scoring message, returning undef
    Aug 27 15:04:41 mailgw3 amavis[7963]: (07963-15) SA dbg: bayes: DB
    expiry: tokens in DB: 154999, Expiry max size: 300000, Oldest atime:
    1219096335, Newest atime: 1472206207, Last expire: 1471602636,
    Current time: 1472299481
    Aug 27 15:04:41 mailgw3 amavis[7963]: (07963-15) SA dbg: bayes: DB
    journal sync: last sync: 0
    Aug 27 15:04:41 mailgw3 amavis[7963]: (07963-15) SA dbg: bayes:
    untie-ing
    ...
    Aug 27 15:39:16 mailgw3 amavis[8866]: (08866-03) SA dbg: bayes:
    tie-ing to DB file R/O /var/amavis/var/.spamassassin/bayes_toks
    Aug 27 15:39:16 mailgw3 amavis[8866]: (08866-03) SA dbg: bayes:
    tie-ing to DB file R/O /var/amavis/var/.spamassassin/bayes_seen
    Aug 27 15:39:16 mailgw3 amavis[8866]: (08866-03) SA dbg: bayes:
    found bayes db version 3
    Aug 27 15:39:16 mailgw3 amavis[8866]: (08866-03) SA dbg: bayes: DB
    journal sync: last sync: 0
    Aug 27 15:39:16 mailgw3 amavis[8866]: (08866-03) SA dbg: bayes:
    corpus size: nspam = 1680, nham = 0
    Aug 27 15:39:16 mailgw3 amavis[8866]: (08866-03) SA dbg: bayes:
    cannot use bayes on this message; none of the tokens were found in
    the database
    Aug 27 15:39:16 mailgw3 amavis[8866]: (08866-03) SA dbg: bayes: not
    scoring message, returning undef
    ...

So, now "corpus size: nspam = 1680, nham = 0" and the currently normal 
situation is (as seen above) "cannot use bayes on this message; none of 
the tokens were found in the database".

It seems things are more under control, since a lot of messages are no 
more automatically designated as "ham". I don't see spam detections, but 
at least I don't see false positive (and auto-learned!) ham ones either!

I am expecting user feedback and I am trying to monitor spam filtering 
behavior as much as I can.

Any and all additional advice will be appreciated!

With regard to the comments on rmpforge repo obsolescence, you are 
right, but I am afraid there is no easy way to currently switch to EPEL 
packages, because, as far as I remember, the respective amavisd-new / 
clamd / spamassassin EPEL packages are not using the same paths / 
structure / setup, and I don't want to mess things up.

When we rebuild a new system as a successor, it will probably be using 
CentOS 8 (probably in two years or so)... Then we will use EPEL for 
sure! (We currently have a lot of CentOS 5 systems to rebuild using 
CentOS 7 in the immediate future...)

Thanks again,
Nick




More information about the amavis-users mailing list