Increasing spam filtering with spamassassin
Nikolaos Milas
nmilas at noa.gr
Fri Aug 26 23:15:23 CEST 2016
On 26/8/2016 9:48 μμ, Dino Edwards wrote:
> First question I have how many ham/spam have you used to train the
> bayes. Need at least 200 of each for it to even start working. Also,
> BAYES_00=-1.9 score usually points to poor training of the bayes
> database. Basically it means that the those spam messages look a lot
> what spamasassin has been told are ham messages.
>
> How are you training the database?
>
Thank you Dino for your reply.
I accumulate spam mails in eml format from users and I put them (usually
via ftp) into a particular *empty* directory (/root/reported-spam) on
the server.
After each upload of new messages, I run:
# sa-learn --spam /root/reported-spam
Learned tokens from 18 message(s) (18 message(s) examined)
Then, after running the above command, I empty the above dir
(/root/reported-spam) until the next time that I'll upload new spam mails.
I do not train for ham. I once did that in the past when some messages
were misinterpreted as spam.
Today I tried adding in /etc/mail/spamassassin/local.cf:
bayes_min_ham_num 0
bayes_min_spam_num 0
to make sure that these settings do not stop bayesian filtering.
Finally, I also increased logging level (in /etc/amavisd.conf):
$log_level = 3;
$sa_debug = 'bayes';
while trying to find more details on what is happening, and I noticed
messages like:
Aug 26 20:14:48 mailgw3 amavis[24795]: (24795-01) SA dbg: bayes:
tie-ing to DB file R/O /var/amavis/var/.spamassassin/bayes_toks
Aug 26 20:14:48 mailgw3 amavis[24795]: (24795-01) SA dbg: bayes:
tie-ing to DB file R/O /var/amavis/var/.spamassassin/bayes_seen
Aug 26 20:14:48 mailgw3 amavis[24795]: (24795-01) SA dbg: bayes:
found bayes db version 3
Aug 26 20:14:48 mailgw3 amavis[24795]: (24795-01) SA dbg: bayes: DB
journal sync: last sync: 1472231151
Aug 26 20:14:48 mailgw3 amavis[24795]: (24795-01) SA dbg: bayes:
corpus size: nspam = 3440, nham = 717405
Aug 26 20:14:48 mailgw3 amavis[24795]: (24795-01) SA dbg: bayes:
score = 0
Aug 26 20:14:48 mailgw3 amavis[24795]: (24795-01) SA dbg: bayes: DB
journal sync: last sync: 1472231151
Aug 26 20:14:48 mailgw3 amavis[24795]: (24795-01) SA dbg: bayes:
untie-ing
Interestingly, all bayesian scoring is quite low:
# grep 'bayes: score =' /var/log/amavisd.log
Aug 26 20:14:48 mailgw3 amavis[24795]: (24795-01) SA dbg: bayes:
score = 0
Aug 26 20:14:49 mailgw3 amavis[24794]: (24794-02) SA dbg: bayes:
score = 0
Aug 26 20:35:50 mailgw3 amavis[24794]: (24794-14) SA dbg: bayes:
score = 0
Aug 26 20:39:41 mailgw3 amavis[24794]: (24794-15) SA dbg: bayes:
score = 1.11022302462516e-16
Aug 26 20:49:38 mailgw3 amavis[24794]: (24794-19) SA dbg: bayes:
score = 5.55111512312578e-17
Aug 26 21:19:55 mailgw3 amavis[25627]: (25627-12) SA dbg: bayes:
score = 0
Aug 26 21:35:48 mailgw3 amavis[26085]: (26085-03) SA dbg: bayes:
score = 1.11022302462516e-16
Aug 26 21:35:54 mailgw3 amavis[26087]: (26087-03) SA dbg: bayes:
score = 0
Aug 26 21:46:37 mailgw3 amavis[26085]: (26085-09) SA dbg: bayes:
score = 1.77635683940025e-15
Aug 26 21:53:17 mailgw3 amavis[26085]: (26085-12) SA dbg: bayes:
score = 1.88737914186277e-15
Aug 26 22:07:24 mailgw3 amavis[26087]: (26087-15) SA dbg: bayes:
score = 5.32351940307763e-14
Aug 26 22:49:35 mailgw3 amavis[26691]: (26691-16) SA dbg: bayes:
score = 0
Aug 26 23:01:04 mailgw3 amavis[27067]: (27067-02) SA dbg: bayes:
score = 5.55111512312578e-17
Aug 26 23:13:18 mailgw3 amavis[27065]: (27065-05) SA dbg: bayes:
score = 0
Aug 26 23:30:51 mailgw3 amavis[27065]: (27065-11) SA dbg: bayes:
score = 0
Aug 26 23:35:49 mailgw3 amavis[27065]: (27065-12) SA dbg: bayes:
score = 2.22044604925031e-16
Aug 26 23:56:13 mailgw3 amavis[27067]: (27067-20) SA dbg: bayes:
score = 2.43476726314862e-05
Aug 26 23:59:58 mailgw3 amavis[27673]: (27673-02) SA dbg: bayes:
score = 0
Aug 27 00:02:39 mailgw3 amavis[27707]: (27707-02) SA dbg: bayes:
score = 2.22044604925031e-16
Aug 27 00:04:33 mailgw3 amavis[27707]: (27707-03) SA dbg: bayes:
score = 1.11022302462516e-16
Aug 27 00:05:25 mailgw3 amavis[27673]: (27673-04) SA dbg: bayes:
score = 1.11022302462516e-16
Aug 27 00:06:45 mailgw3 amavis[27673]: (27673-05) SA dbg: bayes:
score = 5.55111512312578e-17
Any ideas?
Nick
More information about the amavis-users
mailing list