Increasing spam filtering with spamassassin

Nikolaos Milas nmilas at noa.gr
Fri Aug 26 23:15:23 CEST 2016


On 26/8/2016 9:48 μμ, Dino Edwards wrote:

> First question I have how many ham/spam have you used to train the 
> bayes. Need at least 200 of each for it to even start working. Also, 
> BAYES_00=-1.9 score usually points to poor training of the bayes 
> database. Basically it means that the those spam messages look a lot 
> what spamasassin has been told are ham messages.
>
> How are you training the database?
>

Thank you Dino for your reply.

I accumulate spam mails in eml format from users and I put them (usually 
via ftp) into a particular *empty* directory (/root/reported-spam) on 
the server.

After each upload of new messages, I run:

    # sa-learn --spam /root/reported-spam
    Learned tokens from 18 message(s) (18 message(s) examined)

Then, after running the above command, I empty the above dir 
(/root/reported-spam) until the next time that I'll upload new spam mails.

I do not train for ham. I once did that in the past when some messages 
were misinterpreted as spam.

Today I tried adding in /etc/mail/spamassassin/local.cf:

    bayes_min_ham_num   0
    bayes_min_spam_num  0

to make sure that these settings do not stop bayesian filtering.

Finally, I also increased logging level (in /etc/amavisd.conf):

    $log_level = 3;
    $sa_debug = 'bayes';

while trying to find more details on what is happening, and I noticed 
messages like:

    Aug 26 20:14:48 mailgw3 amavis[24795]: (24795-01) SA dbg: bayes:
    tie-ing to DB file R/O /var/amavis/var/.spamassassin/bayes_toks
    Aug 26 20:14:48 mailgw3 amavis[24795]: (24795-01) SA dbg: bayes:
    tie-ing to DB file R/O /var/amavis/var/.spamassassin/bayes_seen
    Aug 26 20:14:48 mailgw3 amavis[24795]: (24795-01) SA dbg: bayes:
    found bayes db version 3
    Aug 26 20:14:48 mailgw3 amavis[24795]: (24795-01) SA dbg: bayes: DB
    journal sync: last sync: 1472231151
    Aug 26 20:14:48 mailgw3 amavis[24795]: (24795-01) SA dbg: bayes:
    corpus size: nspam = 3440, nham = 717405
    Aug 26 20:14:48 mailgw3 amavis[24795]: (24795-01) SA dbg: bayes:
    score = 0
    Aug 26 20:14:48 mailgw3 amavis[24795]: (24795-01) SA dbg: bayes: DB
    journal sync: last sync: 1472231151
    Aug 26 20:14:48 mailgw3 amavis[24795]: (24795-01) SA dbg: bayes:
    untie-ing

Interestingly, all bayesian scoring is quite low:

    # grep 'bayes: score =' /var/log/amavisd.log

    Aug 26 20:14:48 mailgw3 amavis[24795]: (24795-01) SA dbg: bayes:
    score = 0
    Aug 26 20:14:49 mailgw3 amavis[24794]: (24794-02) SA dbg: bayes:
    score = 0
    Aug 26 20:35:50 mailgw3 amavis[24794]: (24794-14) SA dbg: bayes:
    score = 0
    Aug 26 20:39:41 mailgw3 amavis[24794]: (24794-15) SA dbg: bayes:
    score = 1.11022302462516e-16
    Aug 26 20:49:38 mailgw3 amavis[24794]: (24794-19) SA dbg: bayes:
    score = 5.55111512312578e-17
    Aug 26 21:19:55 mailgw3 amavis[25627]: (25627-12) SA dbg: bayes:
    score = 0
    Aug 26 21:35:48 mailgw3 amavis[26085]: (26085-03) SA dbg: bayes:
    score = 1.11022302462516e-16
    Aug 26 21:35:54 mailgw3 amavis[26087]: (26087-03) SA dbg: bayes:
    score = 0
    Aug 26 21:46:37 mailgw3 amavis[26085]: (26085-09) SA dbg: bayes:
    score = 1.77635683940025e-15
    Aug 26 21:53:17 mailgw3 amavis[26085]: (26085-12) SA dbg: bayes:
    score = 1.88737914186277e-15
    Aug 26 22:07:24 mailgw3 amavis[26087]: (26087-15) SA dbg: bayes:
    score = 5.32351940307763e-14
    Aug 26 22:49:35 mailgw3 amavis[26691]: (26691-16) SA dbg: bayes:
    score = 0
    Aug 26 23:01:04 mailgw3 amavis[27067]: (27067-02) SA dbg: bayes:
    score = 5.55111512312578e-17
    Aug 26 23:13:18 mailgw3 amavis[27065]: (27065-05) SA dbg: bayes:
    score = 0
    Aug 26 23:30:51 mailgw3 amavis[27065]: (27065-11) SA dbg: bayes:
    score = 0
    Aug 26 23:35:49 mailgw3 amavis[27065]: (27065-12) SA dbg: bayes:
    score = 2.22044604925031e-16
    Aug 26 23:56:13 mailgw3 amavis[27067]: (27067-20) SA dbg: bayes:
    score = 2.43476726314862e-05
    Aug 26 23:59:58 mailgw3 amavis[27673]: (27673-02) SA dbg: bayes:
    score = 0
    Aug 27 00:02:39 mailgw3 amavis[27707]: (27707-02) SA dbg: bayes:
    score = 2.22044604925031e-16
    Aug 27 00:04:33 mailgw3 amavis[27707]: (27707-03) SA dbg: bayes:
    score = 1.11022302462516e-16
    Aug 27 00:05:25 mailgw3 amavis[27673]: (27673-04) SA dbg: bayes:
    score = 1.11022302462516e-16
    Aug 27 00:06:45 mailgw3 amavis[27673]: (27673-05) SA dbg: bayes:
    score = 5.55111512312578e-17

Any ideas?

Nick


More information about the amavis-users mailing list