Bayes filter only learns HAM but no SPAM

Dieter Scholz rd-disc at gmx.net
Mon Jun 22 09:36:38 CEST 2015


Hello,

I already posted this question on the Spamassassin list but nobody could 
help me. Perhaps this list is more apropriate.

I use Debian Jessie with the distribution packages of postfix, 
amavisd-milter, amavis and spamassassin. I configured a pre-queue filter 
setup where amavis is called through amavisd-milter using the AM.PDP 
protocol.

Everything works great. The only problem I have is that spam learning of 
the bayes filter seems not to work. But ham learning works great.

Here is the output of sa-learn:

> Jun 22 09:15:38.416 [21616] dbg: bayes: learner_new self=Mail::SpamAssassin::Plugin::Bayes=HASH(0x2f7a8b0), bayes_store_module=Mail::SpamAssassin::BayesStore::DBM
> Jun 22 09:15:38.435 [21616] dbg: bayes: learner_new: got store=Mail::SpamAssassin::BayesStore::DBM=HASH(0x35c6f48)
> Jun 22 09:15:38.435 [21616] dbg: plugin: Mail::SpamAssassin::Plugin::Bayes=HASH(0x2f7a8b0) implements 'learner_is_scan_available', priority 0
> Jun 22 09:15:38.435 [21616] dbg: bayes: tie-ing to DB file R/O /var/lib/amavis/.spamassassin/bayes_toks
> Jun 22 09:15:38.436 [21616] dbg: bayes: tie-ing to DB file R/O /var/lib/amavis/.spamassassin/bayes_seen
> Jun 22 09:15:38.436 [21616] dbg: bayes: found bayes db version 3
> Jun 22 09:15:38.437 [21616] dbg: config: score set 1 chosen.
> Jun 22 09:15:38.439 [21616] dbg: sa-learn: spamtest initialized
> Jun 22 09:15:38.439 [21616] dbg: plugin: Mail::SpamAssassin::Plugin::Bayes=HASH(0x2f7a8b0) implements 'learner_dump_database', priority 0
> 0.000          0          3          0  non-token data: bayes db version
> 0.000          0         31          0  non-token data: nspam
> 0.000          0       2863          0  non-token data: nham
> 0.000          0     155110          0  non-token data: ntokens
> 0.000          0 1434540257          0  non-token data: oldest atime
> 0.000          0 1434957301          0  non-token data: newest atime
> 0.000          0 1434956246          0  non-token data: last journal sync atime
> 0.000          0 1434929056          0  non-token data: last expiry atime
> 0.000          0     398389          0  non-token data: last expire atime delta
> 0.000          0          0          0  non-token data: last expire reduction count
> Jun 22 09:15:38.440 [21616] dbg: plugin: Mail::SpamAssassin::Plugin::Bayes=HASH(0x2f7a8b0) implements 'learner_close', priority 0
> Jun 22 09:15:38.440 [21616] dbg: bayes: untie-ing

The 31 spam mails are there because I manually learned them. As you can 
see the ham learning part is working fine. But there is no spam learning.

I checked the logs and I found several mails with a spam score that is 
above 10 (I know the 3 header/body rule hits limitation).

I suspect spam mails are not feed into bayes learning.

Here is a log extract of a mail session when spam is detected by 
Spamassassin:

> Jun 21 16:20:51 mailx1 postfix/smtpd[30631]: 5E3A2C0178: client=fall-lakeland.atl.sa.earthlink.net[207.69.195.103]
> Jun 21 16:20:51 mailx1 postfix/cleanup[30700]: 5E3A2C0178: message-id=<E1Z6g1Z-0002CR-00 at pop-canoe.atl.sa.earthlink.net>
> Jun 21 16:20:51 mailx1 amavis[13552]: (13552-08) Checking: tfC5YNQqzex2 AM.PDP-SOCK [207.69.195.103] <eBay at reply4.ebay.com> -> <XXX>
> Jun 21 16:20:51 mailx1 amavis[13552]: (13552-08) p001 1 Content-Type: text/html, size: 3922 B, name:
> Jun 21 16:20:57 mailx1 amavis[13552]: (13552-08) Blocked SPAM {RejectedInbound}, AM.PDP-SOCK [207.69.195.103] [69.34.169.160] <eBay at reply4.ebay.com> -> <XXX>, Queue-ID: 5E3A2C0178, Message-ID: <E1Z6g1Z-0002CR-00 at pop-canoe.atl.sa.earthlink.net>, mail_id: tfC5YNQqzex2, Hits: 24.854, size: 5145, 6087 ms
> Jun 21 16:20:57 mailx1 amavis[13552]: (13552-08) TIMING-SA total 6033 ms - parse: 1.78 (0.0%), extract_message_metadata: 18 (0.3%), get_uri_detail_list: 1.96 (0.0%), tests_pri_-1000: 14 (0.2%), tests_pri_-950: 1.35 (0.0%), tests_pri_-900: 1.41 (0.0%), tests_pri_-400: 28 (0.5%), check_bayes: 27 (0.4%), b_tokenize: 10 (0.2%), b_tok_get_all: 8 (0.1%), b_comp_prob: 6 (0.1%), b_tok_touch_all: 0.59 (0.0%), b_finish: 0.61 (0.0%), tests_pri_0: 5935 (98.4%), check_dkim_adsp: 6 (0.1%), check_spf: 77 (1.3%), poll_dns_idle: 11 (0.2%), check_razor2: 1389 (23.0%), check_dcc: 4278 (70.9%), check_pyzor: 43 (0.7%), tests_pri_500: 10 (0.2%), get_report: 0.78 (0.0%)
> Jun 21 16:20:57 mailx1 amavis[13552]: (13552-08) size: 5145, TIMING [total 6090 ms] - got data: 0.0 (0%)0, check_init: 1.3 (0%)0, digest_hdr: 0.9 (0%)0, digest_body_dkim: 0.2 (0%)0, collect_info: 2.4 (0%)0, mkdir parts: 0.7 (0%)0, mime_decode: 4.5 (0%)0, get-file-type1: 12 (0%)0, parts_decode: 0.2 (0%)0, check_header: 0.5 (0%)0, AV-scan-1: 7 (0%)0, AV-scan-2: 11 (0%)1, spam-wb-list: 0.8 (0%)1, SA msg read: 0.3 (0%)1, SA parse: 2.3 (0%)1, SA check: 6029 (99%)100, decide_mail_destiny: 7 (0%)100, notif-quar: 0.3 (0%)100, prepare-dsn: 0.6 (0%)100, report: 1.6 (0%)100, main_log_entry: 4.8 (0%)100, update_snmp: 1.7 (0%)100, rundown: 0.9 (0%)100
> Jun 21 16:20:57 mailx1 postfix/cleanup[30700]: 5E3A2C0178: milter-reject: END-OF-MESSAGE from fall-lakeland.atl.sa.earthlink.net[207.69.195.103]: 5.7.0 Reject, id=13552-08 - spam; from=<eBay at reply4.ebay.com> to=<susette.kleiner at rudolf.de> proto=ESMTP helo=<fall-lakeland.atl.sa.earthlink.net>
> Jun 21 16:20:57 mailx1 postfix/smtpd[30631]: disconnect from fall-lakeland.atl.sa.earthlink.net[207.69.195.103]

First of all I cannot see the rule names the caused the high score of 
24.854. Is this a configuration problem? Next I do not see a line with 
the dsn code and the autolearn=spam? Is this normal?

I suspect that the following config extracts are helpful in
investigating the problem:

50-user (amavis):
> $sa_tag_level_deflt  = -99;
> $sa_tag2_level_deflt = 5.0;
> $sa_kill_level_deflt = 5.0;
> $sa_dsn_cutoff_level = 99;
>
> $final_virus_destiny  = D_REJECT;
> $final_banned_destiny = D_REJECT;
> $final_spam_destiny   = D_REJECT;
> $final_bad_header_destiny = D_PASS;

local.cf (spamassassin):
> bayes_auto_learn 1
> bayes_auto_learn_threshold_nonspam -0.001
> bayes_auto_learn_threshold_spam 7.0
> bayes_min_ham_num 20
> bayes_min_spam_num 20

I hope you can help me, because at the moment I'm really clueless.

Thanks in advance.

Dieter


More information about the amavis-users mailing list