Is my Bayes working?

Dino Edwards dino.edwards at mydirectmail.net
Sat May 23 17:34:29 CEST 2020


The bayes_auto_learn is probably working against you. You should never turn that on until you have made absolutely sure your bayes filter is trained just right which usually happens after 200 spam and ham messages. I personally never turn that on even after I train my spam filter.

What messages are you running your cron script against?

________________________________
From: sse450 <sse450 at gmail.com>
Sent: Saturday, May 23, 2020 4:07 AM
To: amavis-users at amavis.org
Subject: Is my Bayes working?

Hello,

I setup amavisd (2.12.0), spamassassin (3.4.2), postfix, dovecot on
CentOS8 about one month ago and run sa-learn every night as a crontab
entry. There are considerable data accumulated on the database. But,
still, I get BAYES_00=-1.9 for a very spammy mail:

X-Spam-Flag: YES
X-Spam-Score: 29.813
X-Spam-Level: *****************************
X-Spam-Status: Yes, score=29.813 tagged_above=-999 required=3
tests=[AXB_XMAILER_MIMEOLE_OL_024C2=0.001, BAYES_00=-1.9,
CUSTOM_DMARC_FAIL=2, DCC_CHECK=1.1, DCC_REPUT_70_89=0.1,
DIGEST_MULTIPLE=0.293, DKIM_ADSP_CUSTOM_MED=0.001, DMARC_NONE=0.1,
FORGED_GMAIL_RCVD=2.5, FORGED_MUA_OUTLOOK=1.927, FORM_FRAUD_5=0.001,
FREEMAIL_ENVFROM_END_DIGIT=0.25, FREEMAIL_FROM=0.001,
FREEMAIL_REPLYTO=1, FREEMAIL_REPLYTO_END_DIGIT=0.25,
FROM_MISSPACED=0.001, FROM_MISSP_EH_MATCH=0.001,
FROM_MISSP_FREEMAIL=2.01,
FROM_MISSP_MSFT=0.001,FROM_MISSP_REPLYTO=1.717, FROM_MISSP_XPRIO=0.001,
FROM_NOT_REPLYTO=2, FSL_BULK_SIG=0.001, FSL_CTYPE_WIN1251=0.001,
FSL_NEW_HELO_USER=0.001, HK_SCAM=0.001, KAM_DMARC_NONE=0.25,
KAM_DMARC_STATUS=0.01, MALFORMED_FREEMAIL=1.142, MISSING_HEADERS=1.021,
MISSING_MID=0.497, NML_ADSP_CUSTOM_MED=0.9, NSL_RCVD_HELO_USER=0.001,
PYZOR_CHECK=1.392,RCVD_IN_MSPIKE_BL=0.001, RCVD_IN_MSPIKE_L4=0.001,
RCVD_IN_RP_RNBL=1.31, RCVD_IN_SBL_CSS=3.335,
REPLYTO_WITHOUT_TO_CC=1.552, SPF_HELO_PASS=-0.001,
SPF_SOFTFAIL=0.665,SPOOFED_FREEMAIL=1.999, SPOOFED_FREEM_REPTO=0.693,
TO_NO_BRKTS_FROM_MSSP=1.655, TO_NO_BRKTS_MSFT=0.001,
T_DEAR_BENEFICIARY=0.01, T_FILL_THIS_FORM_SHORT=0.01,
T_HK_NAME_FM_MR_MRS=0.01] autolearn=no autolearn_force=no

It seems to me that Bayes is not working. But I don't know why. Here are
some info from my server:

/etc/mail/spamassassin/local.cf:

# bayes
use_bayes           1
bayes_auto_learn    1
bayes_auto_expire   1
# Store bayesian data in MySQL
bayes_store_module Mail::SpamAssassin::BayesStore::MySQL
bayes_sql_dsn      DBI:mysql:sa_bayes:127.0.0.1:3306
bayes_sql_username sa_bayes
bayes_sql_password xxxxxxxxx
bayes_sql_override_username amavis

root at winsvr:/# sa-learn -D --dump magic

May 23 09:57:00.510 [23968] dbg: config: read file
/etc/mail/spamassassin/local.cf
...
May 23 09:57:02.270 [23968] dbg: plugin:
Mail::SpamAssassin::Plugin::Bayes=HASH(0x5621708f6b48) implements
'learner_new', priority 0
May 23 09:57:02.270 [23968] dbg: bayes: learner_new
self=Mail::SpamAssassin::Plugin::Bayes=HASH(0x5621708f6b48),
bayes_store_module=Mail::SpamAssassin::BayesStore::MySQL
May 23 09:57:02.293 [23968] dbg: bayes: using username: amavis
May 23 09:57:02.293 [23968] dbg: bayes: learner_new: got
store=Mail::SpamAssassin::BayesStore::MySQL=HASH(0x5621725b6cd0)
May 23 09:57:02.293 [23968] dbg: plugin:
Mail::SpamAssassin::Plugin::Bayes=HASH(0x5621708f6b48) implements
'learner_is_scan_available', priority 0
May 23 09:57:02.304 [23968] dbg: bayes: database connection established
May 23 09:57:02.304 [23968] dbg: bayes: found bayes db version 3
May 23 09:57:02.305 [23968] dbg: bayes: Using userid: 1
May 23 09:57:02.305 [23968] dbg: config: score set 3 chosen.
May 23 09:57:02.306 [23968] dbg: dns: EDNS, UDP payload size 4096
May 23 09:57:02.306 [23968] dbg: dns: servers obtained from Net::DNS :
[xxx.162.133.5]:53, [xxx.162.130.5]:53, [xxx.162.137.5]:53
May 23 09:57:02.306 [23968] dbg: dns: nameservers set to xxx.162.133.5,
xxx.162.130.5, xxx.162.137.5
May 23 09:57:02.307 [23968] dbg: dns: using socket module:
IO::Socket::IP version 0.39
May 23 09:57:02.307 [23968] dbg: dns: is Net::DNS::Resolver available? yes
May 23 09:57:02.307 [23968] dbg: dns: Net::DNS version: 1.15
May 23 09:57:02.307 [23968] dbg: sa-learn: spamtest initialized
May 23 09:57:02.307 [23968] dbg: plugin:
Mail::SpamAssassin::Plugin::Bayes=HASH(0x5621708f6b48) implements
'learner_dump_database', priority 0
0.000          0          3          0  non-token data: bayes db version
0.000          0       5785          0  non-token data: nspam
0.000          0      14487          0  non-token data: nham
0.000          0     323279          0  non-token data: ntokens
0.000          0 1587406453          0  non-token data: oldest atime
0.000          0 1590215255          0  non-token data: newest atime
0.000          0          0          0  non-token data: last journal
sync atime
0.000          0 1590176626          0  non-token data: last expiry atime
0.000          0      43200          0  non-token data: last expire
atime delta
0.000          0     202221          0  non-token data: last expire
reduction count
May 23 09:57:02.308 [23968] dbg: plugin:
Mail::SpamAssassin::Plugin::Bayes=HASH(0x5621708f6b48) implements
'learner_close', priority 0

root at winsvr:~# su amavis -c 'sa-learn -D --dump magic'

plugin: failed to parse plugin (from @INC): Can't locate
Mail/SpamAssassin/Plugin/SpamCop.pm:
lib/Mail/SpamAssassin/Plugin/SpamCop.pm: Permission denied at (eval 50)
line 1.
plugin: failed to parse plugin (from @INC): Can't locate
Mail/SpamAssassin/Plugin/AutoLearnThreshold.pm:
lib/Mail/SpamAssassin/Plugin/AutoLearnThreshold.pm: Permission denied at
(eval 51) line 1.
...
ERROR: Bayes dump returned an error, please re-run with -D for more
information

If,

chown -R amavis.amavis /usr/share/perl5/vendor_perl/Mail/SpamAssassin

Then,

root at winsvr:/# su amavis -c 'sa-learn -D --dump magic'

May 23 10:05:47.129 [24046] dbg: config: read file
/etc/mail/spamassassin/local.cf
...
May 23 10:05:48.785 [24046] dbg: plugin:
Mail::SpamAssassin::Plugin::Bayes=HASH(0x55a459ba3a08) implements
'learner_new', priority 0
May 23 10:05:48.785 [24046] dbg: bayes: learner_new
self=Mail::SpamAssassin::Plugin::Bayes=HASH(0x55a459ba3a08),
bayes_store_module=Mail::SpamAssassin::BayesStore::MySQL
May 23 10:05:48.808 [24046] dbg: bayes: using username: amavis
May 23 10:05:48.808 [24046] dbg: bayes: learner_new: got
store=Mail::SpamAssassin::BayesStore::MySQL=HASH(0x55a45ad4d5a8)
May 23 10:05:48.808 [24046] dbg: plugin:
Mail::SpamAssassin::Plugin::Bayes=HASH(0x55a459ba3a08) implements
'learner_is_scan_available', priority 0
May 23 10:05:48.818 [24046] dbg: bayes: database connection established
May 23 10:05:48.819 [24046] dbg: bayes: found bayes db version 3
May 23 10:05:48.819 [24046] dbg: bayes: Using userid: 1
May 23 10:05:48.819 [24046] dbg: config: score set 3 chosen.
May 23 10:05:48.820 [24046] dbg: dns: EDNS, UDP payload size 4096
May 23 10:05:48.821 [24046] dbg: dns: servers obtained from Net::DNS :
[xxx.162.133.5]:53, [xxx.162.130.5]:53, [xxx.162.137.5]:53
May 23 10:05:48.821 [24046] dbg: dns: nameservers set to xxx.162.133.5,
xxx.162.130.5, xxx.162.137.5
May 23 10:05:48.821 [24046] dbg: dns: using socket module:
IO::Socket::IP version 0.39
May 23 10:05:48.821 [24046] dbg: dns: is Net::DNS::Resolver available? yes
May 23 10:05:48.821 [24046] dbg: dns: Net::DNS version: 1.15
May 23 10:05:48.821 [24046] dbg: sa-learn: spamtest initialized
May 23 10:05:48.821 [24046] dbg: plugin:
Mail::SpamAssassin::Plugin::Bayes=HASH(0x55a459ba3a08) implements
'learner_dump_database', priority 0
0.000          0          3          0  non-token data: bayes db version
0.000          0       5785          0  non-token data: nspam
0.000          0      14487          0  non-token data: nham
0.000          0     323279          0  non-token data: ntokens
0.000          0 1587406453          0  non-token data: oldest atime
0.000          0 1590215255          0  non-token data: newest atime
0.000          0          0          0  non-token data: last journal
sync atime
0.000          0 1590176626          0  non-token data: last expiry atime
0.000          0      43200          0  non-token data: last expire
atime delta
0.000          0     202221          0  non-token data: last expire
reduction count
May 23 10:05:48.822 [24046] dbg: plugin:
Mail::SpamAssassin::Plugin::Bayes=HASH(0x55a459ba3a08) implements
'learner_close', priority 0

Interestingly, even after chown, su amavis -c 'sa-learn -D --dump magic'
still gives permission denied error sometimes.

root at winsvr:/# sa-learn -D --spam
/usr/share/doc/spamassassin/sample-spam.txt
...
May 23 10:27:35.496 [24330] dbg: bayes:
31dcbefd2524b07c65d551d282ce77902f3804c7 at sa_generated already learnt
correctly, not learning twice
Learned tokens from 0 message(s) (1 message(s) examined)
May 23 10:27:35.496 [24330] dbg: plugin:
Mail::SpamAssassin::Plugin::Bayes=HASH(0x5573c2433810) implements
'learner_close', priority 0

root at winsvr:/# su amavis  -c 'sa-learn -D --spam
/usr/share/doc/spamassassin/sample-spam.txt'
...
May 23 10:18:38.666 [24267] dbg: bayes:
31dcbefd2524b07c65d551d282ce77902f3804c7 at sa_generated already learnt
correctly, not learning twice
Learned tokens from 0 message(s) (1 message(s) examined)
May 23 10:18:38.666 [24267] dbg: plugin:
Mail::SpamAssassin::Plugin::Bayes=HASH(0x561762cc7150) implements
'learner_close', priority 0

Is the BAYES_00=-1.9 normal for the sample spam email? Perhaps, I need
to accumulate more training data. Or, is something look wrong with my setup?

I would appreciate any help.

Thank you.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.amavis.org/pipermail/amavis-users/attachments/20200523/ec43936d/attachment.htm>


More information about the amavis-users mailing list