Increasing spam filtering with spamassassin

Sat Aug 27 19:00:17 CEST 2016

>   Aug 27 15:04:41 mailgw3 amavis[7963]: (07963-15) SA dbg: bayes:
>   corpus size: nspam = 1680, nham = 0

This looks much better. Now you should train it with some (at least a thousand or two) ham messages too. If -and only if- you or your users maintain a clean inbox or archive, it is just a matter of training the system on that. The only important part is that you should be reasonably sure that such inboxes/archives do *NOT* contain spam samples.

Once you have some ~2k samples of both ham and spam, you can probably just train the system further on the missclassfied e-mails only (either ham that got into a spam folder, or spam that got through). This will lead to better bayes assessments over time, and then you can start increasing the scores for the corresponding rules (BAYES_01, BAYES_05, etc...).

Just keep in mind that BAYES_01 means "the bayes analysis thinks this e-mail is spam with a probability of ~1%", meaning "bayes is pretty confident that this is NOT spam". In contrast, BAYES_99 matching means "bayes is 99% sure that this is a spam message". As an example, this is what I got on your last e-mail:

X-Spam-Status: No, score=-106.9 required=3 tests=[BAYES_00=-1.9,
	RCVD_IN_DNSWL_HI=-5, SHORTCIRCUIT=-100] autolearn=disabled

Marc.