Improving filtering of SPAM...

Thu Jun 20 22:40:54 CEST 2013

On 6/20/2013 4:32 PM, Nick Rosier wrote:
> The core problem is that, from what I see from the logs the score for
> the spam messages gets rated lower than zero so the system is learning
> these messages as ham and as such the filter becomes less effective as
> time goes on.

That's a problem that needs to be corrected. It sounds as though your
Bayes database is now "borked", for lack of a better term, and you will
probably need to retrain it from scratch.

Firstly, I would disable auto-learn in SpamAssassin. It has ruined many
a Bayes database because the default autolearn-as-ham threshold is set
too low in earlier versions (the developers and maintainers agree on
this subject, and I believe this has been rectified as late).

I would empty and then hand-train your Bayes database for a few thousand
hams and spams, and then enable auto-learn once you fully-understand the
implications of so doing.

Another point that cannot be stressed enough is to *retain your corpus*
when you hand-sort ham/spam! You will save yourself innumerable
headaches, time, and frustration if you follow this one simple rule
(provided your country's privacy laws permit). Taking this measure
ensures that if there's a problem with your Bayes DB, you can easily
re-train it using your stored corpus.

Good luck out there!

-Ben