Improving filtering of SPAM...

Fri Jun 21 02:57:46 CEST 2013

I see. Makes sense. Thank you. Lots of work to do...

Regards,

Christopher Kurtis Koeber

-----Original Message-----
From: amavis-users
[mailto:amavis-users-bounces+ckoeber=gmail.com at amavis.org] On Behalf Of Ben
Johnson
Sent: Thursday, June 20, 2013 4:41 PM
To: amavis-users at amavis.org
Subject: Re: Improving filtering of SPAM...

On 6/20/2013 4:32 PM, Nick Rosier wrote:
> The core problem is that, from what I see from the logs the score for 
> the spam messages gets rated lower than zero so the system is learning 
> these messages as ham and as such the filter becomes less effective as 
> time goes on.

That's a problem that needs to be corrected. It sounds as though your Bayes
database is now "borked", for lack of a better term, and you will probably
need to retrain it from scratch.

Firstly, I would disable auto-learn in SpamAssassin. It has ruined many a
Bayes database because the default autolearn-as-ham threshold is set too low
in earlier versions (the developers and maintainers agree on this subject,
and I believe this has been rectified as late).

I would empty and then hand-train your Bayes database for a few thousand
hams and spams, and then enable auto-learn once you fully-understand the
implications of so doing.

Another point that cannot be stressed enough is to *retain your corpus* when
you hand-sort ham/spam! You will save yourself innumerable headaches, time,
and frustration if you follow this one simple rule (provided your country's
privacy laws permit). Taking this measure ensures that if there's a problem
with your Bayes DB, you can easily re-train it using your stored corpus.

Good luck out there!

-Ben