The nonsense of training spam filters and spam folders [was: training spamsassin]

Daemon A daemoncesar at hotmail.com
Tue Feb 24 09:36:28 CET 2015


When the user clicks " Not Spam" in thunderbird. -E-mail goes to your inbox. 
These e- mails from the inbox would like to add them to a whitelist .
> Date: Tue, 24 Feb 2015 10:55:21 +0300
> From: ta at geuka.net
> To: amavis-users at amavis.org
> Subject: The nonsense of training spam filters and spam folders [was: training spamsassin]
> 
> 
> > I would like to spend a script in each user box by adding the sender
> > on the whitelist .
> 
> Please try at least to understand how a spamfilter is working. Read the
> amavis introduction section in the documentation to understand what
> amavis is, does and what is not, does not.
> Understand how Bayesian filters are working.
> Start with the wikipedia articles
> http://en.wikipedia.org/wiki/Naive_Bayes_spam_filtering and
> http://en.wikipedia.org/wiki/Bayes%27_theorem
> 
> When you have understood how this filter is working you don't want to
> train it any more because the you have understood that you push the
> detection in a direction where you usual don't want it.
> 
> The kind of "white" listing you might want is amavis' pen pal feature.
> The documentation for amavis is poor: Is known. Read the change logs and
> have a look at the source code. This is the real documentation.
> For finding out how a feature has to be configured you don't have to be
> Perl Guru. Some basic Perl knowledge is usual fine.
> 
> White listing? Why?
> If a mail is tagged as spam then in 99.999% of all cases it is spam. And
> in 99.999% of all cases of "false" positives the sender has done
> something really wrong or his mail client/mail server is fucked up.
> Why should I white list them?
> If they are not able to send a at least somehow correct mail they don't
> want to communicate.
> It is like on a road section a few idiots are driving on the wrong side.
> Are the other drivers accepting this? Will they also start driving on
> the wrong side on this section? Sure not.
> And in case there is a serious real reason why I must white list a
> sender (at the moment no idea what this should be, never needed it):
> Don't mess around with the filter.
> Exclude them before the spam filter or write a spamassassin rule and
> deduct some points if this rule is matching.
> Samples and a how to write rules you find in the online spamassassin
> documentation https://wiki.apache.org/spamassassin/WritingRules
> 
> And when you have understood this and you think it all to the end then
> you don't want any white list, spam folder or quarantine.
> All incoming mail you filter during delivery in real time and reject all
> spam hard with 5xx.
> All mail from authenticated users you filter (yes we filter all mails.
> In and out) post-queue (maybe all spam filters are busy at the moment
> and I know no mail client able to handle 4xx errors proper). If the
> sender restrictions are correct (sender_mismatch and so on) it is save
> to bounce them so your client getting a report why his mail was
> rejected. You might have to change the report templates to make them
> more client understandable.
> My experience is: Time on the sending computer is not set correct
> combined with several other mistakes like: This is a important mail, so
> I write EVERYTHING IN CAPITAL LETTERS or a home brew software is
> creating simply completely broken mails. Assembling a correct formatted
> mail is more difficult than it looks like.
> 
> Real time filtering. You don't want to support spammers.
> If you first accept with a 250 response code and then filter: 250 means
> accepted for delivery. If it ends up in the inbox, quarantine is
> discarded: Does not matter it is delivered and the spammers gets paid.
> What to do with the accepted spam?
> I can not bounce it: Sender usual faked, backscatter and I end up on a RBL.
> I can not discard it: I don't know one country where this would be not a
> crime.
> I have to deliver it.
> So I throw it in a quarantine or spam folder where it will be lost.
> Which client is checking the spam folder frequently? None.
> From time to time (quota warning: Mailbox nearly full) the entire spam
> folder is deleted: Mails are lost.
> Ever checked on a quarantine system like maya how often users are
> checking it? I can tell you: Never.
> What is with rarely happening false positives?
> Might be a really important mail. Who will pay for the potential damage?
> Sender: "I informed you about changes in time. I have a 250 delivered.
> You got the mail."
> Receiver: "I did not get this mail."
> Court: "250 response code means: Delivered to your premises. If you
> loose it in house: Your problem."
> 
> If I have to check a quarantine or spam folder frequently for what do I
> need it?
> I want this all in my inbox. Making it easier.
> If I get all this crap in my inbox: For what do I need a spam filter? It
> is absolute useless.
> And don't tag mails as spam by changing the subject: You break DKIM
> signatures.
> 
> If I do pre-queue real time filtering: The rarely bounced false
> positives giving the sender within seconds the information: Not
> delivered. He can try again, pick the phone or whatever but the
> information will not be lost.
> 
> Andreas
> 
> 
> 
 		 	   		  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.amavis.org/pipermail/amavis-users/attachments/20150224/0e495140/attachment.html>


More information about the amavis-users mailing list