The nonsense of training spam filters and spam folders [was: training spamsassin]

Olivier Nicole Olivier.Nicole at cs.ait.ac.th
Tue Feb 24 09:49:03 CET 2015


Daemon,

> When the user clicks " Not Spam" in thunderbird. -E-mail goes to your inbox. 
> These e- mails from the inbox would like to add them to a whitelist .

This goes *way& beyond the role of Amavis, it's pop of imap you are
talking about.

Olivier

>> Date: Tue, 24 Feb 2015 10:55:21 +0300
>> From: ta at geuka.net
>> To: amavis-users at amavis.org
>> Subject: The nonsense of training spam filters and spam folders [was: training spamsassin]
>> 
>> 
>> > I would like to spend a script in each user box by adding the sender
>> > on the whitelist .
>> 
>> Please try at least to understand how a spamfilter is working. Read the
>> amavis introduction section in the documentation to understand what
>> amavis is, does and what is not, does not.
>> Understand how Bayesian filters are working.
>> Start with the wikipedia articles
>> http://en.wikipedia.org/wiki/Naive_Bayes_spam_filtering and
>> http://en.wikipedia.org/wiki/Bayes%27_theorem
>> 
>> When you have understood how this filter is working you don't want to
>> train it any more because the you have understood that you push the
>> detection in a direction where you usual don't want it.
>> 
>> The kind of "white" listing you might want is amavis' pen pal feature.
>> The documentation for amavis is poor: Is known. Read the change logs and
>> have a look at the source code. This is the real documentation.
>> For finding out how a feature has to be configured you don't have to be
>> Perl Guru. Some basic Perl knowledge is usual fine.
>> 
>> White listing? Why?
>> If a mail is tagged as spam then in 99.999% of all cases it is spam. And
>> in 99.999% of all cases of "false" positives the sender has done
>> something really wrong or his mail client/mail server is fucked up.
>> Why should I white list them?
>> If they are not able to send a at least somehow correct mail they don't
>> want to communicate.
>> It is like on a road section a few idiots are driving on the wrong side.
>> Are the other drivers accepting this? Will they also start driving on
>> the wrong side on this section? Sure not.
>> And in case there is a serious real reason why I must white list a
>> sender (at the moment no idea what this should be, never needed it):
>> Don't mess around with the filter.
>> Exclude them before the spam filter or write a spamassassin rule and
>> deduct some points if this rule is matching.
>> Samples and a how to write rules you find in the online spamassassin
>> documentation https://wiki.apache.org/spamassassin/WritingRules
>> 
>> And when you have understood this and you think it all to the end then
>> you don't want any white list, spam folder or quarantine.
>> All incoming mail you filter during delivery in real time and reject all
>> spam hard with 5xx.
>> All mail from authenticated users you filter (yes we filter all mails.
>> In and out) post-queue (maybe all spam filters are busy at the moment
>> and I know no mail client able to handle 4xx errors proper). If the
>> sender restrictions are correct (sender_mismatch and so on) it is save
>> to bounce them so your client getting a report why his mail was
>> rejected. You might have to change the report templates to make them
>> more client understandable.
>> My experience is: Time on the sending computer is not set correct
>> combined with several other mistakes like: This is a important mail, so
>> I write EVERYTHING IN CAPITAL LETTERS or a home brew software is
>> creating simply completely broken mails. Assembling a correct formatted
>> mail is more difficult than it looks like.
>> 
>> Real time filtering. You don't want to support spammers.
>> If you first accept with a 250 response code and then filter: 250 means
>> accepted for delivery. If it ends up in the inbox, quarantine is
>> discarded: Does not matter it is delivered and the spammers gets paid.
>> What to do with the accepted spam?
>> I can not bounce it: Sender usual faked, backscatter and I end up on a RBL.
>> I can not discard it: I don't know one country where this would be not a
>> crime.
>> I have to deliver it.
>> So I throw it in a quarantine or spam folder where it will be lost.
>> Which client is checking the spam folder frequently? None.
>> From time to time (quota warning: Mailbox nearly full) the entire spam
>> folder is deleted: Mails are lost.
>> Ever checked on a quarantine system like maya how often users are
>> checking it? I can tell you: Never.
>> What is with rarely happening false positives?
>> Might be a really important mail. Who will pay for the potential damage?
>> Sender: "I informed you about changes in time. I have a 250 delivered.
>> You got the mail."
>> Receiver: "I did not get this mail."
>> Court: "250 response code means: Delivered to your premises. If you
>> loose it in house: Your problem."
>> 
>> If I have to check a quarantine or spam folder frequently for what do I
>> need it?
>> I want this all in my inbox. Making it easier.
>> If I get all this crap in my inbox: For what do I need a spam filter? It
>> is absolute useless.
>> And don't tag mails as spam by changing the subject: You break DKIM
>> signatures.
>> 
>> If I do pre-queue real time filtering: The rarely bounced false
>> positives giving the sender within seconds the information: Not
>> delivered. He can try again, pick the phone or whatever but the
>> information will not be lost.
>> 
>> Andreas
>> 
>> 
>> 
>  		 	   		  
>
> [2:text/html Show]
>

-- 


More information about the amavis-users mailing list