Tweaking Reputation Parameters

Vernon Schryver vjs@calcite.rhyolite.com
Tue, 20 Sep 2005 07:17:51 -0600 (MDT)


> From: Georg Graf 

> > > ||  REP_ARGS="-t rep,90 -t rep-total,1000"
> > 
> > 90% and 1000 seem rather high.  
>
> You saw that in this case (only "-t rep,80") it did not work for
> me. What would you suggest next? My idea was

Based on your choice of "-t CMN,25,50"
I would try "-t rep,30 -t rep-total,50"

I would also turn off DCC Reputations for the mailboxes that cannot
tolerate false positives.

> Hmm. This comes from my effort to set the reputation parameters
> in a way that they do not yield "false positives" where "false
> positives" means mails that people want to get and that are not
> commercial. I am aware there is no way for the DCC to know that
> ;)

I define spam as "unsolicited bulk mail," not "unsolicited commcial
mail." When used with per-user whitelists, the DCC can detect spam
using that definition.  The target counts detect "bulk" and each user's
individual whiteclnt file defines "(un)solicited."


> I think I have a fundamental problem with reputations. The higher
> I set the rep-total value, the more I can be sure that (100-rep)%
> of mail from a host are not bulk messages. If I lower the
> rep-total value, then I trust the reputation values even if I
> dont know much about a host.
>
> What do you think about these arguments?

The (100-rep)% from an IP address are only not detected as bulk by the
DCC when delivered.  They might have been detected as bulk if they had
been delivered later.  Or they might have had better "hash busting."

Second, that argument misses the idea of reputations.  If you refuse
to believe someone who tells lies 90% of time, you know you will not
crediting the lier's 10% true statements.  If you do not hire convicted
embezzlers as accountants, it is not because you think that all
embezzlers always steal all of the time, but that you think the chances
of a new crime are high.  A X% DCC Reputation does not mean "the next
message from 10.2.3.4 is spam" but (I hope) "the next message from
10.2.3.4 is spam with probabilty at least X%".

A mailbox that cannot tolerate false positives should not use reputations.
It should also not use SpamAssassin or Bayesian filters because those
also are merely probabilistic detectors of spam.  It should also not
use the DCC without a real per-mailbox whitelist, because what is
legitimate, solicited bulk mail for one mailbox is spam for another.


> I use the "common choice": "-t CMN,25,50". Since the mail really
> had only 11 recepients, this would have done the job, I think.

For that particular message, yes, but so would "-t rep-total,50"


Vernon Schryver    vjs@rhyolite.com