Vernon Schryver
vjs@calcite.rhyolite.com
Tue, 20 Sep 2005 07:17:51 -0600 (MDT)
> From: Georg Graf > > > || REP_ARGS="-t rep,90 -t rep-total,1000" > > > > 90% and 1000 seem rather high. > > You saw that in this case (only "-t rep,80") it did not work for > me. What would you suggest next? My idea was Based on your choice of "-t CMN,25,50" I would try "-t rep,30 -t rep-total,50" I would also turn off DCC Reputations for the mailboxes that cannot tolerate false positives. > Hmm. This comes from my effort to set the reputation parameters > in a way that they do not yield "false positives" where "false > positives" means mails that people want to get and that are not > commercial. I am aware there is no way for the DCC to know that > ;) I define spam as "unsolicited bulk mail," not "unsolicited commcial mail." When used with per-user whitelists, the DCC can detect spam using that definition. The target counts detect "bulk" and each user's individual whiteclnt file defines "(un)solicited." > I think I have a fundamental problem with reputations. The higher > I set the rep-total value, the more I can be sure that (100-rep)% > of mail from a host are not bulk messages. If I lower the > rep-total value, then I trust the reputation values even if I > dont know much about a host. > > What do you think about these arguments? The (100-rep)% from an IP address are only not detected as bulk by the DCC when delivered. They might have been detected as bulk if they had been delivered later. Or they might have had better "hash busting." Second, that argument misses the idea of reputations. If you refuse to believe someone who tells lies 90% of time, you know you will not crediting the lier's 10% true statements. If you do not hire convicted embezzlers as accountants, it is not because you think that all embezzlers always steal all of the time, but that you think the chances of a new crime are high. A X% DCC Reputation does not mean "the next message from 10.2.3.4 is spam" but (I hope) "the next message from 10.2.3.4 is spam with probabilty at least X%". A mailbox that cannot tolerate false positives should not use reputations. It should also not use SpamAssassin or Bayesian filters because those also are merely probabilistic detectors of spam. It should also not use the DCC without a real per-mailbox whitelist, because what is legitimate, solicited bulk mail for one mailbox is spam for another. > I use the "common choice": "-t CMN,25,50". Since the mail really > had only 11 recepients, this would have done the job, I think. For that particular message, yes, but so would "-t rep-total,50" Vernon Schryver vjs@rhyolite.com