Dealing with erroneous "many" counts

Vernon Schryver vjs@calcite.rhyolite.com
Fri Jan 24 16:48:08 UTC 2003


> From: "John R Levine" <johnl@iecc.com>

> I have a spamtrap set up that automatically does a -tmany on mail that
> hits spam-only addresses.  It works fine, but now and then I fat-finger
> something and mistag something that I shouldn't have.
>
> Is there any way to undo a mistake beyond adding it to my local whitelist?

As the FAQ in http://www.rhyolite.com/anti-spam/dcc/FAQ.html#delck says:

  What if I make a mistake with dccproc -t many and report legitimate mail
  as spam? 
    It is possible to delete checksums from the distributed DCC database
    with the cdcc delck operation. However, it is not worth the trouble.
    Unless the same (as far as the fuzzy checksums are concerned)
    message is sent again, no one is likely to notice the mistake before the
    report of the message's checksums expire from the DCC servers'
    databases for lack of repetition. 


> More generally, is there anything in DCC to defend against users or leaf
> nodes who are overenthusiastic or malicious in their tagging?  I realize
> the straightforward approach is not to let people who do that participate,
> but if I use this for a lot of mail, it's gotta be entirely or mostly
> automatic.

One of my major disagreements with systems that seem to have "voting"
such as Cloudmark's Razor is with the notion that it is possible to
avoid users who are overenthusiastic or malicious in their tagging
unless you have only about a dozen participants and even there will
be mistakes.  I think that in a large scale system, you must plan on
plenty of mistakes and worse.

As I see it, the raw DCC data only tells you about bulk mail.  A message
with checksums larger than 1 must have been seen somewhere else, and so
is at at least somewhat bulky.  A count of "many" does not imply that a
message is spam, but only that it has been seen elsewhere and declared
extremely bulky.  To determine whether a bulky message sent to you is also
spam, the system must know whether it is unsolicited as well as bulky.
I don't see any way to do that except with white-lists.


Vernon Schryver    vjs@rhyolite.com



More information about the DCC mailing list

Contact vjs@rhyolite.com by mail or use the form.