DCC clients changing Subject header survey

Vernon Schryver vjs@calcite.rhyolite.com
Thu Sep 26 17:14:57 UTC 2002

> From: Brandon Long <blong@fiction.net>

> I'm not sure I like '-t many' as an automated system.  In fact, the more
> I use dcc, the more I'm not liking the -t many part.  I know the correct
> answer is to whitelist, but I'm starting to have to check mail that is
> marked "many" because of several incorrectly marked messages (including
> one from the ACLU, of all places).

Shouldn't you have whitelisted the ACLU to keep that mail from hitting
your DCC threshold when 10, 20, 50, or 1000 other ACLU members start
using the DCC?

What is the "real" count?  Do you really care about the number of
copies of a CERT advisory that have been seen by users of DCC
clients but not white-listed?  As more subscribers to CERT advisories
use DCC clients, the advisories' counts increase.  As subscribers
white-list, the advisories' counts decrease.

>                                     I know that with the correct
> architecture it would be hard, but it would be nice to be able to
> opt-out and get the "real" count.  Yes, someone could "game" the system
> to artifically raise the number on a message, but if I wanted a communal
> spam filter, I would have chosen razor.  Having worked with many opt-in
> mailing lists before, I can guaruntee that a large enough audience is
> going to include people who don't remember opt'ing in, or are too lazy
> to get off the list, and they're going to start marking things as spam.
> Or administrators who turn old accounts into spam traps.  The weekly
> United E-Fare notices are currently being marked as spam, for instance.
> (I had them white listed, but apparently the distribution method
> changed)

If you don't like rejecting at "many", then you must do as I do and
recommend, and reject at a value that says "bulk," such as 5 or 50.
A change in United's distribution method would require a change in
white-listing regardless of spam trapping, because that traffic would
get counts above any reasonable "bulk" threshold.

> Is it possible for dcc to keep track of both?  That something was marked
> as spam and the actual bulk count?  Does anyone use the -t to specify a
> count other than many?

MAPS is running DCC servers in their "beta test" (don't ask me how
you can beta test something for a year and half) with only trusted
people running DCC clients.  If you can trust them to never inflate
a checksum's count by accident (e.g. run dccproc 100 times on a message)
or use `dccproc -t many`, then you can trust their DCC servers to have
"real" counts.  However, you still need white-lists for legitimate
bulk mail. 

That the reason has an log for an RSS entry
(see http://work-rss.mail-abuse.org/cgi-bin/nph-rss?query= )
is because one of the people still at MAPS once had a script go crazy.
For me, that says all that needs to be said about trusting people
to never make mistakes; your milage may vary.

I also do not like Vipul's Razor/Cloudmark or any communal spam
marking tools because I think relying on people to mark spam is
crazy.  When you have more than a few people marking spam, you will
have a significant amount of legitimate mail marked as spam, at least
as I define "significant."

You can't avoid human error or malicious mischief, so you must not
depend on its absense and you may as well try to utilize it.
I think a count of "many" means no more than "many" or "lots."

That is why the DCC servers only tell about "bulk."  To determine
"spam", you MUST add something such as a local white-list.  That's
why the DCC source now includes those prototype point-and-click
white-listing CGI scripts.  There are ISPs using the DCC with global
white-lists and DCC rejection thresholds of 20 or 50.  I don't
understand how they can do that, but I'm not in charge of or a user
of their operations and so have no standing to offer an opinion.

SpamAssassin users say their rules yield a tolerable false positive
rate and perhaps making a DCC determination of "bulk" be only half of
a SpamAssassin threshold is a good choice for them.  It wouldn't be
for me, because I think the false positive rate must be below 0.1%
and ought to be below 0.01%.

Having SpamAssassin automatically mark mail that the other SpamAssassin
rules don't like would only not significantly inflate the DCC counts.
It could not cause false positives unless your correspondents include
people who mark everything as spam.  Really private mail is not seen
by anyone else's and so can't have its DCC counts inflated.  Having
SpamAssassin mark its spam with a DCC count of "many" would only tell
DCC clients "this mail is awfully bulky."

Vernon Schryver    vjs@rhyolite.com

More information about the DCC mailing list

Contact vjs@rhyolite.com by mail or use the form.