why I like DCC but dislike dccproc -t many-rantings (and a few ideas)

Vernon Schryver vjs@calcite.rhyolite.com
Fri Oct 11 14:32:31 UTC 2002

> From: "Tony L. Svanstrom" <tony@svanstrom.com>

> ...
>  #1 Some people report an e-mail as 'many' by mistake.
> #2 Some people think their other filters are 100% accurate, so they autoreport
> with 'many'.
>  #3 Some report e-mails as 'many' because they think it'll benefit the average
> user of DCC.
>  (#4 People find DCC because they're trying to get rid of spam; so a lot of
> people think that a high score means that the e-mail is spam.)
> #1, #2 and #3 are all problems that lowers the reliability of DCC, and they're
> all related to dccproc; as that is, AFAIK, what the average not too root(ish)
> person is using, which, IMNSHO, is the same group of people that tend to make
> these mistakes more often than others.

You and I disagree about the validity of your conclusion.  Even a
"many" mistake is good data, because it correctly marks the message
as "quite bulky."

I also think you are making a mistake in focusing on "many."  It is
necessary for the DCC clients to report the number of SMTP recipient
addresses.  It is impossible limit DCC clients to reporting a number
of recipients that is functionally less than "many."  A DCC client
needs the ability to say "the envelope for this message addressed it
to 200 people," but for the purpose of rejecting unsolicited bulk mail,
"200" is the same as "many."  "-t many" is identical to "-t 16777200`,
and 16777200 is merely the largest possible target count after
reserving some special values.

> ...
>  #1 Why isn't it a good idea to force people to run their own DCC-server to
> make it possible to report an e-mail more than once (compared to letting any
> user do it)?

That seems to concern policies for running dccproc.  There is nothing
that can be done in the source that control the policies followed by
the people who install the binaries they build on their own systems.
Worse, there is nothing the people running DCC servers and clients on
their hosts can do to ensure that other people running servers and
clients on other hosts will follow more restrictive policies.
There is nothing that could be done to change the policies of the
people now running DCC clients.

Your policy sounds more like the way people now at MAPS would like to 
run DCC clients and servers.  You might get joy by contacting MAPS about 
the DCC "Beta Program" they've been running since 2000 at
I think they'd serve their users better by shutting down that program,
since their network sees relatively few checksums and so the DCC does
not perform as well.

>  #2 AFAIK whitelists stop the checksums from being reported to the server, why
> is that if DCC is just about checking bulkiness?

The purpose of checking bulkiness is to detect spam or unsolicited bulk mail.
Checksums for solicited bulk mail would only clutter the databases.

It is also desirable to maximize the privacy of purely local bulk mail
by not reporting its checksums to outsiders, even without envelope checksums.

>  #3 To maximize the use of DCC as many e-mails possible should, of course, be
> reported, but never more than once per e-mail; why not tie reporting to a hash
> of the env-to, making it possible for everyone to report all e-mails passing
> thru their system?

That would increase the size of the database by a factor of at least
100 and slow down DCC transactions significantly.  Instead of finding
and updating a total count for the checksum of the message, the DCC
server would first have to look for and fail a record contain both
the env-to and the message checksum.  Flooding would not only involve
distributing message checksums and their total recipient counts but
also the env-to checksums for every bulky message.

There would also be intolerable risks to privacy.  Consider asking
the nearest DCC server about the checksums of the env-to value
"bgates@microsoft.com" and of the message "ok, we'll sell AOL/Time-Warner."

Regardless, it is would be impossible to impossible that change in
policies and mechanisms on the existing DCC network.  About 1000 people
have installed DCC client and server software to look at perhaps
10,000,000 mail messages per day.

> ...
> c3: This would also obsolete the -Q option, which is, IMHO, another thing that
> combined with my other "problems" with DCC lowers the use/stability.

I don't see how `dccproc -Q` and `dccm -Q` can be considered harmful.
Even if they are harmful they are necessary for maintenance and diagnosing
problems and for situations where the flow of mail causes individual 
messages to be checked more than once.

Vernon Schryver    vjs@rhyolite.com

More information about the DCC mailing list

Contact vjs@rhyolite.com by mail or use the form.