false positives

Vernon Schryver vjs@calcite.rhyolite.com
Mon Oct 31 23:41:26 UTC 2005

> From: Jeff Mincy 

> > I think that it mistaken.  The simplest DCC+SpamAssassin installation
> > will not have reasonable dccifd or dccproc thresholds like 10, 20, 50,
> > or even 500.  The default DCC client thresholds are "never".

> SpamAssassin defaults to using 999999 when checking the X-DCC header
> and interprets 'many' to be 999999 and ok to be 0 when comparing.

Yes, as I tried to say.  See also the SpamAssassin code (not just
documentation) that looks for "bulk" in X-DCC headers.

>   X-DCC-wuwien-Metrics: telesterion.delphioutpost.com 1290; Body=0 Fuz1=many Fuz2=many
> The Fuz1=many Fuz2=many triggers the DCC_CHECK rule in SpamAssassin.
> I am using -Q (Only query instead of reporting and then querying), so
> I have not reported the message.  That header means that 'many' other
> people have reported receiving this message?

While it could be that more than 16 million people received that
message, it is more likely that one or more recipients arranged to
report its checksums to DCC servers with local recipient counts of
"many".  Those responable have probably misconfigured their DCC clients.
An original design goal of the DCC is to not worry about such problems.
Because it is impossible to keep anonymous bad guys or the merely
momentarily confused from reporting legitimate mail with inflated
(e.g. "many") counts, all DCC users should view DCC bulk count as
saying only "this message is bulky."  If you whitelist your solicited
bulk mail, the incomptence or evil of others in inflating their counts
does you no harm.  You won't see any inflated counts for private mail,
unless you need to trim your lists of correspondents.

> SpamAssassin uses dcc_options to control which options are passed to dccproc.
> Anybody using dcc in SpamAssassin (use_dcc 1) and has not whitelisted sourceforge
> or added -Q to dcc_options will report the sourceforge email.

If everyone used `dccproc -Q`, then all DCC counts would be 0.  Thus,
those who do not report their incoming mail but only use DCC -Q queries
could be considered freeloaders.

> This gets us back to my original statement 
>    The easiest installation and use of DCC through SpamAssassin will
>    wind up reporting newsletters and having newsletters tagged by DCC.

Which gets back to the responses to your initial question.  We said
that that the DCC is a bulk mail detector and that it is incumbent on
DCC users to whitelist their legitimate bulk mail.

> Sure, it is not possible for all current newsletters to be whitelisted.
> I was suggesting only that dcc come with more whitelist entries.
> Maybe as separate include files.

You are far from the first to make that request.  My answer has long
been "Feel free to publish whitelists on your website, FTP server, or
elsewhere.  I am not able or competent to maintain such whitelists."

I think the only (imperfect) solution is for bulk mail senders to publish
markers that their subscribers can use to whitelist their bulk mail.
An IETF standard says bulk mailers should include List-ID lines.  Common
mailing list software including Mailman adds such lines.  Look at this
message for an example.

> Yes - the conversion is inexact - that's pretty much was I was getting
> at.  It would be nice if the whitelists were more similar.  That's all.

It would be nice if my words were Law, because overnight I would end
spam and the need for tools like the DCC.  Some people including survivors
of spammers, might disagree about whether that would be nice.  More
seriously, such a whitelisting standard is impossible.  There are too
many tests that a whitelist entry might reasonably bypass:
  - FTP access
  - HTTP acess
      HTTP access with cookies
      HTTP access with popups (for any of the many notions of "popup")
  - SMTP
     delivery to 1 local target
     delivery without DNS blacklist, greylist, body URL, DCC,
	DCC Reptuation, and/or other checks
     delivery of HTML mail, with(out) web bugs, graphical attachements,
	  executable attachements, ...

> I thought that only checksums were exchanged.  

yes, and only checksums of bulk mail.

>                                                I was specifically
> thinking more about bulk whitelisted email.  For example, I don't
> see much of a privacy concern with knowing that many people have
> whitelisted newsletter messages from sourceforge.  All I would get
> back is a count, like many?

If you know or can guess the contents of a message, then a checksum
can violate privacy.  Think about asking whether any checksum system
has seen a checksum for an interesting message.

Some organizations object on privacy grounds to any DCC MD5 checksums
leaving their network and talk about various (generally European)
privacy laws.  I have little sympathy for such concerns, because bulk
messages have little expectation of privacy.  Until I'm Emperor of the
World, their views matter more than mine.

Vernon Schryver    vjs@rhyolite.com

More information about the DCC mailing list

Contact vjs@rhyolite.com by mail or use the form.