false positives

Vernon Schryver vjs@calcite.rhyolite.com
Thu Oct 20 18:28:30 UTC 2005


> From: Jeff Mincy 

> The easiest installation and use of DCC through SpamAssassin will
> wind up reporting newsletters and having newsletters tagged by DCC.
> By simple I mean installing DCC with no extra setup and enabling DCC
> in the Spamassassin user_prefs.

I think that it mistaken.  The simplest DCC+SpamAssassin installation
will not have reasonable dccifd or dccproc thresholds like 10, 20, 50,
or even 500.  The default DCC client thresholds are "never".
The X-DCC headers added by dccifd might say "Fuz1=50000" but I don't
think SpamAssassin will notice.  The X-DCC header will not contain the
"bulk" string and the counts won't contain "Many" (to be translated by
SpamAssassin into "999999").


> >              Whitelists let individual users enforce their individual
> > notions of which bulk mail is solicited.  For example, Microsoft has
> > sent me unsolicited bulk mail.  That it is spam for me should have no
> > bearing on whether it is spam for you.
>
> It is not spam if you signed up with the company to receive the
> newsletter or specials (etc) and if you can control the email from
> the company.

Depending on what it means, that is either what I tried to say or wrong.
I'm worried about the phrase "and if you can control the email from the
company."  If I did not ask for it, then it is always spam, even if I
might be able to beg the company to stop sending it.  Many users scream
"SPAM" instead of unsubscribing from legitimate mail that they explicitly
subscribed to.  That is irrelevant to the fact that most spammers claim
their spam doesn't stink, often with variations of
http://www.rhyolite.com/anti-spam/that-which-we-dont.html

> I agree that users have to have local whitelists and should maintain
> the whitelist, but I also think that the default DCC whitelist should
> come with more whitelist entries for well known and reasonable newsletters.

For years I tried some of that but gave up.  See old versions of the
whitecommon file in the DCC source.  For one thing, there are too many
newsletters to count.

Another and bigger reason is that there are many newsletters whose
publishers give away "free" or "courtesy subscriptions." Even newsletters
that have substantial numbers of real subscribers often add a few (or
not so few) involuntary targets to their lists.

Consider some obvious cases.  Would you whitelist Yahoo's Groups because
most of them are legitimate and despite years of continuing history
of group owners unilaterally "subscribing" victims?
What about Microsoft's bcentral.com/bcentralhost.com/linkexchange.com/-
listbot.com/listbuilder.com system?
What about the unsolicited bulk mail that Microsoft has sent to people
with only UNIX boxes warning about Windows security problems?  Would
you whitelist all of it because only tiny fraction of it is spam?

I think "solicited" is always an individual, personal attribute, and
so whitelisting must be equally individual.



> It would be easier if there was more similarity between different
> whitelists.  For example, could the whitelist_from_rcvd syntax used
> for SpamAssassin be read by dcc, eg:
>   whitelist_from_rcvd BJs_MemberServices@bjs.chtah.com cheetahmail.com
> This would allow a common whitelist file to be included by both.

You might write a cron job that converts many entries from one format
to another, but judging from
http://spamassassin.apache.org/full/3.0.x/dist/doc/Mail_SpamAssassin_Conf.html
you cannot convert that line to a single DCC whiteclnt line.  You might
convert it to a pair of DCC lines like
   OK2 env_from  BJs_MemberServices@bjs.chtah.com
   OK2 substitute mail_host cheetahmail.com
but the conversion is inexact.


> Is that a problem?  It might be kind of interesting to know how many
> other people have seen the same whitelisted message.  The DCC count
> with the threshold is being used as a binary (a message is either
> known to be bulk or it is not known to be bulk) - and the usual use of
> DCC is to equate Bulk with Spam.  Three or four values might be more
> useful: message is whitelisted bulk (either dcc default or user),
> other bulk or not bulk.

That sounds like "feature bloat."

A more compelling argument is that information which third parties
might find interesting but that is none of their business should not
be disclosed or even collected.  Mail that you whitelist is no one's
concern but your own and the sender's, and often not even the sender's.
It is not merely a good thing but an important privacy feature that
whitelisted mail is not reported to any third party including DCC
servers.


Vernon Schryver    vjs@rhyolite.com



More information about the DCC mailing list

Contact vjs@rhyolite.com by mail or use the form.