false positives

Jeff Mincy mincy@rcn.com
Mon Oct 31 22:14:56 UTC 2005


On Thu, 20 Oct 2005, vjs@calcite.rhyolite.com wrote:

>> From: Jeff Mincy 
> 
>> The easiest installation and use of DCC through SpamAssassin will
>> wind up reporting newsletters and having newsletters tagged by DCC.
>> By simple I mean installing DCC with no extra setup and enabling DCC
>> in the Spamassassin user_prefs.
> 
> I think that it mistaken.  The simplest DCC+SpamAssassin installation
> will not have reasonable dccifd or dccproc thresholds like 10, 20, 50,
> or even 500.  The default DCC client thresholds are "never".
> The X-DCC headers added by dccifd might say "Fuz1=50000" but I don't
> think SpamAssassin will notice.  The X-DCC header will not contain the
> "bulk" string and the counts won't contain "Many" (to be translated by
> SpamAssassin into "999999").

SpamAssassin defaults to using 999999 when checking the X-DCC header
and interprets 'many' to be 999999 and ok to be 0 when comparing.
This is from the man page

Name
               Mail::SpamAssassin::Plugin::DCC - perform DCC check of
       messages

User Options
       dcc_body_max NUMBER
       dcc_fuz1_max NUMBER
       dcc_fuz2_max NUMBER
           This option sets how often a message's body/fuz1/fuz2
           checksum must have been reported to the DCC server
           before SpamAssassin will consider the DCC check as
           matched.

           As nearly all DCC clients are auto-reporting these
           checksums, you should set this to a relatively high
           value, e.g. 999999 (this is DCC's MANY count).

           The default is 999999 for all these options.

       dcc_options options
           Specify additional options to the dccproc(8) command.
           Please note that only characters in the range
           [0-9A-Za-z ,._/-] are allowed for security reasons.

           The default is "-R".

This is the header that I get back for a message from Sourceforge
with Subject: SOURCEFORGE.NET UPDATE - 2005-10-31 EDITION

  bash% /usr/local/bin/dccproc -H -Q -S mail_host -S Sender -S List-ID -S From -l /home/jeff/.dcc -w /var/dcc/whiteclnt -R < /home/jeff/mail/backup/msg.UUuO
  X-DCC-wuwien-Metrics: telesterion.delphioutpost.com 1290; Body=0 Fuz1=many Fuz2=many

The Fuz1=many Fuz2=many triggers the DCC_CHECK rule in SpamAssassin.
I am using -Q (Only query instead of reporting and then querying), so
I have not reported the message.  That header means that 'many' other
people have reported receiving this message?

SpamAssassin uses dcc_options to control which options are passed to dccproc.
Anybody using dcc in SpamAssassin (use_dcc 1) and has not whitelisted sourceforge
or added -Q to dcc_options will report the sourceforge email.

This gets us back to my original statement 
   The easiest installation and use of DCC through SpamAssassin will
   wind up reporting newsletters and having newsletters tagged by DCC.
 
>> >              Whitelists let individual users enforce their individual
>> > notions of which bulk mail is solicited.  For example, Microsoft has
>> > sent me unsolicited bulk mail.  That it is spam for me should have no
>> > bearing on whether it is spam for you.
>>
>> It is not spam if you signed up with the company to receive the
>> newsletter or specials (etc) and if you can control the email from
>> the company.
> 
> Depending on what it means, that is either what I tried to say or wrong.
> I'm worried about the phrase "and if you can control the email from the
> company."  If I did not ask for it, then it is always spam, even if I
> might be able to beg the company to stop sending it.  Many users scream
> "SPAM" instead of unsubscribing from legitimate mail that they explicitly
> subscribed to.  That is irrelevant to the fact that most spammers claim
> their spam doesn't stink, often with variations of
> http://www.rhyolite.com/anti-spam/that-which-we-dont.html

The keyword is who has control.  If somebody else signed me up (ie
forged my email address) for a newsletter then I didn't have control.
If the Newsletter then asks me to verify the subscription then I'm back
in control.

>> I agree that users have to have local whitelists and should maintain
>> the whitelist, but I also think that the default DCC whitelist should
>> come with more whitelist entries for well known and reasonable newsletters.
> 
> For years I tried some of that but gave up.  See old versions of the
> whitecommon file in the DCC source.  For one thing, there are too many
> newsletters to count.

Sure, it is not possible for all current newsletters to be whitelisted.
I was suggesting only that dcc come with more whitelist entries.
Maybe as separate include files.

> Another and bigger reason is that there are many newsletters whose
> publishers give away "free" or "courtesy subscriptions." Even newsletters
> that have substantial numbers of real subscribers often add a few (or
> not so few) involuntary targets to their lists.
> 
> Consider some obvious cases.  Would you whitelist Yahoo's Groups because
> most of them are legitimate and despite years of continuing history
> of group owners unilaterally "subscribing" victims?
> What about Microsoft's bcentral.com/bcentralhost.com/linkexchange.com/-
> listbot.com/listbuilder.com system?
> What about the unsolicited bulk mail that Microsoft has sent to people
> with only UNIX boxes warning about Windows security problems?  Would
> you whitelist all of it because only tiny fraction of it is spam?
> 
> I think "solicited" is always an individual, personal attribute, and
> so whitelisting must be equally individual.

No - I would not whitelist any newsletter that allows unconfirmed forged
"subscriptions", certainly not in a default system wide manner.

>> It would be easier if there was more similarity between different
>> whitelists.  For example, could the whitelist_from_rcvd syntax used
>> for SpamAssassin be read by dcc, eg:
>>   whitelist_from_rcvd BJs_MemberServices@bjs.chtah.com cheetahmail.com
>> This would allow a common whitelist file to be included by both.
> 
> You might write a cron job that converts many entries from one format
> to another, but judging from
> http://spamassassin.apache.org/full/3.0.x/dist/doc/Mail_SpamAssassin_Conf.html
> you cannot convert that line to a single DCC whiteclnt line.  You might
> convert it to a pair of DCC lines like
>    OK2 env_from  BJs_MemberServices@bjs.chtah.com
>    OK2 substitute mail_host cheetahmail.com
> but the conversion is inexact.

Yes - the conversion is inexact - that's pretty much was I was getting
at.  It would be nice if the whitelists were more similar.  That's all.

>> Is that a problem?  It might be kind of interesting to know how many
>> other people have seen the same whitelisted message.  The DCC count
>> with the threshold is being used as a binary (a message is either
>> known to be bulk or it is not known to be bulk) - and the usual use of
>> DCC is to equate Bulk with Spam.  Three or four values might be more
>> useful: message is whitelisted bulk (either dcc default or user),
>> other bulk or not bulk.
> 
> That sounds like "feature bloat."
> 
> A more compelling argument is that information which third parties
> might find interesting but that is none of their business should not
> be disclosed or even collected.  Mail that you whitelist is no one's
> concern but your own and the sender's, and often not even the sender's.
> It is not merely a good thing but an important privacy feature that
> whitelisted mail is not reported to any third party including DCC
> servers.

I thought that only checksums were exchanged.  I was specifically
thinking more about bulk whitelisted email.  For example, I don't
see much of a privacy concern with knowing that many people have
whitelisted newsletter messages from sourceforge.  All I would get
back is a count, like many?

But ok, nevermind.

-jeff



More information about the DCC mailing list

Contact vjs@rhyolite.com by mail or use the form.