How to improve DCC handling of attachments?

Gary Mills mills@cc.umanitoba.ca
Sat Jul 8 20:25:19 UTC 2006


On Fri, Jul 07, 2006 at 03:20:28PM -0600, Vernon Schryver wrote:
> > From: Gary Mills 
> 
> > I'm just looking at a complaint from a user about DCC unexpectedly
> > rejecting an e-mail message.  It had a single sender, a single
> > recipient and unique content as an attachment.  Clearly, it was not
> > bulk mail.  However, DCC had already seen 2801 messages with the same
> > fuz2 checksum, and rejected it as bulk mail.  This behavior is quite
> > confusing to users, and difficult for me to explain to them.

I should say, first of all, that I'm extremely pleased with the
operation of DCC.  Complaints are few, but there are some.

> What if you ask such users to look at the text and ask themselves
> whether it is substantially identical to a zillion other messages?

The sender may not have control over the format.  I don't know, but
in this case it may have been entirely generated by some Exchange
facility for sending files.  The recipient can only go by what is
captured in the DCC logs.  MIME-encoded binary files, and HTML for
that matter, are gibberish to most people.

> > It was a multi-part MIME message, apparently generated by Microsoft
> > Exchange.  The first part was text/plain, with a fixed format except
> > for the name of the attached file.  The second part was
> > application/x-zip-compressed.  I presume that DCC ignored that part
> > in computing the checksum.
> >
> > How can messages like this be treated as unique by DCC without opening
> > the doors to spam?
> 
> I can't think of more than:
> 
>  1. tell users not to do that.
>      Chances are that zipped body did not need to the double or triple
>      encryption of compression and whatever else it had.  If it did,
>      it should probably have not have been sent via email but via HTTP
>      or FTP.

The users are typically external.  In this case, it was an employee
of a supplier.  They just use whatever facilities are available to them.
I have no idea of the contents of the ZIP file.  That's not my concern.
I was hoping that DCC could be made to be more cooperative in this case.

>      Don't many sites block such mail because it is so often a Microsoft
>      worm/virus?

That's a policy issue.  Virus e-mail often contains a ZIP file, but
many ZIP files in e-mail contain legitimate files.

>  2. tell users to whitelist their correspondents that do that.

That's what we do now.  However, inter-personal e-mail messages
containing unique binary attachments are clearly not bulk mail.
By that definition, DCC should not be identifying them as such.
Is this technically possible?

>  3. whitelist that particular FUZ2 checksum for your entire installation
> 
>      That checksum might already be in John Levine's list of checksums of
>      empty and test messages at http://www.iecc.com/dcc-testmsg-whitelist.txt
>      You might be able to convince him to add it.

It's not there.

>      Current versions of the DCC source include 
>      /var/dcc/libexec/fetch-testmsg-whitelist 
>      to refresh local copies of that whitelist occassionally.  See also
>      http://www.rhyolite.com/anti-spam/dcc/dcc-tree/misc/fetch-testmsg-whitelist

That's a good idea.  However, we shouldn't be relying on one person to
identify messages that DCC treats as empty, extract the relevant checksums
and update that file.  That's too much to ask of anyone.

>      I've stopped using that whitelist for various reasons including
>      realizing that no one at rhyolite.com cares to receive empty or
>      test messages that consisting of Microsoft XML junk, free mail provider
>      advertising, etc.

I understand that, but again that's a policy decision that must be made
by each organization.  Some people here want to receive at least some
of the things that you mention, provided that it's not spam.

For an example of the latter, I get many copies of a new form of spam
sent to our `abuse' address.  It consists of a single line of text,
often just one word, along with a small image file that advertizes a
performance-enhancing drug.  Clearly, this is intended to get past
spam filters that attempt to identify spam by scanning the header and
body text.  It's going to take image analysis to handle this spam.

-- 
-Gary Mills-    -Unix Support-    -U of M Academic Computing and Networking-



More information about the DCC mailing list

Contact vjs@rhyolite.com by mail or use the form.