DCC Checksums: How are they determined?

Peter Beckman beckman@purplecow.com
Sun Nov 17 03:18:55 UTC 2002

One other thing;

I've noticed that the From: Message-ID: and Received: checksums are no
longer attached to my mail, and usually show up blank in the dccproc -C.  I
read through the change log looking for "header" and "from" but was unable
to find any note about no longer keeping the header checksums on the
server.  Is this a per-server basis or now legacy tags?


On Wed, 6 Nov 2002, Vernon Schryver wrote:

> > From: Peter Beckman <beckman@purplecow.com>
> > Subject: DCC Checksums: How are they determined?
> > I am thinking about writing some software to document and track spam using
> > DCC checksums.  I want to keep a copy of the body of the spam, and use the
> > checksum's from DCC to identify if another one is the same or different.
> That is not far from a description of the DCC.
> > ...
> >                      Fuz2: 9b2f514d ad3c6167 14f3ccc4 ee2abb78    many
> >
> > I assume it is just an MD5 hash on each of these pieces of information.
> > The checksum is just split into 4 groups of 8 chars each for viewing
> > pleasure.
> Yes, those 4 groups are merely easier to read than 32 consecutive
> hex digits.
> > Does DCC do anything special to attachments?  For example, a spam with 3
> > 30K image attachments, are they considered in the checksum?
> I actively discourage discussions of details of how the checksums are
> computed beyond what is written in the documentation.  There is no
> profit for people who dislike spam in helping spammers who generally
> can't read C.  So let's just say attachments are "considered," and
> not talk about what "considered" might mean.
> > There are usually several "Received: " lines in an email.  Which does DCC
> > checksum, or is it all of them?
> The main dcc man page says the following where it discusses the
> checksums:
> ]           Received     last Received: header line in the SMTP message
> See http://www.rhyolite.com/anti-spam/dcc/dcc-tree/dcc.html#X-DCC-Headers
> > When it checksum's the headers, does it checksum the "From: " as well as
> > the address or just the address?  If just the data, how does it deal with
> > multiple received lines?  Concatenate?
> I don't understand that question where it involves From: and Received:
> headers.  I also don't understand 'the "From:" as well as the address'.
> The header checksums covers the entire From header line, with some minor
> exceptions including whitespace and an optional pair of outer <>'s.
> Perhaps the question would be answered by trying `dccproc -Q` on some
> test messages.
> > ...
> > What happens to the fuzzy 1 and 2 checksums?  Are all [a-z]@[a-z].[a-z]
> > (example) replaced with ####### or something?  Are all proper names, or the
> > first few lines of a spam, replaced with ### or something standard to
> > remove personalization?
> How personalizations are handled is an inappropriate topic for public
> discussions.  In fact there are very few people with whom I'll discuss
> that stuff in private.
> One of the ground rules of the DCC is that new versions of the client
> code must be distributed periodically to deal with changes in spam
> personalizations.  It's been a year since the last change, but there's
> no reason to hurry the next one by giving spammers aid and comfort in
> the form of public discussions.
> Vernon Schryver    vjs@rhyolite.com
