DCC Checksums: How are they determined?

Peter Beckman beckman@purplecow.com
Thu Nov 7 06:27:57 UTC 2002

I am thinking about writing some software to document and track spam using
DCC checksums.  I want to keep a copy of the body of the spam, and use the
checksum's from DCC to identify if another one is the same or different.

How does DCC generate the checksums?  MD5 it looks like:
                                                      checksum  server
                     From: e7cb78fc 719aa0eb 13f4eb0b 5d2283e6
               Message-ID: ff753cbf c635525c d820995e 0e45a308
                 Received: 8b3e3f21 16850068 f10f3a1f 996f00d5
                     Body: c8533bd5 b9bea813 98000b63 c5a08c8f    many
                     Fuz1: ff846207 d59e42d4 492fb9d5 61165f50    many
                     Fuz2: 9b2f514d ad3c6167 14f3ccc4 ee2abb78    many

I assume it is just an MD5 hash on each of these pieces of information.
The checksum is just split into 4 groups of 8 chars each for viewing

Does DCC do anything special to attachments?  For example, a spam with 3
30K image attachments, are they considered in the checksum?

There are usually several "Received: " lines in an email.  Which does DCC
checksum, or is it all of them?

When it checksum's the headers, does it checksum the "From: " as well as
the address or just the address?  If just the data, how does it deal with
multiple received lines?  Concatenate?

What happens to the fuzzy 1 and 2 checksums?  Are all [a-z]@[a-z].[a-z]
(example) replaced with ####### or something?  Are all proper names, or the
first few lines of a spam, replaced with ### or something standard to
remove personalization?


Peter Beckman            Systems Engineer, Fairfax Cable Access Corporation
beckman@purplecow.com                             http://www.purplecow.com/

More information about the DCC mailing list

Contact vjs@rhyolite.com by mail or use the form.