Peter Beckman
beckman@purplecow.com
Thu Nov 7 06:27:57 UTC 2002
I am thinking about writing some software to document and track spam using
DCC checksums. I want to keep a copy of the body of the spam, and use the
checksum's from DCC to identify if another one is the same or different.
How does DCC generate the checksums? MD5 it looks like:
checksum server
From: e7cb78fc 719aa0eb 13f4eb0b 5d2283e6
Message-ID: ff753cbf c635525c d820995e 0e45a308
Received: 8b3e3f21 16850068 f10f3a1f 996f00d5
Body: c8533bd5 b9bea813 98000b63 c5a08c8f many
Fuz1: ff846207 d59e42d4 492fb9d5 61165f50 many
Fuz2: 9b2f514d ad3c6167 14f3ccc4 ee2abb78 many
I assume it is just an MD5 hash on each of these pieces of information.
The checksum is just split into 4 groups of 8 chars each for viewing
pleasure.
Does DCC do anything special to attachments? For example, a spam with 3
30K image attachments, are they considered in the checksum?
There are usually several "Received: " lines in an email. Which does DCC
checksum, or is it all of them?
When it checksum's the headers, does it checksum the "From: " as well as
the address or just the address? If just the data, how does it deal with
multiple received lines? Concatenate?
What happens to the fuzzy 1 and 2 checksums? Are all [a-z]@[a-z].[a-z]
(example) replaced with ####### or something? Are all proper names, or the
first few lines of a spam, replaced with ### or something standard to
remove personalization?
Thanks!
Peter
---------------------------------------------------------------------------
Peter Beckman Systems Engineer, Fairfax Cable Access Corporation
beckman@purplecow.com http://www.purplecow.com/
---------------------------------------------------------------------------
More information about the DCC
mailing list