DCC Checksums: How are they determined?

Vernon Schryver vjs@calcite.rhyolite.com
Thu Nov 7 06:57:22 UTC 2002


> From: Peter Beckman <beckman@purplecow.com>
> Subject: DCC Checksums: How are they determined?

> I am thinking about writing some software to document and track spam using
> DCC checksums.  I want to keep a copy of the body of the spam, and use the
> checksum's from DCC to identify if another one is the same or different.

That is not far from a description of the DCC.


> ...
>                      Fuz2: 9b2f514d ad3c6167 14f3ccc4 ee2abb78    many
>
> I assume it is just an MD5 hash on each of these pieces of information.
> The checksum is just split into 4 groups of 8 chars each for viewing
> pleasure.

Yes, those 4 groups are merely easier to read than 32 consecutive
hex digits.


> Does DCC do anything special to attachments?  For example, a spam with 3
> 30K image attachments, are they considered in the checksum?

I actively discourage discussions of details of how the checksums are
computed beyond what is written in the documentation.  There is no
profit for people who dislike spam in helping spammers who generally
can't read C.  So let's just say attachments are "considered," and
not talk about what "considered" might mean.


> There are usually several "Received: " lines in an email.  Which does DCC
> checksum, or is it all of them?

The main dcc man page says the following where it discusses the
checksums:

]           Received     last Received: header line in the SMTP message

See http://www.rhyolite.com/anti-spam/dcc/dcc-tree/dcc.html#X-DCC-Headers


> When it checksum's the headers, does it checksum the "From: " as well as
> the address or just the address?  If just the data, how does it deal with
> multiple received lines?  Concatenate?

I don't understand that question where it involves From: and Received:
headers.  I also don't understand 'the "From:" as well as the address'.
The header checksums covers the entire From header line, with some minor
exceptions including whitespace and an optional pair of outer <>'s.
Perhaps the question would be answered by trying `dccproc -Q` on some
test messages.

> ...
> What happens to the fuzzy 1 and 2 checksums?  Are all [a-z]@[a-z].[a-z]
> (example) replaced with ####### or something?  Are all proper names, or the
> first few lines of a spam, replaced with ### or something standard to
> remove personalization?

How personalizations are handled is an inappropriate topic for public
discussions.  In fact there are very few people with whom I'll discuss
that stuff in private.

One of the ground rules of the DCC is that new versions of the client
code must be distributed periodically to deal with changes in spam
personalizations.  It's been a year since the last change, but there's
no reason to hurry the next one by giving spammers aid and comfort in
the form of public discussions.


Vernon Schryver    vjs@rhyolite.com



More information about the DCC mailing list

Contact vjs@rhyolite.com by mail or use the form.