Maximizing effectiveness against 'empty' spam?

Robert P.Thille
Fri Jun 18 18:48:04 UTC 2004

On Jun 16, 2004, at 10:52 PM, Clive Cleland wrote:

> Perhaps you could recognise the missing body in the application that 
> makes
> the dccifd calls?  The message passes through that application anyway, 
> so it
> should be fairly straightforward to observe a missing body, and act
> accordingly.

Well, if we wanted to throw away (or otherwise consistently treat) all 
empty bodied messages I could do something about them in the 
'connector' software I'm writing.  But unless I reinvent DCC, or 
otherwise maintain a database of messages I've seen I wouldn't have a 
way to judge one empty message from another.

I think the approach I'm going to suggest for handling these empty 
messages is to add a threshold for the Message-ID header.  That will at 
least catch the truly identical (and rfc compliant ;-) messages.

And On Jun 16, 2004, at 10:10 PM, Vernon Schryver wrote:

>> From: Robert Thille <>
>> However, with the DCCIFD_REJECT_AT set at '10', sending 11 identical
>> messages with empty bodies doesn't get any messages rejected.
> All three DCC body checksums require some minimal bits on which
> to compute their sums.
Right.  I wasn't sure that Fuz1 and Fuz2 were just body checksums, or 
over the whole message (body and header).

>> I'm guessing that the problem is the '-t CMN' part, since the man page
>> lists CMN as 'Body, Fuz1, Fuz2'.  Do Fuz1 and Fuz2 cover the headers,
>> or are they just 'fuzzy' checksums of the body?
> The Fuz1, Fuz2, and Body checksums are only of the SMTP body after
> the blank line separating the headers from the rest.

Useful information.

> The best way to figure out what is going on is to turn on logging by
> setting DCCM_LOG_AT or DCCIFD_LOG_AT to 0 in dcc_conf or turning on
> "option log-all" in your main or a per-user whiteclnt file.  The
> resulting log files will have the checksums and their counts.

I've got LOG_AT set to 1, and am getting log files which are helpful 
(though not so much in the case where they are all zero :-)

> The idea of the DCC is to count targets of substantially identical
> messages.  "Substantially identical" gets boring near empty and
> nearly empty messages such as "thanks" and "test".

Well, you can have 'empty' messages which are SPAM and are exactly, or 
very nearly identical: spammer creates a single message, puts a URL in 
the subject,  and does a rcpt-to to many users at many systems around 
the net.  By the time DCC would see it, it would have a new 'Received' 
header, but would otherwise be identical.  While not a large portion of 
SPAM I see, I can see those messages becoming 'interesting' if DCC 
widely deployed and does not address them.

Also, I think the other engineers and manager types may see mail (spam) 
which appears empty, because they have HTML rendering disabled, or 
remote-image loading turned off, and so overestimate the amount of spam 
they receive with empty bodies.

> I'll spare you the rant trigged by such familiar and destructive "QA"
> noise.  Why are so many QA groups rewarded only for shouting "BUG" and
> never penalized for shoddy work?
Well, in my case, I would defend our QA guy.  He's just now been tasked 
with testing this and isn't fully familiar with DCC. My description of 
'as configured it should reject the 11th+ of a series of identical 
messages' should have been amended to include 'non-empty'.

> Why do most test suites contain
> orders of magnitude more and worse bugs than the code they purport to
> test?
I think in most cases it's due to lack of resources, or the quality of 
the resources allocated to test vs. development.  But these questions 
are rhetorical aren't they? :-)

>   Why do the pointy haired hire so many hopeless political hacks
> as test czars? ... but I'm trying to spare you.
To keep them company?  To have someone they can relate to?  Oops, there 
I go again, answering rhetorical questions... :-)

BTW, I noticed that my first mail to the list got graylisted by the email server.  Is that something that happens to everyone,
or is there some list I (or my IPs) have gotten onto that I don't want
to be on?



Robert Thille                7575 Meadowlark Dr.; Sebastopol, CA 95472
Home: 707.824.9753    Office/VOIP: 707.780.1560     Cell: 707.217.7544   YIM:rthille
Cyclist, Mountain Biker, Freediver, Kayaker, Rock Climber, Hiker, Geek
May your spirit dive deep the blue, where the fish are many and large!

More information about the DCC mailing list

Contact by mail or use the form.