Maximizing effectiveness against 'empty' spam?

Robert P.Thille
Fri Jun 18 18:48:04 UTC 2004

On Jun 16, 2004, at 10:52 PM, Clive Cleland wrote:

> Perhaps you could recognise the missing body in the application that 
> makes
> the dccifd calls?  The message passes through that application anyway, 
> so it
> should be fairly straightforward to observe a missing body, and act
> accordingly.

Well, if we wanted to throw away (or otherwise consistently treat) all 
empty bodied messages I could do something about them in the 
'connector' software I'm writing.  But unless I reinvent DCC, or 
otherwise maintain a database of messages I've seen I wouldn't have a 
way to judge one empty message from another.

I think the approach I'm going to suggest for handling these empty 
messages is to add a threshold for the Message-ID header.  That will at 
least catch the truly identical (and rfc compliant ;-) messages.

And On Jun 16, 2004, at 10:10 PM, Vernon Schryver wrote:

>> From: Robert Thille <>
>> However, with the DCCIFD_REJECT_AT set at '10', sending 11 identical
>> messages with empty bodies doesn't get any messages rejected.
> All three DCC body checksums require some minimal bits on which
> to compute their sums.
Right.  I wasn't sure that Fuz1 and Fuz2 were just body checksums, or 
over the whole message (body and header).

>> I'm guessing that the problem is the '-t CMN' part, since the man page
>> lists CMN as 'Body, Fuz1, Fuz2'.  Do Fuz1 and Fuz2 cover the headers,
>> or are they just 'fuzzy' checksums of the body?
> The Fuz1, Fuz2, and Body checksums are only of the SMTP body after
> the blank line separating the headers from the rest.

Useful information.

> The best way to figure out what is going on is to turn on logging by
> setting DCCM_LOG_AT or DCCIFD_LOG_AT to 0 in dcc_conf or turning on
> "option log-all" in your main or a per-user whiteclnt file.  The
> resulting log files will have the checksums and their counts.

I've got LOG_AT set to 1, and am getting log files which are helpful 
(though not so much in the case where they are all zero :-)

> The idea of the DCC is to count targets of substantially identical
> messages.  "Substantially identical" gets boring near empty and
> nearly empty messages such as "thanks" and "test".

Well, you can have 'empty' messages which are SPAM and are exactly, or 
very nearly identical: spammer creates a single message, puts a URL in 
the subject,  and does a rcpt-to to many users at many systems around 
the net.  By the time DCC would see it, it would have a new 'Received' 
header, but would otherwise be identical.  While not a large portion of 
SPAM I see, I can see those messages becoming 'interesting' if DCC 
widely deployed and does not address them.

Also, I think the other engineers and manager types may see mail (spam) 
which appears empty, because they have HTML rendering disabled, or 
remote-image loading turned off, and so overestimate the amount of spam 
they receive with empty bodies.

> I'll spare you the rant trigged by such familiar and destructive "QA"
> noise.  Why are so many QA groups rewarded only for shouting "BUG" and
> never penalized for shoddy work?
Well, in my case, I would defend our QA guy.  He's just now been tasked 
with testing this and isn't fully familiar with DCC. My description of 
'as configured it should reject the 11th+ of a series of identical 
messages' should have been amended to include 'non-empty'.

> Why do most test suites contain
> orders of magnitude more and worse bugs than the code they purport to
> test?
I think in most cases it's due to lack of resources, or the quality of 
the resources allocated to test vs. development.  But these questions 
are rhetorical aren't they? :-)

>   Why do the pointy haired hire so many hopeless political hacks
> as test czars? ... but I'm trying to spare you.
To keep them company?  To have someone they can relate to?  Oops, there 
I go again, answering rhetorical questions... :-)

BTW, I noticed that my first mail to the list got graylisted by the email server.  Is that something that happens to everyone,
or is there some list I (or my IPs) have gotten onto that I don't want
to be on?



