Maximizing effectiveness against 'empty' spam?

Vernon Schryver vjs@calcite.rhyolite.com
Fri Jun 18 21:18:24 UTC 2004


> From: "Robert P.Thille" 

> Well, if we wanted to throw away (or otherwise consistently treat) all 
> empty bodied messages I could do something about them in the 
> 'connector' software I'm writing.  But unless I reinvent DCC, or 
> otherwise maintain a database of messages I've seen I wouldn't have a 
> way to judge one empty message from another.

How can a pair of empty messages differ?  More to the point, why
would you want to deliver one and reject or discard the other?

In theory I can see rejecting messages that have the same Subject:
lines.  In practice too many people send empty messages with very
common subjects like "I'm leaving now".


> I think the approach I'm going to suggest for handling these empty 
> messages is to add a threshold for the Message-ID header.  That will at 
> least catch the truly identical (and rfc compliant ;-) messages.

I don't understand that:
  - some MTAs retransmit single messages with new Message-IDs.
  - many messages (usually bulk and especially spam) do not have
      Message-IDs.
  - in practice assumptions about the global uniqueness of Message-IDs
      are hard to defend.
  - why would identical or dissimilar Message-IDs make messages spam or not?


> > The idea of the DCC is to count targets of substantially identical
> > messages.  "Substantially identical" gets boring near empty and
> > nearly empty messages such as "thanks" and "test".
>
> Well, you can have 'empty' messages which are SPAM and are exactly, or 
> very nearly identical: spammer creates a single message, puts a URL in 
> the subject,  and does a rcpt-to to many users at many systems around 
> the net. 

I have not seen many of those, and I think I look at a lot of spam.


>           By the time DCC would see it, it would have a new 'Received' 
> SPAM I see, I can see those messages becoming 'interesting' if DCC 
> widely deployed and does not address them.

Not unless the default of popular MTAs or MUAs are changed (back)
to answering return-receipts (assuming you meant "return receipt"
instead of "rcpt-to") or (2) or popular MUAs make URLs in Subject:
lines "clickable."

It is possible to count messages with identical Subject: lines with
the DCC.  The checksum now called the "substitute" header checksums
can be substituted for that.  It was originally used only for Subject
lines.  That it is no longer stored by default by DCC servers and is
no longer called the Subject checksum should hint about the apparent
usefulness of Subject line checksums.

No matter what you are designing, there are tradeoffs.  Perfection is
undesirable.  No one would want a bridge with an infinite load limit.
Empty mail messages are unlikely to be significant spam problems because
Subject lines are too small for effective advertising text.  Even where
URLs in Subject: lines are clickable, people will quickly learn to
ignore them because they would so often be spam.  There are far more
impotant limitations in the idea of the DCC than empty messages.


> Also, I think the other engineers and manager types may see mail (spam) 
> which appears empty, because they have HTML rendering disabled, or 
> remote-image loading turned off, and so overestimate the amount of spam 
> they receive with empty bodies.

Such messages are not empty as far as the DCC checksums are concerned.



> BTW, I noticed that my first mail to the list got graylisted by the
> rhyolite.com email server.  Is that something that happens to everyone,
> or is there some list I (or my IPs) have gotten onto that I don't want
> to be on?

Greylisting is extremely effective against the current worst spam.  See
http://www.google.com/search?q=greylist and 
http://www.rhyolite.com/anti-spam/dcc/greylist.html


Vernon Schryver    vjs@rhyolite.com



More information about the DCC mailing list

Contact vjs@rhyolite.com by mail or use the form.