What can DCC do for image spam?

Vernon Schryver vjs@calcite.rhyolite.com
Thu Apr 26 20:59:37 UTC 2007

> From: Gary Mills <mills@cc.umanitoba.ca>

> I've been reading Ironport's advertizing.  They claim good success
> rates on blocking image spam.  In addition to analysis of the message
> body, they use OCR techniques to extract the text from the image, as
> well as examining the composition of the image for features typical of
> current spam.  Could DCC do anything of this sort for image attachments?

For that last question, perhaps the DCC clients might do something
more for images, but it would be like -Gon (greylisting) and -B
(DNS blacklsits) and not directly related to Distributed Checksum

Concerning the advertising, I think the real question is which or how
many contradictory brands of wishful thinking are you willing to believe?
The two brands relevant here are "CAPCHAs prevent abuse" and "OCR can
decode evil images."

If the good guys can use OCR on spam images can convert them to text
that might be analyzed with keywords (e.g. so called Bayesian filters)
or even DCC body checks, then bad guys can use OCR to bypass CAPCHAs
with automated account sign-ups etc.  Worse, the good guys generally
need to use already heavily loaded computers to decode 100 or 10,000
times as many evil images (one per image spam) than the bad guys need
to decode CAPCHAs on their lightly loaded attack systems.

Examining the composition of the image for features typical of current
spam would involve looking for animation or statistical characterics
of pixels of fuzzed-out text.  That sounds to me like sooner or later
rejecting most images, which sounds rather like treating images like
Microsoft program text and rejecting all of them.  I don't see any harm
in requireing that images be transported with a prototocol other than
SMTP (e.g. HTTP or FTP), but that may say more about me and my continued
use of a pure text mail user agent that cannot handle any MIME at all.

That's not so say that you can't make a system that uses a bunch
of spam filters including image analysis and get good results. 
I am claiming that if you skip the image analysis and stick to
simpler things such as checking the URLs that anchor the images in
DNS blacklists, you probably will get results as good or better.
(Spam that requires spam targets to manually copy URLs from fuzzy
images to a browser sounds like bad idea, which might be why the
images are often only covers for <A HREF> links.)

Vernon Schryver    vjs@rhyolite.com

More information about the DCC mailing list

Contact vjs@rhyolite.com by mail or use the form.