darkmark darkmark@filament.org
Fri Apr 18 15:09:20 UTC 2003

Hello all,

I'm noticing a good deal of spam starting to make it's way by dcc, when it
used to correctly detect the bulkness of close to 95% of all spam sent to
my domain.  Eventually the tag-ignore code helped out somewhat.  But now
spammers are simply content to send a spam filled with random character
strings and one or two real links or images.  There could be two solutions
to this:

- change or expand the tag-ignore code, to tag parsing code and just rely
on correctly formatted references, like "img src", "a href" or "form" tags
for computing the checksum.

- add some sort of dictionary interface to discard non-words, or algorithm
that recognizes random strings vs real words and keeps pronouncable
words for checksumming.

I don't like the second idea since it would be easy just to change the
spam software to send out real random dictionary words.  The first idea
holds weight since they must send you to a real site or use real
web-presentable images.

All the best,

Mark Atkinson

