Create checksums based on URL content?

Vernon Schryver
Sat Jan 3 20:42:14 UTC 2004

> From: "John R Levine" <>

> It might be interesting to do bulk counts of the URLs themselves, but I
> agree that fetching the pages is a terrible idea.

Someone else has had a similar response.  I think it would give a new
meaning to "slashdotted."  As soon as 100 people had referred each
other to, the next such message would reported
as spam.  Then there are URLs in trailers added by free providers, and
the "yellow/blue/back ribbon" URLs that people add to their signature
to promote causes.
(Note that URLs do figure in the current DCC fuzzy checksums.)

The DCC is just past the edge of a steep and slipperly slope.  At
the top of the slope is detecting entirely identical copies of bulk
mail.  At the bottom is detecting characteristics at best distantly
related to "unsolicited and bulk" such as the average number of
syllables or naughty words like "remove".

The DCC cannot and should not be expected to do everything.  I figure
the DCC is good primarily against spam from the Fortune 500,000 that
will be the main problem after AOL, Microsoft, and the DMA use the
"You CAN SPAM" act to squash Ralsky &co.

There are good uses for other mechanisms including 
  - IP and domain name blacklists,
  - blacklist naughty URLs or words,
  - greylisting,
  - whitelisting,
  - lawyers.

Concerning lawyers, I wondering if it would be good to add something
like the following to the web pages of the archives of this mailing

  Notice: The operator of this website will not give, sell, or otherwise
  transfer addresses mantained by this website to any other party for
  the purposes of initiating, or enabling others to initiate, electronic
  messsages in the definitions of the CAN-SPAM Act of 2003.

I'd leap onto that work except why would any addresses be mantained
any website except to initiate electronic messages?  This mailing list
would be kind of silly if no mail were sent to any of the subscribed
addresses.  Then there are the contact addresses for the DCC servers.
and the subscribe addresses for the mailing lists themselves.

Vernon Schryver

