HTML vs. bulk

Brandon Long
Fri Jan 3 23:59:30 UTC 2003

On 01/03/03 Vernon Schryver uttered the following other thing:
> All of that is interesting.  Thanks.
> I'm thinking about defining a FUZ3 checksum that would ignore text
> bounded by <html>...</html> and otherwise be similar to the FUZ2
> checksum.  When what remains of the message is too little to generate
> a checksum, then as with the other fuzzy checksums, no checksum would
> be reported to the DCC server.  However, like some of the SMTP header
> and envelope checksums, a constant checksum for the null string would
> be generated for local blacklisting (or even white-listing).
> This would allow DCC clients (e.g. entire enterprises) or individual
> users at enterprises using per-user whitelists (e.g. with dccproc
> or the dccm per-user whitelists) to blacklist all messages without
> enough plaintext to generate a FUZ3 checksum.
> What do you think?

It seems like a single very specific case, more in line with a
spamassasin style heurestic test than one that will actually catch bulk
mail.  Have you looked at many multipart/mixed messages, will this catch
more real bulk mail than the current fuz2?

I could see how a fuzzy checksum that ignored html tags could be useful,
since it could ignore user specific identifications in links in HTML
mail, which might be harder to ignore in plain text mail.  Do you have
reason to believe that the checksum will work better on just the plain
text part of a multipart/mixed message?

 "... a boy who gets a 'C-minus' in Appreciation of Televison 
      can't be all bad." -- Robert Heinlein,  _Starship Troopers_ 

More information about the DCC mailing list

Contact by mail or use the form.