HTML vs. bulk

John R Levine johnl@iecc.com
Sat Jan 4 04:10:11 UTC 2003


> I'm thinking about defining a FUZ3 checksum that would ignore text
> bounded by <html>...</html> and otherwise be similar to the FUZ2
> checksum.

If you're going to do something like that, I'd parse enough of the MIME
headers to pick the plain text out of multipart/alternative and checksum
that.  MIME in general is hugely complex, but picking out one known part
is easy to do in a single simple scan.

If you want to goof around with HTML, take out <!--html comments--> to
catch this months trendy hashbuster that sticks strings related to the
victim's address in comments into the HTML.

Regards,
John Levine, johnl@iecc.com, Primary Perpetrator of "The Internet for Dummies",
Information Superhighwayman wanna-be, http://iecc.com/johnl, Sewer Commissioner
"I dropped the toothpaste", said Tom, crestfallenly.




More information about the DCC mailing list

Contact vjs@rhyolite.com by mail or use the form.