John R Levine
johnl@iecc.com
3 Jan 2003 23:10:11 -0500
> I'm thinking about defining a FUZ3 checksum that would ignore text > bounded by <html>...</html> and otherwise be similar to the FUZ2 > checksum. If you're going to do something like that, I'd parse enough of the MIME headers to pick the plain text out of multipart/alternative and checksum that. MIME in general is hugely complex, but picking out one known part is easy to do in a single simple scan. If you want to goof around with HTML, take out <!--html comments--> to catch this months trendy hashbuster that sticks strings related to the victim's address in comments into the HTML. Regards, John Levine, johnl@iecc.com, Primary Perpetrator of "The Internet for Dummies", Information Superhighwayman wanna-be, http://iecc.com/johnl, Sewer Commissioner "I dropped the toothpaste", said Tom, crestfallenly.