HTML vs. bulk

Daniel V Klein
Sat Jan 4 18:17:02 UTC 2003

> > I'm thinking about defining a FUZ3 checksum that would ignore text
> > bounded by <html>...</html> and otherwise be similar to the FUZ2
> > checksum.

Were it me, I'd do one of these two:

1) Parse out the text/html part, and rather than ignore the HTML, simply
extend the definition of "whitespace" to be anything in <angles>

2) Ignore the last 1 or 2 lines of a message - spammers seem to be putting
an unique coda on their messages of a line of two of garbage.


