Michael Grant
mg-dcc1@grant.org
Sun Mar 17 07:52:10 UTC 2002
I'm new to this list. I must admit that I've had the idea of using fuzzy checksums to spot spam for years. Recently, I started working on something to do this, then a couple days ago, a friend pointed me at the dcc project. Oh well, it figures, someone had to have had the same idea! I have made some interesting headway on my own fuzzy functions. I had a brief look at the fuz1 and fuz2 in the source. fuz1 seems to be based around md5. I was never able to get enough fuzz out of using md5 myself, even doing md5 sums per line and such. What I found that worked surprisingly well was simply to take the root-mean-squares of the space separated words on each line converted to numbers in messages. I'm happy to share the code. Should I post it here or what? I also ran some tests to see how many false positives I would catch based on my old email. For me, it was about 1 in 150,000 and I have to say that the 1 message did resemble quite a bit one of the spams in my spam file. Michael Grant
More information about the DCC
mailing list