Mediratta, Bharat
bharat@fusionone.com
Sun, 2 Sep 2001 16:38:47 -0700
This message is in MIME format. Since your mail reader does not understand this format, some or all of this message may not be legible. ------_=_NextPart_001_01C13408.6962CD30 Content-Type: text/plain; charset="iso-8859-1" Howdy. I'm working on a personal anti-spam project that I'd like to eventually distribute freely (probably under GPL). It is a thin IMAP client that can monitor a mailbox, detect any new spam and move it to a separate spam mailbox. It currently uses DCC as the arbiter for spamminess. The code is operational and probably worthy of being shipped as a beta. However, it's not as effective as I would have hoped. My problem is that I'm not getting very many positive hits from DCC. I know that I'm connected to DCC properly because it does identify certain spam messages correctly, but unfortunately it misses a large percentage of them. I ran it against a folder containing spam detected with spambouncer and other tools and in some (admittedly) small trials it had about a 25% hit rate. Perhaps I'm using DCC incorrectly? Since I'm in development, I've been using dcc.rhyolite.com in anonymous mode. I hope that I'm not imposing too much of a load there. My script calls dccproc, passes in the message and parses the results. Most of my results indicate that DCC has never seen the message before (ie, I get counts of 1 for all of the metrics). Any ideas? I've been using DCC for about 3 hours now so any/all suggestions are welcome. If you're interested in my script, I can make it available. -Bharat ------_=_NextPart_001_01C13408.6962CD30 Content-Type: text/html; charset="iso-8859-1" <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN"> <HTML> <HEAD> <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1"> <META NAME="Generator" CONTENT="MS Exchange Server version 5.5.2653.12"> <TITLE>DCC -- how do I effectively use it?</TITLE> </HEAD> <BODY> <BR> <P><FONT SIZE=2>Howdy. I'm working on a personal anti-spam project that I'd like</FONT> <BR><FONT SIZE=2>to eventually distribute freely (probably under GPL). It is a </FONT> <BR><FONT SIZE=2>thin IMAP client that can monitor a mailbox, detect any new</FONT> <BR><FONT SIZE=2>spam and move it to a separate spam mailbox. It currently uses</FONT> <BR><FONT SIZE=2>DCC as the arbiter for spamminess. The code is operational and</FONT> <BR><FONT SIZE=2>probably worthy of being shipped as a beta. However, it's not as </FONT> <BR><FONT SIZE=2>effective as I would have hoped.</FONT> </P> <P><FONT SIZE=2>My problem is that I'm not getting very many positive hits from </FONT> <BR><FONT SIZE=2>DCC. I know that I'm connected to DCC properly because it does</FONT> <BR><FONT SIZE=2>identify certain spam messages correctly, but unfortunately it</FONT> <BR><FONT SIZE=2>misses a large percentage of them. I ran it against a folder </FONT> <BR><FONT SIZE=2>containing spam detected with spambouncer and other tools and</FONT> <BR><FONT SIZE=2>in some (admittedly) small trials it had about a 25% hit rate.</FONT> </P> <P><FONT SIZE=2>Perhaps I'm using DCC incorrectly? Since I'm in development, I've </FONT> <BR><FONT SIZE=2>been using dcc.rhyolite.com in anonymous mode. I hope that I'm not </FONT> <BR><FONT SIZE=2>imposing too much of a load there. My script calls dccproc, passes </FONT> <BR><FONT SIZE=2>in the message and parses the results. Most of my results indicate</FONT> <BR><FONT SIZE=2>that DCC has never seen the message before (ie, I get counts of</FONT> <BR><FONT SIZE=2>1 for all of the metrics).</FONT> </P> <P><FONT SIZE=2>Any ideas? I've been using DCC for about 3 hours now so any/all</FONT> <BR><FONT SIZE=2>suggestions are welcome. If you're interested in my script, I</FONT> <BR><FONT SIZE=2>can make it available.</FONT> </P> <P><FONT SIZE=2>-Bharat</FONT> </P> </BODY> </HTML> ------_=_NextPart_001_01C13408.6962CD30--