DCC -- how do I effectively use it?

Mediratta, Bharat bharat@fusionone.com
Sun, 2 Sep 2001 16:38:47 -0700


This message is in MIME format. Since your mail reader does not understand
this format, some or all of this message may not be legible.

------_=_NextPart_001_01C13408.6962CD30
Content-Type: text/plain;
	charset="iso-8859-1"


Howdy.  I'm working on a personal anti-spam project that I'd like
to eventually distribute freely (probably under GPL).  It is a 
thin IMAP client that can monitor a mailbox, detect any new
spam and move it to a separate spam mailbox.  It currently uses
DCC as the arbiter for spamminess.  The code is operational and
probably worthy of being shipped as a beta.  However, it's not as 
effective as I would have hoped.

My problem is that I'm not getting very many positive hits from 
DCC.  I know that I'm connected to DCC properly because it does
identify certain spam messages correctly, but unfortunately it
misses a large percentage of them.  I ran it against a folder 
containing spam detected with spambouncer and other tools and
in some (admittedly) small trials it had about a 25% hit rate.

Perhaps I'm using DCC incorrectly?  Since I'm in development, I've 
been using dcc.rhyolite.com in anonymous mode.  I hope that I'm not 
imposing too much of a load there.  My script calls dccproc, passes 
in the message and parses the results.  Most of my results indicate
that DCC has never seen the message before (ie, I get counts of
1 for all of the metrics).

Any ideas?  I've been using DCC for about 3 hours now so any/all
suggestions are welcome.  If you're interested in my script, I
can make it available.

-Bharat

------_=_NextPart_001_01C13408.6962CD30
Content-Type: text/html;
	charset="iso-8859-1"

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML>
<HEAD>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">
<META NAME="Generator" CONTENT="MS Exchange Server version 5.5.2653.12">
<TITLE>DCC -- how do I effectively use it?</TITLE>
</HEAD>
<BODY>
<BR>

<P><FONT SIZE=2>Howdy.&nbsp; I'm working on a personal anti-spam project that I'd like</FONT>
<BR><FONT SIZE=2>to eventually distribute freely (probably under GPL).&nbsp; It is a </FONT>
<BR><FONT SIZE=2>thin IMAP client that can monitor a mailbox, detect any new</FONT>
<BR><FONT SIZE=2>spam and move it to a separate spam mailbox.&nbsp; It currently uses</FONT>
<BR><FONT SIZE=2>DCC as the arbiter for spamminess.&nbsp; The code is operational and</FONT>
<BR><FONT SIZE=2>probably worthy of being shipped as a beta.&nbsp; However, it's not as </FONT>
<BR><FONT SIZE=2>effective as I would have hoped.</FONT>
</P>

<P><FONT SIZE=2>My problem is that I'm not getting very many positive hits from </FONT>
<BR><FONT SIZE=2>DCC.&nbsp; I know that I'm connected to DCC properly because it does</FONT>
<BR><FONT SIZE=2>identify certain spam messages correctly, but unfortunately it</FONT>
<BR><FONT SIZE=2>misses a large percentage of them.&nbsp; I ran it against a folder </FONT>
<BR><FONT SIZE=2>containing spam detected with spambouncer and other tools and</FONT>
<BR><FONT SIZE=2>in some (admittedly) small trials it had about a 25% hit rate.</FONT>
</P>

<P><FONT SIZE=2>Perhaps I'm using DCC incorrectly?&nbsp; Since I'm in development, I've </FONT>
<BR><FONT SIZE=2>been using dcc.rhyolite.com in anonymous mode.&nbsp; I hope that I'm not </FONT>
<BR><FONT SIZE=2>imposing too much of a load there.&nbsp; My script calls dccproc, passes </FONT>
<BR><FONT SIZE=2>in the message and parses the results.&nbsp; Most of my results indicate</FONT>
<BR><FONT SIZE=2>that DCC has never seen the message before (ie, I get counts of</FONT>
<BR><FONT SIZE=2>1 for all of the metrics).</FONT>
</P>

<P><FONT SIZE=2>Any ideas?&nbsp; I've been using DCC for about 3 hours now so any/all</FONT>
<BR><FONT SIZE=2>suggestions are welcome.&nbsp; If you're interested in my script, I</FONT>
<BR><FONT SIZE=2>can make it available.</FONT>
</P>

<P><FONT SIZE=2>-Bharat</FONT>
</P>

</BODY>
</HTML>
------_=_NextPart_001_01C13408.6962CD30--