DCC -- how do I effectively use it?

Mediratta, Bharat bharat@fusionone.com
Mon, 3 Sep 2001 02:15:09 -0700


This message is in MIME format. Since your mail reader does not understand
this format, some or all of this message may not be legible.

------_=_NextPart_001_01C13458.EE0F6FC0
Content-Type: text/plain;
	charset="iso-8859-1"

> From: Vernon Schryver [mailto:vjs@calcite.rhyolite.com]
>
> However, you'd be better served with your own DCC server exchanging
> "floods" of checksums with other DCC server servers.  Besides being
> more robust, faster, and using even less bandwidth, with your 
> own server you could look at your copy of the database of checksums 
> with dblist.

I'm definitely heading in that direction.  In fact I'll contact
you privately to get an id and password.

> Other people with access to the same checksums have seem to have
> had better luck.  However, I think 25% is nothing to sneeze at.

Absolutely.  And it will only get better with time.  I just wanted
to make sure that I wasn't pointing at the wrong database.

>   - bugs in the IMAP client code might be changing the messages so
>    that their checksums don't match.  

Entirely possible.  I'm using Net::IMAP on top of cclient-0106191041
on FreeBSD 4.3.  My code assembles the message by combining the
raw rfc822.header and rfc822.text values and passes it to dccproc.

>   - I'm still fighting hassles with quoted-printable and making
>    dccproc get the same checksums as dccm.  One often sees messages
>    converted from convereted from quoted-printable and with CRLF
>    converted to CR while the other doesn't.

If I can help track this down, let me know.  
 
>   - as part of those hassles, I've changed the fuz1 checksum in
>    version 1.0.28 to not ignore the last line.  Until everyone starts
>    using that code, the effectiveness of the fuz1 checksum 
> will be reduced.

Where can I get 1.0.28?

>   - the spammers who like you differ from those who like DCC users
> 
>   - your name is early in the typical spammer's somewhat alphabetical
>    lists 
> 
>   - you are rejecting only on "many" instead of a threshold approprate
>    for the number of your local users.  (Yes, that wouldn't apply to
>    checksums with counts of 1.)

Right now my simplistic algorithm says that it's maybe spam if any of
Message-ID, Received, Body or Fuz1 are greater than 10.  Definitely 
spam if it's greater than 50 (or "many").  But yeah, mostly the problem
is that the messages haven't been seen before.

> ] Will you also support a mode of operation where the MTA has already
> ] "dcc"ed the message and put it's (DCC's) header in the 
> message?  i.e.
> ] simply parse the IMAP INBOX for messages with existing DCC headers
> ] with values of n>1 where n is some configurable values (rather than
> ] using dccproc on the messages)?

I figure that if the MTA has dcc'd the message (or spambounced it or
used some other spam detection code), the mail client/server can do 
filtering as appropriate.  My script is purely to glue DCC together 
with a system that has no inherent spam detection.

By the way, y'all rock.  It's nice to work with professionals.

-Bharat

------_=_NextPart_001_01C13458.EE0F6FC0
Content-Type: text/html;
	charset="iso-8859-1"

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML>
<HEAD>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">
<META NAME="Generator" CONTENT="MS Exchange Server version 5.5.2653.12">
<TITLE>RE: DCC -- how do I effectively use it?</TITLE>
</HEAD>
<BODY>

<P><FONT SIZE=2>&gt; From: Vernon Schryver [<A HREF="mailto:vjs@calcite.rhyolite.com">mailto:vjs@calcite.rhyolite.com</A>]</FONT>
<BR><FONT SIZE=2>&gt;</FONT>
<BR><FONT SIZE=2>&gt; However, you'd be better served with your own DCC server exchanging</FONT>
<BR><FONT SIZE=2>&gt; &quot;floods&quot; of checksums with other DCC server servers.&nbsp; Besides being</FONT>
<BR><FONT SIZE=2>&gt; more robust, faster, and using even less bandwidth, with your </FONT>
<BR><FONT SIZE=2>&gt; own server you could look at your copy of the database of checksums </FONT>
<BR><FONT SIZE=2>&gt; with dblist.</FONT>
</P>

<P><FONT SIZE=2>I'm definitely heading in that direction.&nbsp; In fact I'll contact</FONT>
<BR><FONT SIZE=2>you privately to get an id and password.</FONT>
</P>

<P><FONT SIZE=2>&gt; Other people with access to the same checksums have seem to have</FONT>
<BR><FONT SIZE=2>&gt; had better luck.&nbsp; However, I think 25% is nothing to sneeze at.</FONT>
</P>

<P><FONT SIZE=2>Absolutely.&nbsp; And it will only get better with time.&nbsp; I just wanted</FONT>
<BR><FONT SIZE=2>to make sure that I wasn't pointing at the wrong database.</FONT>
</P>

<P><FONT SIZE=2>&gt;&nbsp;&nbsp; - bugs in the IMAP client code might be changing the messages so</FONT>
<BR><FONT SIZE=2>&gt;&nbsp;&nbsp;&nbsp; that their checksums don't match.&nbsp; </FONT>
</P>

<P><FONT SIZE=2>Entirely possible.&nbsp; I'm using Net::IMAP on top of cclient-0106191041</FONT>
<BR><FONT SIZE=2>on FreeBSD 4.3.&nbsp; My code assembles the message by combining the</FONT>
<BR><FONT SIZE=2>raw rfc822.header and rfc822.text values and passes it to dccproc.</FONT>
</P>

<P><FONT SIZE=2>&gt;&nbsp;&nbsp; - I'm still fighting hassles with quoted-printable and making</FONT>
<BR><FONT SIZE=2>&gt;&nbsp;&nbsp;&nbsp; dccproc get the same checksums as dccm.&nbsp; One often sees messages</FONT>
<BR><FONT SIZE=2>&gt;&nbsp;&nbsp;&nbsp; converted from convereted from quoted-printable and with CRLF</FONT>
<BR><FONT SIZE=2>&gt;&nbsp;&nbsp;&nbsp; converted to CR while the other doesn't.</FONT>
</P>

<P><FONT SIZE=2>If I can help track this down, let me know.&nbsp; </FONT>
<BR><FONT SIZE=2>&nbsp;</FONT>
<BR><FONT SIZE=2>&gt;&nbsp;&nbsp; - as part of those hassles, I've changed the fuz1 checksum in</FONT>
<BR><FONT SIZE=2>&gt;&nbsp;&nbsp;&nbsp; version 1.0.28 to not ignore the last line.&nbsp; Until everyone starts</FONT>
<BR><FONT SIZE=2>&gt;&nbsp;&nbsp;&nbsp; using that code, the effectiveness of the fuz1 checksum </FONT>
<BR><FONT SIZE=2>&gt; will be reduced.</FONT>
</P>

<P><FONT SIZE=2>Where can I get 1.0.28?</FONT>
</P>

<P><FONT SIZE=2>&gt;&nbsp;&nbsp; - the spammers who like you differ from those who like DCC users</FONT>
<BR><FONT SIZE=2>&gt; </FONT>
<BR><FONT SIZE=2>&gt;&nbsp;&nbsp; - your name is early in the typical spammer's somewhat alphabetical</FONT>
<BR><FONT SIZE=2>&gt;&nbsp;&nbsp;&nbsp; lists </FONT>
<BR><FONT SIZE=2>&gt; </FONT>
<BR><FONT SIZE=2>&gt;&nbsp;&nbsp; - you are rejecting only on &quot;many&quot; instead of a threshold approprate</FONT>
<BR><FONT SIZE=2>&gt;&nbsp;&nbsp;&nbsp; for the number of your local users.&nbsp; (Yes, that wouldn't apply to</FONT>
<BR><FONT SIZE=2>&gt;&nbsp;&nbsp;&nbsp; checksums with counts of 1.)</FONT>
</P>

<P><FONT SIZE=2>Right now my simplistic algorithm says that it's maybe spam if any of</FONT>
<BR><FONT SIZE=2>Message-ID, Received, Body or Fuz1 are greater than 10.&nbsp; Definitely </FONT>
<BR><FONT SIZE=2>spam if it's greater than 50 (or &quot;many&quot;).&nbsp; But yeah, mostly the problem</FONT>
<BR><FONT SIZE=2>is that the messages haven't been seen before.</FONT>
</P>

<P><FONT SIZE=2>&gt; ] Will you also support a mode of operation where the MTA has already</FONT>
<BR><FONT SIZE=2>&gt; ] &quot;dcc&quot;ed the message and put it's (DCC's) header in the </FONT>
<BR><FONT SIZE=2>&gt; message?&nbsp; i.e.</FONT>
<BR><FONT SIZE=2>&gt; ] simply parse the IMAP INBOX for messages with existing DCC headers</FONT>
<BR><FONT SIZE=2>&gt; ] with values of n&gt;1 where n is some configurable values (rather than</FONT>
<BR><FONT SIZE=2>&gt; ] using dccproc on the messages)?</FONT>
</P>

<P><FONT SIZE=2>I figure that if the MTA has dcc'd the message (or spambounced it or</FONT>
<BR><FONT SIZE=2>used some other spam detection code), the mail client/server can do </FONT>
<BR><FONT SIZE=2>filtering as appropriate.&nbsp; My script is purely to glue DCC together </FONT>
<BR><FONT SIZE=2>with a system that has no inherent spam detection.</FONT>
</P>

<P><FONT SIZE=2>By the way, y'all rock.&nbsp; It's nice to work with professionals.</FONT>
</P>

<P><FONT SIZE=2>-Bharat</FONT>
</P>

</BODY>
</HTML>
------_=_NextPart_001_01C13458.EE0F6FC0--