SpamAssassin/DCC integration

Vernon Schryver vjs@calcite.rhyolite.com
Wed May 1 22:03:52 UTC 2002


> From: Brandon Long <blong@fiction.net>

> On 05/01/02 Craig R Hughes uttered the following other thing:
> > Hearing good reports from cutting edge SpamAssassin users who've installed
> > recent CVS builds and activated the new DCC/SA integration stuff.
> > 
> > And already we're seeing the first person who's concerned with DCC performance,
> > and wants to run their own local dccd.  So here's my question: how easy would it
> > be to re-implement the DCC client side in perl (haven't looked at the DCC source
> > yet), so that SpamAssassin doesn't have to fork a dcc client process for each
> > message it processes, and doesn't have to create a copy of the message text to
> > pipe into the dcc process, etc.  I'd be neat (and probably a heck of a lot
> > faster) to do the DCC client side stuff in perl, and just call it directly from
> > the main SA code instead of forking and piping.  Anyone looked at doing this
> > before?

The performace problem in that particular case cannot be related to
forking or anything else that might be changed on the client side,
because the volume of mail involve is not large.  I also doubt that
running a DCC server locally would help this particular client speed
problem.  If it is painful to do the equivalent of at most 2 DNS
transactions per message, then shoveling 6-10 MByte/day of the full
spam database over the wire is likely to be too painful to imagine.

If done more often than once an hour, the DCC client transaction is one
UDP round trip or about the same as one DNS transaction where the right
DNS server is already known.  An SMTP server typically does at least 2
DNS transactions, one for the reverse DNS lookup of the SMTP client IP
address, a second for the forward DNS lookup of the reverse name, a third
for the domain name in the Mail_From command, and additional DNS lookups
for each DNS blacklist.  Each DNS lookup involves at least one UDP round
trip, and up to 3 or even more UDP round trips if the system must ask a
root DNS server, a TLD server (e.g. .com), and then server for example.com.
If done less often than once an hour, the DCC client transaction is one
extra DCC round trip plus some fuzz to measure the RTT to other servers.
`cdcc info` will tell you how slow the DCC transaction is.
I guess `time nslookup` or `time dig` would approximate the speed of DNS.


Rewriting the code from C to Perl would certainly not be my first thought
for improving CPU or disk speed.  I also bet forking and exec'ing a C
program is probably a lot faster than forking and exec'ing the Perl (or
any) interpretor to run a Perl version of the DCC client code.
 
To directly answer the question, porting the DCC client code to Perl
would be non-trivial.
 

> What would probably be better would be to just make a "library" version
> of the dcc client... and then you can wrap that library for whichever
> scripting language you might want.
>
> Actually, dcc is already mostly a library... it would just involve some
> documentation of the library API...  

The big cost is not the documenting but the freezing of the interface.
For example, I've had to whack at the whitelist library code to
accomodate the fancier locking needed for per-user dccm whitelists.
(Each whitelist can be used by multiple processes, and each process
can involve threads.  The hash file for each whitelist must be maintained
automatically, and without stalling more threads or processes than
absolutely necessary....in other words, maybe there are good reasons
why the sendmail automatic /etc/mail/alias updating is being deprecated.

>                                      remember that dcc actually does all
> of the work on the client side, the server only gets and receives
> checksum information, which means a version written in perl or whatever
> would have to re-write each of the checksum routines, etc.

The checksum routines would be a pain.  Another pain would be all of
the code that maintains the shared map file of DCC server IP addresses,
IDs, passwords, and round trip times.


Vernon Schryver    vjs@rhyolite.com



More information about the DCC mailing list

Contact vjs@rhyolite.com by mail or use the form.