dccm timeout lets some spam through

Vernon Schryver vjs@calcite.rhyolite.com
Thu Apr 24 15:28:28 UTC 2003

> From: Gary Mills <mills@cc.UManitoba.CA>

> ...
> Apr 24 02:46:48 electra dccm[26800]: [ID 702911 mail.error] skip asking DCC 4.000 seconds more after failure
> Apr 24 02:46:49 electra dccm[26800]: [ID 109917 mail.error] DCC, mi_rd_cmd: read returned -1: Connection reset by peer
> ...

Note that mi_rd_cmd() is a function in the sendmail libmilter library.
Those two messages suggest something was seriously wrong.

> Should dccm not have changed over to the other dccd, instead of attempting
> to use the busy one?  Maybe it did, and didn't log that information.

If UNIX file descriptors stop working, then all of dccm's efforts to talk
to DCC servers will fail.  It is only when dccm can get any answer from any
DCC server that it skips asking dccd.  If you prefer to temporarily
reject mail with 4yz SMTP status codes instead of passing it when no
DCC server answers, you can add "-x" to DCCM_ARGS in /var/dcc/dcc_conf.

> ...
> This suggests that the problem is not related to a busy dccd, but is more
> likely a result of overload of dccm.  Now that this has happened again,
> I'm going to see if sendmail can limit the load it imposes on dccm.
> Hmm, I'm just looking at sendmail's libmilter code.  It checks to see if
> the file descriptor is larger than FD_SETSIZE.  If it is, it closes the
> socket and sets the error number to ERANGE.  I also notice that sendmail
> has a symbol _FFR_USE_POLL that tells libmiter to use poll() rather than
> select().  `poll()' doesn't have the restriction to 1024 file descriptors
> that select() has.  Maybe _FFR_USE_POLL is my solution?

The DCC client library used in dccm and elsewhere also uses select(). 
If socket() is returning file desciptors larger than select() can handle,
then very bad things will happen.  The least bad I can think of is that
dccm won't be able to hear dccd.

I guess the long run fix is for me to:
  - add yet more auto-conf and #ifdef stuff to have the DCC client library
      use poll() on Solaris
  - add documentation to suggest the use of _FFR_USE_POLL

Until then, you should probably set dccm to use -j220 if you use
per-user whitelists and -j490 if not.

The trigger for the problem may be that something gets too slow causing
your system to have many simultaneous SMTP transactions.  Or perhaps
someone hits your system with hundreds of simultaneous messages.

I trust you are not running the cron-job on your two DCC servers at the
same time, so that you always have a non-busy DCC server.

Vernon Schryver    vjs@rhyolite.com

More information about the DCC mailing list

Contact vjs@rhyolite.com by mail or use the form.