dccm timeout lets some spam through

Gary Mills mills@cc.UManitoba.CA
Tue Apr 22 21:36:47 UTC 2003

On Tue, Apr 22, 2003 at 09:54:42AM -0600, Vernon Schryver wrote:
> > From: Gary Mills <mills@cc.UManitoba.CA>
> > Apr 22 02:46:38 electra dccm[401]: [ID 125918 mail.error] DCC: accept() returned invalid socket (Result too large), try again
> > Apr 22 02:46:39 electra dccm[401]: [ID 125918 mail.error] DCC: accept() returned invalid socket (Result too large), try again
> > Apr 22 02:46:40 electra dccm[401]: [ID 702911 mail.error] no answer from naos.cc.umanitoba.ca (,6277) after 0 ms
> > Apr 22 02:46:40 electra dccm[401]: [ID 702911 mail.error] skip asking DCC 1.000 seconds more after failure
> > The result was that `dccm' would time out attempting to contact `dccd'.
> > Here's an example from a DCC log file several hours after the beginning
> > of the incident:
> >
> >   skip asking DCC 160.704 seconds more after failure
> >   ...
> >   result: accept
> That confounds two different symptoms.   The "skip asking" and "no answer"
> messages concerns dccm's failure to hear from dccd.  The "invalid socket"
> complaint is from the sendmail libmilter code.  The two problems might
> have a common cause, but they are superficially independent.

That sounds about right.  It's curious why dccm could not hear from
dccd when two of them were running.  One was busy, but the other should
have responded.  Something else was wrong there.

> What ended the problem, restarting dccm?

Yes, although I restarted both.

> I assume (based in part on `grep -i 'too large' /usr/include/sys/errno.h`
> on a Solaris systm) that "Result too large" means that libmilter.a
> was told EOVERFLOW by accept().  However, I cannot find any clue in
> `man -s 3xnet accept` or `man -s 3socket socket` why Solaris would 
> whine about overflow when doing an accept.

Yes, that's quite peculiar.  I've always been able to find the exact
error message in errno.h.  There have been cases where system functions
returned more error codes than were documented, however.  In this case,
I really can't guess what the actual error code was.

> My best but wild guess is that the dccm process ran out of file
> descriptors or something similar.  Some hints in /usr/include/sys/select.h
> on a Solaris system may sometimes be limited to 1024.  It might be
> worthwhile to limit the number of concurrent messages handled by
> sendmail to 512.

Yes, that's possible.  Yes, select() on Solaris is always limited to
FD_SETSIZE file descriptors.  That is the length of the bit string
used by select().  FD_SETSIZE always equals 1024, and has no other
purpose.  poll() has no such limit.  I'll look into setting that limit
on sendmail, although I'd prefer not to set arbitrary limits unless
they aid performance.

-Gary Mills-    -Unix Support-    -U of M Academic Computing and Networking-

More information about the DCC mailing list

Contact vjs@rhyolite.com by mail or use the form.