DCC 1.2.14 and "temp failing commands"

Vernon Schryver vjs@calcite.rhyolite.com
Wed Oct 22 18:35:40 UTC 2003

> From: Spike Ilacqua <spike@indra.com>

> I upgraded to 1.2.14 over the weekend and now, about twice a day, we
> start getting:
> Oct 21 12:55:09 net sm-mta[18299]: h9LIt8HQ018299: Milter: connect:
> host=[], addr=, temp failing commands
> at which point sendmail starts refusing connections.  Stopping and
> restarting dccm clears the problem and make sendmail happy.  I'm going
> to have to downgrade unless there is a fix.  Any ideas?

I'd be surprised if the problems are related to recent changes because
I can't think of anything relevant that has been changed recently.

By coincidence, I made some closeer observations of such sendmail
complaints yesterday.  It's long been evident that if for some reason
dccm stalls on my system, the sendmail-milter connection can break.
The stalling I've seen has been caused by either the BSD/OS mmap()
flushing bugs or with gdb breakpoints in dccm.  To counter the milter
connection breaking, several versions ago I added a mechanism for dccm
to re-exec() itself if the milter hook returns after dccm has been
up for a while.

Yesterday I used gdb to poke at a broken connection.  Once the
breakage happens, the accept() system call consistent returns
apparently valid sockets but with a null socket name.  The sendmail
milter library requires that the name of the accepted socket be non-null
and have the right address family.  If not, it "accept() returned
invalid socket (Invalid argument), try again"  I think it doesn't actually
use the socket name.

I theorize that if the listen socket's queue ever fills up, the BSD/OS
kernel code stops working to produce the socket name.

A trouble with the dccm re-exec() kludge is that the libmilter hook
does not give up until 16 failures.

Vernon Schryver    vjs@rhyolite.com

More information about the DCC mailing list

Contact vjs@rhyolite.com by mail or use the form.