dccm running out of file descriptors

Gary Mills mills@cc.UManitoba.CA
Fri Jan 30 01:35:27 UTC 2004


On Thu, Jan 29, 2004 at 12:03:30PM -0700, Vernon Schryver wrote:
> > From: Gary Mills <mills@cc.UManitoba.CA>
> 
> > We had another incident this morning.  I have log messages from that
> > time, but `lsof' output from about two hours later, when dccm had
> > recovered. ...
> 
> Did dccm recover spontaneously or was it restarted?

It recovered, but after I ran `lsof' on it, I restarted it just
to be sure it was healthy.

> > Of the sockets, there were 2846 TCP and 801 UDP.  799 of the sockets
> > were in the Wait_Data_Xfr state.  Of the TCP connections, 2671 were
> > idle, like this:
> >
> > COMMAND  PID   USER   FD   TYPE        DEVICE SIZE/OFF    NODE NAME
> > dccm    9546 daemon 3910u  IPv4 0x30002868bd0      0t0     TCP electra.cc.umanitoba.ca:*->naos.cc.umanitoba.ca:* (IDLE)
> 
> 2671 TCP sockets looks more than a little strange.
> What does that `lsof` line mean?  What are the '*' characters?  Do they
> mean the socket is bound to port 0 at both ends?  Or does that line
> mean the socket is not complete, perhaps because accept() has not been done?

I'm not sure.  If `lsof' uses the `netstat' definitions, it means:

     IDLE  Idle, opened but not bound.

I just checked the currently running `dccm'.  It had 68 TCP connections,
none of which were idle.  The highest file descriptor was 2298.

> Are you running sendmail on one system and dccm on another?
> Could there be a TCP socket leak in the milter library?

Actually, sendmail on two hosts, and dccm on one of them.

> Are you linking dccm with the libmilter generated by the fairly recent
> sendmail you seem to be using?

Yes, although it may be a version or two older.

Here are my sendmail statistics from that day:

                                      From     To   From     To
Date             Load Average        Local  Local  ESMTP  ESMTP
Jan 28 03:16     1.21,1.32,1.47        547    544   5150    291
Jan 28 04:16     1.90,2.20,1.93        524    536   5193    592
Jan 28 05:16     1.72,1.84,1.90        501    554   4564    310
Jan 28 06:16     1.07,1.09,1.22         85     78    936     53
Jan 28 07:16     1.24,1.50,1.73       1026    920   5680    548
Jan 28 08:16     3.72,2.53,2.34       2014   1569   6114   2162
Jan 28 09:16     4.27,4.17,3.33       1805   1847   7221   1969

`dccm' ran out of file descriptors between 05:30 and 06:30, which
corresponds to that hour of very low e-mail activity.  It may have
been the result an I/O overload that began earlier in the evening.
I've since eliminated that problem.  All those idle sockets could
be the result of something that only happens under overload conditions.

-- 
-Gary Mills-    -Unix Support-    -U of M Academic Computing and Networking-



More information about the DCC mailing list

Contact vjs@rhyolite.com by mail or use the form.