Gary Mills
mills@cc.UManitoba.CA
Fri Jan 30 01:35:27 UTC 2004
On Thu, Jan 29, 2004 at 12:03:30PM -0700, Vernon Schryver wrote:
> > From: Gary Mills <mills@cc.UManitoba.CA>
>
> > We had another incident this morning. I have log messages from that
> > time, but `lsof' output from about two hours later, when dccm had
> > recovered. ...
>
> Did dccm recover spontaneously or was it restarted?
It recovered, but after I ran `lsof' on it, I restarted it just
to be sure it was healthy.
> > Of the sockets, there were 2846 TCP and 801 UDP. 799 of the sockets
> > were in the Wait_Data_Xfr state. Of the TCP connections, 2671 were
> > idle, like this:
> >
> > COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
> > dccm 9546 daemon 3910u IPv4 0x30002868bd0 0t0 TCP electra.cc.umanitoba.ca:*->naos.cc.umanitoba.ca:* (IDLE)
>
> 2671 TCP sockets looks more than a little strange.
> What does that `lsof` line mean? What are the '*' characters? Do they
> mean the socket is bound to port 0 at both ends? Or does that line
> mean the socket is not complete, perhaps because accept() has not been done?
I'm not sure. If `lsof' uses the `netstat' definitions, it means:
IDLE Idle, opened but not bound.
I just checked the currently running `dccm'. It had 68 TCP connections,
none of which were idle. The highest file descriptor was 2298.
> Are you running sendmail on one system and dccm on another?
> Could there be a TCP socket leak in the milter library?
Actually, sendmail on two hosts, and dccm on one of them.
> Are you linking dccm with the libmilter generated by the fairly recent
> sendmail you seem to be using?
Yes, although it may be a version or two older.
Here are my sendmail statistics from that day:
From To From To
Date Load Average Local Local ESMTP ESMTP
Jan 28 03:16 1.21,1.32,1.47 547 544 5150 291
Jan 28 04:16 1.90,2.20,1.93 524 536 5193 592
Jan 28 05:16 1.72,1.84,1.90 501 554 4564 310
Jan 28 06:16 1.07,1.09,1.22 85 78 936 53
Jan 28 07:16 1.24,1.50,1.73 1026 920 5680 548
Jan 28 08:16 3.72,2.53,2.34 2014 1569 6114 2162
Jan 28 09:16 4.27,4.17,3.33 1805 1847 7221 1969
`dccm' ran out of file descriptors between 05:30 and 06:30, which
corresponds to that hour of very low e-mail activity. It may have
been the result an I/O overload that began earlier in the evening.
I've since eliminated that problem. All those idle sockets could
be the result of something that only happens under overload conditions.
--
-Gary Mills- -Unix Support- -U of M Academic Computing and Networking-
More information about the DCC
mailing list