Gary Mills
mills@cc.UManitoba.CA
Wed Jan 28 16:25:39 UTC 2004
On Sun, Jan 11, 2004 at 09:59:00AM -0700, Vernon Schryver wrote: > > From: Gary Mills <mills@cc.UManitoba.CA> > > > I just took a look. It's quiet now. `dccm' was using 109 threads. > > Yes, `lsof' shows sockets. Here are the file descriptors by type: > > I see nothing wrong there. > What does lsof say when things are sick? We had another incident this morning. I have log messages from that time, but `lsof' output from about two hours later, when dccm had recovered. Here's the beginning and end of the log errors: Jan 28 05:29:52 electra dccm[9546]: [ID 125918 mail.error] DCC: accept() returned invalid socket (Too many open files), try again Jan 28 05:29:53 electra dccm[9546]: [ID 125918 mail.error] DCC: accept() returned invalid socket (Too many open files), try again Jan 28 05:29:54 electra dccm[9546]: [ID 925838 mail.error] dcc_mkstemp(/var/dcc/log/028/05/tmp.4iae5i): Too many open files ... Jan 28 06:29:23 electra dccm[9546]: [ID 925838 mail.error] dcc_mkstemp(/usr/local/dcc/whiteclnt): Too many open files Jan 28 06:29:23 electra dccm[9546]: [ID 925838 mail.error] dcc_mkstemp(/usr/local/dcc/whiteclnt): Too many open files Jan 28 06:29:23 electra dccm[9546]: [ID 925838 mail.error] dcc_mkstemp(/usr/local/dcc/whiteclnt): Too many open files `dccm' is running with a context limit of 800 and a file descriptor limit of 5120. Here't the tail of the lsof output, showing the highest file descriptor that was still in use: COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME dccm 9546 daemon 4939u IPv4 0x300042da7a8 0t0 UDP electra.cc.umanitoba.ca:41544 (Wait_Data_Xfr) dccm 9546 daemon 4978u IPv4 0x30011c5ca60 0t0 UDP electra.cc.umanitoba.ca:33901 (Wait_Data_Xfr) dccm 9546 daemon 4979u IPv4 0x3002e4834a8 0t0 UDP electra.cc.umanitoba.ca:33902 (Wait_Data_Xfr) dccm 9546 daemon 5041u IPv4 0x30002910cd0 0t0 UDP electra.cc.umanitoba.ca:57025 (Wait_Data_Xfr) These are the file descriptor types: 3647 IPv4 538 VREG 4 VCHR 1 VDIR 1 DOOR Of the sockets, there were 2846 TCP and 801 UDP. 799 of the sockets were in the Wait_Data_Xfr state. Of the TCP connections, 2671 were idle, like this: COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME dccm 9546 daemon 3910u IPv4 0x30002868bd0 0t0 TCP electra.cc.umanitoba.ca:*->naos.cc.umanitoba.ca:* (IDLE) dccm 9546 daemon 3911u IPv4 0x3002db21bd0 0t0 TCP electra.cc.umanitoba.ca:*->electra.cc.umanitoba.ca:* (IDLE) dccm 9546 daemon 3912u IPv4 0x3000e2faaa0 0t0 TCP electra.cc.umanitoba.ca:*->electra.cc.umanitoba.ca:* (IDLE) dccm 9546 daemon 3913u IPv4 0x3002f424550 0t0 TCP electra.cc.umanitoba.ca:*->electra.cc.umanitoba.ca:* (IDLE) Are there any clues in this information? -- -Gary Mills- -Unix Support- -U of M Academic Computing and Networking-
More information about the DCC
mailing list