dccm running out of file descriptors

Gary Mills mills@cc.UManitoba.CA
Wed Jan 28 16:25:39 UTC 2004


On Sun, Jan 11, 2004 at 09:59:00AM -0700, Vernon Schryver wrote:
> > From: Gary Mills <mills@cc.UManitoba.CA>
> 
> > I just took a look.  It's quiet now.  `dccm' was using 109 threads.
> > Yes, `lsof' shows sockets.  Here are the file descriptors by type:
> 
> I see nothing wrong there.
> What does lsof say when things are sick?

We had another incident this morning.  I have log messages from that
time, but `lsof' output from about two hours later, when dccm had
recovered.  Here's the beginning and end of the log errors:

Jan 28 05:29:52 electra dccm[9546]: [ID 125918 mail.error] DCC: accept() returned invalid socket (Too many open files), try again
Jan 28 05:29:53 electra dccm[9546]: [ID 125918 mail.error] DCC: accept() returned invalid socket (Too many open files), try again
Jan 28 05:29:54 electra dccm[9546]: [ID 925838 mail.error] dcc_mkstemp(/var/dcc/log/028/05/tmp.4iae5i): Too many open files
...
Jan 28 06:29:23 electra dccm[9546]: [ID 925838 mail.error] dcc_mkstemp(/usr/local/dcc/whiteclnt): Too many open files
Jan 28 06:29:23 electra dccm[9546]: [ID 925838 mail.error] dcc_mkstemp(/usr/local/dcc/whiteclnt): Too many open files
Jan 28 06:29:23 electra dccm[9546]: [ID 925838 mail.error] dcc_mkstemp(/usr/local/dcc/whiteclnt): Too many open files

`dccm' is running with a context limit of 800 and a file descriptor
limit of 5120.  Here't the tail of the lsof output, showing the highest
file descriptor that was still in use:

COMMAND  PID   USER   FD   TYPE        DEVICE SIZE/OFF    NODE NAME
dccm    9546 daemon 4939u  IPv4 0x300042da7a8      0t0     UDP electra.cc.umanitoba.ca:41544 (Wait_Data_Xfr)
dccm    9546 daemon 4978u  IPv4 0x30011c5ca60      0t0     UDP electra.cc.umanitoba.ca:33901 (Wait_Data_Xfr)
dccm    9546 daemon 4979u  IPv4 0x3002e4834a8      0t0     UDP electra.cc.umanitoba.ca:33902 (Wait_Data_Xfr)
dccm    9546 daemon 5041u  IPv4 0x30002910cd0      0t0     UDP electra.cc.umanitoba.ca:57025 (Wait_Data_Xfr)

These are the file descriptor types:

3647 IPv4
 538 VREG
   4 VCHR
   1 VDIR
   1 DOOR

Of the sockets, there were 2846 TCP and 801 UDP.  799 of the sockets
were in the Wait_Data_Xfr state.  Of the TCP connections, 2671 were
idle, like this:

COMMAND  PID   USER   FD   TYPE        DEVICE SIZE/OFF    NODE NAME
dccm    9546 daemon 3910u  IPv4 0x30002868bd0      0t0     TCP electra.cc.umanitoba.ca:*->naos.cc.umanitoba.ca:* (IDLE)
dccm    9546 daemon 3911u  IPv4 0x3002db21bd0      0t0     TCP electra.cc.umanitoba.ca:*->electra.cc.umanitoba.ca:* (IDLE)
dccm    9546 daemon 3912u  IPv4 0x3000e2faaa0      0t0     TCP electra.cc.umanitoba.ca:*->electra.cc.umanitoba.ca:* (IDLE)
dccm    9546 daemon 3913u  IPv4 0x3002f424550      0t0     TCP electra.cc.umanitoba.ca:*->electra.cc.umanitoba.ca:* (IDLE)

Are there any clues in this information?

-- 
-Gary Mills-    -Unix Support-    -U of M Academic Computing and Networking-



More information about the DCC mailing list

Contact vjs@rhyolite.com by mail or use the form.