dccm misbehaving on Solaris 9

Gary Mills mills@cc.umanitoba.ca
Wed Mar 1 20:52:20 UTC 2006

On Wed, Mar 01, 2006 at 08:59:26AM -0700, Andy Rudoff wrote:
> >>	dccm[2920]: [ID 702911 mail.error] fdopen(whiteclnt): Resource 
> >>	temporarily unavailable
> [...]
> > I've been trying to reproduce the failure from fdopen().  That
> >message from dccm only happens when fdopen() returns 0 and set errno
> >to EAGAIN, but the Solaris `man fdopen` does not mention EAGAIN.
> The man page might be lacking in a detail or two here :-)
> I can think of two ways that errno can be EAGAIN on return from fdopen().
> First, fdopen() can return NULL without setting errno (an ugly little
> fact that is documented in the Solaris man page).  So if fdopen() finds
> there are no stdio streams left and errno was already set to EAGAIN from
> some previous syscall, it is technically possible to get the NULL/EAGAIN
> combination.

Ah, yes:

     The fdopen() function may fail and not set  errno  if  there
     are no free stdio streams.

I didn't see that in the Opensolaris source, though.  Maybe it's been
fixed there.  In any case, 32-bit Solaris programs have a limit of
255 stdio streams, and require the file descriptors to be 255 or below.
I assume that fdopen() will set EBADF if the file descriptor is out of
range.  Curiously, the Solaris 9 man page also says this:

           The number of streams currently open  in  the  calling
           process is either FOPEN_MAX or STREAM_MAX.

Apparently, it will set errno under some conditions.

With those restrictions, it is easy to run out of stdio streams.  The
usual workaround is to reserve low file descriptors for stdio, and to
limit the number of stdio streams.  When dccm was misbehaving the
other day, the file descriptor limit was 5120.  It had 748 open files.
The highest file descriptor was 1442.  This is fairly normal on a busy
e-mail server that's handling hundreds of simultaneous SMTP sessions.

> But that's not what I think happened above.  fdopen() calls calloc()
> and if it gets NULL back, the errno is preserved on return from fdopen()
> (this detail is sadly missing from the Solaris man page).  calloc()
> can indeed return EAGAIN.  If calloc() fails because the process memory
> limit is hit, then it returns ENOMEM, but if it fails because the system
> is out of swap space, it returns EAGAIN.  The idea being, I guess, that
> the resource exhaustion may be temporary so the application can try again
> later.

Yes, it has to allocate the FILE structure, and fill in some fields.
I doubt that it ran out of swap.  The server has 16 gigs of memory and
50 gigs of swap.  I wonder if some transient condition in a
multi-threaded program could cause a similar error.

-Gary Mills-    -Unix Support-    -U of M Academic Computing and Networking-

More information about the DCC mailing list

Contact vjs@rhyolite.com by mail or use the form.