Vernon Schryver
vjs@calcite.rhyolite.com
Wed, 24 Oct 2001 11:51:55 -0600 (MDT)
> From: Levent Serinol <lserinol@yahoo.com> > When I had about 600 concurreny dccproc applications > started to report that "dccd server is not responding" > for a while later I've got following error > > Oct 24 20:01:04 gemini dccd[10217]: [ID 802041 > mail.error] graceful stop > > and my machine crashed :-( No matter what any application or user program does or tries to do, the system must not crash. Absolutely every crash of the system is a bug in the operating system. This may be contrary to what users of Microsoft software may think, but it has been part of the definition of an operating system for more than 30 years. > without running dccd (using dccproc and letting it to > report not responding errors) machine can handle 700 > concurreny. > > I think graceful stop caused by flooding limit ? > > Which parameters do I have to check on machine and > dccd ? I think "graceful stop" happens only as the result of the `cdcc stop` command. See the top of the for(;;) loop in recv_job() in dccd/dccd.c. Grep says stopint is set <0 only in dccd/work.c by the DCC_AOP_STOP administrative command. One thing to try is to turn on tracing of administrative operations with `dccd -T ADMN` or `cdcc "trace admn on"`. However, I doubt you'll see anything. I suspect dccd's memory was corrputed, and that might be related to whatever operating system bug was tickled to crash the system. It all smells like a kernel VM bug to me. Dccd is single threaded, and so knows nothing about concurrency in the rest of the system. I would expect 600 dccproc process talking to the dccd process to stress the kernel's paging code, especially if the system does not have lots of physical memory. vjs