Oct 24 20:01:04 gemini graceful stop

Vernon Schryver vjs@calcite.rhyolite.com
Wed Oct 24 17:51:55 UTC 2001


> From: Levent Serinol <lserinol@yahoo.com>

> When I had about 600 concurreny  dccproc applications
> started to report that "dccd server is not responding"
> for a while later I've got following error 
>
> Oct 24 20:01:04 gemini dccd[10217]: [ID 802041
> mail.error] graceful stop
>
> and my machine crashed :-(

No matter what any application or user program does or tries to do, the
system must not crash.  Absolutely every crash of the system is a bug
in the operating system.  This may be contrary to what users of Microsoft
software may think, but it has been part of the definition of an operating
system for more than 30 years.

> without running dccd (using dccproc and letting it to
> report not responding errors) machine can handle 700
> concurreny. 
>
> I think graceful stop caused by flooding limit ? 
>
> Which parameters do I have to check on machine and
> dccd ?

I think "graceful stop" happens only as the result of the `cdcc stop` command.
See the top of the for(;;) loop in recv_job() in dccd/dccd.c.
Grep says stopint is set <0 only in dccd/work.c by the DCC_AOP_STOP
administrative command.
One thing to try is to turn on tracing of administrative operations
with `dccd -T ADMN` or `cdcc "trace admn on"`.  However, I doubt you'll
see anything.  I suspect dccd's memory was corrputed, and that might be
related to whatever operating system bug was tickled to crash the system.

It all smells like a kernel VM bug to me.  Dccd is single threaded, and
so knows nothing about concurrency in the rest of the system.
I would expect 600 dccproc process talking to the dccd process to 
stress the kernel's paging code, especially if the system does not have
lots of physical memory.


vjs



More information about the DCC mailing list

Contact vjs@rhyolite.com by mail or use the form.