802 copies of dccm running

Vernon Schryver vjs@calcite.rhyolite.com
Wed Jan 3 03:23:26 UTC 2007


> From: Gary Mills 

> Really!  Does the parent broadcast to all of the children and then
> deal with the one that responds first?  That model won't scale very
> well.

The protocol is a standard flavor.  The parent sends a request to socket.
The request includes a return address for a response.  The request is
received by, processed, and answered by a single child.


>        Surely the parent must know which children are busy and which
> are idle.  Can't the parent just pick the first idle one? 

There are perhaps subtle implementation problems related to speed,
correctness, and robustness with having the parent know the state
of all children.

However, there is a bigger problem with that model that you in particular
should care about.  For the parent to be able to send to an individual
child and so with a private channel, the parent would need to devote a
file descriptor to each child.  Since every mail message being processed
by dccd might need a DNSBL lookup simultaneously, that additional use
of file descriptors would reduce the -j limit on concurrent dccm jobs.
The current scheme uses no additional file descriptors because talking
to the DNSBL child is done with the socket normally used to talk to the
DCC server.

>                                                            They should
> all respond equally quickly.

Hidden in that thought is the problem.  See
http://www.google.com/search?q=unix+%22thundering+herd+problem%22


> My system call trace is puzzling.  When a helper is in its poll/recvfrom
> loop, poll() claims that one file descriptor is ready for reading, but
> recvfrom() says there's nothing there.  I don't know how that's even
> possible.  What could cause that behavior?

  - All of the idle children are asleep in select/poll waiting for
   something to happen on either the pipe that tells them that the
   parent has died or socket on which requests arrive.

  - A request arrives on the file descripter, so the kernel awakens
   all of the idle children.

  - One child gets into the recvfrom() system call first, receives the
   request, and starts working.

  - The thundering herd of other idlers get "sorry no data" from
   recvfrom() on the non-blocking socket and go back to sleep.


I coded a fix that uses write(1 byte)/read() on that pipe to awaken a
single child, and SIGALRM to awaken children that have been asleep too
long and need to kill themselves.  However, I can't make SIGALRM work
in some POSIX threads implementations.  So I'm ripping out support for
external filters to make it possible to use SIGALRM.
(The only external filter ever hooked to dccm/dccproc/dccifd that I know
of requires threads, but no cares about using it inside dccm, dccifd,
or dccproc.)


Vernon Schryver    vjs@rhyolite.com



More information about the DCC mailing list

Contact vjs@rhyolite.com by mail or use the form.