802 copies of dccm running

Gary Mills mills@cc.umanitoba.ca
Tue Jan 2 15:51:11 UTC 2007


On Mon, Jan 01, 2007 at 09:32:37AM -0700, Vernon Schryver wrote:
> > From: Gary Mills 
> 
> >                           So, I did a triage by setting `no-body' and
> > `no-MX' within DNSBL_ARGS in dcc_conf.  The difference was dramatic!
> > The number of `dccm' processes stabilized at 5.  It does go higher
> > when the traffic increases, but it settles back to 5. 
> 
> If you are only checking the SMTP client IP address, then that is
> probably best done by sendmail with FEATURE(`enhdnsbl'...) and applying
> /var/dcc/libexec/hackmc to the mc/cf file so that dccm will know about
> sendmail's DNSBL hit.

I prefer to do it through DCC because I need the native DCC
whitelisting facility to work for messages rejected by the RBL.
I can also add more DCC checking later, that way.

> > No, but I did run `truss' on some of the DNS helper processes.  There
> > were many lines like this:
> >
> > poll(0xFFBFEAD8, 2, 60000)      (sleeping...)
> > poll(0xFFBFEAD8, 2, 60000)                      = 1
> > recvfrom(37, 0xFFBFEEA8, 1936, 0, 0xFFBFECF8, 0xFFBFECF4) Err#11 EAGAIN
> 
> I suspect that and all other idle helpers were awakened by a request from
> the main `dccm` process for a DNS resolution.  Some other helper won
> the race to recvfrom(), so this one went back to sleep.
> 
> Perhaps I should switch to a blocking socket and use a timer to reap
> excess helpers.  That waking up to lose the race and go back to sleep
> is wasteful.  The next version of dccm will reap excess helpers much
> more agressively, which would also help.

Really!  Does the parent broadcast to all of the children and then
deal with the one that responds first?  That model won't scale very
well.  Surely the parent must know which children are busy and which
are idle.  Can't the parent just pick the first idle one?  They should
all respond equally quickly.

My system call trace is puzzling.  When a helper is in its poll/recvfrom
loop, poll() claims that one file descriptor is ready for reading, but
recvfrom() says there's nothing there.  I don't know how that's even
possible.  What could cause that behavior?

When a helper actually receives a message, the trace begins like this:

poll(0xFFBFEAD8, 2, 60000)                      = 1
recvfrom(37, "BEEFDEAD\0\01B C E98 5 K".., 1936, 0, 0xFFBFECF8, 0xFFBFECF4) = 324
door_info(4, 0xFFBFC0B8)                        = 0
door_call(4, 0xFFBFC0A0)                        = 0
open("/etc/inet/ipnodes", O_RDONLY|O_LARGEFILE) = 5

It ends like this:

open("/etc/hosts", O_RDONLY|O_LARGEFILE)        = 5
fcntl(5, F_DUPFD, 0x00000100)                   = 256
close(5)                                        = 0
fcntl(256, F_SETFD, 0x00000001)                 = 0
read(256, " #\n #   I n t e r n e t".., 1024)   = 323
read(256, 0x00117808, 1024)                     = 0
close(256)                                      = 0
sendto(37, "DEADBEEF\0\01B C\0\0\0\0".., 272, 0, 0xFFBFECF8, 16) = 272

> > poll() behavior.  The poll() system call is used internally by the
> > select() function.  File descriptor 37 is a UDP socket which, I
> 
> In some circles in decades past, a selling point of poll() over select()
> was that only one of the waiting processes needed to be awakened and
> avoiding that wasteful rush.

Is that the `thundering herd' problem?  I believe that it only affected
CPU scheduling of those processes in the kernel.  No system calls would
be executed.

-- 
-Gary Mills-    -Unix Support-    -U of M Academic Computing and Networking-



More information about the DCC mailing list

Contact vjs@rhyolite.com by mail or use the form.