802 copies of dccm running

Vernon Schryver vjs@calcite.rhyolite.com
Sun Dec 31 17:14:23 UTC 2006


> From: Gary Mills 

> Notice that the number of `dccm' processes now greatly exceeds the
> number of `sendmail' processes.  Most of those correspond to incoming
> SMTP connections.  I also watched the number of threads used by the
> main `dccm' process.  It was generally 50 or less.  `named' used seven
> threads throughout.

The number of `named` threads seems irrelevant.

> I stopped the test at this point because it would surely run away in
> time, or when the system got busy.  Why are all those `dccm' processes
> even needed?

I bet that the excess dccm threads and processes are waiting for
DNS requests to be finished.  If I'm right about that, then stopping
sendmail for at most 91 seconds would make all but 3 or 4 dccm child
threads disappear.



I do not believe there is a thread leak with `dccm -B` because other
systems using `dccm -B` have been using `dccm -B` for more than a year.

One is now running SunOS 5.8 with sendmail 8.13.6.  It has run `dccm
-B` for months without being restarted.  It receives only about 10% as
much traffic as dcc1.cc.umanitoba.ca, but a thread leak should be evident
on it.  Just now it has been running for more than a week.  Ps -L finds
64 lightweight threads in the main dccm process.  Gdb is confused
and cannot list them, but says almost all of the thread IDs I try are
defunct.

Each dccm DNS helper child process has 4 threads, but does not use
threading because the resolver library is not thread safe.  So 3
of those threads are some sort of overhead.  This suggests that on
Solaris, `ps -L | grep dccm | wc` can give a number as large as 5
times the dccm -j limit.


Does your /etc/resolv.conf set the resolver library timeouts?

Does /var/dcc/build/dcc/include/dcc_config.h say that ./configure
found the BIND resolver hooks with lines like these?:
    /* BIND resolver library */
    #define HAVE_RESOLV_H 1
    #define HAVE_ARPA_NAMESER_H 1
    #define HAVE__RES 1
    #define HAVE_RES_INIT 1
    #define HAVE_RES_QUERY 1
    #define HAVE_DN_EXPAND 1

I assume you have not set any of the -B timeouts with -B:set:xxx

Have you tried -Bset:debug=4 to see what is happening?  That will make
a lot of noise on a busy system, perhaps too much to determine anything.


Vernon Schryver    vjs@rhyolite.com



More information about the DCC mailing list

Contact vjs@rhyolite.com by mail or use the form.