DCC chewing up cpu time

Vernon Schryver vjs@calcite.rhyolite.com
Sat Oct 13 07:51:57 UTC 2007

> From: John L 

> My dcc daemon has of late been close to compute bound, currently using 
> about 80% of my 3GHz server.  It's FreeBSD 6.2, 1GB process size on a 2GB 
> machine.  Recently upgraded from 1.3.59 to 1.3.64 which made no 
> difference.

I've noticed excessive use of CPU cycles on FreeBSD 6.2 when the database
gets bigger than physical memory, but that should not apply in this case.
The window on a 2 GByte system (announced by dccd when opens the
database at start and after dbclean) should be more than 860 MByte.

The low service time of "0 ms delay" says dccd doesn't think it is slow.

A 2 GByte machine implies a database limit of about 1 GByte.  That is
a difficult limit to stick to when DCC servers see about 1.3 GByte/day
added by flooding to their database.

I'm trying to figure a way to have dccd run dbclean when the database
gets large and dccd gets slow, but not when there are no backup DCC
servers available and not when the system is otherwise heavily loaded.
Experiments on a 2 GByte Solaris system show that a mid day dbclean can
take less than 30 minutes and make a big difference in how dccd keeps up.

> Any suggestions?  Here's a pair of cdcc status outputs a minute apart.
> The enormous numbers of dup and white entries are rather odd.  The
> number of locally generated entries is not large, maybe 20 a minute.

>      0 ms delay  46 NOPs  44 ADMN  0 query  4 clients in 24 hours

>      flood on   4 streams   4 out active 3 in 956703953 total flooded in
> 5587832 accepted   0 stale 951116078 dup  1212137513 white    0 delete
> 5592334 reports added between Oct 12 12:37:30.959929 EDT and Oct 13 01:55:44

>      flood on   4 streams   4 out active 3 in 970047582 total flooded in
> 5590419 accepted   0 stale 964457120 dup  1213350335 white    0 delete
> 5594939 reports added between Oct 12 12:37:30.959929 EDT and Oct 13 01:56:44

The white and dup values are crazy, but the total-flooded-in is also
too high compared to this server's peers as seen through the server
status web page.

Judging from the source, those "white" counts should be generating
a bunch complaints that would be visible in the system log if
detailed flood tracing were turned on with
    cdcc "id 1107; trace flood2 on"
That can generate a lot of noise even with the rate limiting,
so it might be wise to turn it off before long.

The syslog lines should include the name and IP address of the peer
and the timestamp for the offending entry.
The entry can be found in the database of the sending peer with
`dblist -T timestamp`

Except for the crazy bad "white" counts, I would guess that this 
server had been down and is only now catching up with incoming floods.

The command `cdcc "flood FFWD in ID"` asks the peer with that ID
to fast-forward its flood, or skip everything to the current time.
Apply this to all but 1 or 2 of a server's peer's can help get a
bogged down system caught up.

Vernon Schryver    vjs@rhyolite.com

More information about the DCC mailing list

Contact vjs@rhyolite.com by mail or use the form.