dccifd failover under load

Vernon Schryver vjs@calcite.rhyolite.com
Tue May 2 21:51:41 UTC 2006

> From: Ian Brewer 

> currently being used by the client, the second server is immediately 
> picked (according to cdcc info), but the fail_more() code also fires and 
> gives me 64 seconds worth of "Continue not asking" messages. Shouldn't 
> the second server be available for use straight away?

The second server should be available immediately.

> Its almost as if any queued requests are counted as failed and causing 
> the fail_more code to go off, even though a new working server has been 
> found.

If somehow a bunch of requests do in fact fail, then something like
that should happen.

> I have included a modified dccif-test program that can be used to show 
> the problem. I run the program, then "watch -n 1 'cdcc info'" on another 
>   terminal while shutting the in-use server down.

I doubt that `watch` on a FreeBSD system does what you intended, so I
used `repeat 100 sh foo`, where foo was a shell script consisting of
`cdcc info; sleep 1`

I see no problems.  The count from your program spirals up and pauses when
when I run `cdcc "id 101; stop`.  Then it resumes, albeit at a slower
pace because the other servers are distant.

What version of the DCC code are you using?  The current version is either
1.3.31 or 2.3.31 depending on whether you are using the commercial version.
I've continued to work on the server switching machinery since the earliest

Exactly (other than passwords) what do you see from `cdcc info`?

Vernon Schryver    vjs@rhyolite.com

More information about the DCC mailing list

Contact vjs@rhyolite.com by mail or use the form.