dccifd failover under load

Ian Brewer Ian_Brewer@vwiz.co.uk
Wed, 03 May 2006 09:23:57 +0100


This is a multi-part message in MIME format.
--------------090204060702060102010609
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit

Vernon,

I am using version 1.3.31 of the client and server code. The only thing 
I have tweaked is the DCCD_ARGS to "-R 4000,50,300,0.1" to make sure the 
servers accept the request rate.

I have attached the output of cdcc info while stopping "server1".

Some interesting times are:

08:48:11 *server1 100% of 32 requests ok    0.55 ms RTT
           server2  75% of 32 requests ok    0.63 ms RTT

08:48:12  server1 100% of  4 requests ok   10.65 ms RTT
          *server2 100% of 32 requests ok    0.95 ms RTT

...
08:48:17  server1 100% of  4 requests ok   10.65 ms RTT
          *server2 100% of 32 requests ok    0.54 ms RTT

08:48:18
# skipping asking DCC server 64 seconds more
           server1  20% of 20 requests ok 3969.25 ms RTT
          *server2 100% of 32 requests ok    1.07 ms RTT


/var/log/maillog on the client has the following entries:
May  3 08:48:18 client dccifd[28141]: no DCC answer from server1 after 
6062 ms
May  3 08:48:18 client dccifd[28141]: no DCC answer from server1 after 
6032 ms

It looks to me that there are queued requests to server1 even though we 
are using server2 happily. When the client times out these requests, the 
switching code get confused somehow and I get a 64 second timeout.

Do you think this is a linux only issue? If it would help, I could 
install a freeBSD client and retest.

Ian.

Vernon Schryver wrote:
>>From: Ian Brewer 
> 
> 
>>currently being used by the client, the second server is immediately 
>>picked (according to cdcc info), but the fail_more() code also fires and 
>>gives me 64 seconds worth of "Continue not asking" messages. Shouldn't 
>>the second server be available for use straight away?
> 
> 
> The second server should be available immediately.
> 
> 
>>Its almost as if any queued requests are counted as failed and causing 
>>the fail_more code to go off, even though a new working server has been 
>>found.
> 
> 
> If somehow a bunch of requests do in fact fail, then something like
> that should happen.
> 
> 
>>I have included a modified dccif-test program that can be used to show 
>>the problem. I run the program, then "watch -n 1 'cdcc info'" on another 
>>  terminal while shutting the in-use server down.
> 
> 
> I doubt that `watch` on a FreeBSD system does what you intended, so I
> used `repeat 100 sh foo`, where foo was a shell script consisting of
> `cdcc info; sleep 1`
> 
> I see no problems.  The count from your program spirals up and pauses when
> when I run `cdcc "id 101; stop`.  Then it resumes, albeit at a slower
> pace because the other servers are distant.
> 
> 
> What version of the DCC code are you using?  The current version is either
> 1.3.31 or 2.3.31 depending on whether you are using the commercial version.
> I've continued to work on the server switching machinery since the earliest
> releases.
> 
> Exactly (other than passwords) what do you see from `cdcc info`?
> 
> 
> Vernon Schryver    vjs@rhyolite.com
> _______________________________________________
> DCC mailing list      DCC@rhyolite.com
> http://www.rhyolite.com/mailman/listinfo/dcc
> 
> 

--------------090204060702060102010609
Content-Type: text/plain;
 name="cdccoutput"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
 filename="cdccoutput"

# 05/03/06 08:48:10 BST  /var/dcc/map
# Re-resolve names after 10:46:52  Check RTTs after 09:01:52
# 100.63 ms threshold, 101.48 ms average    2 total, 2 working servers
IPv6 off

server1,-                    <client_id> <client passwd>
# *10.83.1.202,-                                                     ID S1_ID
#     100% of 32 requests ok    0.73 ms RTT              0 ms queue wait

server2,-                    <client_id> <client passwd>
#  10.83.1.203,-                                                     ID S2_ID
#      75% of 32 requests ok    0.63 ms RTT              0 ms queue wait

================================================================================
# 05/03/06 08:48:11 BST  /var/dcc/map
# Re-resolve names after 10:46:52  Check RTTs after 09:01:52
# 100.63 ms threshold, 101.48 ms average    2 total, 2 working servers
IPv6 off

server1,-                    <client_id> <client passwd>
# *10.83.1.202,-                                                     ID S1_ID
#     100% of 32 requests ok    0.55 ms RTT              0 ms queue wait

server2,-                    <client_id> <client passwd>
#  10.83.1.203,-                                                     ID S2_ID
#      75% of 32 requests ok    0.63 ms RTT              0 ms queue wait

================================================================================
# 05/03/06 08:48:12 BST  /var/dcc/map
# Re-resolve names after 10:46:52  Check RTTs after 09:01:52
# 8000.00 ms threshold, 101.48 ms average    2 total, 2 working servers
IPv6 off

server1,-                    <client_id> <client passwd>
#  10.83.1.202,-                                                     ID S1_ID
#     100% of  4 requests ok   10.65 ms RTT              0 ms queue wait

server2,-                    <client_id> <client passwd>
# *10.83.1.203,-                                                     ID S2_ID
#     100% of 32 requests ok    0.95 ms RTT              0 ms queue wait

================================================================================
# 05/03/06 08:48:13 BST  /var/dcc/map
# Re-resolve names after 10:46:52  Check RTTs after 09:01:52
# 8000.00 ms threshold, 101.48 ms average    2 total, 2 working servers
IPv6 off

server1,-                    <client_id> <client passwd>
#  10.83.1.202,-                                                     ID S1_ID
#     100% of  4 requests ok   10.65 ms RTT              0 ms queue wait

server2,-                    <client_id> <client passwd>
# *10.83.1.203,-                                                     ID S2_ID
#     100% of 32 requests ok    0.65 ms RTT              0 ms queue wait

================================================================================
# 05/03/06 08:48:14 BST  /var/dcc/map
# Re-resolve names after 10:46:52  Check RTTs after 09:01:52
# 8000.00 ms threshold, 101.48 ms average    2 total, 2 working servers
IPv6 off

server1,-                    <client_id> <client passwd>
#  10.83.1.202,-                                                     ID S1_ID
#     100% of  4 requests ok   10.65 ms RTT              0 ms queue wait

server2,-                    <client_id> <client passwd>
# *10.83.1.203,-                                                     ID S2_ID
#     100% of 32 requests ok    0.61 ms RTT              0 ms queue wait

================================================================================
# 05/03/06 08:48:15 BST  /var/dcc/map
# Re-resolve names after 10:46:52  Check RTTs after 09:01:52
# 8000.00 ms threshold, 101.48 ms average    2 total, 2 working servers
IPv6 off

server1,-                    <client_id> <client passwd>
#  10.83.1.202,-                                                     ID S1_ID
#     100% of  4 requests ok   10.65 ms RTT              0 ms queue wait

server2,-                    <client_id> <client passwd>
# *10.83.1.203,-                                                     ID S2_ID
#     100% of 32 requests ok    2.41 ms RTT              0 ms queue wait

================================================================================
# 05/03/06 08:48:16 BST  /var/dcc/map
# Re-resolve names after 10:46:52  Check RTTs after 09:01:52
# 8000.00 ms threshold, 101.48 ms average    2 total, 2 working servers
IPv6 off

server1,-                    <client_id> <client passwd>
#  10.83.1.202,-                                                     ID S1_ID
#     100% of  4 requests ok   10.65 ms RTT              0 ms queue wait

server2,-                    <client_id> <client passwd>
# *10.83.1.203,-                                                     ID S2_ID
#     100% of 32 requests ok    1.76 ms RTT              0 ms queue wait

================================================================================
# 05/03/06 08:48:17 BST  /var/dcc/map
# Re-resolve names after 10:46:52  Check RTTs after 09:01:52
# 8000.00 ms threshold, 101.48 ms average    2 total, 2 working servers
IPv6 off

server1,-                    <client_id> <client passwd>
#  10.83.1.202,-                                                     ID S1_ID
#     100% of  4 requests ok   10.65 ms RTT              0 ms queue wait

server2,-                    <client_id> <client passwd>
# *10.83.1.203,-                                                     ID S2_ID
#     100% of 32 requests ok    0.54 ms RTT              0 ms queue wait

================================================================================
# 05/03/06 08:48:18 BST  /var/dcc/map
# Re-resolve names after 10:46:52  Check RTTs after 09:01:52
# 8000.00 ms threshold, 101.48 ms average    2 total, 2 working servers
# skipping asking DCC server 64 seconds more
IPv6 off

server1,-                    <client_id> <client passwd>
#  10.83.1.202,-                                                     ID S1_ID
#      20% of 20 requests ok 3969.25 ms RTT              0 ms queue wait

server2,-                    <client_id> <client passwd>
# *10.83.1.203,-                                                     ID S2_ID
#     100% of 32 requests ok    1.07 ms RTT              0 ms queue wait

================================================================================
# 05/03/06 08:48:19 BST  /var/dcc/map
# Re-resolve names after 10:46:52  Check RTTs after 09:01:52
# 8000.00 ms threshold, 101.48 ms average    2 total, 2 working servers
# skipping asking DCC server 63 seconds more
IPv6 off

server1,-                    <client_id> <client passwd>
#  10.83.1.202,-                                                     ID S1_ID
#      20% of 20 requests ok 3969.25 ms RTT              0 ms queue wait

server2,-                    <client_id> <client passwd>
# *10.83.1.203,-                                                     ID S2_ID
#     100% of 32 requests ok    1.07 ms RTT              0 ms queue wait

================================================================================
# 05/03/06 08:48:20 BST  /var/dcc/map
# Re-resolve names after 10:46:52  Check RTTs after 09:01:52
# 8000.00 ms threshold, 101.48 ms average    2 total, 2 working servers
# skipping asking DCC server 62 seconds more
IPv6 off

server1,-                    <client_id> <client passwd>
#  10.83.1.202,-                                                     ID S1_ID
#      20% of 20 requests ok 3969.25 ms RTT              0 ms queue wait

server2,-                    <client_id> <client passwd>
# *10.83.1.203,-                                                     ID S2_ID
#     100% of 32 requests ok    1.07 ms RTT              0 ms queue wait

================================================================================
# 05/03/06 08:48:21 BST  /var/dcc/map
# Re-resolve names after 10:46:52  Check RTTs after 09:01:52
# 8000.00 ms threshold, 101.48 ms average    2 total, 2 working servers
# skipping asking DCC server 61 seconds more
IPv6 off

server1,-                    <client_id> <client passwd>
#  10.83.1.202,-                                                     ID S1_ID
#      20% of 20 requests ok 3969.25 ms RTT              0 ms queue wait

server2,-                    <client_id> <client passwd>
# *10.83.1.203,-                                                     ID S2_ID
#     100% of 32 requests ok    1.07 ms RTT              0 ms queue wait

================================================================================
# 05/03/06 08:48:22 BST  /var/dcc/map
# Re-resolve names after 10:46:52  Check RTTs after 09:01:52
# 8000.00 ms threshold, 101.48 ms average    2 total, 2 working servers
# skipping asking DCC server 60 seconds more
IPv6 off

server1,-                    <client_id> <client passwd>
#  10.83.1.202,-                                                     ID S1_ID
#      20% of 20 requests ok 3969.25 ms RTT              0 ms queue wait

server2,-                    <client_id> <client passwd>
# *10.83.1.203,-                                                     ID S2_ID
#     100% of 32 requests ok    1.07 ms RTT              0 ms queue wait

================================================================================
# 05/03/06 08:48:23 BST  /var/dcc/map
# Re-resolve names after 10:46:52  Check RTTs after 09:01:52
# 8000.00 ms threshold, 101.48 ms average    2 total, 2 working servers
# skipping asking DCC server 59 seconds more
IPv6 off

server1,-                    <client_id> <client passwd>
#  10.83.1.202,-                                                     ID S1_ID
#      20% of 20 requests ok 3969.25 ms RTT              0 ms queue wait

server2,-                    <client_id> <client passwd>
# *10.83.1.203,-                                                     ID S2_ID
#     100% of 32 requests ok    1.07 ms RTT              0 ms queue wait

================================================================================
# 05/03/06 08:48:24 BST  /var/dcc/map
# Re-resolve names after 10:46:52  Check RTTs after 09:01:52
# 8000.00 ms threshold, 101.48 ms average    2 total, 2 working servers
# skipping asking DCC server 58 seconds more
IPv6 off

server1,-                    <client_id> <client passwd>
#  10.83.1.202,-                                                     ID S1_ID
#      20% of 20 requests ok 3969.25 ms RTT              0 ms queue wait

server2,-                    <client_id> <client passwd>
# *10.83.1.203,-                                                     ID S2_ID
#     100% of 32 requests ok    1.07 ms RTT              0 ms queue wait

================================================================================
# 05/03/06 08:48:26 BST  /var/dcc/map
# Re-resolve names after 10:46:52  Check RTTs after 09:01:52
# 8000.00 ms threshold, 101.48 ms average    2 total, 2 working servers
# skipping asking DCC server 56 seconds more
IPv6 off

server1,-                    <client_id> <client passwd>
#  10.83.1.202,-                                                     ID S1_ID
#      20% of 20 requests ok 3969.25 ms RTT              0 ms queue wait

server2,-                    <client_id> <client passwd>
# *10.83.1.203,-                                                     ID S2_ID
#     100% of 32 requests ok    1.07 ms RTT              0 ms queue wait

================================================================================
# 05/03/06 08:48:27 BST  /var/dcc/map
# Re-resolve names after 10:46:52  Check RTTs after 09:01:52
# 8000.00 ms threshold, 101.48 ms average    2 total, 2 working servers
# skipping asking DCC server 55 seconds more
IPv6 off

server1,-                    <client_id> <client passwd>
#  10.83.1.202,-                                                     ID S1_ID
#      20% of 20 requests ok 3969.25 ms RTT              0 ms queue wait

server2,-                    <client_id> <client passwd>
# *10.83.1.203,-                                                     ID S2_ID
#     100% of 32 requests ok    1.07 ms RTT              0 ms queue wait

================================================================================
# 05/03/06 08:48:28 BST  /var/dcc/map
# Re-resolve names after 10:46:52  Check RTTs after 09:01:52
# 8000.00 ms threshold, 101.48 ms average    2 total, 2 working servers
# skipping asking DCC server 54 seconds more
IPv6 off

server1,-                    <client_id> <client passwd>
#  10.83.1.202,-                                                     ID S1_ID
#      20% of 20 requests ok 3969.25 ms RTT              0 ms queue wait

server2,-                    <client_id> <client passwd>
# *10.83.1.203,-                                                     ID S2_ID
#     100% of 32 requests ok    1.07 ms RTT              0 ms queue wait

================================================================================
# 05/03/06 08:48:29 BST  /var/dcc/map
# Re-resolve names after 10:46:52  Check RTTs after 09:01:52
# 8000.00 ms threshold, 101.48 ms average    2 total, 2 working servers
# skipping asking DCC server 53 seconds more
IPv6 off

server1,-                    <client_id> <client passwd>
#  10.83.1.202,-                                                     ID S1_ID
#      20% of 20 requests ok 3969.25 ms RTT              0 ms queue wait

server2,-                    <client_id> <client passwd>
# *10.83.1.203,-                                                     ID S2_ID
#     100% of 32 requests ok    1.07 ms RTT              0 ms queue wait

================================================================================
# 05/03/06 08:48:30 BST  /var/dcc/map
# Re-resolve names after 10:46:52  Check RTTs after 09:01:52
# 8000.00 ms threshold, 101.48 ms average    2 total, 2 working servers
# skipping asking DCC server 52 seconds more
IPv6 off

server1,-                    <client_id> <client passwd>
#  10.83.1.202,-                                                     ID S1_ID
#      20% of 20 requests ok 3969.25 ms RTT              0 ms queue wait

server2,-                    <client_id> <client passwd>
# *10.83.1.203,-                                                     ID S2_ID
#     100% of 32 requests ok    1.07 ms RTT              0 ms queue wait

================================================================================
# 05/03/06 08:48:31 BST  /var/dcc/map
# Re-resolve names after 10:46:52  Check RTTs after 09:01:52
# 8000.00 ms threshold, 101.48 ms average    2 total, 2 working servers
# skipping asking DCC server 51 seconds more
IPv6 off

server1,-                    <client_id> <client passwd>
#  10.83.1.202,-                                                     ID S1_ID
#      20% of 20 requests ok 3969.25 ms RTT              0 ms queue wait

server2,-                    <client_id> <client passwd>
# *10.83.1.203,-                                                     ID S2_ID
#     100% of 32 requests ok    1.07 ms RTT              0 ms queue wait

================================================================================
# 05/03/06 08:48:32 BST  /var/dcc/map
# Re-resolve names after 10:46:52  Check RTTs after 09:01:52
# 8000.00 ms threshold, 101.48 ms average    2 total, 2 working servers
# skipping asking DCC server 50 seconds more
IPv6 off

server1,-                    <client_id> <client passwd>
#  10.83.1.202,-                                                     ID S1_ID
#      20% of 20 requests ok 3969.25 ms RTT              0 ms queue wait

server2,-                    <client_id> <client passwd>
# *10.83.1.203,-                                                     ID S2_ID
#     100% of 32 requests ok    1.07 ms RTT              0 ms queue wait

================================================================================
# 05/03/06 08:48:33 BST  /var/dcc/map
# Re-resolve names after 10:46:52  Check RTTs after 09:01:52
# 8000.00 ms threshold, 101.48 ms average    2 total, 2 working servers
# skipping asking DCC server 49 seconds more
IPv6 off

server1,-                    <client_id> <client passwd>
#  10.83.1.202,-                                                     ID S1_ID
#      20% of 20 requests ok 3969.25 ms RTT              0 ms queue wait

server2,-                    <client_id> <client passwd>
# *10.83.1.203,-                                                     ID S2_ID
#     100% of 32 requests ok    1.07 ms RTT              0 ms queue wait

================================================================================
# 05/03/06 08:48:34 BST  /var/dcc/map
# Re-resolve names after 10:46:52  Check RTTs after 09:01:52
# 8000.00 ms threshold, 101.48 ms average    2 total, 2 working servers
# skipping asking DCC server 48 seconds more
IPv6 off

server1,-                    <client_id> <client passwd>
#  10.83.1.202,-                                                     ID S1_ID
#      20% of 20 requests ok 3969.25 ms RTT              0 ms queue wait

server2,-                    <client_id> <client passwd>
# *10.83.1.203,-                                                     ID S2_ID
#     100% of 32 requests ok    1.07 ms RTT              0 ms queue wait

================================================================================
# 05/03/06 08:48:35 BST  /var/dcc/map
# Re-resolve names after 10:46:52  Check RTTs after 09:01:52
# 8000.00 ms threshold, 101.48 ms average    2 total, 2 working servers
# skipping asking DCC server 47 seconds more
IPv6 off

server1,-                    <client_id> <client passwd>
#  10.83.1.202,-                                                     ID S1_ID
#      20% of 20 requests ok 3969.25 ms RTT              0 ms queue wait

server2,-                    <client_id> <client passwd>
# *10.83.1.203,-                                                     ID S2_ID
#     100% of 32 requests ok    1.07 ms RTT              0 ms queue wait

================================================================================
# 05/03/06 08:48:36 BST  /var/dcc/map
# Re-resolve names after 10:46:52  Check RTTs after 09:01:52
# 8000.00 ms threshold, 101.48 ms average    2 total, 2 working servers
# skipping asking DCC server 46 seconds more
IPv6 off

server1,-                    <client_id> <client passwd>
#  10.83.1.202,-                                                     ID S1_ID
#      20% of 20 requests ok 3969.25 ms RTT              0 ms queue wait

server2,-                    <client_id> <client passwd>
# *10.83.1.203,-                                                     ID S2_ID
#     100% of 32 requests ok    1.07 ms RTT              0 ms queue wait

================================================================================
# 05/03/06 08:48:37 BST  /var/dcc/map
# Re-resolve names after 10:46:52  Check RTTs after 09:01:52
# 8000.00 ms threshold, 101.48 ms average    2 total, 2 working servers
# skipping asking DCC server 45 seconds more
IPv6 off

server1,-                    <client_id> <client passwd>
#  10.83.1.202,-                                                     ID S1_ID
#      20% of 20 requests ok 3969.25 ms RTT              0 ms queue wait

server2,-                    <client_id> <client passwd>
# *10.83.1.203,-                                                     ID S2_ID
#     100% of 32 requests ok    1.07 ms RTT              0 ms queue wait

================================================================================

--------------090204060702060102010609--