Vernon Schryver
vjs@calcite.rhyolite.com
Fri, 4 Jan 2002 08:55:49 -0700 (MST)
> From: me <lar@trib.com> > I have an inbound sendmail-dccm machine handling about 6000 messages/hour It would be swell if the bulky checksums among those 6000 could be exchanged with other DCC users. The inter-server flooding mechanism only sends the reports of checksums that look like spam, since the default value of `dccd -t` is 20 and for only the body checksums. Since almost all legitimate mail is sent to far fewer than 20 targets, that radically reduces the cost of inter-server flooding. I'd be quite happy to help set up an exchange of checksums. > ... > My problem is that as long as the CommuniGatePro machine isn't sending > outbound email to the outbound machine everything works fine. (very low > loads) When I activate the outbound messages from CommuniGate the outbound > logging services quickly shows the following message: > > Jan 3 19:14:59 mailhost0 dccm[1179]: clock reads impossible -12171512 > Jan 3 19:16:05 mailhost0 dccm[1229]: clock reads impossible -10038471 > Jan 3 19:16:16 mailhost0 dccm[1233]: clock reads impossible -1548276 > Jan 3 19:16:20 mailhost0 dccm[1247]: clock reads impossible -10274395 > Jan 3 19:16:48 mailhost0 dccm[1270]: clock reads impossible -5320351 > Jan 3 19:16:58 mailhost0 dccm[1291]: clock reads impossible -10567373 > Jan 3 19:18:03 mailhost0 dccm[1348]: clock reads impossible -3456553 Each of those messages say that the DCC client library thinks that time has jumped backwards by 12, 10, 1.5, etc. seconds or forward by more than an hour since the effort to ask a DCC server about a set fo checksums was started. Such time jumps would mess up the RTT measurements. Since the syslog timestamps don't look crazy, it seems unlikely that time is really bouncing around. If the system running dccm is SunOS or AIX, it is conceivable that a bug I fixed in 10.0.38 might be the cause. I wasn't building some of the DCC client library to be thread-safe. It looks like dcc.trib.com is running 1.0.37. I've never seen that impossible clock message actually generated. I've tweaked it to be a little less obscure in 1.0.42, which I just released. 1.0.42 includes a fix for a dccm crash seen on SunOS. > cdcc info reports: > ... > dcc.trib.com,- 32XXX XXXXXXX > # * 63.229.150.17,6277 DCC.TRIB.COM TRIB server-ID XXXXX > # 100% of 32 requests ok 70.75 ms RTT 6 ms queue wait 70 ms is a very long RTT under the circumstances as I understand them. Either dcc.trib.com is far more distant from the client then sounds likely, the client or the server is awfully busy, or something is broken. > (numbers and password replaced with 'X's) When cdcc is not run as root or a UID that can normally read the /var/dcc/map file, it does not blab the passwords. > The process table shows dozens of dccm processes. The number of outgoing > messages should have been a few hundred/hour, nothing compared to the other > machine. Are there really dozens of dccm processes running? What kind operating systems are involved? Systems without kernel thread support (and some with) show a single process for all of the threads that are a dccm process. If there are dozens of separate dccm process, why? Vernon Schryver vjs@rhyolite.com