DCC use of connect() and sendto() on FreeBSD

Vernon Schryver vjs@calcite.rhyolite.com
Thu Sep 23 04:24:48 UTC 2004


> From: Jamie Clark <jamie@zeroth.org>

> There's nothing sinister in the port. 

we don't agree.

>                                       It uses the 1.2.48 distribution 
> tarball from
> http://www.rhyolite.com/. 

more than 4 months old.

>                           It applies no patches (in the patch(1) sense)

A "global regex" is not exactly an improvement on a `patch` style patch.


> however it does a global regex to replace the hardwired string literal
> '/usr/local' with ${PREFIX} to conform to package building requirements.

There is no such "hardwired string literal '/usr/local' except for
people who don't use the ./configure script.
Many installations choose directories other than /usr/local.  
That I've not figured out how to account for individual tastes is
a major reason why I've never distributed ports/packages/whatevers.
I've forgotten the choice instead of /usr/local of whomever is promugating
that sinister BSD port, except that it seemed as unlikely to meet
universal approval as /usr/local or anything else.


> It also makes a required change to the PTHREAD_LDFLAGS to build
> across the BSDs. Nothing that changes the way the code should work.

That's not so great either, unless "build across the BSDs" means
something other than what one would guess or one cannot use the
./configure script.  The official DCC source with the ./configure script
builds on all major relatives of 4.3BSD that I've heard of.


> In particular the connect() manpage and this kernel code
> from netinet/udp_usrreq.c lead me to believe that an intervening
> operation is needed to disassociate the socket before
> reconnecting:

> In other words "if UDP socket is connected then a subsequent connect will
> return EISCONN".  I wrote test code to confirm this. An intervening 
> connect()
> to invalid address does seem to disconnect the socket so that a later
> connect() will succeed.

yes, and so the purpose of the connect() at line 1500 of
dcclib/clint_send.c in version 1.2.54.


> It seemed that the dcc client code was not consistently disassociating the
> socket before reconnect or sendto. You can see in my truss output.

That is not what I see in your truss output.  Instead I see the client
code trying to disassociate the socket before using sendto().


> I've been developing on FreeBSD since 1994 so I'm not exactly a newbie.
> I am not proclaiming to have solved this problem, nor suggesting that I
> have taken the time to fully understand the logic within clnt_send.c.

I read http://www.rhyolite.com/pipermail/dcc/2004/002388.html
as proclaiming a problem, a diagnosis, and a fix of changing the
#ifdef's or the #define.

> What I am saying is that I have observed the truss output (syscall trace)
> from a failing dccm and cdcc and I can see that connect() and sendto()
> are failing on already connected sockets. Something is wrong.

There is no question that something is wrong.  The questions
concern what the problem is and the appropriate fix.


> Here is the relevent portion of a truss output of "cdcc rtt" run
> outside of any jail, straight from the source tree of a clean,
> build of dcc-dccd-1.2.54 downloaded from rhyolite.com yesterday:
>
> connect(0x4,{ AF_INET 194.109.153.82:6277 },16)  = 0 (0x0)
> sendto(0x4,0xbfbfda58,0x28,0x0,0x0,0x0)          = 40 (0x28)
> connect(0x4,{ sa_len = 16, sa_family = 0, sa_data = { 0x18, 0x85, 0xc2, 
> 0x6d, 0x99, 0x52, 00, 00, 00, 00, 00, 00, 00, 00,,0) ERR#22 'Invalid 
> argument'
> sendto(0x4,0xbfbfda58,0x28,0x0,0xbfbfc804,0x10)  ERR#56 'Socket is 
> already connected'
> sendto(0x4,0xbfbfda58,0x28,0x0,0xbfbfc804,0x10)  ERR#56 'Socket is 
> already connected'

Notice that the second connect() is to an invalid address,
just as you mentioned.


Here is some similar `truss` output on 4.9-RELEASE:

connect(0x4,{ AF_INET 142.27.70.214:6277 },16)   = 0 (0x0)
sendto(0x4,0xbfbfd598,0x28,0x0,0x0,0x0)          = 40 (0x28)
connect(0x4,{ sa_len = 16, sa_family = 0, sa_data = { 0x18, 0x85, 0x8e, 0x1b, 0x
46, 0xd6, 00, 00, 00, 00, 00, 00, 00, 00,,0) ERR#22 'Invalid argument'
sendto(0x4,0xbfbfd598,0x28,0x0,0xbfbfc344,0x10)  = 40 (0x28)
sendto(0x4,0xbfbfd598,0x28,0x0,0xbfbfc3a4,0x10)  = 40 (0x28)
sendto(0x4,0xbfbfd598,0x28,0x0,0xbfbfc404,0x10)  = 40 (0x28)

Notice the lack of EISCONN from the later sendto()'s after the
disconnect or connect(invalid).


One of my pet peeves is the widely held notion that `truss`, `strace`,
or other system call tracing is of great value for diagnosing problems.
When that's all that's available, that's what you use.  However, a
system call trace is not a substitute for running the application in a
debugger.

The official DCC client code can be fetched and built with -g by running
`updatedcc -e DBGFLAGS=-g`   Then you could try:
   % gdb cdcc
   b disconnect
   r -h /tmp
   host dcc.dcc-servers.net
   rtt

If and when the process stops in disconnect(), continue with the
gdb "finish" command.  Then use `netstat -a -p udp | grep 6277`,
fstat, lsof, or whatever to see if the cdcc process still has a UDP
socket bound to an IP address of a public DCC server after the connect()
system call with the invalid address.

When I do that on a 4.9-RELEASE system, a line from that `netstat | grep`
pipeline changes from the following at the start of disconnect():
    udp4     124      0  calcite.3683       avas.cnc.bc..6277  
to the following after disconnect() returns:
    udp4     124      0  *.3683             *.*                

Given your `truss` output, I'd be surprised if the breakpoint in
disconnect() were not hit.



More information about the DCC mailing list

Contact vjs@rhyolite.com by mail or use the form.