DNSBL helper not answering

Vernon Schryver vjs@calcite.rhyolite.com
Fri Nov 24 18:26:45 UTC 2006


> From: Daniel Gehriger 

> >     LIB=-lresolv ./configure ...

> No, this doesn't change anything (and I did rm config.cache). But 
> manually editing configure and adding -lresolv (target system "Linux") 
> to LIBS yields:

> > #undef HAVE__RES
> > #undef HAVE_RES_INIT
> > #define HAVE_RES_QUERY 1
> > #define HAVE_DN_EXPAND 1
>
> To get it to compile, I also had to link against the pthread library.

I do not understand what you changed.  Was what you did equivalent to this?
    LIB="-lresolv -lpthread" ./configure


> Maybe you're interested in the symbols contained in libresolv.so:
>
> > vps183:~ # nm /lib/libresolv.so.2  | grep -i -E '(res_.?init|res_query|dn_expand)'

What about _res?
The lack of res_init() and the _res structure must turn off dnsbl_res_init()
in dcclib/dnsbl.c.  They are used to shorten the resolver library's
timeout to the dccifd -B values.  See for example
http://www.penguin-soft.com/penguin/man/3/resolver.html


> > If so, I'll try to figure out how to make ./configure add -lresolv for
> > Linux systems that have that library.
>
> case "$TARGET_SYS" in
>      Linux)
>          PTHREAD_LDFLAGS="$PTHREAD_LDFLAGS -pthread"
>          LIBS="$LIBS -lresolv"
>          ;;

I've vaguely aware of those lines.  The figuring out I meant included
probing libresolv.a to see that it exists and contains the right stuff
on the target Linux system.  I would also have to discover whether it
contains bad things.  There are hints that the Linux libresolv.a differs
from the resolver functions in libc.a by being thread-aware.  Dragging
threading into unthreaded DCC programs such as dccd, dblist, and dccproc
would be unacceptable.


> > Separately, does grossly setting inflating the timeouts seem to make the
> > problem go away?
> >   -Bset:msg-secs=100  -Bset:URL-secs=91
> > (I vaguely recall something about very old default resolver timeouts
> > of 90 seconds.)
>
> No, I already tried that. It's not a timeout problem - the DNSBL helper 
> just won't start, not even after 90 seconds. 

The timeouts have nothing to do with starting DNSBL helpers, but with
waiting for them to finish.
I still doubt that the problem is simply a matter of DNSBL helpers
not starting.  I've tried all of the error-bailouts in the path to the
fork() and exec() of the helpers.  I still suspect that
  - for the first message arrives, dccifd 
      1. checks the count of idle helpers and starts one if necessary
      2. decreases the count of idle helpers,
      3. sends a message to helper asking names or IP addresses
      4. after 10 or 20 seconds and no answer from the helper, gives
          up and increases the count of idle helpers
  - for the next message, 
      skip #1 since the number of idle helpers >0 but does #2-#4.
      but since the helper process is really still stuck in gethostby*(),
      nothing useful happens.
   (it's a little more complicated, because 2 helpers usually get started
   early to minimize some waiting)

>                                              Besides, I also had to 
> update the corresponding timeouts in postfix 

What do Postfix installations do about DNSBL within Postfix that take more
than a few seconds?  How do they stop the resolver library from stalling?

Does your resolver library allow lines like this in /etc/resolv.conf ?
    options timeout:2 attempts:1
They would be compatible with the default dccifd -B timeout values,
too short for other applications.


>                                                and the sending MX may 
> also have its own timeout below 90 seconds.

Many SMTP clients have timeouts of less than 90 seconds for the DATA
command, but they are spammers.  A slow SMTP server is by itself a
good spam filter.  Section 4.5.3.2 of RFC 2821 says

   Based on extensive experience with busy mail-relay hosts, the minimum
   per-command timeout values SHOULD be as follows:
    ...
   DATA Termination: 10 minutes.
      This is while awaiting the "250 OK" reply.  When the receiver gets
      the final period terminating the message data, it typically
      performs processing to deliver the message to a user mailbox.  A
      spurious timeout at this point would be very wasteful and would
      typically result in delivery of multiple copies of the message,
      since it has been successfully sent and the server has accepted
      responsibility for delivery.  See section 6.1 for additional
      discussion.


Vernon Schryver    vjs@rhyolite.com



More information about the DCC mailing list

Contact vjs@rhyolite.com by mail or use the form.