Problem on dcc 1.3.30 - Continue Not Asking DCC...

Breno Moiana breno@haxent.com.br
Thu Mar 9 18:37:59 UTC 2006


Hello, Vernon.  Thanks for the quick reply!

About your considerations:



Vernon Schryver wrote:

>>From: Breno Moiana 
>>    
>>
>
>  
>
>>We have a DCC server set up on an email provider, handling around 3 
>>million email messages a day.
>>    
>>
>
>That volume could easily justify a second DCC server.  The DCC client
>code prefers the fastest working known DCC server.  When the currently
>chosen server stops working, it tries another.
>  
>
We have thought about it, and now that you mentioned it as a possible 
solution, we will look carefully into it as a solution.
I know this might be a stupid question, but how can I verify the need 
for a secondary server? I mean, the CPU is constantly idle, and nearly 
all my memory is being used for cache... where is the bottleneck? Would 
a second server help me even if I have a lot of unused hardware on this 
server already?

An option I can think of is to install VMWare on this machine, and make 
two servers in this hardware. Should this work ?


>>Without any apparent reason, something happens to DCC that makes it stop 
>>responding. Here is the log from the beginning of the problem:
>>
>>: Mar  9 09:29:38 dcc dccifd[4782]: no DCC answer from 127.0.0.1,6277 
>>after 18264 ms
>>: Mar  9 09:29:38 dcc dccifd[4782]: continue not asking DCC 64 seconds 
>>after failure
>>    
>>
>
>The "continue not asking" messages mean that dccifd, dccproc, or dccm
>has seen consecutive failures while trying to talk to the DCC server
>and so is passing all mail.  In many situations, it is better to fail
>by passing all mail than to block all mail.
>  
>
I completely agree. That's why I have been falling into RBLs. I think 
that getting into the occasional spamcop list is better than not 
delivering mail. Besides, we do have other filters in place, so most of 
our spam is still filtered out.

>All UNIX flavors I've looked closely at for dccd performance deal poorly
>with large mmap() files.  None of them seem to properly page or
>swap-to-file as they should for mmap() files.  Solaris is not good but
>least bad.  Linux is worst.  I've watched Linux grind a halt as it
>apparently slops the entire dccd database from swap space on the disk
>to the filesystem, also on the disk.  FreeBSD is between the extremes.
>It sometimes decides to push the entire database from RAM to the file
>in a single effort.  When you're talking about GBytes, the rest fo the
>system gets very slow or even stops for tens of seconds.
>
>Dccd has lots of code that periodically tries to encourage the operating
>system to flush parts of the database to the file.  I've never found a
>combination that really works on any UNIX flavor.  Msync() generally
>seems to do nothing.  Madvise() seems to be useless.  Fsync() after
>every operation would probably prevent the hiccups, but would make every
>operation take 10s instead of fractions of milliseconds.
>  
>
I didn't experience any noticeable system performance issues on this 
machine so far, and I have been very focused on it for the last two 
weeks.  The database size is well under the 2GB. Right now it is 1.35GB. 
I don't know what info I could add to enhance the diagnose on this section.

>>Please notice that the RTT to the server remains low all the time, at 
>>around 50ms.
>>    
>>
>
>50 ms is a fairly large RTT for a local server.
>  
>
Right now, it is working, and cdcc info gives me:

---cut
127.0.0.1,-                 RTT-1000 ms  anon
# *127.0.0.1,-                                               OiComBR ID 1004
#     100% of 32 requests ok   51.57-1000 ms RTT        50 ms queue wait
---/cut

>>Not always, when I manually run the cron-dccd script, the errors stop:
>>
>>Mar  8 17:54:22 dcc dccd[4748]: 1.3.30 database /var/dcc/dcc_db 
>>reopened with 2016 MByte window
>>    
>>
>
>Could the database be growing larger than 2 GByte, and then Linux going
>into its crazy mode of swapping the mmap() dcc_db and dcc_db.hash files
>to swap space?  I ask because dbclean run by the cron script will 
>shrink the file.
>  
>
I don't think so... right now, the database is at 1.35GB, and it is not 
growing, at least not for the last half hour. I ran the cron script a 
couple of hours ago, not sure if that should allow it to work without 
increasing the filesize though.

>>Any help will be greatly appreciated, as we are falling into RBLs every 
>>other day, due to the eventual lack of DCC service (we allow email to 
>>pass when the DCC doesn't respond)
>>    
>>
>
>If it is better for the DCC client to fail by blocking mail, then
>you could add -x to DCCM_ARGS or DCCIFD_ARGS in /var/dcc/dcc_conf.
>That has two effects.  It turns off the "continue not asking" mechanism
>so that the DCC client asks every time.  Second, it causes dccm or
>dccifd (when dccifd is in proxy mode such as a postfix before-queue
>filter) to tell the local MTA to give the distan client MTA or mail
>sender a 4yz try-again failure.
>
>Perhaps the best thing to do is to run 2 local DCC servers, each
>flooding the other.  Each should run the cron job (and so dbclean)
>at different times, and perhaps more than once per day.  Each 
>should be known in /var/dcc/map files on DCC client systems.
>
>
>Vernon Schryver    vjs@rhyolite.com
>_______________________________________________
>DCC mailing list      DCC@rhyolite.com
>http://www.rhyolite.com/mailman/listinfo/dcc
>
Well, we still think that letting email pass when it fails is the lesser 
of evils.

About the "continue not asking" mechanism, I noticed that sometimes the 
system just gets out of its idleness and gets back to a responsive 
status, without any interference. Another thing is that even when I 
stop/start the service, it keeps counting from when it was. Can I reset 
the counter? Some command to tell the server: "Hey, try it now, let's 
see if what I did worked for you".

I am not sure about the second server. Granted, redundancy is always 
welcome, and it would be nice to have it on DCC as well. However, we 
have already had 5 million emails a day running here without problems. 
The server also doesn't necessarily stop responding on peak times, which 
would also be an indicator of high load problems on dcc process, even 
though most of the server hardware is not being used.

We are considering a second server, but I am not sure if that will solve 
the problem, or only hide its effect.

Thanks once more for the attention!

Best Regards,

Breno Moiana.
===============
Haxent Consulting




More information about the DCC mailing list

Contact vjs@rhyolite.com by mail or use the form.