Too many early morning pages

Gary Mills mills@cc.UManitoba.CA
Sat Feb 14 17:34:14 UTC 2004


I had another incident this morning where sendmail was rejecting
connections.  This time, I was able to collect some better information.
The culprit seems to be `dbclean'.  `cron-dccd' runs at 03:45, but the
problem happened later.  It was around 07:00 when I noticed it.
`sendmail' was complaining thusly:

Feb 14 07:11:43 electra sm-mta[18080]: [ID 801593 mail.error] i1EDAhF9018080: Milter (dcc): timeout before data read
Feb 14 07:11:43 electra sm-mta[18080]: [ID 801593 mail.info] i1EDAhF9018080: Milter (dcc): to error state
Feb 14 07:11:43 electra sm-mta[18080]: [ID 801593 mail.error] i1EDAhF9018080: Milter (dcc): init failed to open
Feb 14 07:11:43 electra sm-mta[18080]: [ID 801593 mail.info] i1EDAhF9018080: Milter (dcc): to error state
Feb 14 07:11:43 electra sm-mta[18080]: [ID 801593 mail.info] i1EDAhF9018080: Milter: initialization failed, temp failing commands

It was also hitting limits:

Feb 14 07:16:52 electra sm-mta[13673]: [ID 702911 mail.info] deferring connections on daemon MTA: 50 per second
Feb 14 07:16:53 electra sm-mta[13673]: [ID 702911 mail.info] rejecting connections on daemon MTA: 600 children, max 600

`dbclean' ran twice, according to these logs:

Feb 14 06:06:14 electra dccd[23006]: [ID 968768 mail.notice] 837646 free hash entries among 16760832 total; starting `dbclean -DPq -i 1032`
Feb 14 06:06:24 electra dccd[10259]: [ID 287260 mail.notice] database cleaning begun
Feb 14 07:26:13 electra dccd[10259]: [ID 308827 mail.notice] 1.2.30 database /usr/local/dcc/dcc_db reopened with 2046 MByte window
Feb 14 07:30:13 electra dccd[10259]: [ID 287260 mail.notice] database cleaning begun
Feb 14 07:40:09 electra dccd[10259]: [ID 308827 mail.notice] 1.2.30 database /usr/local/dcc/dcc_db reopened with 2046 MByte window

It was the first one that ran a very long time, and seems to have caused
the problems.  The mail server is running DCC 1.2.30, not compiled with
`--enable-big-db'.  It does have "DBCLEAN_ARGS='-F'" in dcc_conf.  I took
an `lsof' snapshot of `dccm' at 07:23.  It had 295 IDLE TCP connections,
but did not run out of file descriptors.

There is a second DCC server that was running normally at the time.
Why didn't `dccm' just start using it while `dbclean' was running on
the first one?

-- 
-Gary Mills-    -Unix Support-    -U of M Academic Computing and Networking-



More information about the DCC mailing list

Contact vjs@rhyolite.com by mail or use the form.