OS and architecture migration for DCC

Gary Mills mills@cc.umanitoba.ca
Tue Jun 10 01:23:54 UTC 2008


I'm planning an upgrade of our e-mail server, which is also one of our
dccd and grey servers.  I'm doing this first on a test e-mail server
that has only isolated dccd and grey servers.  It gets a bit of spam,
but the databases are mostly empty.  The migration is from Solaris 9
to Solaris 10, from SPARC (big-endian) to x86 (little-endian), and
from UFS to ZFS.  Talk about a torture test!  I only expect the byte
order to be a problem.

I'm using the same version of DCC in both cases, but recompiled for
the Solaris 10 x86 server.  On that server, I started with a DCC
directory (/usr/local/dcc) that was a copy of the one from Solaris 9
SPARC.  I then reinstalled DCC so that the executables would all be
x86 ones.  The daemons logged some interesting errors, presumably due
to the byte order, but eventually seemed to run normally.  This was
for dccd:

Jun  9 16:03:02 setup01 dccd[1198]: [ID 702911 mail.error] dcc_db has page size 16128 incompatible with 15654912 in dcc_db.hash
Jun  9 16:03:02 setup01 dccd[1198]: [ID 702911 mail.error] dcc_db says it contains 1158682945536393216 bytes or more than the actual size of 8257536
Jun  9 16:03:02 setup01 dccd[1199]: [ID 702911 mail.notice] database initially broken; starting `/usr/local/dcc/libexec/dbclean -Pq4SRbad -i 9003`
Jun  9 16:03:02 setup01 dbclean[1199]: [ID 839192 mail.notice] 1.3.86 repairing /usr/local/dcc/dcc_db
Jun  9 16:03:02 setup01 dbclean[1199]: [ID 702911 mail.error] explicit repair of dcc_db
Jun  9 16:03:02 setup01 dbclean[1199]: [ID 702911 mail.error] unexpected EOF in dcc_db at 0x7e0000 instead of 0x1014780000000000
Jun  9 16:03:02 setup01 dbclean[1199]: [ID 394617 mail.notice] expired 19 records and 17 checksums, obsoleted 63 checksums in dcc_db
Jun  9 16:03:02 setup01 dbclean[1199]: [ID 582593 mail.notice] hashed 262205 records containing 393286 checksums, compressed 25 records
Jun  9 16:03:03 setup01 dbclean[1199]: [ID 838263 mail.notice] 6709240 hash entries total, 131619 or 1% used
Jun  9 16:03:03 setup01 dccd[1198]: [ID 702911 mail.notice] unrecognized hash_len=33554432 in /usr/local/dcc/dccd_clients
Jun  9 16:03:03 setup01 dccd[1198]: [ID 702911 mail.notice] 1.3.86 listening to port 6277  /usr/local/dcc  window=1911MB  real=33,553,660KB  max RSS=1920MB  DB max=2400MB

This was for grey:

Jun  9 16:08:38 setup01 dccd grey[1230]: [ID 702911 mail.error] grey_db is not a greylist database but must be
Jun  9 16:08:38 setup01 dccd grey[1232]: [ID 702911 mail.notice] database initially broken; starting `/usr/local/dcc/libexec/dbclean -Pq4SRbad -Gon -i 9003 -Gon`
Jun  9 16:08:38 setup01 dbclean grey[1232]: [ID 839192 mail.notice] 1.3.86 repairing /usr/local/dcc/grey_db
Jun  9 16:08:38 setup01 dbclean grey[1232]: [ID 702911 mail.error] explicit repair of grey_db
Jun  9 16:08:38 setup01 dbclean grey[1232]: [ID 702911 mail.error] unexpected EOF in grey_db at 0x888000 instead of 0x60217b0000000000
Jun  9 16:08:38 setup01 dbclean grey[1232]: [ID 304167 mail.notice] expired 4 records and 10 checksums in grey_db
Jun  9 16:08:38 setup01 dbclean grey[1232]: [ID 582593 mail.notice] hashed 268691 records containing 403101 checksums, compressed 0 records
Jun  9 16:08:38 setup01 dbclean grey[1232]: [ID 838263 mail.notice] 638968 hash entries total, 135365 or 21% used
Jun  9 16:08:39 setup01 dccd grey[1230]: [ID 702911 mail.notice] unrecognized hash_len=33554432 in /usr/local/dcc/grey_clients
Jun  9 16:08:39 setup01 dccd grey[1230]: [ID 702911 mail.notice] 1.3.86 listening to port 6276  /usr/local/dcc  window=273MB  real=33,553,660KB  max RSS=1920MB  DB max=2400MB

dccm was much better behaved:

Jun  9 16:12:06 setup01 dccm[1256]: [ID 702911 mail.notice] 1.3.86 listening to inet:3331 with /usr/local/dcc

When I first ran `cdcc info', I got these errors logged:

Jun  9 16:13:19 setup01 dccd[1198]: [ID 702911 mail.notice] bad client or server-ID 25165824 from 130.179.16.64,33473 for NOP
Jun  9 16:13:19 setup01 dccd grey[1230]: [ID 702911 mail.notice] bad client or server-ID 25165824 from 130.179.16.64,33473 for NOP

I fixed that by removing the `map' file and reloading it from `map.txt'.
After that, `cdcc info' ran normally.

With single dccd and grey servers, is there any other way to do this
migration and still maintain the data in the databases?  Is there a
way to check their consistency, or has this already been done in the
startup?  On the production e-mail server, where there are also dccd and
grey servers running on another machine, is there a better way to use
those to rebuild the database?  I'm assuming that the network protocol
between DCC clients and servers is independant of byte order, so it's
only the on-disk databases that might have problems.

-- 
-Gary Mills-    -Unix Support-    -U of M Academic Computing and Networking-



More information about the DCC mailing list

Contact vjs@rhyolite.com by mail or use the form.