Discrepancy between public servers

Vernon Schryver vjs@calcite.rhyolite.com
Fri Jul 24 15:24:10 UTC 2009

> From: Cedric Knight <cedric@gn.apc.org>

> I run dccifd with SpamAssassin using the public DCC servers, and a user
> has recently reported a non-bulk email getting caught as spam.

The "many" checksum value implies that a very similar message was
reported by a DCC client as spam.  The FUZ2 checksum ignores binary
bits, and so the similarity might be free mail provider advertising
or a user signature.  In other words, I am sure that as far the DCC
network is concerned, the message was bulk.

>                                                                 I
> checked and it seems odd to me that there is such a large discrepancy
> for the offending checksum between different DCC servers:

>                      Body: 7e1aab3a 07098a85 b63e65cc 2f25ac14       1
>                      Fuz1: ebc11451 b12abc2d 5f876d7c 8afcae55       1
>                      Fuz2: 67279754 87ed81a3 be38bdad 1e7e51af    many

>                      Body: 7e1aab3a 07098a85 b63e65cc 2f25ac14       0
>                      Fuz1: ebc11451 b12abc2d 5f876d7c 8afcae55       0
>                      Fuz2: 67279754 87ed81a3 be38bdad 1e7e51af       0

No, database corruption has very different symptoms.

> I've checked on some other hits, and it's not unique to that one sample:

That is more evidence that no database corruption is involved.

The message body checksum counts of 0, 0, and many for the second sample
suggest a database record that more than a day old that has been trimmed
of uninteresting counts.

The likely situation is that the mail messages are being reported
by only a few DCC clients and so reports of their checksums are not
being flooded throughout the DCC network.  
The mail systems that are reporting the mail without counts of "many"
are probably close as the packets fly to dcc.misty.com.
Because dcc.misty.com has more RAM than most of the public DCC servers
but receives less traffic than some other public servers, it uses
longer database expirations.  That would make it remember reports of
less bulky mail longer than other servers.

I've heard reports that seem similar.  One case involves a relatively
low volume newsletter operator who insists that his mail is not
spam, although some of the targets of his mail have evidently wired
their mail systems to report his mail with counts of "many" to the
DCC network.  Are the relevant messages in this case from a mailing list?

> $ cdcc "add RTT-4000 ms"

I like creating new map files, as in
  rm -f /tmp/map
  cdcc -h /tmp "new map; add dcc.misty.com"

> $ dccproc -Q -d -C <fp-dcc-bbc.eml

dccproc knows about -i, but for such tests feeding the checksums to
`/var/dcc/libexec/dccsight -dCQ` seems easier.

> note recvfrom(???,0): Connection refused

The form of that debugging message that is supposed to indicate the
receipt of an ICMP Unreachable packet indicates that the current
version of the DCC software is not being used.
Besides, the "???" is obviously bogus, and I vaguely recall fixing
a relevant bug some time ago.

Vernon Schryver    vjs@rhyolite.com

More information about the DCC mailing list

Contact vjs@rhyolite.com by mail or use the form.