Any DCC servers available?

Vernon Schryver vjs@calcite.rhyolite.com
Mon Jul 16 05:50:49 UTC 2001


> From: "Brian J. Murrell" <dcc-list@interlinx.bc.ca>

> ...
> OK.  Do you mind if I point my client at it for a few weeks (too
> long?) until I see how well it does?

Feel free.

>                                       Does your server see a lot of
> spam to checksum or are you like me in that you are only processing
> e-mail for a small sample of users at your site?

The command `cdcc "host dcc.rhyolite.com; stats"` displays a bunch
of stuff about my server.  That recently said:

  dcc.rhyolite.com calcite.rhyolite.com 192.188.61.3,6277
	  server-ID 101  /var/dcc/map  23:04:43
      version 1.0.19  DB locked  tracing ANON CLNT 
    73725 hash entries  57139 used   2634096 DB bytes
      5 ms delay 224 NOPs  12014 ADMN     8 query    5 clients
    240 reports    0>10        0>100      0>1000    40 many
       answers    25>10        0>100      0>1000    54 many  101 whitelisted
      0 bad IDs    0 passwds   0 error responses     4 retransmitted
      0 answers rate-limited   0 anonymous           0 rejected reports
      flood on   6 streams   1 out active 5 in     32658 total flooded in
   9461 accepted   5 stale 23101 dup     91 white    0 delete  0 bad id
    189 mmap   378227 hashed 189688 records mapped   9711 added
    since Jul 12 19:05:15.215415 MDT

>From that you can see
  - I last restarted dccd July 12
  - my database has 57,139 checksums (and you can deduce that's mostly bulk
     mail)
  - since July 12, it has had 240 direct reports from DCC clients 
     and 32,658 reports of more or less bulky mail from at least 6 other
     DCC servers, of which 9,461 were unique.  The flooding algorithm
     does not have an equalvalent of the netnews Path: line and so a
     star-connected network of servers sees duplicates.

> Of course it would help if I whitelist all of my real bulk traffic
> like mailing lists etc. right?

yes, unless you want to reject absolutely all bulk traffic,
including mailing lists including this one.
(which reminds me to add this list to my white lists)

>                                 dccproc[1] does not even make a
> connection to the server for whitelisted traffice correct?

yes, provided you mean a client-side whitelist instead of the
server-side whitelist. 


> > One of the features of the servers is that they flood reports of
> > checksums of bulk mail among each other. ...

> The idea is to create a network of servers exchanging checksum
> databases then, not unlike the way usenet servers work with articles?

exactly.

> Is there provision in the protcol for knowing when duplicate
> information is being received in the case of one server being
> connected to several other servers?

Each report ought to be uniquely identified by a server ID and a
timestamp applied by that server.  "Ought to" because there is
a server-ID translating mechanism to deal with duplicate IDs
and other issues.

> ...
> Got 'er working to your server!

If your reverse DNS name starts with "adsl" and you've sent 27
queries, yes, you have.  (See `cdcc clients`)

> ...
> [1] I am only interested in seeing the metrics while I am evaluating --
> if I like the results I will set up the Sendmail hooks.

The promise of the DCC is in lots of checksum reports.  I suspect,
but can't really tell that the current volume I've current access to
is 20,000-50,000 messages/day, which is a good start but only a start.

  ......


] From: "Brian J. Murrell" <dcc-list@interlinx.bc.ca>

] If I have a piece of spam in my mailbox and I know it was not
] targetted at me, should I do a "dccproc -t many" on it to tell the DCC
] server(s) that this is definately spam?  i.e. as a community service
] to other users of the DCC server.

Yes, that's what I do.  I start the traceroutes, begin the queries to
whois.abuse.net, and open web pages URLs that may be innocent. 
While those drag along, I use `dccproc -t many` to get the spam into
the database.  I sometimes first use `dccproc -Q` to see if someone
has beaten me too it or to check various things, such as the operation
of the de-quoted-printable machinery in the checksumming.  For now,
most of that 20-50K messages is does not involve spam traps, 
but people report very good look with thresholds of 50-200.


] Even if I don't know for sure that there were any other recipients
] (as a single user, I can't know for absolutely sure that there were
] "many" recipients) should I still?

Yes.  If you were the only recipient, the worst that does to add a
record to the databases of coooperating (flooding) servers.  If no
one else gets a copy of that message, then no one else can have that
message rejected with the help of that record.  If you send a message
to two friends and one of them reports it as spam with `dccproc -t
many` so the other doesn't receive it, then maybe you need to change
friends.


] Does the fact that the spam was already queried/submitted to the DCC
] server change that in any way?  i.e. doe a query for an e-mail using
] "dccproc" followed by an "assertion" (using dccproc -t many) that it
] was spam skew/screw the database in any way?

`dccproc -Q` does nothing to the database.
`dccproc` (or equivalently `dccproc -t 1`) adds a record with a
target count of 1.
`dccproc -t many` is simply a shorthand for doing `dccproc -t 1` so
many times that the fixed width total count overflows.  It's about 24
bits wide.

Again, the DCC has a 0% false positive rate for detecting bulk mail
if you define bulk as >1 recipient and assume no misconfiguration
such as not knowing you are reporting the same message twice.
How much of that bulk mail is spam depends on white lists.


Vernon Schryver    vjs@rhyolite.com



More information about the DCC mailing list

Contact vjs@rhyolite.com by mail or use the form.