Distributed Checksum Clearinghouses

Overview

The Distributed Checksum Clearinghouses or DCC is an anti-spam content filter that runs on a variety of operating systems. The counts can be used by SMTP servers and mail user agents to detect and reject or filter spam or unsolicited bulk mail. DCC servers exchange or "flood" common checksums. The checksums include values that are constant across common variations in bulk messages, including "personalizations."

There are graphs of recently detected spam. Those graphs suggest the effectiveness of the system. For example, if you assume that 80% of all mail is spam and those graphs indicate that DCC finds 70% of mail is spam, then DCC detects 88% of spam.

DCC graphs DCC graphs
click for more graphs

The idea of DCC is that if mail recipients could compare the mail they receive, they could recognize unsolicited bulk mail. A DCC server totals reports of checksums of messages from clients and answers queries about the total counts for checksums of mail messages. A DCC client reports the checksums for a mail message to a server and is told the total number of recipients of mail with each checksum. If one of the totals is higher than a threshold set by the client and according to local whitelists the message is unsolicited, the DCC client can log, discard, or reject the message.

Because simplistic checksums of spam would not be effective, the main DCC checksums are fuzzy and ignore aspects of messages. The fuzzy checksums are changed as spam evolves. Since DCC started being used in late 2000, the fuzzy checksums have been modified several times.

Unless used with isolated DCC servers and so losing much of its power, DCC causes some additional network traffic. However, the client-server interaction for a mail message consists of exchanging a single pair of UDP/IP datagrams of about 150 bytes. That is often less than the several pairs of UDP/IP datagrams required for a single DNS query. SMTP servers make DNS queries to check the envelope Mail_From value and often several more. As with the Domain Name System, DCC servers should be placed near active clients to reduce DCC network costs. DCC servers exchange or flood reports of checksums, but only the checksums of bulk mail.

Listings and Removals

Do not send comments or questions about your "DCC listing" to any address at Rhyolite Software unless an SMTP server operated by Rhyolite Software LLC rejected your mail. Contact instead the operators of the system that rejected your mail.

DCC does not "list" domain names or IP addresses, but detects bulk mail messages. Domain names, IP addresses, and so forth are "listed" independently. by DCC users. If DCC users want to receive your bulk mail, they must whitelist it by adding your IP address, SMTP envelope sender, RFC 2369 SMTP List-* headers, or other characteristics of your mail to their whiteclnt files. Do not send "please remove my address" requests unless you want your domain name, mailbox, or IP address added to a blacklist.

A separate facility called DCC Reputations supported by the commercial verson of the DCC software does automatically compute the reputations for sending bulk mail. However, it makes no sense to ask for IP addresses to be removed from the distributed DCC Reputation database. A reputation for sending lots of bulk mail expires automatically a week to 30 days after the last bulk email reported by a DCC Reputation client mail system.

Spam is unsolicited bulk mail, and only mail targets can say whether a message is solicited. A virtue of DCC and DCC Reputations spam filtering is that mail targets decide whether they have subscribed to bulk mail or want to hear from senders with DCC Reputations for sending bulk mail. The opinions of bulk mail senders about whether their messages are spam are irrelevant.

Download

The current version of the DCC source is version 1.3.155, August 30, 2014. It is available at dcc-servers.net. It is usually best to update an existing installation with the /var/dcc/libexec/updatedcc script. Some previous versions are available.

License

The non-commercial DCC software is distributed under a license that is free only to organizations that do not sell filtering devices or services except to their own users and that participate in the global DCC network. ISPs that use DCC to filter mail for their own users are intended to be covered by the free license. You can redistribute unchanged copies of the free source, but you may not redistribute modified, "fixed," or "improved" versions of the source or binaries. You also can't call it your own or blame anyone for the results of using it.

Organizations that do not qualify for the free license are welcome to inquire about licensing the commercial version of the DCC software by email to sales@rhyolite.com or via the form. The commercial DCC version supports DCC Reputations.

Please note that contrary to obsolete web pages you might find with search engines, Rhyolite Software is currently the exclusive source of commercial DCC software. No other organizations can sell or market DCC software except as part of their own products.

Selling the bandwidth and, most important, human system administration work of the public DCC servers to third parties has always been wrong. Sellers of products, "appliances," or managed mail services must contract for or provide their own DCC servers, as well as obtain a commercial license for the DCC software.

DCC Client Problems

Incorrectly configured firewalls are the a common causes of problems of DCC client using the public DCC servers. Your firewalls must allow responses to requests from dccproc or dccifd on your system to come from UDP port 6277 at the public servers.

Another common cause of DCC client problems is the use of ancient versions redistributed by some organizations including Linux packagers. Those versions can try so hard to get answers that they triggers the denial-of-service (DoS) defenses in the public DCC servers. See a discussion of problems associated with old versions of the DCC software.

Excessive requests are a third common cause. The public DCC servers have various defenses against DoS attacks including rate limiting or delaying responses based on the maximum of the requests made today and a recent daily average. When the delays would reach 4 seconds, the public servers completely ignore additional requests. If your mail system processes more than 100,000 messages per day, you should use your own, probably private DCC server connected to the global network of DCC servers.

If the public DCC servers not working for you, your firewalls allow UDP port 6277, and you are not sending an excessive number of requests, then the cause might be excessive or objectionable DCC operations that have been received from your network. See the blacklist of DCC clients used by the public DCC servers.

Documentation and Source

Each of the several parts of DCC have its own man page including:

There are also

The code seems to be compatible with flavors of UNIX-like systems. See the list of systems in the installation instructions.

Operational DCC Services

A useful anti-spam scheme is more than just code, and that is particularly true of the Distributed Checksum Clearinghouses, DCC, which are based sharing information about bulk mail If you do not run your own DCC server, you need to point your DCC client to someone else's server. The DCC client code does the right thing when it cannot contact any of the servers it knows about; it quickly passes the mail without worrying about its bulkiness. Given more than one server, the DCC client code uses the fastest or closest.

When using someone else's server, you must either contact them for a DCC client-ID and corresponding password.

Public DCC servers for anonymous DCC clients handling fewer than 100,000 mail messages per day are provided by people and organizations in the following list. The default contents of /var/dcc/map file point to these servers.

Organization Contact
DelMarVa OnLine Sven Willenberger
www.eatserver.nl dcc@eatserver.nl
Etherboy.com Dave Lugo
INFN (National Institute for Nuclear Physics) - Bari Domenico Diacono
INFN (National Institute for Nuclear Physics) - Turin Alberto D'Ambrosio
MGT Consulting --
INAF IASF (National Institute for Astrophysics)-Palermo-Italy Giacomo Fazio
Peregrine Computer Consultants Corporation Kevin A. McGrail
Quonix Networks John Von Essen
Sonic,net, Inc. Kelsey Cummings
Tilastokeskus - Statistikcentralen --
Universitšt Trier Horst Scheuermann
Vienna University of Economics and Business Administration Franz Schaefer

The IP addresses of the public DCC servers define the DNS names dcc1.dcc-servers.net, dcc2.dcc-servers.net, dcc3.dcc-servers.net, dcc4.dcc-servers.net, and dcc5.dcc-servers.net. Use them by adding those names to your /var/dcc/map file with cdcc "add dcc1.dcc-servers.net" and so forth. The names are automatically installed when the DCC programs are installed with the ./configure script and Makefile in the source. See the installation instructions.

Note well that it has been wrong to take and resell the bandwidth and, most important, human system administration work of the public DCC servers to third parties. Blunt words for that include theft and stealing. Vendors of "spam appliances" or services including DCC such as "managed email" must provide DCC servers of their own or contract for DCC services from others. They must also buy a license for the commercial version of the DCC software.

Flooding Checksums among Private DCC servers

The effectiveness of DCC filtering increases with checksums "flooded" or exchanged with other DCC servers. The spam filtering results of violating the free license by not connecting a local, private server to the global network of DCC servers may be disappointing.

Mail systems that handle more than 100,000 mail messages per day should have a local DCC server so that processing incoming mail is not delayed by the time required for the UDP packets used by the DCC client protocol to cross the Internet. Organizations that deal with more than 500,000 mail messages per day benefit from two or more local DCC servers to ensure that at least one local DCC server is available despite system maintenance. Organizations that deal with fewer than 100,000 mail messages per day use less bandwidth of their own and of the servers in the global network by using the public servers.

The first step in configuring a DCC server to flood checksums is agreeing on the server-IDs of all participating servers. There is a private list of the DCC servers, server-IDs and so forth in the global network of DCC servers at http://www.rhyolite.com/dcc/private/. It is readable only by server operators. Contact Vernon Schryver at vjs@rhyolite.com for server-IDs. Subscriptions to the DCC-servers mailing list are available only to operators of servers in the global network.

Other Resources

Whitelists
Use of DCC to reject unsolicited bulk mail generally requires a whitelist of solicited bulk mail sources the local common /var/dcc/whiteclnt or /var/dcc/whitecommon files or a per-user whiteclnt file.

Whitelist of blank or test messages
It can be useful to white-list practically blank messages from various sources and common test messages. I.E.C.C. offers such a whitelist of blank messages that can be copied or included into a /var/dcc/whiteclnt file.

The DCC source includes a script named /var/dcc/libexec/fetch-testmsg-whitelist intended to be invoked by cron to periodically fetch new copies.

Blacklists
Blacklists such as those used at rhyolite.com can be used as "spam traps" to feed DCC. For example, sendmail can use an "access_db" to mark spam, and then report it via dccm.

DNS Blacklists
The DCC clients, dccm, dccifd, and dccproc can check domain names and IP addresses in SMTP envelope Mail_From values and in URLs in mail message bodies against DNS blacklists (DNSBL) such as the SBL. See the installation instructions and DNSBL_ARGS in the configuration file, dcc_conf, in the DCC home directory.

Greylisting
The DCC sendmail milter, dccm, and the dccifd general MTA interfaces can use a form of greylisting.

Logos
Some logos that can be displayed on web pages are available.

CGI Demonstration
There is a demonstration of the proof of concept CGI scripts that allow users to maintain individual whitelists and monitor individual logs of rejected mail at http://www.rhyolite.com/dcc-demo-cgi-bin/ or http://cgi-demo:cgi-demo@www.rhyolite.com/dcc-demo-cgi-bin/. It requires a user name of cgi-demo and a password of cgi-demo the same as the user name.

Mailing lists
The main DCC mailing list. has subscriptions available by completing a form. Because it is protected against spam not only with a DCC client but blacklists that include many free mail providers, it is not available via those free providers. However, the archive for the mailing list is open.

There are also mailing lists for DCC server operators and public DCC server operators, but they are closed except to operators.

Answers to Questions
See the DCC FAQ and the archive of the DCC mailing list for information about connections between DCC and mail user agents and mail transfer agents.

The DCC FAQ also answers questions about the resources needed by a DCC server.

Technical questions or comments can be sent to Rhyolite Software. More extensive assistance can also be hired from Rhyolite Software.

DCC Reputations

DCC Reputations are a distinct mechanism based on and contributing to DCC data. In part to minimize abuse by anonymous users, DCC Reputations are available only in the commercial version of the DCC software.

History

DCC is based on an idea of Paul Vixie and on fuzzy body matching to reject spam on a corporate firewall operated by Vernon Schryver starting in 1997. The DCC software was designed and written at Rhyolite Software starting in 2000. It has been used in production since the winter of 2000/2001.

Contact Vernon Schryver at vjs@rhyolite.com or use the form.

$Date: 2014/09/14 18:15:13 $