Distributed Checksum Clearinghouses

The current version of the DCC source is version 1.3.98, October 09, 2008. The DCC source is available at dcc-servers.net and Rhyolite Software. It is usually best to update an existing installation with the /var/dcc/libexec/updatedcc script.

The variations of version 1.2.74 redistributed by some organizations is more than 3 years old. Many problems have been fixed and improvements made since that version was released. DCC clients of that version and older can be expected to stop working with the public DCC servers in coming months when the version of the DCC client-server protocol that they use is finally disabled on the public DCC servers.

There are graphs of recently detected spam. Those graphs suggest the effectiveness of the system. For example, if you assume that 70% of all mail is spam and those graphs indicate that the DCC finds 50% of mail is spam, then the DCC is 70% effective.

DCC graphs DCC graphs
click for more graphs

Overview

The Distributed Checksum Clearinghouses or DCC is an anti-spam content filter that runs on a variety of operating systems. As of the middle of 2007, it involves millions of users, more than six hundred thousand client computer systems, and more than 250 servers collecting and counting checksums related to more than 300 million mail messages on week days. The counts can be used by SMTP servers and mail user agents to detect and reject or filter spam or unsolicited bulk mail. DCC servers exchange or "flood" common checksums. The checksums include values that are constant across common variations in bulk messages, including "personalizations."

The idea of the DCC is that if mail recipients could compare the mail they receive, they could recognize unsolicited bulk mail. A DCC server totals reports of checksums of messages from clients and answers queries about the total counts for checksums of mail messages. A DCC client reports the checksums for a mail message to a server and is told the total number of recipients of mail with each checksum. If one of the totals is higher than a threshold set by the client and according to local whitelists the message is unsolicited, the DCC client can log, discard, or reject the message.

Because simplistic checksums of spam would not be effective, the main DCC checksums are fuzzy and ignore aspects of messages. The fuzzy checksums are changed as spam evolves. Since the DCC started being used in late 2000, the fuzzy checksums have been modified several times.

Unless used with isolated DCC servers and so losing much of its power, the DCC causes some additional network traffic. However, the client-server interaction for a mail message consists of exchanging a single pair of UDP/IP datagrams of about 150 bytes. That is often less than the several pairs of UDP/IP datagrams required for a single DNS query. SMTP servers make DNS queries to check the envelope Mail_From value and often several more. As with the Domain Name System, DCC servers should be placed near active clients to reduce the DCC network costs. DCC servers exchange or flood reports of checksums, but only the checksums of bulk mail. Since most mail is not bulk and only representative checksums of bulk mail need to be exchanged, flooding checksums among DCC servers involves a manageable amount of data.

License

The non-commercial DCC software is distributed under a license that is free only to organizations that do not sell filtering devices or services except to their own users and that participate in the global DCC network. ISPs that use the DCC to filter mail for their own users are intended to be covered by the free license. You can redistribute unchanged copies of the free source, but you may not redistribute modified, "fixed," or "improved" versions of the source or binaries. You also can't call it your own or blame anyone for the results of using it.

Organizations that do not qualify for the free license are welcome to inquire about licenses for the commercial version by email to sales@rhyolite.com or via the form. The commercial version supports DCC Reputations.

The public DCC servers Selling the bandwidth and, most important, human system administration work of the public DCC servers to third parties has always been wrong. Sellers of products, "appliances," or managed mail services including the DCC must provide DCC servers of their own or contracted from others, as well as obtain a commercial license for the DCC source.

Listings and Removals

Do not send comments or questions about your "DCC listing" to any address at Rhyolite.com unless a Rhyolite.com mail server rejected your mail. Contact instead the operators of the system that rejected your mail. The DCC detects bulk mail messages instead of mail senders. The DCC does not "list" domain names or IP addresses, but detects bulk mail messages. Domain names, IP addresses, and so forth are "listed" independently by ISPs and others that might also be using the DCC.

A separate facility called DCC Reputations supported by the commercial verson of the DCC software does automatically compute the reputations for sending bulk mail. It makes no sense to ask for IP addresses to be removed from the distributed DCC Reputation database. A reputation for sending lots of bulk mail expires automatically a week to 30 days after the last bulk email reported by a DCC Reputation client mail system.

If the targets of your bulk mail really want to receive it, they may need to whitelist your IP address, domain name, SMTP List-id: header, or other signature in your messages in whiteclnt files. Spam is unsolicited bulk mail, and only mail targets can say whether a message is solicited. A major virtue of DCC and DCC Reputations spam filtering is that mail targets decide whether they have subscribed to bulk mail or want to hear from senders with DCC Reputations for sending bulk mail. The views of bulk mail senders about whether their messages are spam are irrelevant.

Do not send any mail to Rhyolite.com from addresses or domains among the Rhyolite Software blacklists, because mail from those sources is unlikely to be seen. If you have no alternative, try using the contact web form.

DCC Client Problems

Incorrectly configured firewalls are the a common causes of problems of DCC client using the public DCC servers. Your firewalls must allow responses to requests from dccproc or dccifd on your system to come from UDP port 6277 at the public servers.

Another common cause of DCC client problems is the use of ancient versions redistributed by some organizations including Linux packagers. Those versions can try so hard to get answers that they triggers the denial-of-service (DoS) defenses in the public DCC servers.

Excessive requests are a third common cause. The public DCC servers have various defenses against DoS attacks including rate limiting or delaying responses based on the maximum of the requests made today and a recent daily average. When the delays would reach 4 seconds, the public servers completely ignore additional requests. If your mail system processes more than 100,000 messages per day, you should use your own, probably private DCC server connected to the global network of DCC servers.

If the public DCC servers not working for you, your firewalls allow UDP port 6277, and you are not sending an excessive number of requests, then the cause might be excessive or objectionable DCC operations that have been received from your network. See the blacklist of DCC clients used by the public DCC servers.

Documentation and Source

Each of the several parts of the DCC have its own man page including:

There are also

The code seems to be compatible with flavors of UNIX-like systems. See the list of systems in the installation instructions. The long range plan is to port the DCC client code widely, including to common personal computer operating systems.

Operational DCC Services

A useful anti-spam scheme is more than just code, and that is particularly true of the Distributed Checksum Clearinghouses, DCC, which are based sharing information about bulk mail If you do not run your own DCC server, you need to point your DCC client to someone else's server. The DCC client code does the right thing when it cannot contact any of the servers it knows about; it quickly passes the mail without worrying about its bulkiness. Given more than one server, the DCC client code uses the fastest or closest.

When using someone else's server, you must either contact them for a DCC client-ID and corresponding password.

Public DCC servers for anonymous DCC clients handling fewer than 100,000 mail messages per day are provided by people and organizations in the following list. The default contents of /var/dcc/map file point to these servers.

Note well that it has never been right to take and resell the bandwidth and, most important, human system administration work of the public DCC servers to third parties. Blunt words for that include theft and stealing. Vendors of "spam appliances" or services including the DCC such as "managed email" must provide DCC servers of their own or contract for DCC services from others. They must also buy a license for the commercial version of the DCC software.

Organization Contact
College of New Caledonia Kevin W. Gagel
Cumberland Technologies DCC Administration
BlackSpider Technologies David Saunders
DelMarVa OnLine Sven Willenberger
www.eatserver.nl dcc@eatserver.nl
Etherboy.com Dave Lugo
Gdansk University of Technology Krzysztof Snopek
INFN (National Institute for Nuclear Physics) - Bari Domenico Diacono
INFN (National Institute for Nuclear Physics) - Turin Alberto D'Ambrosio
Iron Hill, LLC Frank Black
MGT Consulting dcc@misty.com
-- Vincent Schonau
INAF IASF (National Institute for Astrophysics)-Palermo-Italy Giacomo Fazio
Quanteam Soeren Gerlach
Pacific Internet, Singapore (NASDAQ: PCNTF) Hostmaster
Sihope Communications Sihope NOC
Sonic,net, Inc. Kelsey Cummings
Telefonica O2 Czech Republic Pavel Urban and Lenka Sevcikova
Tilastokeskus - Statistikcentralen --
UNC Wilmington Tony Copeland
Universität Trier Horst Scheuermann
Vienna University of Economics and Business Administration Georg Graf

The IP addresses of the Public DCC are servers define the DNS names dcc1.dcc-servers.net, dcc2.dcc-servers.net, dcc3.dcc-servers.net, dcc4.dcc-servers.net, and dcc5.dcc-servers.net. Use them by adding those names to your /var/dcc/map file with cdcc "add dcc1.dcc-servers.net" and so forth. The names are automatically installed in the file when the DCC programs are installed with the ./configure script and Makefile in the source. See the installation instructions.

Flooding Checksums among Private DCC servers

The effectiveness of DCC filtering increases with checksums "flooded" or exchanged with other DCC servers. The spam filtering results of violating the free license by not connecting a local, private server to the global network of DCC servers may be disappointing.

Mail systems that handle more than 100,000 mail messages per day should have a local DCC server so that processing incoming mail is not delayed by the time required for the UDP packets used by the DCC client protocol to cross the Internet. Organizations that deal with more than 500,000 mail messages per day benefit from two or more local DCC servers to ensure that at least one local DCC server is available despite system maintenance. Organizations that deal with fewer than 100,000 mail messages per day use less bandwidth of their own and of the servers in the global network by using the public servers.

See the DCC FAQ about the resources needed by a DCC server.

The first step in configuring a DCC server to flood checksums is agreeing on the server-IDs of all participating servers. There is a private list of the DCC servers, server-IDs and so forth in the global network of DCC servers at http://www.rhyolite.com/dcc/private/. It is readable only by server operators. Contact vjs@rhyolite.com for server-IDs. Subscriptions to the DCC-servers mailing list are available only to operators of servers in the global network.

Other Resources

Whitelists
Use of the DCC to reject unsolicited bulk mail generally requires a whitelist of solicited bulk mail sources the local common /var/dcc/whiteclnt file or a per-user whiteclnt file.

Whitelist of blank or test messages
It can be useful to white-list practically blank messages from various sources and common test messages. I.E.C.C. offers such a whitelist of blank messages that can be copied or included into a /var/dcc/whiteclnt file.

The DCC source includes a script named /var/dcc/libexec/fetch-testmsg-whitelist intended to be invoked by cron to periodically fetch new copies.

Blacklists
Blacklists such as those used at rhyolite.com can be used as "spam traps" to feed the DCC. For example, sendmail can use an "access_db" to mark spam, and then report it via dccm.

DNS Blacklists
The DCC clients, dccm, dccifd, and dccproc can check domain names and IP addresses in SMTP envelope Mail_From values and in URLs in mail message bodies against DNS blacklists (DNSBL) such as the SBL. See the installation instructions and DNSBL_ARGS in the configuration file, dcc_conf, in the home directory.

Greylisting
The DCC sendmail milter, dccm, and the dccifd general MTA interfaces can use a form of greylisting.

Logos
Some logos that can be displayed on web pages are available.

CGI Demonstration
There is a demonstration of the proof of concept CGI scripts that allow users to maintain their whitelist and monitor their individual logs of rejected mail at https://www.rhyolite.com/DCC-demo-cgi-bin or https://cgi-demo:cgi-demo@www.rhyolite.com/DCC-demo-cgi-bin. It requires a user name of cgi-demo and a password of cgi-demo the same as the user name.

Questions
See the DCC FAQ and the archive of the DCC mailing list for information about connections between the DCC and mail user agents and mail transfer agents.

Technical questions or comments can be sent to Rhyolite Software. More extensive assistance can also be hired from Rhyolite Software.

DCC Reputations

DCC Reputations are a distinct mechanism based on and contributing to DCC data. In part to minimize abuse by anonymous users, DCC Reputations are available only in the commercial version of the DCC software.

History

The DCC is based on an idea of Paul Vixie and on fuzzy body matching to reject spam on a corporate firewall operated by Vernon Schryver starting in 1997. The DCC was designed and written at Rhyolite Software starting in 2000. It has been used in production since the winter of 2000/2001.

Contact vjs@rhyolite.com by mail or using the form. Do not send mail to the spam trap.

$Date: 2008/10/09 01:29:02 $