dccm and dccd (greylist) - Another newbie-

Vernon Schryver vjs@calcite.rhyolite.com
Sun Jan 15 03:21:42 UTC 2006

> From: "Paul R. Ganci" <ganci@nurdog.com>

> >Why not let sendmail+dccm reject spam?  Why complicate it by involving
> >SpamAssassin?  
> >
> Because of the chance of a false positive (FP). SpamAssassin allows one 
> to process the added DCC header to one's content. Therefore I have finer 
> control over what is considered spam. In this instance it is a "black & 
> white" decision made on something that actually is grey.

On the contrary, checking the archives of this mailing list would show
that using SpamAssassin as a substitute for DCC whitelists does not
prevent false positives.

> Okay one can do this but then what does one do with scores < 
> DCCM_REJECT_AT? Perhaps some percentage of that is spam too. 

So use local blacklists, DNS blacklists, DCC reputations, SpamAssassin,
or other mechanisms to handle those DCC false negatives.

>                                                              Most 
> probably some percentage of rejected Email was legitimate. 

For the DCC that percentage is practically 0% if you maintain whitelists
of solicited bulk mail senders.  Unless you are running a mail system
where you get to define acceptable email, such as a company mail system,
that probably requires per-user whitelists instead of just the
global /var/dcc/whiteclnt file.

>                                                            Effective 
> spam detection requires a multi-layered, multi-tool approach. 

I agree with that.

>                                                               Set 
> DCCM_REJECT_AT too low and you get FPs. 

That is wrong, provided you use the DCC as designed and so have
proper whitelists.

>                                         Set it too high and you get too 
> much spam allowed in. I don't believe a single DCCM_REJECT_AT score 
> addresses this problem.

You've evidently (and I think admittedly) never really tried to use
the DCC as designed.

> A while back I was asking about a two tier system where there were two 
> controls ... namely a DCCM_REJECT_AT control which DCC uses to reject 
> Email and a DCCM_BULK_AT control which DCC uses to only add a bulk 
> designation to the DCC header ... i.e. no rejection. Then I could set 
> the DCCM_REJECT_AT to a "large" value which would reject the most 
> obvious stuff with low risk for FP and let my SpamAssassin use the 
> "bulk" DCC header to determine what to do with stuff which was scored 
> DCCM_BULK_AT <= Score < DCCM_REJECT_AT. Email scored <DCCM_BULK_AT would 
> not be assessed any SpamaAssassin penalty. With two DCC controls I can 
> assert much finer influence over what is bulk and what isn't and make 
> the most efficient use of DCC and SpamAssassin together in regards to 
> the number of FPs found and computer resources used.

Didn't I answer that you might run two DCC daemons?  One would reject
with one DCCM_REJECT_AT threshold, while the other would use -QAaIGNORE
and only add headers.  If I didn't, it might have been because I think
it would be a poor idea for several reasons.

> >   - using site-local as well as per-user logs and whitelists to
> >      identify solicited bulk email.

> For a large site this still seems to me to be impractical.

Yes, for the largest sites, there are problems with DCC whitelists.
However, it seems clear to me that the other person is not running a
mail system for 100,000 or more users.  If you are also not running a
big site, why do you care about such problems?

One can speculate forever about problems entirely outside one's
experience or even academic knowledge, but usually only to waste time.

> For a large site this still seems to me to be impractical. Some things 
> come to mind:
>     1.) How large can a whitelist grow before it takes a figurative 
> "days" for DCC to read and process?

Because that question necessarily assumes I am an incompent boob,
it offends me.  Have you done any benchmarking to see if that might
be a problem?  Maybe I am an incompetent poseur, but
  - DCC whitelists are hash tables, which implies that their performance
      after being (re)built is close to O(1).
  - DCC whitelists use CIDR blocks to handle the main source of growth,
     blocks of IP addresses
  - there is no reason to have large enough whitelists that might
     cause problems.  
  - because they are simple hash tables, they'd break before they got
     that big.
  - no plausible per-user DCC whitelist would be large enough to matter
  - to prevent speed problems related to DNS delays, host names cannot be
     in per-user DCC whitelists
  - dccm and dccifd use separate let a separate thread rebuild the main hash 
     so that the normal DCC work can continue in parallel with waiting for
     DNS answers.

In other words, with the help of people running dccm with DCC whitelists
at sites with 30K or more users, I think the significant performance/size/etc.
problems of DCC whitelists have been anticipated or found and addressed.

>     2.) When can I safely turn on DCCM_REJECT_AT? Even for my small site 
> (400 subs) we are growing. I can monitor the logs for a time, create my 
> whitelist and then turn on DCCM_REJECT_AT. However as soon as I add my 
> next batch of new users aren't they subject to loosing legitimate bulk 
> Email not found in my whitelist? I argue new users will sooner or later 
> end up with a false positive.  As, I am a volunteer for a rural, 
> mountain internet coop in the CO Rockies, I don't have time to monitor 
> logs and maintain whitelists given my day job responsibilities. 
> Moreover, I try to avoid FPs like the plague. I just get tired hearing 
> complaints from those 400 subs ... the job just doesn't pay enough. :)

Why do you ignore what I keep writing about per-user whitelists and
controls?  Why not do as some ISPs do and let individual users monitor
and control things themselves?  At least one ISP (in Colorado by
coincidence) has built web pages that function somewhat like the proofs
of concept in the DCC cgi-bin directory that let individual users
maintain their own whitelists and turn DCC rejections on and off.

You can see a demo of the proof of concept scripts by follow the link in 
(user name cgi-demo and password cgi-demo).

>     3.) How do the scripts work when an organization has multiple Email 
> servers with multiple instances of DCC? How is all the data from the 
> various logs combined to form one unique whitelist used by all flooded 
> servers?

I would use one of the standard tools to solve such a problem. 
If your system has that problem, I guess we could talk about the
standard solutions.

>    4.) How would users maintain their own DCC whitelists from a single 
> location given that there exist multiple DCC servers and the fact that 
> the incoming Email servers where DCC resides are not the servers with 
> user accounts?

See #3.

>    5.)  Unfortunately Both DCC and SpamAssassin allow for both global 
> and per user whitelisting. How do I reconcile with the end user that he 
> has to whitelist senders in two places now? I don't believe I can make a 
> single whitelist available to both tools ... hence I have 2 times the 
> work and twice the user hassle.

To keep my sanity I resolutely pay little attention to SpamAssassin.
With that I don't mean to say anything bad about SpamAssassin, but to
point out that I don't control or even influence the SA source.

I suspect one could convert DCC whitelists to SA whitelists or vice
versa.  I vaguely recall some talk in this mailing list or elsewhere
about something like that.

Vernon Schryver    vjs@rhyolite.com

More information about the DCC mailing list

Contact vjs@rhyolite.com by mail or use the form.