Paul R. Ganci
ganci@nurdog.com
Sat Jan 14 18:45:32 UTC 2006
Vernon Schryver wrote:
>Why not let sendmail+dccm reject spam? Why complicate it by involving
>SpamAssassin?
>
Because of the chance of a false positive (FP). SpamAssassin allows one
to process the added DCC header to one's content. Therefore I have finer
control over what is considered spam. In this instance it is a "black &
white" decision made on something that actually is grey.
>As I've said many times, I think is wrong. The right way to use the
>DCC has nothing to do with SpamAssassin. Assuming you are using sendmail,
>it consists of:
>
> - installing dccm as described in the INSTALL.txt or .html file
>
> - letting sendmail+dccm reject unsolicited bulk email
> To do that, sent DCCM_REJECT_AT in /var/dcc/dcc_conf to what
> you consider "bulk". Common choices range from 5 to 500, with
> small values appropriate for small sites
>
>
Okay one can do this but then what does one do with scores <
DCCM_REJECT_AT? Perhaps some percentage of that is spam too. Most
probably some percentage of rejected Email was legitimate. Effective
spam detection requires a multi-layered, multi-tool approach. Set
DCCM_REJECT_AT too low and you get FPs. Set it too high and you get too
much spam allowed in. I don't believe a single DCCM_REJECT_AT score
addresses this problem.
A while back I was asking about a two tier system where there were two
controls ... namely a DCCM_REJECT_AT control which DCC uses to reject
Email and a DCCM_BULK_AT control which DCC uses to only add a bulk
designation to the DCC header ... i.e. no rejection. Then I could set
the DCCM_REJECT_AT to a "large" value which would reject the most
obvious stuff with low risk for FP and let my SpamAssassin use the
"bulk" DCC header to determine what to do with stuff which was scored
DCCM_BULK_AT <= Score < DCCM_REJECT_AT. Email scored <DCCM_BULK_AT would
not be assessed any SpamaAssassin penalty. With two DCC controls I can
assert much finer influence over what is bulk and what isn't and make
the most efficient use of DCC and SpamAssassin together in regards to
the number of FPs found and computer resources used.
> - using site-local as well as per-user logs and whitelists to
> identify solicited bulk email.
> To do that, follow comments in /var/dcc/dcc_conf about setting
> DCCM_LOG_AT= your notion of bulk, leaving DCCM_REJECT_AT blank,
> and monitoring bulk mail in /var/dcc/log. Each time you see
> solicited bulk mail, whitelist the sender.
> That can be done by pointing-and-clicking if you set up the
> CGI scripts as described in /var/dcc/cgi-bin/README.
>
>
For a large site this still seems to me to be impractical. Some things
come to mind:
1.) How large can a whitelist grow before it takes a figurative
"days" for DCC to read and process?
2.) When can I safely turn on DCCM_REJECT_AT? Even for my small site
(400 subs) we are growing. I can monitor the logs for a time, create my
whitelist and then turn on DCCM_REJECT_AT. However as soon as I add my
next batch of new users aren't they subject to loosing legitimate bulk
Email not found in my whitelist? I argue new users will sooner or later
end up with a false positive. As, I am a volunteer for a rural,
mountain internet coop in the CO Rockies, I don't have time to monitor
logs and maintain whitelists given my day job responsibilities.
Moreover, I try to avoid FPs like the plague. I just get tired hearing
complaints from those 400 subs ... the job just doesn't pay enough. :)
3.) How do the scripts work when an organization has multiple Email
servers with multiple instances of DCC? How is all the data from the
various logs combined to form one unique whitelist used by all flooded
servers?
4.) How would users maintain their own DCC whitelists from a single
location given that there exist multiple DCC servers and the fact that
the incoming Email servers where DCC resides are not the servers with
user accounts?
5.) Unfortunately Both DCC and SpamAssassin allow for both global
and per user whitelisting. How do I reconcile with the end user that he
has to whitelist senders in two places now? I don't believe I can make a
single whitelist available to both tools ... hence I have 2 times the
work and twice the user hassle.
Admittedly questions 3-5 are likely due to my lack of understanding of
what the scripts do and my opinion that whitelists/blacklists are just
too dynamic to effectively maintain. Our Coop had a serious user revolt
on its hands when we attempted to reject Email at the MTA based upon
public DNSBLs. Even our private sendmail whitelist/blacklist constructed
from our sendmail log files was too volatile to maintain. Hence we
purposely went to a SpamAssassin approach and leave it to the user to
decide what to do with Spam at least to some degree. The Coop is still
willing to reject outright "high" enough scoring stuff since a
statistical analysis indicated it was "always" spam. Hence the reason I
am pushing for two DCC controls and have some fear of outright Email
rejection.
Vernon, I want to make it perfectly clear I am not picking on DCC ... it
is clearly one of the most useful UBE killers I have found. I also
understand why you suggest the usage you do. Clearly catching bulk
before it ever gets to MailScanner or SpamAssassin is worthwhile ... it
avoids the resource hogs they are. However, in the scenario I describe,
I don't have to maintain a DCC whitelist, I can still reject some bulk
Email upfront with less FP risk, and can process questionable stuff in
more detail with some resource cost. This spam detection procedure would
work the best in my situation. I am sure there are many others who will
differ with this opinion.
--
Paul (ganci@nurdog.com)
More information about the DCC
mailing list