Good starting numbers for spamassassins dcc

Vernon Schryver vjs@calcite.rhyolite.com
Sat May 2 13:39:50 UTC 2009


> From: =?ISO-8859-2?Q?Micha=B3_Grz=EAdzicki?= <lazy@iq.pl>

> By default spamassassin uses 99999 as dcc_body/fuz1/fuz2_max whitch is  =
> same as dcc's many.
> This is olny 1/6th of the messages.

Are you referring to the difference between 19% tagged as "many" and
the 51% with bulky counts according to the graphs for your server at
https://www.rhyolite.com/dcc/private/.... ?
That is a low value.  Are you doing DCC filtering after other filters?

> I'm planing to add variable scoring to spamassassins DCC.PM to make it  =
> more usefull ( now only messages with many reports are flagged).
> I'm thinking about 40 reports getting 1/10 of the base score to 10 000  =
> reports (or many, where does it start ?) getting whole base score,
> 500 reports may be treated as likelly spam with 1/2 of base score in  =
> beatween maybe use 2 linear functions or one of higher order.
>
> Base score should be around 4/5 of mark as spam score.
>
> What would be good threstholds for wery unlikely spam, likelly spam,  =
> surelly spam.

I doubt that would help.  The DCC detects bulk email.  Spam is unsolicited
bulk email.  Mail messages that have been seen 100 or 10,000 times are
equally bulky, and neither is more likely to be spam.  Contrast Amazon
online order confirmations with Amazon advertisements.  Both are very
bulky, but only some of the Amazon advertisements are spam.

That is why I have always said the best way to use DCC is with per-user
whitelists.  Each user's whitelist indicates which streams of bulk mail
are solicited.


I think the SpamAssassin threshold of "many"/99999 is far too high.
The SpamAssassin conversion of "many" to 99999 is kludge that should
not have been code.
Instead, SpamAssassin should look for "bulk" in the X-DCC header
and the dccifd or dccproc thresholds should tell dccifd or dccproc
whether to add "bulk".  See DCCM_REJECT_AT and DCCM_REJECT_AT
in /var/dcc/dcc_conf.  See also -c and -t in the dccproc and dccifd
man pages; -t could be added to DCCIFD_ARGS


> I'm guessing this is the right aligment body fuz1 fuz2 checksums with  =
> body getting most reports and fuz2 least reports.
> Is this right?

If I understand the question, no.  All of the checksums are computed
on all mail messages, but only reports of the most bulky checksums are
flooded among DCC servers.  Body checksums are not at all fuzzy, and
so minimal personalizations can make each copy of spam have differing
DCC body checksums.


Vernon Schryver    vjs@rhyolite.com



More information about the DCC mailing list

Contact vjs@rhyolite.com by mail or use the form.