Hello, all...

Vernon Schryver vjs@calcite.rhyolite.com
Mon Jan 14 18:16:07 UTC 2002

> From: <dcc@kstone.win.net>

>        ...  I've hacked dccproc and qmail-qfilter to provide DCC
> checksumming in qmail.  ...

What changes to dccproc were needed to make it work with qmail?
I think some people have reported using qmail with procmail and 
something like:

    :0 f
    | /usr/local/bin/dccproc -R -w whiteclnt

combined with some procmail regular-expression parsing to look at
the D-DCC line.

I hope that with version 1.0.43 of dccproc, the reg-exp matching
will not be needed.  One of the changes in 1.0.43 will be teaching
dccproc about thresholds like dccm.  With that change, dccproc will
exit with 0 if the total counts from the DCC server are below the
specified thresholds and with EX_NOUSER if they are above.

> ...
> And that's for a run 4 days.  /var/dcc/libexec/dbclean -i <myserver>
> doesn't seem to eliminate anything.

The idea of the DCC is to count recipients of individual messages,
and reject those messages that are unsolicited and have been sent to many
mailboxes.  Most spam spews start and end within 30 hours, but some
continue to send essentially the same message for months.  Therefore, 
DCC servers want to accumulate reports of checksums for some time.

As the dbclean man page says about -t -e, the dbclean default expiration
time is 7 days for non-bulk reports and about 90 days for bulky checksums.

> ...
> Using an awk script to match the output of dblist -v, I'm seeing a lot of
> checksums that are identical.
> $2  > 300 && $1 ~ "Body" {print $1 " " $4 " " $5 " " $6}

Yes, the main idea of the DCC is not to collect and spread individual
reports of spam, but to count and so detect bulk mail.
`dccproc -t many` is both a response to the impossibility of keeping
bad guys from applying `dccproc -t 1` a million times to a non-bulk
message and a hack for accepting and spreading reports from good guys
of mail that is very bulky because it is unsolited.

> Is what im using to pull out any checksum that matches a count of more
> then 300...I was going to feed those to dccsight -t many, as the many tag
> is the onlything I wrote in support to reject email on in the above
> qmail-qfilter/dccproc hack.

I don't understand why dccsight would be used there.

>                             The thing I noticed is that after processing
> dblist -v with the above awk script, I get roughly 50,000 records
> returned, but when uniqed, it winds up being about 80 unique checksums.

That sounds like some kind of mail loop that is reporting the same
messages about 1000 times each.

> 1.  What gives?  Why is there checksums the server is identifing as being
> seperate, although they probably shouldn't be?  Is that because of the
> mutated from/to/subjects?

The From, To, and Subject checksums are not useful for rejecting mail.
The To checksum is never sent to a DCC server for various reasons including
privacy concerns.  The Subject checksum is useless, and I'm going to
remove it from the clients with #ifdef.  The From and envelope From
checksums are useless for rejecting based on counts.  Consider that
any busy mailing list will soon have large From, Env_From, and IP counts.
Those three are useful for white-listing legitimate mailing lists
and for local blacklisting.  To save space in server databases
I'm considering making them not sent to DCC servers.  Each checksum
takes 11 bytes in the server on most platforms.  

> 2. If I feed the uniqed list above into dccsight with a -t many, is it
> going to set every one of those (including the extra ones) to many?

I'm not sure I understand.
If you tell dccsight to report some checksums with "many", then that's
the value that is reported.

> 3.  What arguments should I use with dbclean to erase every entry except
> ones with a body tag of many?

There is no such setting.  However, -e 86400 will expire everything
but "many" counts older than 86,400 seconds or 1 day.

> 4.  I run the crontab example and it doesn't seem to ever erase anything
> and I'm assuming because the default time to live for a record is one week.
> Am I correct in assuming that?
> ...

Vernon Schryver    vjs@rhyolite.com

More information about the DCC mailing list

Contact vjs@rhyolite.com by mail or use the form.