Vernon Schryver vjs@calcite.rhyolite.com
Thu Aug 29 16:28:36 UTC 2002

> From: Dale_Whiteaker-Lewis@Dell.com

> 	If, upon classifying a message as "bulk", DCC (through dccm) were to
> mark the headers with the acutal hash that exceeded the threshold (not sure
> that's feasible), the hash itself could be used as the filename in
> quarantine.  This would have the advantage of continually overwriting a
> single copy of the bulk message, rather than quarantining thousands of
> near-identical copies.  Why would I go to these lenghts?  If a message were
> seen as bulk, yet was business critical, a single copy of it would exist in
> the quarantine and could be searched for and retrieved using data in the
> procmaillog file.  This occurs to me as one way to provide most of the
> benefit of DCC to my network infrastructure with the assurance that no data
> would be lost.  Messages that did not exceed any threshold would be stored
> individually. 

Unless the majority of your mail is spam, wouldn't that last sentence
imply that most of your message storage is spent on legitimate mail?
Why store messages that do not exceed a bulk threshold instead of
delivering them (and so storing them in mailboxes)?

Dccproc is likely to be significantly more expensive than dccm.
I wouldn't be surprised if a busy SMTP server would need to be
extremely muscular to apply SpamAssassin to every message.

The dccm log files contain at most the first 30 KBytes of the body.
A server dealing with 1,000,000 messages/day would need fewer than
30 GBytes of storage per day.  So why not just store everything using
dccproc or dccm log files?
(Should there be yet another option to dccm to remove that 30K log limit?)
(dccproc records the entire message.)

If you did store a single copy of bulk mail how do you deal with
the privacy concerns of letting people know who else got the message?
Sometimes the "blind" part of "bcc" is important to people.
(That is why dccm creates separate per-user log files for each addressee
of a single message with many RCPTs, and why sendmail does not put
the envelope addressees in Received headers when there are more than one.)

Where would you record all of the addressees for the single copy
of a message that arrives in separate SMTP transactions?

Vernon Schryver    vjs@rhyolite.com

