Looking for critique of idea for local integration of DCC and SA

Schmitt, Andy C - CISU-2 acschmitt@bpa.gov
Thu Aug 29 16:21:26 UTC 2002

That sounds pretty neat; I've run into the problem of storing gigabytes of
utter garbage just because it was addressed to a different person each time.
I hadn't thought of your method.

However, I think you'll outsmart yourself with this trick unless you have a
very small user base.  Note that I haven't used SA before and don't know
your setup, so these might not be accurate.

First of all, the human factor is important; Jill on 4th floor might not
want mail that begins with "Dear Bob,".  The fuzzy checksum would be the
same; if you used a stricter checksum, you'd end up with separate copies of
messages just like before, and wouldn't get anywhere.

Second, user confirmation codes also fall into this category.  Something
that sends a large-bodied, identical message with a 4-letter confirmation
code for different people might result in overwritten messages, causing
wailing and gnashing of teeth amongst your users when you resend.

Third, the only thing you save is space....and not much of that, since mail
is generally small and easily compressed.  You still have to resend
quarantined messages to the same set of people.  This time, however, instead
of just grepping your archives for a subject or sender, then grepping the
resulting file list for the recipient names and mailing the corresponding
files, you have to figure it out from a list without any hard
information...at least if you're using Sendmail, it becomes a miserable grep
'n' guess job.  Maybe SpamAssassin has better logging.

My recommendation: Just cron something to compress the stuff and delete it
after a while, maybe compress daily and delete a day's worth after a month
or so; that won't use much space.  Anyway, I'm always leery of automatic
things that overwrite data unless the data can be regenerated in five
seconds.  Anyone have any other thoughts, refutations, etc?

								Andy Schmitt
								BPA Unix

-----Original Message-----
From: Dale_Whiteaker-Lewis@Dell.com
Sent: Thursday, August 29, 2002 8:33 AM
To: dcc@calcite.rhyolite.com
Subject: Looking for critique of idea for local integration of DCC and

	I've been using DCC off and on with SpamAssassin for a project and
ran up against the requirement to quarantine all mail that would otherwise
be blocked by both tools.  I've used a combination of recipient re-writing
and procmail to log and quarantine the messages on a reiserfs file system.
The method of centrally storing one copy of binary attachments described in
the documentation led me to an idea I'd like to broach.  
	If, upon classifying a message as "bulk", DCC (through dccm) were to
mark the headers with the acutal hash that exceeded the threshold (not sure
that's feasible), the hash itself could be used as the filename in
quarantine.  This would have the advantage of continually overwriting a
single copy of the bulk message, rather than quarantining thousands of
near-identical copies.  Why would I go to these lenghts?  If a message were
seen as bulk, yet was business critical, a single copy of it would exist in
the quarantine and could be searched for and retrieved using data in the
procmaillog file.  This occurs to me as one way to provide most of the
benefit of DCC to my network infrastructure with the assurance that no data
would be lost.  Messages that did not exceed any threshold would be stored
	Thoughts, anyone?  

DCC mailing list      DCC@rhyolite.com

More information about the DCC mailing list

Contact vjs@rhyolite.com by mail or use the form.