white lists

Earl Killian dcc@lists.killian.com
Sat Apr 20 16:40:05 UTC 2002


Vernon Schryver writes:
 > Date: Sat, 20 Apr 2002 10:21:08 -0600 (MDT)
 > From: Vernon Schryver <vjs@calcite.rhyolite.com>
 > 
 > > Seven rules (18%) consist of lists of domains using the pattern (not a
 > > regular expression):
 > >   *.DOMAINNAME0 *.DOMAINNAME1 *.DOMAINNAME2 ...
 > 
 > What is the difference between a "pattern" and a "regular expression"?
 > Perhaps an initial or terminal substring match ignoring case?

A pattern is either a string with "*"s in it for shell-like matching,
or /regularexpression/.  I never used the latter.

 > that sounds quite similar to a sendmail access_db.
 > I've 1000's of IP addresses and domain names in mine.

Yes, except that most of these patterns were used in a whitelist (like
the sendmail access_db OK feature?).  I don't use access_db because I
don't use sendmail on port 25 -- I don't trust its security well
enough to do that.

 > I'd hope a system dealing with several 100,000 messages per day would
 > not need to do several 100 string comparisons on each at least two
 > and perhaps all headers in all messages.

Again, this was for the envelope of the message, not the header of the
message.

Also, I can think of very efficient ways to organize the wildcard
tests.

Finally, see my followup post about the caveat.  A lot of those
strings are there to prevent things from being RBL'd, not to prevent
them from being DCC'd.  I just thought the statistics on patterns were
interesting.

 > At least the sendmail
 > access_db scheme consists of parsing the header into parts and then
 > doing a handful hash table (Berkeley DB) lookups on each part.

Not being a user of access_db, I can't comment on it very
intelligently, but it doesn't appear to support wildcards (true?),
which is an issue.

-Earl



More information about the DCC mailing list

Contact vjs@rhyolite.com by mail or use the form.