white lists

Vernon Schryver vjs@calcite.rhyolite.com
Sat Apr 20 16:21:08 UTC 2002


> From: "Earl Killian" <dcc@lists.killian.com>

> ...
> The problem with procmail is that you would have to run it before DCC
> instead of after.  The current order is typically DCC and then
> procmail.

Why would you have to run procmail before the DCC?  What's wrong with
a procmail recipe like this run on mail that has already been seen
by the DCC?:

   :0
   * ^X-DCC.*(Body|Fuz[12])=([0-9]*[0-9][0-9][0-9]|[5-9][0-9]|many)
   # !(^X-DCC-.*OK)
   * !(^From:.*list1@example.com)
   * !(^From:.*list2@example.com)
   {
        EXITCODE=67
        :0
        /dev/null
    }



> ...
> Seven rules (18%) consist of lists of domains using the pattern (not a
> regular expression):
>   *.DOMAINNAME0 *.DOMAINNAME1 *.DOMAINNAME2 ...

What is the difference between a "pattern" and a "regular expression"?
Perhaps an initial or terminal substring match ignoring case?

> ...
> Seven rules (18%) consist of lists of domains using the pattern (not a
> regular expression):
>   *.DOMAINNAME0 *.DOMAINNAME1 *.DOMAINNAME2 ...
> with a total of 211 DOMAINs listed.  These test are very efficient.
> The only use of the dotted IP matching pattern was in the single IP

that sounds quite similar to a sendmail access_db.
I've 1000's of IP addresses and domain names in mine.

> ...
> had a DNS NS record.  There were 150 from patterns of the form
>   NAME@*
> 25 from patterns of the form
>   NAME*@*
> 5 patterns of the form
>   *NAME*@*
> 140 patterns of the form
>   *NAME@*
> 96 patterns of the form
>   *@DOMAINNAME
> 68 patterns of the form
>   *@*DOMAINNAME

Except when implemented by a system such as an MTA that has already
need to parse addresses into NAMEs and DOMAINNAMEs those 100's of patterns
sound like 100's of regular expressions.


> Basically it appears that pattern matching on the leading, trailing,
> or whole strings that are the user and host portion of the mail from
> would capture almost everything I used in patterns, even though I had
> more flexibility than that.  Such tests are very efficient.
>
> I hope this helps.

Well, it seems quite eloquent to me, although perhaps not in the way
expected.

I'd hope a system dealing with several 100,000 messages per day would
not need to do several 100 string comparisons on each at least two
and perhaps all headers in all messages.  At least the sendmail
access_db scheme consists of parsing the header into parts and then
doing a handful hash table (Berkeley DB) lookups on each part.


Vernon Schryver    vjs@rhyolite.com



More information about the DCC mailing list

Contact vjs@rhyolite.com by mail or use the form.