white lists

Earl Killian dcc@lists.killian.com
Sat Apr 20 15:58:13 UTC 2002


Vernon Schryver writes:
 > Date: Sat, 20 Apr 2002 08:40:00 -0600 (MDT)
 > From: Vernon Schryver <vjs@calcite.rhyolite.com>
 > 
 > You could be right, but I suspect substring matching would be the
 > camel's nose for full regular expressions.  People would probably want
 > to look for strings in only some headers, and so would want at least
 > "^from:.*example\.com$"

The whitelisting I use never uses the message body, so it doesn't
suffer from the above problem.

 > And then there are the problems of where to keep the substrings or
 > expressions in the DCC, what format to use, and the rest of the
 > problems that procmail has solved.

The problem with procmail is that you would have to run it before DCC
instead of after.  The current order is typically DCC and then
procmail.

To provide some data for this discussion, I analyzed my
smtpd_check_rules file.

Background: my modified smtpd_check_rules file consists of lines of
the form
  ACTION:IPMATCH:FROMMATCH:TOMATCH[:MESSAGE]
where action is one of
  allow, allow-dcc, deny, noto, deny_delay, noto_delay
IPMATCH is a pattern match against either the dotted form of the IP
address, or its reverse translation.
FROMMATCH is a pattern match against the MAIL FROM: value.
TOMATCH is a pattern match against the RCPT TO: value.
MESSAGE is used on deny/noto rules for the rejection response.

ACTION is taken if IPMATCH, FROMMATCH, and TOMATCH are all true for a
given rule.

The pattern matches are a list of space separated individual
patterns, with an optional EXCEPT keyword.  They are satisfied if any
of the individual patterns match and none of the EXCEPT patterns
match.
Thus we rules of the form
    (IP0 | IP1 | ...) & ~(IPEX0 | IPEX1 | ...)
  & (FR0 | FR1 | ...) & ~(FREX0 | FREX1 | ...)
  & (TO0 | TO1 | ...) & ~(TOEX0 | TOEX1 | ...)

Looking at IPMATCH, I see 64% of the rules using "ALL" which matches
anything.  One rule uses the EXCEPT feature.  One rule uses the
UNKNOWN feature (non-existant IP reverse address translation).
Five rules (13%) use the RBL IP matching feature (5 blacklists).
Seven rules (18%) consist of lists of domains using the pattern (not a
regular expression):
  *.DOMAINNAME0 *.DOMAINNAME1 *.DOMAINNAME2 ...
with an average of 30 or so DOMAINs listed.  The only use of the
dotted IP matching pattern was in the EXCEPT rule:

ACTION is taken if IPMATCH, FROMMATCH, and TOMATCH are all true for a
given rule.

The pattern matches are a list of space separated individual
patterns, with an optional EXCEPT keyword.  They are satisfied if any
of the individual patterns match and none of the EXCEPT patterns
match.
Thus we rules of the form
    (IP0 | IP1 | ...) & ~(IPEX0 | IPEX1 | ...)
  & (FR0 | FR1 | ...) & ~(FREX0 | FREX1 | ...)
  & (TO0 | TO1 | ...) & ~(TOEX0 | TOEX1 | ...)

Regular expressions (while available) were never used in any of the
rules.

Looking at IPMATCH, I see 64% of the rules using "ALL" which matches
anything.  One rule uses the EXCEPT feature.  One rule uses the
UNKNOWN feature (non-existant IP reverse address translation).
Five rules (13%) use the RBL IP matching feature (5 blacklists).
Seven rules (18%) consist of lists of domains using the pattern (not a
regular expression):
  *.DOMAINNAME0 *.DOMAINNAME1 *.DOMAINNAME2 ...
with a total of 211 DOMAINs listed.  These test are very efficient.
The only use of the dotted IP matching pattern was in the single IP
rule using EXCEPT:
  allow:ALL EXCEPT 61.0.0.0/8 62.0.0.0/8 80.0.0.0/8 81.0.0.0/8 193.0.0.0/8 194.0.0.0/8 195.0.0.0/8 202.0.0.0/8 203.0.0.0/8 210.0.0.0/8 211.0.0.0/8 212.0.0.0/8 213.0.0.0/8 217.0.0.0/8 218.0.0.0/8 219.0.0.0/8 220.0.0.0/8:*@*.edu *@*.gov *@*.mil *@*.org:ALL
which allows MAIL FROMs with .edu, .gov, .mil, and .org domains except
from RIPE and APNIC IPs.

Turning next to the FROM rules.  Four rules (10% used EXCEPT),
primarily to allow through abuse@* and postmaster@*.  One rule used
the NS=UNKNOWN feature to test if the host portion of the MAIL FROM
had a DNS NS record.  There were 150 from patterns of the form
  NAME@*
25 from patterns of the form
  NAME*@*
5 patterns of the form
  *NAME*@*
140 patterns of the form
  *NAME@*
96 patterns of the form
  *@DOMAINNAME
68 patterns of the form
  *@*DOMAINNAME
EXCEPT was used in only a few patterns, again for "postmaster" and "abuse".
Comment: there is no particular reason that all of the latter 2
patterns couldn't have both used *@*DOMIANNAME, i.e. a test of the form
  hl = strlen(HOST);
  tl = strlen(DOMAINNAME);
  hl >= tl && memcmp(HOST + hl - tl, DOMAINNAME, tl+1)
wouldn't be sufficient.

The To rules were pretty uninterested, and were used primarily to
prevent relaying, which isn't DCC concern, so I won't compute
statistics on them.

Basically it appears that pattern matching on the leading, trailing,
or whole strings that are the user and host portion of the mail from
would capture almost everything I used in patterns, even though I had
more flexibility than that.  Such tests are very efficient.

I hope this helps.

-Earl



More information about the DCC mailing list

Contact vjs@rhyolite.com by mail or use the form.