Graylisting

chris albert christopher.albert@mcgill.ca
Fri Sep 24 23:08:04 UTC 2004


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi,

Greylisting is a powerful tool, but perhaps a complicate one to
implement on a large scale.
I have a couple of questions related to a rollout of (dcc) greylisting
that I was hoping some list members might be able to answer.

1. Call the time between the initial reception of a message , in say a
dccm grey-listing regimen, and the time it is accepted, the *
retransmission time*. In the language of "man dcc", this would be <
wait -embargo, and the *retransmission time*  would be a statistic
that applies only to 'familiar triples'.

Does anyone have statistics on retransmission times?
What is the mean, variance, inter-quartile range...?
What does the distribution of retransmission times look like?
Is it positively skewed, lognormal, weibull, chi^2,  leptokurtic, ...?

If someone has data, and can make it available to me, I might be able
to find answers to some of these questions. ( No, I am not a
statistician, so there might be people on this list better equipped
than me to analyze such data...).

Why would I ask such questions? Well if I could associate costs to a
business impact analysis, having that statistical data would enable me
to do the kind of graphical risk presentation beholden of my manager's
manager (hereafter boss^2). However, I work for a monopoly inside a
state institution, which you might think makes life easier, but you
would be wrong.

In my environment, there is a certain class of dedicated, determined,
irrational, powerful, kvetching, megomaniacial user who thinks that
boss^2 is their service desk. With 50K users, even if the latter class
is 1.5%, that means a lot of calls to boss^2 when I roll out greylisting.
Morevoer I have to assume that there is ongoing research on V!agr4,
and that 'AsianPornSlut' is part of the title of someone's PhD thesis.

Thus, understanding the distribution of 'restransmission times' will
give me the ability, under some unrealistic assumptions, to make
predictions about the kind of crap that will arrive at 2 levels above
in the organigramme, and thus, under some realistic assumtions about
the exponential growth rate of the severity of crap under bureacratic
gravitation, estimate my choke point, in the potential rollout of
greylisitng for my user base.

2. Seeding, training -- rollout considerations.

Suppose that despite my survival instincts, that I am so sick of spam
that I want to stop it, but not to the point of suicide.
So I'd like to seed, train my graylisting regime, prior to its global
implementation, so as to reduce the impact of stochastic
retransmission times.

Is there a way to do that?
Can I use historical data? ( whitelist certain senders from the
previous 60 days ...)
Can I implement (dccm) greylisting, just recording triples for a
period of time, to reduce the impact of retransmission delays.

Can I use a forward feedback mechanism like described in
http://www.hpl.hp.com/techreports/2004/HPL-2004-5R1.html ?
( for example, greylist just those emails whose checksum appears as
MANY in a dcc server?, ...).


Chris






-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFBVKjSkRKXIlZkCr8RAs/BAJ0SWi+QeeeXx9MqEH/qc2bxYF1emQCfT1Yf
0zmeGTz0Pyb0Lv7rhw+8/n4=
=n5zo
-----END PGP SIGNATURE-----




More information about the DCC mailing list

Contact vjs@rhyolite.com by mail or use the form.