Newbie suffering...

Vernon Schryver vjs@calcite.rhyolite.com
Thu Feb 12 15:28:21 UTC 2004


> From: John Sutton <john@scl.co.uk>


> I need to introduce spam control on my server which is dealing with about 
> 30,000 emails per day for some hundredes of users.  How am I going to compile 
> appropriate whitelistings for the solicited bulk mail for my users?  I can 
> run with "dccm -a IGNORE" for a few weeks, but I don't really want to have to 
> trawl through 1000's of log files every day looking for likely candidates, 
> and nor can I expect my users to do that, or even to examine the source of 
> each email they get to read the X-DCC header line.  Remember, these are 
> users...;-)

The global network of about 200 DCC servers is handling about 100 million
mail messages per day.  I estimate more than 5 million but probably no
more than 10 million mailboxes are involved.  Some of the organizations
involve are retail ISPs with more than hundreds of end users.

> So I figure what I need to do is to run for a few weeks with -a IGNORE and 
> modify dccm so that bulk mail has the Subject: header changed to read (say) 
> "Subject: SPAM?: Re: Fwd: Gimme Viagra!".  Then I tell the users to look 
> through their inbox listings and let me know about anything which has been 
> marked "SPAM?:" but shouldn't have been.  Thus I compile the white listings 
> and then after a few weeks remove the -a IGNORE switch.

I think most other organizations with similar concerns tell their users
to watch for X-DCC headers that contain the string "bulk" or they use
CGI scripts similar to those in the cgi-bin directory in the DCC source
and tell their users to look at their per-user log files.  When a user
finds a message that would have been (or was) reject, the user can
click on it to whitelist similar messages in the future.

I've not heard any other requests to modify Subject headers.  That is
rather intrusive for those users that don't want DCC spam filtering.


> ...
> 1) introduce a new switch, say, "-f optarg" meaning "flag bulk mail by 
> prepending the string optarg to the Subject header".  -f is valid ONLY with 
> -a IGNORE, otherwise it is itself ignored.

Since this would be a one-off temporary kludge, why bother with an argv flag? 
Why not simply make your version always do whatever you want?

> 2) introduce a lump of storage into the bottom half of the WORK struct 
> (underneath WORK_REZERO) so that in dccm_header() I can store the Subject.

I'd probably use malloc()/free() to avoid hassles with the varying
sizes of Subject: headers, particularly since this is for a small
installation so speed is not a major concern.


> 3) at *the_appropriate_point* in dccm_eom() i.e. at that point at which it is 
> decided that this email WOULD have been rejected/discarded were it not for 
> the -a IGNORE setting, I can modify the Subject.
>
> And my problem is number 3!  Where *is* that point in dccm_eom()? 

I would change the code in dccm_eom() that calls smfi_chgheader() or
smfi_addheader() depending on whether there is already an X-DCC header
in the message.  I assume you'll want to add a Subject header if
none is present.

>                                                               This block 
> of code misled me:

> ...
>     /* it is spam for at least some targets */

> ...
>         work_done(wp, "ignore and accept");
>         return SMFIS_ACCEPT;
>     }
> -----------------------------------
>
> because you'll surely have to agree, it is not a question of what *I* mean by 
> "ignore and accept", it is a question of what the author of this code means 
> by it!  This is what led me to think that "no message ever gets (potentially) 
> rejected/discarded".

I don't understand.  That code deals with the case when the fact
that a message is spam is ignored and the message is accepted.


> >From your response (and that from John Doherty), it would appear that if I 
> remove the -a IGNORE flag then the bulk mail *will* get bounced, and so I 
> assume that the second block of code above is a bit of old stuff left in 
> there to trick the unwary! 

Because my first programming job was on a computer with accoustic delay
lines for memory that containing a total of about 4000 words, I tend
to avoid dead code.


> ...
> Anyway, if you could give me some feedback on the general idea and (if you 
> think its worth doing) some ideas about point 3 above, I will be grateful.  I 
> will of course post a patch of my efforts in due course!

I would use the CGI scripts in the cgi-bin directory in the DCC source.
See also the portions of the dccm man page that discuss per-user
whitelists and log directories.

I think the spam filters that mess up Subject lines in user's mail are
badly designed.


Vernon Schryver    vjs@rhyolite.com



More information about the DCC mailing list

Contact vjs@rhyolite.com by mail or use the form.