FuzzyOcr 3.5.1 released

Vernon Schryver vjs@calcite.rhyolite.com
Mon Jan 8 06:33:25 UTC 2007

> From: "John Scully" 
> To: <users@spamassassin.apache.org>
> Cc: <dcc@rhyolite.com>

I'm not sending this to users@spamassassin.apache.org because I'm not
subscribed to that list.

> I wonder if Vernon Schryver at rhyolite could tie fuzzy OCR into the DCC 
> (distributed Checksum) project.  ...

Perhaps the easiest way to do that would be to pretend the OCR'ed mail
messages were plain text to start with, and feed them to dccifd or dccproc.

I'm not really suggesting that because I'm not sure whether it is a
good idea on various non-technical grounds.  If this OCR system is
a product sold in an appliance or as a managed service, it would need
to buy a commercial license to use the DCC programs.
Besides the license issue on the source, it would simply be wrong to
take and sell the CPU cycles, bandwidth, disk space, and, most important,
the human system administration work of the public DCC server operators.

> To give you an idea, our DCC server currently has these stats:  The key 
> items - 22,057,457 checksums in memory, using a little over 1.1G of RAM.  We 
> receive about 4,000 reports per minute from the network and send about 200 
> per minute from emails we process.

I currently recommend more than 3 GByte of RAM for a DCC server.  4 is
not too much, but 8 GBytes probably are unless you are handling several
tens of million mail messages/day.

There are private systems that stuff up to 10 million mail messages per
day through their DCC clients and servers.  That's about 7000 messages/minute
or 100 msgs/second.  DCC servers do that with a special purpose hash
table of a database.

The public DCC servers handle requests from the perhaps 40,000 small,
anonymous DCC client installations.  Some of the public DCC servers
handle more than 20 million requests per day.  Each request involves
accumulating a total number of receipients for 3 message checksums and
sending the answer back to the DCC client.

Vernon Schryver    vjs@rhyolite.com

More information about the DCC mailing list

Contact vjs@rhyolite.com by mail or use the form.