DNSBL -Bset:URL mode

Vernon Schryver vjs@calcite.rhyolite.com
Mon Dec 20 16:15:40 UTC 2010


> From: Kostik 

> Yes, I'm talking about 8-bit encoded message:

> Content-Type: text/html;charset=koi8-r

> http://??????.??
> ---
> In the real world such messages are exist. Is it possible to somehow encode
> such domains in Punycode and only then use DNSBL?

After I wrote a bunch of code and started to test it, I realized
this is an even bigger mess than I thought.  The trouble is in the
practically infinite number of Content-Type charsets.  Punycode is a
scheme to encode Unicde domain names in the RFC 1034 subset of ASCII.
How can dccproc/dccm/dccifd convert the many Cyrillic and other non-ASCII
character sets to Unicode?

> Now this situation in the logs looks like this:
> ---
> DNSBL helper URL \208\210\201\215\197\212.\210\198
> gethostbyname(\208\210\201\215\197\212.\210\198.dbl.spamhaus.org): Unknown
> host\n

That a good example of the hopelessness of the situation, because
it is in koi8-r.   Unless I add code to recognized "charset=koi8-r",
koi8-u, cp866, Windows-1251, and perhaps 8859-5,
dccm/dccifd/dccproc cannot know how to convert that domain name to Punycode.

And that's only for Cyrllic.

The best I could do is notice when a domain label looks like UTF-8, and
convert it to Unicode and then to Punycode.


Vernon Schryver    vjs@rhyolite.com



More information about the DCC mailing list

Contact vjs@rhyolite.com by mail or use the form.