ICANN haz untypable URLs?

Forget the complications of extending out the latin 26 character set used for domains currently to include foreign characters, kanji and the like. How will this effect the browsers when you go from English, read left to right as in the URL example here

right to left

to something like Hebrew which is read right to left?

left to right

This change will be impacting more than just servers around the world. Entire browser infrastructures will need to have their architecture revamped.

In my experience, Firefox has better Hebrew support than does Internet Explorer, Chrome works surprisingly well but gets caught up in translation. Support for bidirectional languages was implemented in Mozilla 3.0, all the improvements and bug-fixes that have been made since are awesome and significant.

Time will tell how the URL bar is handled. How much of the code is capable of handling UTF-8 let alone UTF-16? A simple way to overcome this in browsers (at least in the url bar) is to color space each character according to character set.

The DNS protocol has always been 8-bit clean. UTF isn’t involved in internationalized domain names either. they start with unicode which then gets normalized and encoded as ascii strings. so テスト(test in Japanese) gets translated to “xn--zckzah” as an IDN. a web browser or mail program can in theory then take a domain name like fubar.xn--zckzah and display it in the appropriate script for a Japanese user.

www.arussiansite.рф should be a joy to try and type in 😉

Ever since support for bidirectional languages was implemented in Mozilla by me and my colleagues at IBM. and through all the improvements andbug-fixes that have been made since, one thing that we never got quite right was text with diacritics, aka nikkud, aka harakat, especially in justified text. This was a real obstacle in the way of my recommending Mozilla or Firefox to my friends, many of whom heavily use sites like Mechon Mamre that feature vocalized Hebrew.

Leave a Reply

Your email address will not be published.