Egy beteg srác naplója

domain

IDN ain’t a solved problematics

As formerly discussed Google had some bugs handling IDNs in its services and since things hardly changed. When you request an application API key from Google for your site (let’s say you want embed Maps) Google poorly url encode the entered domain, but Yahoo’s freshly updated Site Explorer tool even flunks IDNs. After adding my site to it, it says my request was not accepted because the url was malformed.

This whole procedure is beacuse a tiny accent, funny isn’t it. Friends here always kidding me that only dummies use IDNs (just beacuse they cannot open the link with IE, lol) but it’s a good test anyway and I face a lot of shortcomings. In most cases web based services are not prepared for handling IDNs yet (though you can register an international domain since 2004), they only understand the ASCII safe form of the domains (e.g. xn--gbor-5na.20y.hu for gábor.20y.hu).

Híreket mondunk

És akkor jössz azzal, hogy nem tudod megnézni a fotókat. Csak mert buzi vagyok, és hosszú ó-val írom. Igen, mily’ csodálatos a technológia. 2006 májusa van. Ékezetes domént két éve lehet regisztrálni. Ha te olyan böngészővel nyomulsz, ami ennyi idő alatt sem volt képes magába olvasztani ennek a támogatását — c’est la vie. Safari, Konqueror, Firefox, Opera, Mozilla, Netscape — van néhány alkalmazás, amit használhatsz. Browse happy! Mindezeken felül úgy tudom SP2-vel felvértezett XP-n az IE 6.0 kezeli az ékezetes doméneket. De várhatsz a 7.0-ra, mindjárt itt van, addig meg nézd másnak a fotóit, akinem olyan buzi, hogy beleteszi azt a kibaszott ó betűt. Csumi.

Re: Google bites IDNs

Thank to the loads of feedbacks it turned out that the Google bug mentioned earlier only comes up when using Firefox and you’re logged into your Google Account. When you log out or just simply copy the link from the results, it works fine, ‘cos only the JS based tracking (Personal Search) escapes it in a bad way.

More on, Matt Cutts already faced another IDN issue as he wrote it down in his blog:

Q: “Any results on why IDN Domains don’t show pagerank?”
A: I’ve seen a couple that do, but I’ll check into why most don’t. My guess is that there’s a normalization issue somewhere in the toolbar PageRank pathway.

Google bites IDNs

Poor Google is a bit buggy. Sooner coders there already faced some character encoding issues and now, have problems with domains containing international (non-ASCII) characters. Source of the bug I found nowadays is that using some JavaScript magic Google doesn’t really forward you direct to the given search result. It handles the hit itself for search analysis and user tracking, and then redirects you to the real target. Let’s look up for gábor.20y.hu. The corresponding link result Google will return with is something similar to this:

http://www.google.com/url?sa=t&ct=res&cd=1&url=http%3A//g%E1bor.20y.hu/&ei=TXA1RMPxB7viwQHc-6WuAw&sig2=5dhrtGyojR_GPShMOKCdjg

Of course the guys at Google are smart, so they encoded the url parameter. Did they right? Not exactly. In internationalized domain names the special chars are not resolved like URL params. They have their own logic system, e.g. gábor means xn--gbor-5na for nameservers. So when you try to reach an URL like above, Firefox will notice you kindly that „Firefox can’t find the server at g%c3%a1bor.20y.hu.” And it’s got the point. It really doesn’t exist. So folks remember to use URL encoding carefully, do not encode domain names, only their GET parameters.