+++ This bug was initially created as a clone of Bug #987069 +++ If URLs contain non-ASCII characters, they are not correctly Unicode encoded within the Network panel. STR: 1. Open a new tab 2. Open the Network panel (Ctrl+Shift+Q) 3. Go to http://шомбург.рф/ => The Network panel displays http://H><1C@3.@D/ as URL for the requests instead of http://шомбург.рф/. It looks like it is not a Necko bug, because Firebug displays Punycode. Sebastian
This string seems to come from this code in netmonitor-vew.js: NetworkHelper.convertToUnicode(unescape(aData.url)) which is calling a function in network-helper.js. This function takes the given string and hands it off to nsIScriptableUnicodeConverter, with a charset of UTF-8 in this case. What this last will do is treat each char of the JSString as a single byte (dropping its high byte) and then treat those bytes as a UTF-8 string. Now the actual string that comes along in aData.url in this case is "http://шомбург.рф/" which is already Unicode and in particular is decidedly NOT byte-inflated UTF-8. So treating it as byte-inflated UTF-8 will cause obvious problems. Victor, you seem to have blame for this code. Where is that string coming from?
I don't particularly remember the code paths in there, but I believe the url is handed over by the webconsole backend actor, during a "networkEvent" event emitted by the client. If it's always the case that it is already unicode, then that conversion is not necessary. I don't see why it would be unicode in some cases and not in others.
I tried to find where the networkEvent events are actually created, but I can't find it... nor can I find any documentation for the expected format of the packets going back and forth between server and client for the networkEvent case. :( Is there someone who knows the devtools better than I do who can take this on?
(In reply to Boris Zbarsky [:bz] from comment #3) > I tried to find where the networkEvent events are actually created, but I > can't find it... nor can I find any documentation for the expected format > of the packets going back and forth between server and client for the > networkEvent case. :( I believe the url property is taken from the HTTP channel. Does that mean it should be be safe to display directly? (Seems like we could just try it an see...) : https://dxr.mozilla.org/mozilla-central/rev/4ea7408b3eef059aa248f4b00328f8fdb4475112/devtools/shared/webconsole/network-monitor.js#863
Created attachment 8726282 [details] [diff] [review] bug1250397-unicode.patch (In reply to J. Ryan Stinnett [:jryans] (use ni?) from comment #4) > I believe the url property is taken from the HTTP channel. Does that > mean it should be be safe to display directly? (Seems like we could just > try it an see...) Yes, exactly. I looked at the packet using the RDPi  and its data looks just fine. Here is a quick patch, not sure if that's enough, but the primary issue is fixed and I am curious what the try says. Try https://treeherder.mozilla.org/#/jobs?repo=try&revision=678b57aa5ce6 Honza
Argh, here is correct try-push: https://treeherder.mozilla.org/#/jobs?repo=try&revision=14c11dacd487 Honza
> Does that mean it should be be safe to display directly? It... depends. The .spec of a URI, in its ACString form, may contain unescaped UTF-8 characters (and clearly documents that). Or it may contain escaped UTF-8. It's actually totally up to whoever created the URI. Now when you do .spec in JS, we will do UTF-8 to UTF-16 conversion, so if the URI had unescaped UTF-8 you will now have nontrivial (as in, high bytes not 0) UTF-16. If you want to make sure things really get unescaped consistently here (e.g. if someone happens to use escaped UTF-8, not raw UTF-8, in a file path), you probably want nsITextToSubURI.unEscapeURIForUI.
You need to log in before you can comment on or make changes to this bug.