Closed Bug 945240 Opened 11 years ago Closed 7 years ago

Change nsStandardURL's internal hostname representation to punycode

Categories

(Core :: Networking, defect)

x86
macOS
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla56
Tracking Status
firefox56 --- fixed

People

(Reporter: bzbarsky, Assigned: valentin)

References

(Blocks 1 open bug)

Details

(Keywords: addon-compat, dev-doc-complete, Whiteboard: [necko-next] btpp-active)

Attachments

(5 files, 10 obsolete files)

59 bytes, text/x-review-board-request
mcmanus
: review+
Details
59 bytes, text/x-review-board-request
mcmanus
: review+
Details
59 bytes, text/x-review-board-request
mcmanus
: review+
Details
59 bytes, text/x-review-board-request
mcmanus
: review+
Details
59 bytes, text/x-review-board-request
mcmanus
: review+
Details
See discussion in bug 942074.

Gavin, how much would this break Firefox/extensions?
I don't have a good sense offhand. nsLocation is not as often accessed as e.g. docShell.currentURI, for most frontend code, but it is used in a bunch of tests, and addons probably use it more. I would assume most don't care about punycode vs. not, though, so maybe it wouldn't be a big deal?
Addons are my main worry, honestly, followed by our UI.  I can obviously fix whatever tests fail, since I'll know about them!

If you think the UI will be OK, chances are we can just make that change and addons will end up dealing as needed...
Keywords: addon-compat
Jason, could we add an nsIURI API like GetASCIISpec but that only converts the hostname to ASCII?
Flags: needinfo?(jduell.mcbugs)
It seems unlikely that this will affect many add-ons, and I can't think of a reliable way to check for this in the Add-ons MXR. I think it should be okay to go forward with this change.
bz: sure, you can add a a new field/method to nsIURI.  I don't see why that would be a problem.
Flags: needinfo?(jduell.mcbugs)
Blocks: 1273168
Andrew, this seems to be breaking more sites.  Do we have someone who can take this, please?  I'm trying to avoid piling more stuff on my plate...

Note that we'd probably want to us the same punycoding setup in document.URL as well, at least.

Maybe the right answer here really is to store nsStandardURL in punycode and only convert to IDN in things like unescapeURIForUI and whatnot?  That will probably incidentally fix things like bug 1224225 and whatnot.  :(
Flags: needinfo?(jduell.mcbugs)
Andrew, see comment 6.
Flags: needinfo?(overholt)
> Maybe the right answer here really is to store nsStandardURL in punycode and 
> only convert to IDN in things like unescapeURIForUI and whatnot?

Valentin, do you have an opinion here?  Can you take the URI part of this bug? (I'm not sure what's involved on the DOM side).
Flags: needinfo?(jduell.mcbugs) → needinfo?(valentin.gosu)
Sure I'll take it. This seems like a major issue for web-compat.
Assignee: nobody → valentin.gosu
Flags: needinfo?(valentin.gosu)
Whiteboard: [necko-active]
The DOM side mostly depends on what exactly we do on the URI side.
OK, once we've got the URI side a bit further along I will find someone to take the DOM side. Please needinfo me when it's time, Valentin.
Flags: needinfo?(overholt)
Whiteboard: [necko-active] → [necko-active] btpp-active
So, I've started looking into this, but I have a question. Why can't we just use GetAsciiSpec and GetAsciiHost ?
From what I can tell, the non-ascii characters in the path, query and hash ( and user/pass) are percent encoded anyway. So the only difference between GetSpec and GetAsciiSpec would be the punycode encoding of the host - am I mistaken about this?
Flags: needinfo?(bzbarsky)
Hmm.  Non-ascii stuff in the path and so forth isn't guaranteed to be %-encoded last I checked.  Certainly the IDL for nsIURI and company claims that it's not guaranteed.  If something _does_ guarantee that now, then yeah, GetAsciiSpec might do the right thing and we should update the documentation.  We'd just need to audit all callers of GetSpec and convert them to GetAsciiSpec as needed, then.

Anyway, looks like we maybe have a "network.standard-url.escape-utf8" preference that controls the escaping behavior.  Defaults to true.
Flags: needinfo?(bzbarsky)
Andrew, can we get someone to take this?
From what I can tell no other work needs to be done necko-side.
GetAsciiSpec should work in all cases. The standard-url.escape-utf8 pref is set to be removed in bug 1099194.
It shouldn't be too difficult to get nsContentUtils::GetHostOrIPv6WithBrackets to use GetAsciiHost, and URL.cpp and Link.cpp to use the GetAscii* variants of the nsIURI methods.
We also need to make sure the security checks do the same.
Assignee: valentin.gosu → nobody
Flags: needinfo?(overholt)
baku's been doing some URL stuff recently (IIRC) ...
Flags: needinfo?(overholt) → needinfo?(amarchesini)
Valentin, does this mean that we can use nsIURI::GetAsciiHost everywhere when we use nsIURI::GetHost?
Or do we want to do it only to what is exposed to content?
Flags: needinfo?(amarchesini) → needinfo?(valentin.gosu)
Location.href and URL.href/hostname etc - should definitely be using GetAsciiSpec/GetAsciiHost.
For internal code, it may vary for each callsite, but IMO we would prefer using the Ascii variants for most things.
Flags: needinfo?(valentin.gosu)
Anything that goes to the web needs to be consistent.  If we switch to GetAsciiHost we need to use GetAsciiSpec instead of GetSpec as well, as Valentin says.

I suspect we never want to use GetHost and GetSpec in practice except maybe for logging code or something.
Attached patch url.patch (obsolete) — Splinter Review
This patch almost green on try... it breaks a IDN test because it wants to have 'domain' with unicode chars, instead having the ascii version of it. I'm fixing it together with other tests when treeherder finishes to run them all.

Mainly I replaced:

GetHost => GetAsciiHost
GetSpec => GetAsciiSpec
GetHostPort => GetAsciiHostPort

everywhere except for logging.

We need an ascii version of nsIURI::specIgnoringRef.
Assignee: nobody → amarchesini
Attachment #8770451 - Flags: review?(bzbarsky)
So here's another question... What would break if we just made GetHost/Spec/HostPort aliases for the ASCII versions?  If we only use the other in logging, it might not be that big a deal to just log stuff in punycode....
With my patch (that is more or less what you propose but without make those function alisas of their ASCII versions) we unexpectedly pass some WPTs and we break only 1 test: dom/tests/mochitest/dom-level0/test_setting_document.domain_idn.html (and this is going to be fixed soon).
But I don't know if we have enough tests for punycode.
valentin, thoughts on comment 20?
Flags: needinfo?(valentin.gosu)
Theoretically it shouldn't break too many things. Indeed, a lot of WPT tests will actually start passing. The only things I expect would break would be the location bar and dev tools handling of URLs (they prefer showing unicode domain names, instead of punycode).

But I think this could be a good way forward.
1. Forward the GetHost/Spec/HostPort to the ASCII versions.
2. Add GetUnicodeHost/Spec/HostPort to have non-ascii alternatives that we can use
3. Optimize the ASCII methods in nsStandardURL - currently they are a bit slower.

CCing Patrick and Honza in case they have any opinion on this matter.
Flags: needinfo?(valentin.gosu)
Location bar doesn't use GetSpec directly anyway.  It uses UnEscapeURIForUI.

So I guess we may still want some accessor (the GetUnicodeSpec thing seems reasonable) that gets the IDN domain name for the few cases that then proceed to call UnEscapeURIForUI.
Comment on attachment 8770451 [details] [diff] [review]
url.patch

This is almost certainly not sufficient; I don't see any changes to JS .spec users here... :(
Attachment #8770451 - Flags: review?(bzbarsky) → review-
valentin, are you go do implement what you suggested in comment 23?
If you do the necko part, we can work together on the broken mochitests.
Yes, I'll do a try push in a few minutes, and we'll see what breaks.
Thanks for offering to help with that :)
Attached patch bug945240-use-ascii.patch (obsolete) — Splinter Review
WIP patch. Needs comments, cleaning up
Assignee: amarchesini → valentin.gosu
Status: NEW → ASSIGNED
Attached patch bug945240-fix-tests.patch (obsolete) — Splinter Review
:baku, can you help with also fixing:
test_setting_document.domain_idn.html
test_postMessage_idn.xhtml
test_postMessage_origin.xhtml
browser_identity_UI.js

Thanks
Valentin, I have some sec-critical bugs to fix before helping with this. How urgent is this?
Not very urgent. I'm almost done with the main patch, and I don't think we can land it without fixing the display of URIs in the location bar as well. Some of these tests I think we'll disable anyway, since they're no longer applicable.
* nsStandardURL::GetHost/GetHostPort/GetSpec contain an punycode encoded hostname.
* Added nsIURI::GetDisplayHost/GetDisplayHostPort/GetDisplaySpec which have unicode hostnames, depending on the hostname, character blacklist and the network.IDN_show_punycode pref
* remove mHostEncoding since it's not needed anymore (the hostname is always ASCII encoded)
* Add mCheckedIfHostA to know when GetDisplayHost can return the regular host, or when we need to use the cached mDisplayHost

MozReview-Commit-ID: 4qV9Ynhr2Jl
Attachment #8772848 - Flags: review?(mcmanus)
Attachment #8770883 - Attachment is obsolete: true
Attachment #8772848 - Attachment is obsolete: true
Attachment #8772848 - Flags: review?(mcmanus)
Whiteboard: [necko-active] btpp-active → [necko-next] btpp-active
Attached patch test2.patch (obsolete) — Splinter Review
Attachment #8799363 - Flags: review?(valentin.gosu)
Comment on attachment 8799363 [details] [diff] [review]
test2.patch

Review of attachment 8799363 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good to me. You might want to run it by a dom peer as well.
Attachment #8799363 - Flags: review?(valentin.gosu) → review+
Attachment #8794637 - Flags: review?(mcmanus)
MozReview-Commit-ID: GKnbpc8GMb2
Attachment #8770884 - Attachment is obsolete: true
Comment on attachment 8799449 [details] [diff] [review]
Fix tests that use nsIURI.host expecting unicode domain name

Review of attachment 8799449 [details] [diff] [review]:
-----------------------------------------------------------------

r? Olli for dom & docshell, Honza for netwerk and PSM.
Honza, is the PSM test still checking for the right thing? nsIURI.asciiHost and nsIURI.host will return the same thing now, the ASCII domain name, and .displayHost/Port/Spec the unicode variant.
Attachment #8799449 - Flags: review?(honzab.moz)
Attachment #8799449 - Flags: review?(bugs)
Attachment #8770451 - Attachment is obsolete: true
Attachment #8799363 - Flags: review?(ehsan)
Comment on attachment 8799449 [details] [diff] [review]
Fix tests that use nsIURI.host expecting unicode domain name

>   else
>   {
>     // We're receiving data from the Greek IDN name; since that TLD is
>     // whitelisted for now, the domain we get isn't going to be punycoded.
>-    is(evt.origin, "http://sub1.ÏαÏάδειγμα.δοκιμή", "wrong sender");
>+    is(evt.origin, "http://sub1.xn--hxajbheg2az3al.xn--jxalpdlp", "wrong sender");
Could you fix also the comment.
Attachment #8799449 - Flags: review?(bugs) → review+
Making a note to look into this.
(00:51:25) MattN: going back to test_disabled_hosts.js… the main thing is that we're doing normalization to be backwards compatible with extensions and other callers
(00:52:32) MattN: and we should preserve that compat.
Theoretically shouldn't be a problem since all URIs will have the same behaviour. Should probably add a pref to revert to the old behaviour just in case we get a regression from this.
Flags: needinfo?(valentin.gosu)
Attachment #8799363 - Flags: review?(ehsan) → review+
Attachment #8794637 - Flags: review?(mcmanus) → review+
Valentin,  can you land my patch together with yours?
Dao, can we get your help with the URL bar?
The patches change nsIURI.host .hostPort and .spec to return the ASCII version.
The URL bar should use .displayHost .displayHostPort and .displaySpec instead.
Flags: needinfo?(valentin.gosu) → needinfo?(dao+bmo)
The URL bar is using nsTextToSubURI::UnEscapeURIForUI, no?  Similar for other things that do UI display of URIs.

I don't see your patches fixing that function; they should.
Oh, and I did say that in comment 6...
(In reply to Boris Zbarsky [:bz] (still a bit busy) from comment #45)
> The URL bar is using nsTextToSubURI::UnEscapeURIForUI, no?  Similar for
> other things that do UI display of URIs.
> I don't see your patches fixing that function; they should.

AFAICT UnEscapeURIForUI doesn't know anything about punycode vs unicode, or even which segment of the URI it's unescaping, it just makes sure no blacklisted characters are present. The important thing is that we call it for strings that come from .displaySpec/Host.
I did a mass replace in browser.js, and that seems to work, but we need to audit all the callsites, and maybe fix the extra UI tests that would be failing.
Comment on attachment 8799449 [details] [diff] [review]
Fix tests that use nsIURI.host expecting unicode domain name

Review of attachment 8799449 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/tests/mochitest/dom-level0/test_setting_document.domain_idn.html
@@ +106,5 @@
>    }
>    else
>    {
>      // We're receiving data from the Greek IDN name; since that TLD is
>      // whitelisted for now, the domain we get isn't going to be punycoded.

does this comment need an update?
Attachment #8799449 - Flags: review?(honzab.moz) → review+
Updates on landing this?
Blocks: 414090
Note also https://github.com/whatwg/html/issues/2568 and whether location should match that...
Attachment #8866757 - Flags: review?(mcmanus)
Comment on attachment 8794637 [details] [diff] [review]
Make nsIURI.host & variants return ASCII strings

Obsoleting as rebased patch is now in reviewboard.
Attachment #8794637 - Attachment is obsolete: true
Attachment #8799363 - Attachment is obsolete: true
Attachment #8799449 - Attachment is obsolete: true
Flags: needinfo?(dao+bmo)
Comment on attachment 8866761 [details]
Bug 945240 - Fix web platform tests meta to expect punycode encoding

https://reviewboard.mozilla.org/r/138372/#review141560

This looks good. I would have expected more results to change, but I guess IDNA testing is still spotty since it's unclear how the details should work still.
Attachment #8866761 - Flags: review?(annevk) → review+
Comment on attachment 8866757 [details]
Bug 945240 - Add unimplemented GetDisplay* methods to RustURL

https://reviewboard.mozilla.org/r/138364/#review141576
Attachment #8866757 - Flags: review?(mcmanus) → review+
Comment on attachment 8866756 [details]
Bug 945240 - Make nsIURI.host & variants return ASCII strings

https://reviewboard.mozilla.org/r/138362/#review141578

lgtm - the tests help :)
Attachment #8866756 - Flags: review?(mcmanus) → review+
Comment on attachment 8866761 [details]
Bug 945240 - Fix web platform tests meta to expect punycode encoding

https://reviewboard.mozilla.org/r/138372/#review141866

You probably need to fix the html/dom/self-origin.sub.html web platform test too.  The only reason it didn't fail for you is that we haven't synced wpt in a while and hence haven't picked up https://github.com/w3c/web-platform-tests/commit/2ea51e5825a5d237f49e36f205bee17f897d66fa (which you may want to just include).
Comment on attachment 8866758 [details]
Bug 945240 - Temporarily set the network.standard-url.punycode-host pref to false

https://reviewboard.mozilla.org/r/138366/#review141982
Attachment #8866758 - Flags: review?(bugs) → review+
Comment on attachment 8866759 [details]
Bug 945240 - Add test checking that URL.origin returns punycode

https://reviewboard.mozilla.org/r/138368/#review141984

I should not review this.
Attachment #8866759 - Flags: review?(bugs)
Comment on attachment 8866760 [details]
Bug 945240 - Add unit test for new behaviour

https://reviewboard.mozilla.org/r/138370/#review141986
Attachment #8866760 - Flags: review?(bugs) → review+
Attachment #8866759 - Flags: review?(gijskruitbosch+bugs)
I won't be able to properly (mostly: non-sleepily) review the effect this stuff has on browser/ until Monday. One thing I can think of offhand: do we need to update createExposableURI, and/or what happens with these patches when using reader mode (particularly, we use documentURI (which I guess uses documentURIObject.spec!) in https://dxr.mozilla.org/mozilla-central/source/toolkit/components/reader/ReaderMode.jsm#478 to display a domain name in the content). I expect there may be other places...

If there's urgency, looks like Matt might have more context, or Dão or Marco might be quicker.
(In reply to :Gijs from comment #69)
> I won't be able to properly (mostly: non-sleepily) review the effect this
> stuff has on browser/ until Monday. One thing I can think of offhand: do we
> need to update createExposableURI, and/or what happens with these patches
> when using reader mode (particularly, we use documentURI (which I guess uses

It didn't seem to hit trigger any tests, but it's definitely possible.
Anyway, we can easily fix it by replacing .spec/host with .displaySpec/displayHost where needed.

> If there's urgency, looks like Matt might have more context, or Dão or Marco
> might be quicker.
I don't think there's any urgency, except for bitrot. Feel free to CC them on the patch if they can any add anything to the conversation.
Comment on attachment 8866759 [details]
Bug 945240 - Add test checking that URL.origin returns punycode

https://reviewboard.mozilla.org/r/138368/#review142584

We should also update:

https://dxr.mozilla.org/mozilla-central/rev/e66dedabe582ba7b394aee4f89ed70fe389b3c46/browser/base/content/tabbrowser.xml#1440-1447

https://dxr.mozilla.org/mozilla-central/rev/e66dedabe582ba7b394aee4f89ed70fe389b3c46/browser/base/content/tabbrowser.xml#7668

https://dxr.mozilla.org/mozilla-central/rev/e66dedabe582ba7b394aee4f89ed70fe389b3c46/browser/base/content/browser-places.js#415

everything in https://dxr.mozilla.org/mozilla-central/rev/e66dedabe582ba7b394aee4f89ed70fe389b3c46/browser/base/content/urlbarBindings.xml#804 (did you test copy/paste ? Does this pass try with mochitest-bc ? That would be surprising...)

https://dxr.mozilla.org/mozilla-central/rev/e66dedabe582ba7b394aee4f89ed70fe389b3c46/browser/base/content/utilityOverlay.js#928

This also produces punycode in the bookmarks popup and other bookmarks UI, so presumably those need updating too. If you need more details about what to do there, I would suggest asking ::mak - he should probably also review that patch separately.


and maybe:

https://dxr.mozilla.org/mozilla-central/rev/e66dedabe582ba7b394aee4f89ed70fe389b3c46/browser/base/content/browser.js#6584

https://dxr.mozilla.org/mozilla-central/rev/e66dedabe582ba7b394aee4f89ed70fe389b3c46/browser/base/content/browser.js#l7635 (splitting out the 'real' href and the displayed one)

https://dxr.mozilla.org/mozilla-central/rev/e66dedabe582ba7b394aee4f89ed70fe389b3c46/browser/base/content/browser.js#8027-8028,8032-8033

https://dxr.mozilla.org/mozilla-central/rev/e66dedabe582ba7b394aee4f89ed70fe389b3c46/browser/base/content/urlbarBindings.xml#211

https://dxr.mozilla.org/mozilla-central/rev/e66dedabe582ba7b394aee4f89ed70fe389b3c46/browser/base/content/urlbarBindings.xml#1291-1293 (also splitting out href vs. what is displayed, at a minimum)

https://dxr.mozilla.org/mozilla-central/rev/e66dedabe582ba7b394aee4f89ed70fe389b3c46/browser/components/downloads/content/allDownloadsViewOverlay.js#1369

https://dxr.mozilla.org/mozilla-central/rev/e66dedabe582ba7b394aee4f89ed70fe389b3c46/browser/components/downloads/content/downloads.js#1069


and then here we need to check what IE/Edge do:

https://dxr.mozilla.org/mozilla-central/rev/e66dedabe582ba7b394aee4f89ed70fe389b3c46/browser/components/migration/IEProfileMigrator.js#68
because of:
https://dxr.mozilla.org/mozilla-central/rev/e66dedabe582ba7b394aee4f89ed70fe389b3c46/browser/components/migration/MSMigrationUtils.jsm#674


and that's what I found when looking only in non-test browser/ paths. I'm fairly sure there's more lurking, but I have a lot of other reviews to get to and this is already so much that I think this shouldn't land as-is. :-(
Attachment #8866759 - Flags: review?(gijskruitbosch+bugs) → review-
It seems you did not change all origin getters for this. E.g., WPT html/dom/self-origin.sub.html was not affected: https://github.com/w3c/web-platform-tests/issues/5955.

I'm not sure if we want to change all of them in this bug, but we should change all of them so a follow-up might be warranted depending on how you want to tackle things.
> E.g., WPT html/dom/self-origin.sub.html was not affected

It's affected.  The patches here were just not tested against a version of wpt that would detect the difference: see comment 65.

As in, if these patches land in their current form we will start failing html/dom/self-origin.sub.html as it currently exists upstream, but not as it currently exists in our tree, because the latter doesn't test what it thinks it's testing.
Great! Hopefully me changing it upstream is sufficient then, but if you want to take that change here that's also fine with me.
(In reply to :Gijs from comment #71)
> (did you test copy/paste ? Does this pass try with mochitest-bc ?
> That would be surprising...)

We don't have great test coverage for unicode URLs. I added one.

> https://dxr.mozilla.org/mozilla-central/rev/
> e66dedabe582ba7b394aee4f89ed70fe389b3c46/browser/base/content/utilityOverlay.
> js#928
> 
> This also produces punycode in the bookmarks popup and other bookmarks UI,
> so presumably those need updating too. If you need more details about what
> to do there, I would suggest asking ::mak - he should probably also review
> that patch separately.

Indeed, there are a few other UI issues I'm missing. I'll submit a separate patch for this.

> https://dxr.mozilla.org/mozilla-central/rev/
> e66dedabe582ba7b394aee4f89ed70fe389b3c46/browser/base/content/urlbarBindings.
> xml#1291-1293 (also splitting out href vs. what is displayed, at a minimum)

Not exactly sure what you meant by slplitting out - I changed the instances to use displaySpec.

> content/allDownloadsViewOverlay.js#1369
> content/downloads.js#1069

I didn't change these two since they deal with file URIs.

> 
> and then here we need to check what IE/Edge do:
> 
> IEProfileMigrator.js#68
> MSMigrationUtils.jsm#674

I don't know the code too well, but it seems to me that it doesn't matter if the URLs are unicode or punycode in this case.
So, I originally assumed, based on the first patch and the bug summary, that this affected nsIURI.spec and location.href, but not location.hostname / nsIURI.host. However, the first patch sure looks like this also affects nsIURI.host. In which case, this is going to affect a lot more frontend code. :-(

Am I just misreading things? What about principal origins and nsIURI.prePath and such?

Because the browser frontend is generally what displays stuff, I expect there'll be a long tail here, from the preferences / permissions / page info style stuff to notification bars ("we searched for föo, did you mean to go to http://föo ?"), the identity popup, and all kinds of other browser UI...


Would it be possible to split the frontend patches out more and find appropriate reviewers?
Flags: needinfo?(valentin.gosu)
(In reply to Valentin Gosu [:valentin] from comment #84)
> > https://dxr.mozilla.org/mozilla-central/rev/
> > e66dedabe582ba7b394aee4f89ed70fe389b3c46/browser/base/content/urlbarBindings.
> > xml#1291-1293 (also splitting out href vs. what is displayed, at a minimum)
> 
> Not exactly sure what you meant by slplitting out - I changed the instances
> to use displaySpec.

I meant that we're generating HTML, and so I kind of assumed that maybe you want to generate:

<a href="punycode-href">pretty-href</a>

But if we're confident the unicode href works x-platform then we can just use displaySpec throughout.

> > and then here we need to check what IE/Edge do:
> > 
> > IEProfileMigrator.js#68
> > MSMigrationUtils.jsm#674
> 
> I don't know the code too well, but it seems to me that it doesn't matter if
> the URLs are unicode or punycode in this case.

Basically we're doing string compares with stuff IE/Edge have plugged in the registry to see if the user has typed a given URL in the IE/Edge location bar (vs. just visited it). The string compares will obviously fail if one is punycode and the other is unicode. I don't know what IE/Edge do. Status quo would be switching to displaySpec (though a followup wouldn't hurt).
(In reply to :Gijs from comment #85)
> Am I just misreading things? What about principal origins and nsIURI.prePath
> and such?

Principals are not affected, as they already use GetAsciiHost and GetAsciiSpec.

All nsIURI methods that contain the hostname - including prePath - are affected.
If needed, I can add a .displayPrePath as well.

> Would it be possible to split the frontend patches out more and find
> appropriate reviewers?

Yes, that's probably necessary. But I could use some help finding the places in the UI that are affected by this change.
Comment on attachment 8866759 [details]
Bug 945240 - Add test checking that URL.origin returns punycode

https://reviewboard.mozilla.org/r/138368/#review143534

Do we have similar wpt tests?
Attachment #8866759 - Flags: review?(bugs) → review+
(In reply to Olli Pettay [:smaug] from comment #88)
> Comment on attachment 8866759 [details]
> Bug 945240 - Add test checking that URL.origin returns punycode
> 
> https://reviewboard.mozilla.org/r/138368/#review143534
> 
> Do we have similar wpt tests?

There are a few, and Anne is working on adding more.
Flags: needinfo?(valentin.gosu)
See https://github.com/whatwg/html/pull/2689#issuecomment-302074202 for a list of WPT pull requests that add tests for the various cases affected. Reviews appreciated.
(In reply to Valentin Gosu [:valentin] from comment #87)
> (In reply to :Gijs from comment #85)
> > Would it be possible to split the frontend patches out more and find
> > appropriate reviewers?
> 
> Yes, that's probably necessary. But I could use some help finding the places
> in the UI that are affected by this change.

What I did for the list in comment #71 was to use DXR and look for '.spec', excluding things with path:test , and then auditing manually. This is painful and takes a long time, but I don't really have a better idea. You'd have to do the same for .host and .prePath and any others ('.host' already finds '.hostPort', and '.spec' finds '.specIgnoringRef' so not sure there are any).

In a general sense, places that are affected is going to be "anything that displays a URL or hostname" and there's quite a lot of those places...

To start with, I expect we'd want to update the identity panel, permissions doorhangers, the popup blocker code, the "did you mean to go to 'foo'?" notification bar, the various dialogs in about:preferences that display cookies/indexeddb/localstorage/appcache/whatever data by host, probably the page info dialog, and then bookmarks/history/synced tabs. Devtools probably will also want changes. But all that's just off the top of my head, and I expect there's more that I've forgotten about.
Comment on attachment 8868616 [details]
Bug 945240 - Change firefox code to use uri.displaySpec when unicode URLs are wanted

https://reviewboard.mozilla.org/r/140202/#review143526

r=me in the sense that these changes themselves look OK, but I think a lot more stuff will want changing.

::: browser/base/content/test/urlbar/browser_urlbarCopying.js:140
(Diff revision 1)
>    },
>    {
>      copyVal: "<example.com/\xe9>\xe9",
>      copyExpected: "http://example.com/\xe9"
>    },
> +  { // Note: it seems BrowserTestUtils.loadURI fails for unicode domains

Can you elaborate on what 'fails' means here? I know that there's URI spec checking in some places in BrowserTestUtils - perhaps some of those need updating?

::: browser/base/content/test/urlbar/browser_urlbarCopying.js:227
(Diff revision 1)
>    if (testCase.setup) {
>      testCase.setup();
>    }
>  
>    if (testCase.loadURL) {
> +    info(`Loading : ${testCase.loadURL}\n`);

Shouldn't need an extra newline in `info`.
Attachment #8868616 - Flags: review?(gijskruitbosch+bugs) → review+
(In reply to :Gijs from comment #92)
> > +  { // Note: it seems BrowserTestUtils.loadURI fails for unicode domains
> 
> Can you elaborate on what 'fails' means here? I know that there's URI spec
> checking in some places in BrowserTestUtils - perhaps some of those need
> updating?

I didn't look to much into it. The issue is definitely in BrowserTestUtils.
A few times, when I tried to load a unicode URL, it actually loaded a different one. I think some UTF8->UTF16 transformations are failing.
(In reply to Valentin Gosu [:valentin] from comment #93)
> (In reply to :Gijs from comment #92)
> > > +  { // Note: it seems BrowserTestUtils.loadURI fails for unicode domains
> > 
> > Can you elaborate on what 'fails' means here? I know that there's URI spec
> > checking in some places in BrowserTestUtils - perhaps some of those need
> > updating?
> 
> I didn't look to much into it. The issue is definitely in BrowserTestUtils.
> A few times, when I tried to load a unicode URL, it actually loaded a
> different one. I think some UTF8->UTF16 transformations are failing.

This is confusing - BTU is a simple JSM that calls "normal" browser/docshell/webnavigation APIs. I don't think it does any character encoding/transformations itself: https://dxr.mozilla.org/mozilla-central/rev/85e5d15c31691c89b82d6068c26260416493071f/testing/mochitest/BrowserTestUtils/BrowserTestUtils.jsm#478
Speaking of devtools, what do we plan to have Error.stack and stacks in devtools showing?  That is, things that come from the JS script "filename"?  I assume we're changing its behavior as well?
Comment on attachment 8868617 [details]
Bug 945240 - Add a pref to be able to restore previous behaviour where nsIURI.host/.spec returned unicode instead of punycode

https://reviewboard.mozilla.org/r/140204/#review144640
Attachment #8868617 - Flags: review?(mcmanus) → review+
Blocks: 1361135
Hi Valentin,

I looks like all the reviews are in.  Is there something blocking landing the fixes?  This resolution would go a long way towards uBlock Origyn being upgraded to a WebExtension.
Flags: needinfo?(valentin.gosu)
(In reply to :shell escalante from comment #97)
> Hi Valentin,
> 
> I looks like all the reviews are in.  Is there something blocking landing
> the fixes?  This resolution would go a long way towards uBlock Origyn being
> upgraded to a WebExtension.

See comment #92 / comment 85.
I could land it as it is, but there is still a lot of code in the Firefox frontend that hasn't been fixed. I'm considering doing a mass-replace of .host/.spec/etc to .displayHost/.displaySpec/etc (and need to add .displayPrePath)
Ok, I've tried doing the mass replace thing, but it may introduce further regressions if for example we key something in C++ code using the asciiSpec, and the JS code is using displaySpec to search for it. Also, the patches tend to bitrot quite quickly.

My new approach is to land these changes preffed off, then a few days later land the pref change along with the tests, and handle each regression individually (either by opening bugs or by fixing them myself). For each one of these, I think we should also add unit tests, as IDNA is severely under-tested in our code-base.
Flags: needinfo?(valentin.gosu)
Attachment #8866761 - Attachment is obsolete: true
Attachment #8866759 - Attachment is obsolete: true
Attachment #8868616 - Attachment is obsolete: true
Comment on attachment 8866760 [details]
Bug 945240 - Add unit test for new behaviour

https://reviewboard.mozilla.org/r/138370/#review161694
Attachment #8866760 - Flags: review?(mcmanus) → review+
Comment on attachment 8866758 [details]
Bug 945240 - Temporarily set the network.standard-url.punycode-host pref to false

https://reviewboard.mozilla.org/r/138366/#review161698
Attachment #8866758 - Flags: review?(mcmanus) → review+
I'm not sure I'm on board with the "turn it on and fix things that are reported.. plan" but landing it off is certainly ok.
Pushed by valentin.gosu@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/35d7c15f5df2
Make nsIURI.host & variants return ASCII strings r=mcmanus
https://hg.mozilla.org/integration/autoland/rev/abd19b1b5859
Add unimplemented GetDisplay* methods to RustURL r=mcmanus
https://hg.mozilla.org/integration/autoland/rev/905c1230c0a9
Add a pref to be able to restore previous behaviour where nsIURI.host/.spec returned unicode instead of punycode r=mcmanus
https://hg.mozilla.org/integration/autoland/rev/2dca6b785721
Temporarily set the network.standard-url.punycode-host pref to false r=mcmanus,smaug
https://hg.mozilla.org/integration/autoland/rev/82e7b5e70e23
Add unit test for new behaviour r=mcmanus,smaug
(In reply to Patrick McManus [:mcmanus] from comment #112)
> I'm not sure I'm on board with the "turn it on and fix things that are
> reported.. plan" but landing it off is certainly ok.

Thanks for the review!
Changing the title of this bug to better reflect what landed.
Will open new bug for the extra work to expose this changes to outside code.
Component: DOM → Networking
Summary: Consider making Location.href punycode non-ASCII hostnames → Change nsStandardURL's internal hostname representation to punycode
The various dependencies of this bug should probably move to the new bug...
Blocks: 1380617
I've had a go at writing a note about this in the Fx56 rel notes:

https://developer.mozilla.org/en-US/Firefox/Releases/56#Other

I wasn't sure where else to mention this - it's a bug fix really, and there are so many functions that use URLs where this could be mentioned...

Let me know if you think anything else is desperately needed.
Depends on: 1385883
Depends on: 1386195
Depends on: 1386213
Depends on: 1391421
No longer depends on: 1391421
Depends on: 1450538
See Also: → 1714021
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: