URL values not normalized to ASCII in WebExtensions API

UNCONFIRMED
Unassigned

Status

()

Toolkit
WebExtensions: Request Handling
P5
normal
UNCONFIRMED
2 months ago
12 days ago

People

(Reporter: R. Hill, Unassigned)

Tracking

(Depends on: 1 bug, Blocks: 1 bug)

55 Branch
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: triaged)

MozReview Requests

()

Submitter Diff Changes Open Issues Last Updated
Loading...
Error loading review requests:

Attachments

(2 attachments)

(Reporter)

Description

2 months ago
Created attachment 8862520 [details]
ff-webext-idn-urls.zip

User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:55.0) Gecko/20100101 Firefox/55.0
Build ID: 20170427100655

Steps to reproduce:

1. Install the attached minimalist extension -- this extension installs a webRequest.onBeforeRequest listener, and it will output to the browser console the hostname extracted from the details.url of whatever request it receives.

2. Temporarily load this extension using about:debugging.

3. Open browser console.

4. Open http://raymondhill.net/ublock/idn-tricks.html in a new tab -- the purpose of this page is to make image requests to various urls which are IDNs (some of these URLs are not referring to existing servers).

5. Observe in the browser console the reported hostnames.



Actual results:

Results:

The reported IDN hostnames are seemingly randomly reported as ASCII strings (punycoded) or as Unicode strings.



Expected results:

I can't find nowhere in the chrome API documentation that all URL values are guarantee to always be plain ASCII internally throughout the chrome API -- so at this point this bug report is merely my opinion of what should happen.

However I have observed that there never has been a case where the chrome webRequest, webNavigation, etc. APIs were URL values passed with Unicode-based hostnames, they always have been ASCII-based.

(This currently breaks the webext version of uBlock Origin when such URL values with Unicode character in it are passed to its listeners, or other API entry points.)

I do believe that all URL values in the webext API should be normalized to ASCII strings -- as otherwise this would force extension authors to normalize on their side using javascript code, an avoidable overhead knowing that the ASCII-based values of URL is readily available to the Firefox's webext framework (the legacy version of uBlock Origin uses these readily available ASCII URL values).

Unicode-encoded URLs should be only dealt with to interface with user input/output.

Updated

2 months ago
Component: Untriaged → WebExtensions: Request Handling
Product: Firefox → Toolkit

Updated

2 months ago
Blocks: 1309926

Updated

2 months ago
Priority: -- → P5
Whiteboard: triaged

Updated

a month ago
webextensions: --- → ?

Comment 1

a month ago
I gave a deeper look into this and it looks like there is no easy way to retrieve the "not normalized ascii url" (the one in the original IDNA format), e.g. by using URL (https://developer.mozilla.org/en-US/docs/Web/API/URL) we can easily convert the IDNA format into the normalized url:

    `(new URL("http://xn--80aj.com")).href` => "http://еа.com"

but no easy way to do the opposite and convert "http://еа.com" into the more explicit "http://xn--80aj.com".

After giving the attached example a try and explored it by making some experiments on top of it, I think that I agree with Robert. 

Besides the performance impact that converting the url back to the original IDNA format can introduce in an extension, the current behavior can be misleading because if the url is already encoded, any equality test needs to use the same encoded string, which is going to make the code harder to read (e.g. how can you easily know, by looking at the code and/or the manifest.json file of a webextention, if the extension is targeting the url "http://еа.com" (the encoded version of "http://xn--80aj.com") or the plain "http://ea.com"?

On the bright side in our webRequest internals we have access to both the encodings as part of the nsIURI properties, in particular:

- uri.asciiSpec is the url in the original IDNA format ("http://xn--80aj.com")
- uri.spec is the encoded url ("http://еа.com")

we are currently using `uri.spec` everywhere, e.g. the one related to the onBeforeRequest:

- http://searchfox.org/mozilla-central/rev/24c443a440104dabd9447608bd41b8766e8fc2f5/toolkit/modules/addons/WebRequest.jsm#735

but there are also the originUrl and the documentUrl:

-  http://searchfox.org/mozilla-central/rev/24c443a440104dabd9447608bd41b8766e8fc2f5/toolkit/modules/addons/WebRequest.jsm#747,751

If we opt to change the url format on webRequest.onBeforeRequest,  we should also do the same for the other webRequest events, and we should also evaluate what this means for the host permissions, the url filters and for the other API that can have similar issues (e.g. the webNavigation and tabs details).

I'd like to hear Shane opinion on what described above.
Flags: needinfo?(mixedpuppy)
Maybe related with bug 945240.
IIUC, WebRequest in Chrome is providing the equivalent of URI.asciiSpec and we are providing unicode.  

I think it is likely correct to switch to using asciiSpec (and document the conversion back for those instances it's necessary).

I'm not clear if this is an issue in WebNavigation as well, or what the implications for the new match code might be.  Lets see what Kris thinks on those.
Flags: needinfo?(mixedpuppy) → needinfo?(kmaglione+bmo)
This is being fixed in platform in bug 945240, spec will return punycode.  That is close to landing, so the question for us is whether we wait (possibly Fx56) or fix it ourselves now.  Chatted with valentin, asciiSpec should be fine but if addon authors compare a punycode against something like location.origin they'll need to do the conversion.
Depends on: 945240
We've decided to wait for the fix in bug 945240.  That should also address any other potential issues (e.g. webnavigation).
Flags: needinfo?(kmaglione+bmo)
https://treeherder.mozilla.org/#/jobs?repo=try&revision=45190fa85b7f7ae83b3b023d8602c1489fd0943d
Comment hidden (mozreview-request)
bug 945240 is pretty big, and may have yet to fix area's.  The patch I attached here simply fixes our use in WebRequest and WebNavigation which I think should be enough for OP.
You need to log in before you can comment on or make changes to this bug.