URL values not normalized to ASCII in WebExtensions API

UNCONFIRMED
Unassigned

Status

()

Toolkit
WebExtensions: Request Handling
P5
normal
UNCONFIRMED
27 days ago
21 hours ago

People

(Reporter: R. Hill, Unassigned, NeedInfo)

Tracking

(Blocks: 1 bug)

55 Branch
Points:
---

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: triaged)

Attachments

(1 attachment)

(Reporter)

Description

27 days ago
Created attachment 8862520 [details]
ff-webext-idn-urls.zip

User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:55.0) Gecko/20100101 Firefox/55.0
Build ID: 20170427100655

Steps to reproduce:

1. Install the attached minimalist extension -- this extension installs a webRequest.onBeforeRequest listener, and it will output to the browser console the hostname extracted from the details.url of whatever request it receives.

2. Temporarily load this extension using about:debugging.

3. Open browser console.

4. Open http://raymondhill.net/ublock/idn-tricks.html in a new tab -- the purpose of this page is to make image requests to various urls which are IDNs (some of these URLs are not referring to existing servers).

5. Observe in the browser console the reported hostnames.



Actual results:

Results:

The reported IDN hostnames are seemingly randomly reported as ASCII strings (punycoded) or as Unicode strings.



Expected results:

I can't find nowhere in the chrome API documentation that all URL values are guarantee to always be plain ASCII internally throughout the chrome API -- so at this point this bug report is merely my opinion of what should happen.

However I have observed that there never has been a case where the chrome webRequest, webNavigation, etc. APIs were URL values passed with Unicode-based hostnames, they always have been ASCII-based.

(This currently breaks the webext version of uBlock Origin when such URL values with Unicode character in it are passed to its listeners, or other API entry points.)

I do believe that all URL values in the webext API should be normalized to ASCII strings -- as otherwise this would force extension authors to normalize on their side using javascript code, an avoidable overhead knowing that the ASCII-based values of URL is readily available to the Firefox's webext framework (the legacy version of uBlock Origin uses these readily available ASCII URL values).

Unicode-encoded URLs should be only dealt with to interface with user input/output.

Updated

27 days ago
Component: Untriaged → WebExtensions: Request Handling
Product: Firefox → Toolkit

Updated

27 days ago
Blocks: 1309926

Updated

23 days ago
Priority: -- → P5
Whiteboard: triaged

Updated

7 days ago
webextensions: --- → ?

Comment 1

5 days ago
I gave a deeper look into this and it looks like there is no easy way to retrieve the "not normalized ascii url" (the one in the original IDNA format), e.g. by using URL (https://developer.mozilla.org/en-US/docs/Web/API/URL) we can easily convert the IDNA format into the normalized url:

    `(new URL("http://xn--80aj.com")).href` => "http://еа.com"

but no easy way to do the opposite and convert "http://еа.com" into the more explicit "http://xn--80aj.com".

After giving the attached example a try and explored it by making some experiments on top of it, I think that I agree with Robert. 

Besides the performance impact that converting the url back to the original IDNA format can introduce in an extension, the current behavior can be misleading because if the url is already encoded, any equality test needs to use the same encoded string, which is going to make the code harder to read (e.g. how can you easily know, by looking at the code and/or the manifest.json file of a webextention, if the extension is targeting the url "http://еа.com" (the encoded version of "http://xn--80aj.com") or the plain "http://ea.com"?

On the bright side in our webRequest internals we have access to both the encodings as part of the nsIURI properties, in particular:

- uri.asciiSpec is the url in the original IDNA format ("http://xn--80aj.com")
- uri.spec is the encoded url ("http://еа.com")

we are currently using `uri.spec` everywhere, e.g. the one related to the onBeforeRequest:

- http://searchfox.org/mozilla-central/rev/24c443a440104dabd9447608bd41b8766e8fc2f5/toolkit/modules/addons/WebRequest.jsm#735

but there are also the originUrl and the documentUrl:

-  http://searchfox.org/mozilla-central/rev/24c443a440104dabd9447608bd41b8766e8fc2f5/toolkit/modules/addons/WebRequest.jsm#747,751

If we opt to change the url format on webRequest.onBeforeRequest,  we should also do the same for the other webRequest events, and we should also evaluate what this means for the host permissions, the url filters and for the other API that can have similar issues (e.g. the webNavigation and tabs details).

I'd like to hear Shane opinion on what described above.
Flags: needinfo?(mixedpuppy)
Maybe related with bug 945240.
You need to log in before you can comment on or make changes to this bug.