Closed Bug 1360285 Opened 7 years ago Closed 7 years ago

URL values not normalized to ASCII in WebExtensions API


(WebExtensions :: Request Handling, defect, P2)

55 Branch


(Not tracked)

webextensions ?


(Reporter: u287251, Assigned: mixedpuppy)



(Whiteboard: triaged)


(2 files)

Attached file
User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:55.0) Gecko/20100101 Firefox/55.0
Build ID: 20170427100655

Steps to reproduce:

1. Install the attached minimalist extension -- this extension installs a webRequest.onBeforeRequest listener, and it will output to the browser console the hostname extracted from the details.url of whatever request it receives.

2. Temporarily load this extension using about:debugging.

3. Open browser console.

4. Open in a new tab -- the purpose of this page is to make image requests to various urls which are IDNs (some of these URLs are not referring to existing servers).

5. Observe in the browser console the reported hostnames.

Actual results:


The reported IDN hostnames are seemingly randomly reported as ASCII strings (punycoded) or as Unicode strings.

Expected results:

I can't find nowhere in the chrome API documentation that all URL values are guarantee to always be plain ASCII internally throughout the chrome API -- so at this point this bug report is merely my opinion of what should happen.

However I have observed that there never has been a case where the chrome webRequest, webNavigation, etc. APIs were URL values passed with Unicode-based hostnames, they always have been ASCII-based.

(This currently breaks the webext version of uBlock Origin when such URL values with Unicode character in it are passed to its listeners, or other API entry points.)

I do believe that all URL values in the webext API should be normalized to ASCII strings -- as otherwise this would force extension authors to normalize on their side using javascript code, an avoidable overhead knowing that the ASCII-based values of URL is readily available to the Firefox's webext framework (the legacy version of uBlock Origin uses these readily available ASCII URL values).

Unicode-encoded URLs should be only dealt with to interface with user input/output.
Component: Untriaged → WebExtensions: Request Handling
Product: Firefox → Toolkit
Blocks: 1309926
Priority: -- → P5
Whiteboard: triaged
webextensions: --- → ?
I gave a deeper look into this and it looks like there is no easy way to retrieve the "not normalized ascii url" (the one in the original IDNA format), e.g. by using URL ( we can easily convert the IDNA format into the normalized url:

    `(new URL("")).href` => "http://еа.com"

but no easy way to do the opposite and convert "http://еа.com" into the more explicit "".

After giving the attached example a try and explored it by making some experiments on top of it, I think that I agree with Robert. 

Besides the performance impact that converting the url back to the original IDNA format can introduce in an extension, the current behavior can be misleading because if the url is already encoded, any equality test needs to use the same encoded string, which is going to make the code harder to read (e.g. how can you easily know, by looking at the code and/or the manifest.json file of a webextention, if the extension is targeting the url "http://еа.com" (the encoded version of "") or the plain ""?

On the bright side in our webRequest internals we have access to both the encodings as part of the nsIURI properties, in particular:

- uri.asciiSpec is the url in the original IDNA format ("")
- uri.spec is the encoded url ("http://еа.com")

we are currently using `uri.spec` everywhere, e.g. the one related to the onBeforeRequest:


but there are also the originUrl and the documentUrl:


If we opt to change the url format on webRequest.onBeforeRequest,  we should also do the same for the other webRequest events, and we should also evaluate what this means for the host permissions, the url filters and for the other API that can have similar issues (e.g. the webNavigation and tabs details).

I'd like to hear Shane opinion on what described above.
Flags: needinfo?(mixedpuppy)
Maybe related with bug 945240.
IIUC, WebRequest in Chrome is providing the equivalent of URI.asciiSpec and we are providing unicode.  

I think it is likely correct to switch to using asciiSpec (and document the conversion back for those instances it's necessary).

I'm not clear if this is an issue in WebNavigation as well, or what the implications for the new match code might be.  Lets see what Kris thinks on those.
Flags: needinfo?(mixedpuppy) → needinfo?(kmaglione+bmo)
This is being fixed in platform in bug 945240, spec will return punycode.  That is close to landing, so the question for us is whether we wait (possibly Fx56) or fix it ourselves now.  Chatted with valentin, asciiSpec should be fine but if addon authors compare a punycode against something like location.origin they'll need to do the conversion.
Depends on: 945240
We've decided to wait for the fix in bug 945240.  That should also address any other potential issues (e.g. webnavigation).
Flags: needinfo?(kmaglione+bmo)
bug 945240 is pretty big, and may have yet to fix area's.  The patch I attached here simply fixes our use in WebRequest and WebNavigation which I think should be enough for OP.
Assignee: nobody → mixedpuppy
Priority: P5 → P2
Bug 945240 has landed, based on the test I am closing this as unnecessary effort.
Closed: 7 years ago
Resolution: --- → WONTFIX
Product: Toolkit → WebExtensions
You need to log in before you can comment on or make changes to this bug.