Closed Bug 1254098 Opened 4 years ago Closed 4 years ago

Investigate testing/web-platform/tests/workers/semantics/encodings/003.html and 004.html

Categories

(Core :: DOM: Workers, defect)

defect
Not set

Tracking

()

RESOLVED FIXED
mozilla48
Tracking Status
firefox48 --- fixed

People

(Reporter: khuey, Assigned: stone, Mentored)

References

(Depends on 1 open bug)

Details

(Whiteboard: [tw-dom] btpp-active)

Attachments

(2 files, 5 obsolete files)

The test appears to be testing character set handling of either XHR or script loads on worker threads.  Wouldn't surprise me if we're doing something wrong there.
I would like to try on this bug.
Assignee: nobody → sshih
Whiteboard: [tw-dom] → [tw-dom] btpp-active
Attached patch bug-1254098-wip-part1.patch (obsolete) — Splinter Review
Attached patch bug-1254098-wip-part2.patch (obsolete) — Splinter Review
I found two suspicious problems from this bug.

1. When URL encoding override is absent, the base URL encoding is selected to encode the new URL, which leads some test cases failed. I can't find the spec to define thus behavior so made a WIP and tried it (bug-1254098-wip-part1.patch and bug-1254098-wip-part2.patch).

Code in https://dxr.mozilla.org/mozilla-central/source/netwerk/base/nsStandardURL.cpp?from=nsstandardurl.cpp#2930
The tried result is in https://treeherder.mozilla.org/#/jobs?repo=try&revision=a6b39df64714.

2. Worker uses parent document's charset encoding to encode the URL. That means the URL may be encoded as windows-1252 or other encodings. 
(https://dxr.mozilla.org/mozilla-central/source/dom/workers/ScriptLoader.cpp?from=ScriptLoader.cpp#120)

I found https://html.spec.whatwg.org/multipage/workers.html 10.2.6.2 defines the 'The API URL character encoding' is 'Return UTF-8'. Does it mean the worker URL should be encoded as UTF-8?

I tried to encode worker URL with UTF-8 but failed some test cases of web-platform-tests
('Worker constructor' and 'SharedWorker constructor' in html/infrastructure/urls/resolving-urls/query-encoding/windows-1252.html). 

I am not sure whether my miss any spec to define the encoding of worker URL? Could you kindly give me some suggestions? Thanks.
Flags: needinfo?(khuey)
Maybe annevk knows the answer to these questions.
Flags: needinfo?(khuey) → needinfo?(annevk)
Okay, so the testcases mentioned in comment 0 are about XMLHttpRequest. XMLHttpRequest always uses UTF-8 for its URLs. That follows from https://xhr.spec.whatwg.org/#dom-xmlhttprequest-open which in step 6 directly invokes the URL parser without an encoding override (which therefore defaults to UTF-8).

Now as for workers, it's complicated. new Worker() is defined in https://html.spec.whatwg.org/multipage/workers.html#dom-worker and per step 2 uses https://html.spec.whatwg.org/multipage/infrastructure.html#parse-a-url to parse a URL. That grabs the encoding from a settings object (which is 1:1 with the global object), so for Window objects that will be the encoding of the associated document and for workers that will be UTF-8. So if you invoke new Worker() when Window is the global object you'll use the document encoding (as that test seems to expect) and when WorkerGlobalScope is the global object you'll use UTF-8.

Therefore, importScripts() will always use UTF-8, since it can only be used inside workers.

(Having said all that, I think we should maybe change new Worker and new SharedWorker to also always use UTF-8, but this bug is mostly about XMLHttpRequest I think, unless I missed something.)
Flags: needinfo?(annevk)
Thanks to Annevk and Kyle. I think the causes of this bug are
1. The document is detected as windows-1252 encoding.
2. A worker is created with relative URL '#', so the worker script URL encoding is windows-1252 (followed the encoding of associated document).
3. The worker script creates XHR to open a URL '003-1.py?x=疇' then nsStandardURL uses base URL's encoding (windows-1252) to encode it and get '003-1.py?x=%E5' (expected '003-1.py?x=%C3%A5' which is encoded with UTF-8).
Okay, so 2 and are 3 are the bugs. A worker script URL encoding must always be UTF-8. XMLHttpRequest must always use UTF-8 (overriding the base URL's encoding).
Attachment #8736282 - Attachment is obsolete: true
Attachment #8736286 - Attachment is obsolete: true
Attachment #8737118 - Flags: review?(mcmanus)
Attachment #8737118 - Attachment is obsolete: true
Attachment #8737118 - Flags: review?(mcmanus)
Attachment #8737120 - Flags: review?(mcmanus)
Attachment #8737120 - Flags: review?(mcmanus) → review?(valentin.gosu)
Attachment #8737119 - Flags: review?(mcmanus) → review?(valentin.gosu)
Attachment #8737119 - Flags: review?(valentin.gosu) → review+
Comment on attachment 8737120 [details] [diff] [review]
Bug 1254098 Part1: Don't use base url encoding to encode url when charset override is absent.

Review of attachment 8737120 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks for taking this.
The charset handling fix is fine, but don't make any whitespace changes to the file.
Attachment #8737120 - Flags: review?(valentin.gosu) → review+
Follow reviewer's feedbacks to remove trailing space modifications. Change reviewer in the patch summary.
Attachment #8737119 - Attachment is obsolete: true
Attachment #8737120 - Attachment is obsolete: true
Attachment #8739273 - Flags: review+
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/b80531102366
https://hg.mozilla.org/mozilla-central/rev/7056c33bc38e
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla48
Depends on: 1320925
You need to log in before you can comment on or make changes to this bug.