Closed
Bug 1240436
Opened 8 years ago
Closed 8 years ago
Can't load a service worker script containing an ZERO WIDTH JOINER unicode character
Categories
(Core :: DOM: Service Workers, defect)
Tracking
()
RESOLVED
FIXED
mozilla48
Tracking | Status | |
---|---|---|
firefox48 | --- | fixed |
People
(Reporter: NiKo, Assigned: stone, Mentored)
References
Details
(Whiteboard: [tw-dom] btpp-active)
Attachments
(3 files, 4 obsolete files)
2.05 KB,
patch
|
stone
:
review+
|
Details | Diff | Splinter Review |
2.86 KB,
patch
|
stone
:
review+
|
Details | Diff | Splinter Review |
2.36 KB,
patch
|
stone
:
review+
|
Details | Diff | Splinter Review |
Reference real world issue: https://github.com/bahmutov/babel-service/issues/7 It seems Firefox 43.0.4 Ubuntu can't deal with any service worker script containing a ZERO WIDTH JOINER character. I didn't test with more recent versions nor with other platforms. Sample minimal reproducible test case: 1. index.html <!DOCTYPE html> <html><meta charset="utf-8"></head> <body> <script> window.navigator.serviceWorker.register("worker.js", {scope: "/"}) .then(reg => console.log("registered", reg)) .catch(err => console.error("service worker error", err)) </script> </body> </html> 2. worker.js var a = ""; Notice variable a contains an invisible character, ZERO WIDTH JOINER (0x200D) I used Python SimpleHTTPServer to serve the page over http: $ python -m SimpleHTTPServer 8000 And opened http://localhost:8000/ Expected result: The worker script should at least be successfully registered. Actual result: A registration error occurs: SyntaxError: SyntaxError: unterminated string literal I've also created a gist if that's useful: https://gist.github.com/n1k0/6f925c655798b9a1abeb
Comment 1•8 years ago
|
||
what does the python http server return as the http content-type?
Flags: needinfo?(nperriault)
Reporter | ||
Comment 2•8 years ago
|
||
curl -I http://localhost:8000 HTTP/1.0 200 OK Server: SimpleHTTP/0.6 Python/2.7.10 Date: Mon, 18 Jan 2016 08:24:55 GMT Content-type: text/html Content-Length: 269 Last-Modified: Mon, 18 Jan 2016 08:00:20 GMT
Flags: needinfo?(nperriault)
Comment 4•8 years ago
|
||
Can you attach the problem worker script from comment 0? Or give me instructions on how to craft one? Its unclear to me how to generate such a script.
Flags: needinfo?(nperriault)
Reporter | ||
Comment 5•8 years ago
|
||
You can either use the real world script failing here: https://babel-service-demo.herokuapp.com/babel-service.js (offending char is at line 10348), or create a file and paste the content I suggested in comment 0; to ensure it contains the 0x200D char, paste the copied text into http://www.unicodemap.org/search.asp?search= to ensure it's actually there; for example if you select and copy the var declaration stuff and paste it there, you'll get something like in this capture http://i.imgur.com/QO3Auhr.png Then just load that script as a service worker and you're done.
Flags: needinfo?(nperriault)
Whiteboard: [tw-dom]
Updated•8 years ago
|
Assignee: bkelly → nobody
Status: ASSIGNED → NEW
Mentor: khuey
Assignee | ||
Updated•8 years ago
|
Assignee: nobody → sshih
Assignee | ||
Comment 6•8 years ago
|
||
After a short study, according to [1], the promise which returned by |register(...)| is rejected when an uncaught exception occurred during 'invoking service worker algorithm', while [2] shows a corresponding uncaught script error had occurred when the target script is not parse/compile/initialize successfully. I think the behavior works as spec defined. Is there anything I misunderstand? BTW, it seems chrome register the service worker successfully because it can handle the special character 0x200D. [1] https://www.w3.org/TR/service-workers/#update-algorithm (step 4.22.3 and 4.22.4) [2] https://www.w3.org/TR/service-workers/#run-service-worker-algorithm (step 12)
Flags: needinfo?(nperriault)
Reporter | ||
Comment 7•8 years ago
|
||
Maybe what we want is being eventually able to handle the special character 0x200D as well? I'm not too much into character support, but would there be any reason to keep things being the way they are atm? Can't we just support this character? Also, why is this character supported in a page context and not in a service worker one? If we plan on claiming this to be a feature, where should we document behavioral differences between service worker JavaScript vs. regular page one?
Flags: needinfo?(nperriault)
Comment 8•8 years ago
|
||
Are these characters usable in dedicated workers? We must be coercing a string badly in workers/ScriptLoader.cpp.
Assignee | ||
Comment 9•8 years ago
|
||
Thanks for the feedback. Dedicated workers work fine. After investigations, the JavaScript source codes are truncated in https://dxr.mozilla.org/mozilla-central/source/dom/workers/ServiceWorkerScriptCache.cpp#540. They are encoded as UTF-16 and truncated by NS_NewStringInputStream. I'm trying to find a solution for it.
Comment 10•8 years ago
|
||
Is the UTF-8 hint being passed here correct? https://dxr.mozilla.org/mozilla-central/source/dom/workers/ServiceWorkerScriptCache.cpp#783
Assignee | ||
Comment 11•8 years ago
|
||
(In reply to Ben Kelly [:bkelly] from comment #10) > Is the UTF-8 hint being passed here correct? > > > https://dxr.mozilla.org/mozilla-central/source/dom/workers/ > ServiceWorkerScriptCache.cpp#783 I think the answer is yes because 1. The file 'worker.js' downloaded from [1] are encoded as UTF8 (U+200D is encoded as 0xe2 0x80 0x8d). 2. According to [2], the script should be encoded as UTF8. And the cause of this bug is In [3], a nsString with UTF16 encoding are converted by a lossy utility to a single byte buffer when writing to cache. While in [4], the string buffer is converted from UTF8 to UTF16 when retrieving from cache. [1] https://gist.github.com/n1k0/6f925c655798b9a1abeb [2] https://www.w3.org/TR/service-workers/#run-service-worker-algorithm step 7 [3] https://dxr.mozilla.org/mozilla-central/source/dom/workers/ServiceWorkerScriptCache.cpp#540 [4] https://dxr.mozilla.org/mozilla-central/source/dom/workers/ServiceWorkerScriptCache.cpp#836
Assignee | ||
Comment 12•8 years ago
|
||
Attachment #8739823 -
Flags: review?(khuey)
Comment on attachment 8739823 [details] [diff] [review] Convert the service worker script to be registered to utf8 before writing to cache. r?khuey Review of attachment 8739823 [details] [diff] [review]: ----------------------------------------------------------------- ::: dom/workers/ServiceWorkerScriptCache.cpp @@ +537,5 @@ > > ErrorResult result; > nsCOMPtr<nsIInputStream> body; > + result = NS_NewCStringInputStream(getter_AddRefs(body), > + NS_ConvertUTF16toUTF8(mCN->Buffer())); http://mxr.mozilla.org/mozilla-central/source/xpcom/io/nsStringStream.cpp#410 Doh!!!
Attachment #8739823 -
Flags: review?(khuey) → review+
Nathan, if the code linked in comment 13 is expected behavior, can we get a big warning comment somewhere about that? Or maybe an assertion? Ben, does ExtractFromURLSearchParams in fetch need to do something different here too?
Flags: needinfo?(nfroyd)
Flags: needinfo?(bkelly)
Comment 15•8 years ago
|
||
It certainly seems like this code would have the exact same problem for a URL search param containing one of these UTF-16 characters: https://dxr.mozilla.org/mozilla-central/source/dom/fetch/Fetch.cpp#454 Ming-Chou, can you fix this case as well using the same solution as here? Thanks!
Flags: needinfo?(bkelly) → needinfo?(sshih)
I wonder if we should just fix this in xpcom/ ...
Comment 17•8 years ago
|
||
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #16) > I wonder if we should just fix this in xpcom/ ... Yes, we should. It's not clear to me if the uses in netwerk/streamconv/converters/ are bogus or not: https://dxr.mozilla.org/mozilla-central/search?q=NS_NewStringInputStream&redirect=false&case=false though there are various XXX comments about Unicode and such, so maybe the current behavior is the intended behavior? We could have NS_NewStringInputStream that does the right thing and NS_NewLossyStringInputStream that retains the current behavior, using the former in dom/ and the latter in netwerk/. Or we could ask somebody about the netwerk/ ones.
Flags: needinfo?(nfroyd)
Assignee | ||
Comment 18•8 years ago
|
||
(In reply to Ben Kelly [:bkelly] from comment #15) > It certainly seems like this code would have the exact same problem for a > URL search param containing one of these UTF-16 characters: > > https://dxr.mozilla.org/mozilla-central/source/dom/fetch/Fetch.cpp#454 > > Ming-Chou, can you fix this case as well using the same solution as here? > Thanks! No problems. I will handle it if our conclusion is fixing it in callers.
Flags: needinfo?(sshih)
Updated•8 years ago
|
Whiteboard: [tw-dom] → [tw-dom] btpp-active
Assignee | ||
Comment 19•8 years ago
|
||
Hi Honza, Would you mind helping me confirm whether the behaviors in nsTXTToHTMLConv which uses NS_NewStringInputStream to generate nsStringInputStream are intended? This API truncates the high bytes of nsAString to a single-byte data buffer and leads to this bug. I plan to fix it by converting UTF16 to UTF8 in NS_NewStringInputStream, but need confirmation whether it's ok to nsTXTToHTMLConv.
Flags: needinfo?(honzab.moz)
Comment 20•8 years ago
|
||
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #16) > I wonder if we should just fix this in xpcom/ ... Could we have something like: template <class T> nsresult NS_NewStringInputStream(nsIInputStream** aStreamResult, const nsAString& aStringToRead) { T data(aStringToRead); return NS_NewCStringInputStream(aStreamResult, data); } ? (In reply to Ming-Chou Shih [:stone] from comment #19) > Hi Honza, > Would you mind helping me confirm whether the behaviors in nsTXTToHTMLConv > which uses NS_NewStringInputStream to generate nsStringInputStream are > intended? This API truncates the high bytes of nsAString to a single-byte > data buffer and leads to this bug. I plan to fix it by converting UTF16 to > UTF8 in NS_NewStringInputStream, but need confirmation whether it's ok to > nsTXTToHTMLConv. The code has been introduced in bug 318193 and loosy conversion is there since ever (2006). I think there originally was only a loosy conv because we didn't have any UTF-16 -> UTF-8 conversion, only this function https://github.com/mozilla/gecko-dev/blob/eb3847a802f725d0a1eeaf94305f5b70a55fee9f/xpcom/string/public/nsStringAPI.h#L784 which was loosy too. Hence, I think to have a template function that on places you KNOW the target can be read as UTF-8 should use NS_ConvertUTF16toUTF8. See https://mxr.mozilla.org/mozilla-central/source/netwerk/streamconv/converters/nsTXTToHTMLConv.cpp#150. It's already destroying the input with LossyConvertEncoding8to16. I'd rather leave it, so use the template function with loosy conv as the template arg.
Flags: needinfo?(honzab.moz)
It would be better to make NS_NewStringInputStream do the correct thing, and make any legacy Necko consumers do the lossy conversion and then call NS_NewCStringInputStream themselves, rather than templating NS_NewStringInputStream or something crazy.
Assignee | ||
Comment 22•8 years ago
|
||
Hi Kyle and Honza, I personally prefer Kyle's solution mentioned in comment #21. In fact, I think we can be a little more aggresive, that is we can remove both of the two functions. Since the InputStream has no idea about the format of its content, the caller should explicitly decide the encoding method when creating it and the decoding method when reading from it. Thus, the caller can write something like the following snippets. ``` NS_ConvertUTF16toUTF8 data(aStringToRead); NS_NewCStringInputStream(aStreamResult, data); ``` or ``` NS_LossyConvertUTF16toASCII data(aStringToRead); NS_NewCStringInputStream(aStreamResult, data); ``` How do you think about it?
Flags: needinfo?(khuey)
Flags: needinfo?(honzab.moz)
That sounds fine.
Flags: needinfo?(khuey)
Comment 24•8 years ago
|
||
That is kinda more flexible than a template function, tho caller is not notified during compile time that there is a need for a conversion to be made, so it may be a foot gun. Up to you tho, I'm fine with both of the approaches.
Flags: needinfo?(honzab.moz)
Comment 25•8 years ago
|
||
Is this something exposed to addons or chrome script? Is it possible those depend on lossy conversions for some reason?
Assignee | ||
Comment 26•8 years ago
|
||
(In reply to Ben Kelly [:bkelly] from comment #25) > Is this something exposed to addons or chrome script? Is it possible those > depend on lossy conversions for some reason? If I understand Gecko correctly, addons and Chrome scripts can only access methods exposed by XPCOM interfaces. Thus, the method (NS_NewStringInputStream) only can be used internally. The other possibility that addons can use those methods is that the method is exposed to the addon development environment. I've tried searching it with DXR, and I didn't find anything alike. Is there any case should be considered?
No, you're correct.
Assignee | ||
Comment 28•8 years ago
|
||
Attachment #8739823 -
Attachment is obsolete: true
Attachment #8742045 -
Flags: review?(khuey)
Assignee | ||
Comment 29•8 years ago
|
||
Attachment #8742046 -
Flags: review?(honzab.moz)
Assignee | ||
Comment 30•8 years ago
|
||
Attachment #8742047 -
Flags: review?(khuey)
Assignee | ||
Comment 31•8 years ago
|
||
https://treeherder.mozilla.org/#/jobs?repo=try&revision=d447e3c4eb9a1bbf87fc4054564e9b7524770f6f
Attachment #8742045 -
Flags: review?(khuey) → review+
Attachment #8742047 -
Flags: review?(khuey) → review?(nfroyd)
Updated•8 years ago
|
Attachment #8742047 -
Flags: review?(nfroyd) → review+
Updated•8 years ago
|
Attachment #8742046 -
Flags: review?(honzab.moz) → review+
Assignee | ||
Comment 32•8 years ago
|
||
Changed patch summary
Attachment #8742045 -
Attachment is obsolete: true
Attachment #8742687 -
Flags: review+
Assignee | ||
Comment 34•8 years ago
|
||
Changed patch summary
Attachment #8742046 -
Attachment is obsolete: true
Attachment #8742047 -
Attachment is obsolete: true
Attachment #8742689 -
Flags: review+
Assignee | ||
Updated•8 years ago
|
Keywords: checkin-needed
Comment 35•8 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/0abedbe00975 https://hg.mozilla.org/integration/mozilla-inbound/rev/e9cdedc9c456 https://hg.mozilla.org/integration/mozilla-inbound/rev/f851d6d56143
Keywords: checkin-needed
A test would be nice too.
Flags: needinfo?(sshih)
Comment 37•8 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/0abedbe00975 https://hg.mozilla.org/mozilla-central/rev/e9cdedc9c456 https://hg.mozilla.org/mozilla-central/rev/f851d6d56143
Status: NEW → RESOLVED
Closed: 8 years ago
status-firefox48:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla48
Assignee | ||
Comment 38•8 years ago
|
||
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #36) > A test would be nice too. OK, I will add it.
Flags: needinfo?(sshih)
You need to log in
before you can comment on or make changes to this bug.
Description
•