Closed Bug 1492937 Opened 3 years ago Closed 3 years ago
The JS subscript loader should load scripts as UTF-8 only, never as Latin-1
There are only a few callers of JS::CompileLatin1 and JS::CompileLatin1ForNonSyntacticScope now. Two of the calls are in PrepareScript in mozJSSubScriptLoader.cpp, dealing (I presume) with loading JS subscripts from trusted code. https://searchfox.org/mozilla-central/search?q=symbol:_ZN2JS13CompileLatin1EP9JSContextRKNS_22ReadOnlyCompileOptionsEPKcmNS_13MutableHandleIP8JSScriptEE&redirect=false https://searchfox.org/mozilla-central/search?q=symbol:_ZN2JS33CompileLatin1ForNonSyntacticScopeEP9JSContextRKNS_22ReadOnlyCompileOptionsEPKcmNS_13MutableHandleIP8JSScriptEE&redirect=false Last I knew, JS subscript loading was an XPCOM/XPConnect API not directly exposed to Web Extensions, so we internally are the only users of it. Therefore we might be able to just unilaterally change to compiling subscripts as UTF-8. I haven't attempted this to see whether or not it would work. Anyone want to? Note that changing to UTF-8 probably does not entail any different performance. Latin-1 source text currently gets inflated to UTF-16 before it's compiled, and UTF-8 undergoes essentially the same inflation, so it's six of one, half a dozen of the other now. But direct parsing, compilation, and evaluation of UTF-8 source text is coming soon via bug 1351107 and other bugs, and in that future world using UTF-8 instead of Latin-1 won't require inflation and likely will be some sort of perf win.
There are super-few uses of this, and they all pass in ASCII scripts, if you feel like verifying yourself. (Future patches here may not be so verifiable. :-\ I don't have an answer to that problem, save to say positive treeherder results are probably the best we can do for the time/effort/reward tradeoff.) https://searchfox.org/mozilla-central/search?q=loadSubScriptWithOptions&case=true®exp=false&path=
Attachment #9030383 - Flags: review?(kmaglione+bmo)
Assignee: nobody → jwalden+bmo
Status: NEW → ASSIGNED
Attachment #9030383 - Flags: review?(kmaglione+bmo) → review+
Pushed by firstname.lastname@example.org: https://hg.mozilla.org/integration/mozilla-inbound/rev/31eb07ce57ae Make mozIJSSubScriptLoader.loadSubScriptWithOptions interpret script data only as UTF-8, without any way to use another charset. r=kmag
Hi all! I would appreciate moderately fast reviews on this, because loadSubScript calls and fresh mochitest-browser tests spring up often, and any one of them can bust this patch. I audited all loadSubScript calls (and all mochitest-browser tests, because one call handles all of them) that load non-ASCII UTF-8 as Latin-1, and I fixed the ones that would break if UTF-8 were used. First, I added a runUtf8Script identical to loadSubScript, save for using only UTF-8. Second, I laboriously swapped over callers that processed only ASCII, and callers that already demanded UTF-8, to this new function. Third, I adjusted callers that passed in UTF-8 but would break if the script were actually interpreted as UTF-8. And fourth, I changed a bunch of subscripts that clearly were dancing around this bug, to use UTF-8 directly where they really wanted to -- put this fix to immediate use. A try-push of the full stack of patches as I worked on them is available here. The full stack makes a lot of runUtf8Script-related changes that are erased in this rollup patch. https://treeherder.mozilla.org/#/jobs?repo=try&revision=6ce6c31b86ea09b61e68920bab99f6ac20ed20dd The first revision in the full patch stack is 4ea4ca0687de5e413b1ae864c9afd052472c0e8b, and the revision preceding it is e305cb6319226b712e95315eeaf95cc0eddeae56. (The try-push depends on a few patches from a previous try-push, hence they don't show up just looking at that single push.) I would provide a nice clean pushloghtml-like view of exactly these changes, but I don't know how to get one out of hgweb. For who reviews what: * accessible/: surkov This change fixes accessible/tests/browser/bounds/browser_test_zoom_text.js and accessible/tests/browser/scroll/browser_test_zoom_text.js which passed non-ASCII to addAccessibleTask, which calls snippetToURL, which passes in non-8-bit text to (probably) nsContentUtils::Btoa which promptly throws. (It does not appear going to one code point from several Latin-1 code points in these snippets changes test behavior, but you tell me, I'm leaning on try. :-) ) Encode/escape the document into a data: URL in a more webby way that doesn't have btoa restrictions. * browser/, toolkit/: MattN The URL bar copying goo has a toUnicode that converts strings containing UTF-8 spewed into separate characters in a string, to strings containing the corresponding code points in UTF-8. When browser scripts are UTF-8, we can just use the strings directly without going through toUnicode. Incidentally, I haven't hit content-task.js-related issues of the sort you mentioned yesterday in any of this bug's patching. (The one failure of that nature I *currently* see in the push, was not in a prior push that didn't include the rename-backward-to-loadSubScript: https://treeherder.mozilla.org/#/jobs?repo=try&revision=b68e4e6e5f9725283c7e9b708870b49583470566&group_state=expanded but is otherwise the same, up to a couple tweaks, to the try-push I noted earlier.) * devtools/: jimb The lambda character isn't in that element any more for browser_dbg-outline.js, so the replacing is just not necessary at all. https://searchfox.org/mozilla-central/rev/3160ddc1f0ab55d230c595366662c62950e5c785/devtools/client/debugger/new/src/components/PrimaryPanes/Outline.js#152 https://searchfox.org/mozilla-central/rev/3160ddc1f0ab55d230c595366662c62950e5c785/devtools/client/debugger/new/src/components/shared/PreviewFunction.js#52 I think devtools/client/netmonitor/test/browser_net_curl-utils.js may be buggy. If I use the code points from UTF-8 directly, the resulting escaped string will contain \x-escapes of those extended-ASCII code points. But as this test is written, it seems to think UTF-8 encodings of stuff are what curl uses! I can't find good curl docs on what is right for this, so I'd like to do the stupid thing of exactly replicating the Latin-1 interpretation of the UTF-8 bytes into that string, then file a followup bug to determine what ought be done here. It looks like *possibly* CurlUtils.escapeStringPosix may just be buggy, but I don't know. devtools/client/performance/test/browser_timeline-waterfall-sidebar.js could stand to directly use UTF-8 in a comparison string, judging by the comment there. But I don't have the desired comparison string on speed-dial, so I'd rather devtools people make that change. * js/xpconnect, netwerk/, security/, testing/mochitest: kmag LoadSubScriptOptions::charset as a field, and charset conversion code related to it, will die in a subsequent patch I haven't written yet. I'd like to pocket the gains of the bug summary before doing that cleanup. I guess if the Android xpcshell needs bumping to pick up server changes, as you hypothesized in the other bug, this change will require such. Let me know where/how to file a bug to make that happen? Shouldn't leave the footgun around *even* *more* when this is in. Thanks in advance to everyone!
(In reply to Jeff Walden [:Waldo] from comment #4) > I guess if the Android xpcshell needs bumping to pick up server changes, as > you hypothesized in the other bug, this change will require such. ...as soon as there's a .sjs in the tree, containing UTF-8 in a semantically important manner, that runs on Android. We do not have one now, of course. And as of this bug, merely picking up a new httpd.js (if we guessed at the cause of the problem correctly) will no longer be enough to fix UTF-8 .sjs on Android.
Comment on attachment 9031072 [details] [diff] [review] Make loadSubScript always interpret script data as UTF-8, and fix existing subscripts for this where necessary Review of attachment 9031072 [details] [diff] [review]: ----------------------------------------------------------------- Thanks
Attachment #9031072 - Flags: review?(MattN+bmo) → review+
Attachment #9031072 - Flags: review?(kmaglione+bmo) → review+
(In reply to Jeff Walden [:Waldo] from comment #4) > I guess if the Android xpcshell needs bumping to pick up server changes, as > you hypothesized in the other bug, this change will require such. Let me > know where/how to file a bug to make that happen? Shouldn't leave the > footgun around *even* *more* when this is in. Just file a dupe of bug 1457012, and make sure to ask that it includes these changes. Filing sooner is probably better than later, since it can take them a while, and who knows what unexpected problems will show up when you try to land...
Comment on attachment 9031072 [details] [diff] [review] Make loadSubScript always interpret script data as UTF-8, and fix existing subscripts for this where necessary Saving MattN and kmag's r+ marks, wiped out by bugzilla with my blind "Okay" button mashing
Attachment #9031072 - Flags: review?(surkov.alexander) → review?(yzenevich)
Comment on attachment 9031072 [details] [diff] [review] Make loadSubScript always interpret script data as UTF-8, and fix existing subscripts for this where necessary Review of attachment 9031072 [details] [diff] [review]: ----------------------------------------------------------------- Thanks, this looks good.
Attachment #9031072 - Flags: review?(yzenevich) → review+
Pushed by email@example.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/c3e4039f9d1e Make the JS subscript loader load scripts exclusively as UTF-8, with no way to specify any other encoding, and adjust a bunch of existing tests to use UTF-8 directly, rather than Unicode escape sequences or similar. (This also changes the encoding of .sjs scripts and all mochitest-browser tests in the tree from Latin-1 to UTF-8.) r=yzen, r=MattN, r=jimb, r=kmag
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Commit pushed to master at https://github.com/mozilla/activity-stream https://github.com/mozilla/activity-stream/commit/f7dfbf28998d8697b6099a88077b577f0683ef01 Port Bug 1492937 - Make the JS subscript loader load scripts exclusively as UTF-8 (#4600)
Summary: Should the JS subscript loader load scripts as UTF-8, not as Latin-1? → The JS subscript loader should load scripts as UTF-8 only, never as Latin-1
You need to log in before you can comment on or make changes to this bug.