Make TokenStream parse both single-byte UTF- and two-byte source text
Categories
(Core :: JavaScript Engine, enhancement, P2)
Tracking
()
People
(Reporter: Waldo, Assigned: Waldo)
References
(Blocks 1 open bug)
Details
(Keywords: perf)
Attachments
(32 files, 7 obsolete files)
Assignee | ||
Comment 1•8 years ago
|
||
Assignee | ||
Comment 2•8 years ago
|
||
Assignee | ||
Comment 3•8 years ago
|
||
Assignee | ||
Comment 4•8 years ago
|
||
Assignee | ||
Comment 5•8 years ago
|
||
Assignee | ||
Comment 6•8 years ago
|
||
Assignee | ||
Comment 7•8 years ago
|
||
Assignee | ||
Comment 8•8 years ago
|
||
Assignee | ||
Comment 9•8 years ago
|
||
Assignee | ||
Comment 10•8 years ago
|
||
Assignee | ||
Updated•8 years ago
|
Updated•8 years ago
|
Updated•8 years ago
|
Updated•8 years ago
|
Updated•8 years ago
|
Comment 12•8 years ago
|
||
Comment 14•8 years ago
|
||
Comment 15•8 years ago
|
||
Comment 16•8 years ago
|
||
Updated•8 years ago
|
Comment 17•8 years ago
|
||
Assignee | ||
Comment 18•8 years ago
|
||
Comment 19•8 years ago
|
||
Updated•8 years ago
|
Assignee | ||
Comment 20•8 years ago
|
||
Assignee | ||
Updated•8 years ago
|
Assignee | ||
Comment 21•8 years ago
|
||
Assignee | ||
Updated•8 years ago
|
Assignee | ||
Comment 22•8 years ago
|
||
Assignee | ||
Updated•8 years ago
|
Assignee | ||
Comment 23•8 years ago
|
||
Assignee | ||
Updated•8 years ago
|
Assignee | ||
Comment 24•8 years ago
|
||
Assignee | ||
Comment 25•8 years ago
|
||
Assignee | ||
Comment 26•8 years ago
|
||
Assignee | ||
Comment 27•8 years ago
|
||
Assignee | ||
Comment 28•8 years ago
|
||
Assignee | ||
Comment 29•8 years ago
|
||
Assignee | ||
Comment 30•8 years ago
|
||
Assignee | ||
Comment 31•8 years ago
|
||
Assignee | ||
Comment 32•8 years ago
|
||
Updated•8 years ago
|
Updated•8 years ago
|
Updated•8 years ago
|
Updated•8 years ago
|
Comment 33•8 years ago
|
||
Updated•8 years ago
|
Updated•8 years ago
|
Updated•8 years ago
|
Updated•8 years ago
|
Updated•8 years ago
|
Updated•8 years ago
|
Updated•8 years ago
|
Comment 34•8 years ago
|
||
Updated•8 years ago
|
Comment 35•8 years ago
|
||
Comment 36•8 years ago
|
||
bugherder |
Assignee | ||
Updated•8 years ago
|
Comment 37•8 years ago
|
||
Comment 38•8 years ago
|
||
bugherder |
Assignee | ||
Comment 39•8 years ago
|
||
Assignee | ||
Updated•8 years ago
|
Updated•8 years ago
|
Comment 40•8 years ago
|
||
Comment 42•8 years ago
|
||
Assignee | ||
Comment 43•8 years ago
|
||
Comment 44•8 years ago
|
||
bugherder |
Assignee | ||
Comment 45•8 years ago
|
||
Assignee | ||
Comment 46•8 years ago
|
||
Assignee | ||
Comment 47•8 years ago
|
||
Assignee | ||
Comment 48•8 years ago
|
||
Assignee | ||
Comment 49•8 years ago
|
||
Assignee | ||
Comment 50•8 years ago
|
||
Assignee | ||
Comment 51•8 years ago
|
||
Assignee | ||
Comment 52•8 years ago
|
||
Comment 53•8 years ago
|
||
Assignee | ||
Comment 54•8 years ago
|
||
Assignee | ||
Comment 55•8 years ago
|
||
Assignee | ||
Comment 56•8 years ago
|
||
Assignee | ||
Comment 57•8 years ago
|
||
Assignee | ||
Comment 58•8 years ago
|
||
Assignee | ||
Comment 59•8 years ago
|
||
Comment 60•8 years ago
|
||
Assignee | ||
Comment 61•8 years ago
|
||
Assignee | ||
Updated•8 years ago
|
Comment 62•8 years ago
|
||
Updated•8 years ago
|
Updated•8 years ago
|
Comment 63•8 years ago
|
||
Updated•8 years ago
|
Updated•8 years ago
|
Comment 64•8 years ago
|
||
Comment 65•8 years ago
|
||
Updated•8 years ago
|
Updated•8 years ago
|
Comment 66•8 years ago
|
||
Updated•8 years ago
|
Updated•8 years ago
|
Comment 67•8 years ago
|
||
Assignee | ||
Comment 68•8 years ago
|
||
Comment 69•8 years ago
|
||
Comment 70•8 years ago
|
||
bugherder |
Assignee | ||
Comment 71•8 years ago
|
||
Updated•8 years ago
|
Updated•7 years ago
|
Comment 73•7 years ago
|
||
Updated•7 years ago
|
Updated•7 years ago
|
Updated•7 years ago
|
Updated•7 years ago
|
Updated•7 years ago
|
Updated•7 years ago
|
Comment 75•7 years ago
|
||
Comment 76•7 years ago
|
||
Updated•7 years ago
|
Comment 77•6 years ago
|
||
Updated•6 years ago
|
Assignee | ||
Comment 79•6 years ago
|
||
Assignee | ||
Comment 80•6 years ago
|
||
Updated•6 years ago
|
Updated•6 years ago
|
Updated•6 years ago
|
Updated•6 years ago
|
Updated•6 years ago
|
Assignee | ||
Comment 81•6 years ago
|
||
The bug and dependency trees here are a bit of a mess. By this bug's summary (whether pre- or post-revision made here to eliminate the Latin-1 possibility), this bug is complete: TokenStream is perfectly capable of handling UTF-8 source text. Going further to expose and use this is not really this bug's ambit, notwithstanding that any number of bugs ostensibly blocking this actually do that task (and so really should themselves be blocked by this bug).
It seems like a waste of time to push bugs here and there and make an accurate dependency tree, so I'm just going to close this. But for the sake of people wondering what remains to be done here to actually be able to parse/compile UTF-8 source text:
- Bug 1504947 fixes column numbers to be counts of code points, not units as now. For UTF-16 we could fake it and mostly be okay; for UTF-8, the error quickly becomes glaring. This is the only bug remaining to be fixed, IMO, before we can start doing direct compilation of UTF-8 somewhere in the browser. The fix for this is "one line" to flip an #ifdef -- but if you flip it now, we consistently leak windows in several devtools tests runs. Not good.
If you just want to play around in advance of that fix, and our using this stuff in any sort of "production", the JS shell will compile from UTF-8 if you pass in files with '-u' instead of with the traditional '-f', and things generally work pretty well. (At most we have a handful of jstests/jit-tests that will fall over for the preceding problem.)
Once that's fixed,
- Bug 1506902 will make our self-hosted code be directly parsed from UTF-8.
- The JS::Compile* functions that claim to take UTF-8 should be augmented by versions that explicitly don't inflate.
- We need to add prefs to control whether to use the "normal" or "do not inflate" APIs.
- We'll need to start hacking on the various ScriptLoader.cpp to convert to UTF-8 and not UTF-16. (We may need to specially handle the UTF-8-to-UTF-8 case to avoid copying, in the normal case of no UTF-8 encoding errors.)
- We need to change places that call "normal" APIs, to call "do not inflate" APIs if so requested by pref. (And we may want some things to move in lockstep, e.g. possibly <script> elements and workers should be controlled by one pref.)
- Ultimately, the "normal" APIs will all be unused, we can remove them, and the "do not inflate" APIs can be renamed to the "normal" names.
Also we probably ought clean up the JSAPI compilation APIs to all take JS::SourceText<Unit>& only -- no accepting raw pointers and lengths -- but that's driveby cosmetic API simplification and not functionality improvement.
Anyway: let's handle the rest of this in other bugs. TokenStream, Parser, and BytecodeCompiler all handle UTF-8 just fine if you let them, and the only lapse right now (columns) is largely distinct from that achievement.
Updated•6 years ago
|
Updated•3 years ago
|
Description
•