Closed
Bug 1434429
Opened 7 years ago
Closed 7 years ago
More rounds of TokenStream adjustments for UTF-8 tokenizing/parsing
Categories
(Core :: JavaScript Engine, defect, P1)
Core
JavaScript Engine
Tracking
()
RESOLVED
FIXED
mozilla60
Tracking | Status | |
---|---|---|
firefox60 | --- | fixed |
People
(Reporter: Waldo, Assigned: Waldo)
References
Details
Attachments
(7 files)
11.38 KB,
patch
|
till
:
review+
|
Details | Diff | Splinter Review |
17.53 KB,
patch
|
arai
:
review+
|
Details | Diff | Splinter Review |
6.00 KB,
patch
|
arai
:
review+
|
Details | Diff | Splinter Review |
8.48 KB,
patch
|
arai
:
review+
|
Details | Diff | Splinter Review |
3.55 KB,
patch
|
arai
:
review+
|
Details | Diff | Splinter Review |
9.75 KB,
patch
|
arai
:
review+
|
Details | Diff | Splinter Review |
4.45 KB,
patch
|
arai
:
review+
|
Details | Diff | Splinter Review |
No description provided.
Assignee | ||
Comment 1•7 years ago
|
||
I find myself wanting TokenStream::error to actually use the current immediate offset soon, and changing how these places work is necessary to have that. Plus, these error messages are mildly improvable.
Attachment #8946815 -
Flags: review?(till)
Assignee | ||
Comment 2•7 years ago
|
||
The inheritance structure of TokenStream is coming increasingly to roughly mirror that of Parser. Unsurprisingly, given the two need to be merged pretty much (but not now).
Attachment #8946817 -
Flags: review?(arai.unmht)
Assignee | ||
Comment 3•7 years ago
|
||
Attachment #8946821 -
Flags: review?(arai.unmht)
Assignee | ||
Comment 4•7 years ago
|
||
There are no tests for this because there are no behavior changes. It *happens* that right now, badchar only happens for something that's wrong at the *start* of a token, so the ungets here just revert to exactly the same position. But, this is clearer, uses the instantaneous offset (as one would expect), and it provides a new primitive that will be useful as other places come to use better report locations as they shift away from reportError.
Attachment #8946824 -
Flags: review?(arai.unmht)
Assignee | ||
Comment 5•7 years ago
|
||
Attachment #8946825 -
Flags: review?(arai.unmht)
Assignee | ||
Comment 6•7 years ago
|
||
Ostensible UTF-16 source text in JS *can* have mispaired surrogates and other junk in it, per ECMAScript rules. But ECMAScript doesn't directly process UTF-8, only UTF-16 -- so the implicit process of conversion from malformed UTF-8 to this UTF-16-alike input should produce an error earlier in the compilation chain.
So, when tokenizing UTF-8 and we get something malformed, technically we're in a place that ECMAScript says nothing about. So *not* allowing malformed code point sequences and goo is the most desirable thing to do. That requires changing the existing multi-unit code point consumption signature to allow errors, even if no errors are reported in the case of processing UTF-16.
Note that bz says that Gecko will only hand the JS engine properly encoded data (because it does that validation step internally, as it *should*). So none of this should be web-observable, once the browser starts invoking UTF-8-specific parsing entrypoints.
Attachment #8946830 -
Flags: review?(arai.unmht)
Assignee | ||
Comment 7•7 years ago
|
||
It doesn't seem impossible to me that in the long run, we can write fully code-unit-length-generic code and never have to think about this goo anywhere in tokenizing. But we'll see.
Attachment #8946832 -
Flags: review?(arai.unmht)
Updated•7 years ago
|
Attachment #8946817 -
Flags: review?(arai.unmht) → review+
Updated•7 years ago
|
Attachment #8946821 -
Flags: review?(arai.unmht) → review+
Comment 8•7 years ago
|
||
Comment on attachment 8946825 [details] [diff] [review]
Use MakeScopeExit to handle resetting userbuf offset in TokenStreamSpecific::putIdentInTokenbuf
Review of attachment 8946825 [details] [diff] [review]:
-----------------------------------------------------------------
::: js/src/frontend/TokenStream.cpp
@@ +1325,5 @@
>
> + auto restoreNextRawCharAddress =
> + MakeScopeExit([this, originalAddress]() {
> + this->userbuf.setAddressOfNextRawChar(originalAddress);
> + });
oh, this class is nice :)
I think we have some other places that this can simplify.
Attachment #8946825 -
Flags: review?(arai.unmht) → review+
Updated•7 years ago
|
Attachment #8946824 -
Flags: review?(arai.unmht) → review+
Updated•7 years ago
|
Attachment #8946830 -
Flags: review?(arai.unmht) → review+
Comment 9•7 years ago
|
||
Comment on attachment 8946832 [details] [diff] [review]
Change TokenStreamCharsBase::appendMultiUnitCodepointToTokenbuf to TokenStreamCharsBase::appendCodePointToTokenbuf so that the idea doesn't require knowledge of multi-unitness
Review of attachment 8946832 [details] [diff] [review]:
-----------------------------------------------------------------
::: js/src/frontend/TokenStream.cpp
@@ +1304,5 @@
> +
> + if (!tokenbuf.append(units[0]))
> + return false;
> +
> + return numUnits == 1 || tokenbuf.append(units[1]);
I prefer separating `numUnits == 1` and `tokenbuf.append(units[1])` into 2 statements.
since "true" has different meaning for them.
Attachment #8946832 -
Flags: review?(arai.unmht) → review+
Comment 10•7 years ago
|
||
Pushed by jwalden@mit.edu:
https://hg.mozilla.org/integration/mozilla-inbound/rev/024f70aeba70
Use MakeScopeExit to handle resetting userbuf offset in TokenStreamSpecific::putIdentInTokenbuf. r=arai
https://hg.mozilla.org/integration/mozilla-inbound/rev/112eaf00632c
Move TokenStreamSpecific::ungetChar into a new GeneralTokenStreamChars<CharT, AnyCharsAccess> inserted between TokenStreamCharsBase<CharT> and TokenStreamChars<CharT, AnyCharsAccess> in the token stream inheritance hierarchy. r=arai
https://hg.mozilla.org/integration/mozilla-inbound/rev/45c9102825ab
Move TokenStreamSpecific::ungetCharIgnoreEOL into TokenStreamCharsBase. r=arai
https://hg.mozilla.org/integration/mozilla-inbound/rev/21c79fd76133
Implement a TokenStreamChars::ungetCodePointIgnoreEOL and use it to report errors at precise locations, rather than blindly at the beginning of the token (which happens to be the same thing, just not nearly as clear about it). r=arai
https://hg.mozilla.org/integration/mozilla-inbound/rev/95e07a79f4b2
Implement TokenStreamChars::matchMultiUnitCodePoint as a better nailing-down of behavior when processing a multi-code unit code point. r=arai
Assignee | ||
Comment 11•7 years ago
|
||
Extricated several of these for landing, still a couple to go after the last review.
Keywords: leave-open
Comment 12•7 years ago
|
||
Pushed by jwalden@mit.edu:
https://hg.mozilla.org/integration/mozilla-inbound/rev/b190c5a3dab2
Followup bustage fix (?) for gcc (and maybe other?) compiler bustage. Worked in recent clang... r=boogstage in a CLOSED TREE
Comment 13•7 years ago
|
||
Pushed by jwalden@mit.edu:
https://hg.mozilla.org/integration/mozilla-inbound/rev/1e0faca62e86
Comment out an assertion whose condition is disliked by certain versions of gcc for entirely mysterious reasons, to be debugged later. r=bustage in a still-CLOSED TREE
Comment 14•7 years ago
|
||
Backout by csabou@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/63fe40a76a25
Backed out 7 changesets for build bustages on regress-618572.js and TokenStream.h on a CLOSED TREE
Comment 15•7 years ago
|
||
Comment on attachment 8946815 [details] [diff] [review]
Give better errors for unterminated/EOF'd string/template literals
Review of attachment 8946815 [details] [diff] [review]:
-----------------------------------------------------------------
r=me
Attachment #8946815 -
Flags: review?(till) → review+
Comment 16•7 years ago
|
||
Pushed by jwalden@mit.edu:
https://hg.mozilla.org/integration/mozilla-inbound/rev/049cc7799b53
Use MakeScopeExit to reset userbuf offset in TokenStreamSpecific::putIdentInTokenbuf. r=arai
https://hg.mozilla.org/integration/mozilla-inbound/rev/a9cb684274da
Move TokenStreamSpecific::ungetChar into a new GeneralTokenStreamChars<CharT, AnyCharsAccess> inserted between TokenStreamCharsBase<CharT> and TokenStreamChars<CharT, AnyCharsAccess> in the token stream inheritance hierarchy. r=arai
https://hg.mozilla.org/integration/mozilla-inbound/rev/d353d4a61cb0
Move TokenStreamSpecific::ungetCharIgnoreEOL into TokenStreamCharsBase. r=arai
https://hg.mozilla.org/integration/mozilla-inbound/rev/1bf0b12b992f
Implement a TokenStreamChars::ungetCodePointIgnoreEOL and use it to report errors at precise locations, rather than blindly at the beginning of the token (which happens to be the same thing, just not nearly as clear about it). r=arai
https://hg.mozilla.org/integration/mozilla-inbound/rev/229b3ea6530f
Implement TokenStreamChars::matchMultiUnitCodePoint as a better nailing-down of behavior when processing a multi-code unit code point. r=arai
Assignee | ||
Comment 17•7 years ago
|
||
(In reply to Tooru Fujisawa [:arai] from comment #9)
> I prefer separating `numUnits == 1` and `tokenbuf.append(units[1])` into 2
> statements.
> since "true" has different meaning for them.
Yeah, I dithered on that before posting. Will change when landing.
Comment 18•7 years ago
|
||
bugherder |
Comment 19•7 years ago
|
||
Pushed by jwalden@mit.edu:
https://hg.mozilla.org/integration/mozilla-inbound/rev/4b03e86fb95a
Use the current offset, not the offset of the start of the current token, when reporting errors for unterminated string/template literals. r=till
https://hg.mozilla.org/integration/mozilla-inbound/rev/dcde99a8fa6c
Change TokenStreamCharsBase::appendMultiUnitCodepointToTokenbuf to TokenStreamCharsBase::appendCodePointToTokenbuf so that the idea doesn't require knowledge of multi-unitness. r=arai
Comment 20•7 years ago
|
||
Pushed by jwalden@mit.edu:
https://hg.mozilla.org/integration/mozilla-inbound/rev/835b96c27ded
Another followup test-fix. r=red
Updated•7 years ago
|
status-firefox60:
--- → fix-optional
Priority: -- → P2
Updated•7 years ago
|
Priority: P2 → P1
Comment 21•7 years ago
|
||
Pushed by jwalden@mit.edu:
https://hg.mozilla.org/integration/mozilla-inbound/rev/11ead800137e
Use the current offset, not the offset of the start of the current token, when reporting errors for unterminated string/template literals. r=till
https://hg.mozilla.org/integration/mozilla-inbound/rev/e4213d1706e9
Change TokenStreamCharsBase::appendMultiUnitCodepointToTokenbuf to TokenStreamCharsBase::appendCodePointToTokenbuf so that the idea doesn't require knowledge of multi-unitness. r=arai
Comment 22•7 years ago
|
||
bugherder |
Comment 23•7 years ago
|
||
Did all patches for this issue land? (I'm waiting for this bug to be closed so I can proceed with bug 1433909).
Flags: needinfo?(jwalden+bmo)
Assignee | ||
Comment 24•7 years ago
|
||
Oh, whoops. Yes, it all landed.
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Flags: needinfo?(jwalden+bmo)
Keywords: leave-open
Resolution: --- → FIXED
Comment 25•7 years ago
|
||
(In reply to Jeff Walden [:Waldo] from comment #24)
> Oh, whoops. Yes, it all landed.
Thanks! :-)
Updated•7 years ago
|
Target Milestone: --- → mozilla60
You need to log in
before you can comment on or make changes to this bug.
Description
•