Closed
Bug 1461481
Opened 7 years ago
Closed 6 years ago
Move various tokenstream functions, that will require specialized definition for single-byte tokenizing, into classes that will permit this
Categories
(Core :: JavaScript Engine, enhancement, P2)
Core
JavaScript Engine
Tracking
()
RESOLVED
FIXED
Tracking | Status | |
---|---|---|
firefox62 | --- | fix-optional |
People
(Reporter: Waldo, Assigned: Waldo)
Details
Attachments
(2 files)
8.01 KB,
patch
|
arai
:
review+
|
Details | Diff | Splinter Review |
3.67 KB,
patch
|
arai
:
review+
|
Details | Diff | Splinter Review |
Some functions like "unget a char" have to know about the single/double-byte encoding distinction, ergo will require specializing template algorithms. Because C++ doesn't allow function template partial specialization, the tokenstream class hierarchy mandates that these functions be in TokenStreamChars, where the *class* is specialized by CharT and AnyCharsAccess as sole template parameter can vary.
This bug will move various of those functions demanding specialization into the right locations. (Some functions will additionally have to move upward in the class hierarchy, so that TokenStreamChars can depend on CharT-sensitive behavior that doesn't require different definitions of the functions.) This is one of the last bits of change I can do without specifically writing out UTF-8-centric versions of these functions (which requires resolving the snarl in bug 1426909).
I only have a couple of these patches written right now, so I'll post them and probably keep piling on this bug with more and more functions' moves.
Assignee | ||
Comment 1•7 years ago
|
||
Attachment #8975638 -
Flags: review?(arai.unmht)
Assignee | ||
Comment 2•7 years ago
|
||
This function will require 1/2-byte different definitions (because the UTF-8 case has to check encoding validity), but it doesn't care about AnyCharsAccess, so we can specialize it directly within TokenStreamCharsBase. This move also cuts on template-instantiation heaviness, because we only have two specializations (1/2) this way versus six ({1,2}{TokenStream,Parser<{Full,Syntax}ParseHandler>}) if we built out the current thing.
Attachment #8975641 -
Flags: review?(arai.unmht)
Updated•7 years ago
|
Attachment #8975638 -
Flags: review?(arai.unmht) → review+
Comment 3•7 years ago
|
||
Comment on attachment 8975641 [details] [diff] [review]
Move filling up a CharBuffer with the characters in a sequence of template literal characters into TokenStreamCharsBase
Review of attachment 8975641 [details] [diff] [review]:
-----------------------------------------------------------------
I have one question.
(In reply to Jeff Walden [:Waldo] from comment #2)
> This function will require 1/2-byte different definitions (because the UTF-8
> case has to check encoding validity)
Can you remind me the plan about where we're going to validate UTF-8 encoding?
I thought it's already validated outside of JS engine?
or it's just that current situation due to chat16_t-only frontend, and we're going to validate inside TokenStream for UTF-8 case?
Attachment #8975641 -
Flags: review?(arai.unmht) → review+
Assignee | ||
Comment 4•7 years ago
|
||
(In reply to Tooru Fujisawa [:arai] from comment #3)
> Can you remind me the plan about where we're going to validate UTF-8
> encoding?
> I thought it's already validated outside of JS engine?
> or it's just that current situation due to chat16_t-only frontend, and we're
> going to validate inside TokenStream for UTF-8 case?
The plan is slightly nascent, but: Gecko is only going to hand us validated UTF-8, so we can sort of ignore it for error purposes.
JSAPI clients generally, however, probably will get validation as we go. For anything with lots of ASCII, you get the validation for nearly free in getTokenInternal's < 128 check. Otherwise, and in contexts where arbitrary text can appear (string contents, template literal contents as they're accumulated into CharBuffer or so, identifiers, comment contents, &c.), we'll have to do a little extra work *when we encounter overlarge code units only*.
The extra safety of extra checking seems desirable to me at least short term (even for Gecko use where only-valid UTF-8 should be seen). Longer term, and if/when we get validated-UTF-8 types to rely on, maybe we can undo some of that. But IMO not (safely) any sooner.
But this is a somewhat active point of contention in bug 1426909 which I have been avoiding touching as long as I can partly for this reason, so there is some amount of this subject to change.
Pushed by jwalden@mit.edu:
https://hg.mozilla.org/integration/mozilla-inbound/rev/70d1589b5df5
Move getChar and updateLineInfoForEOL upward into TokenStreamChars and GeneralTokenStreamChars. r=arai
https://hg.mozilla.org/integration/mozilla-inbound/rev/43b6fedcb9af
Move filling up a CharBuffer with the characters in a sequence of template literal characters into TokenStreamCharsBase. r=arai
Assignee | ||
Updated•7 years ago
|
Keywords: leave-open
Comment 6•7 years ago
|
||
bugherder |
Updated•7 years ago
|
status-firefox62:
--- → fix-optional
Priority: -- → P2
Comment 7•6 years ago
|
||
The leave-open keyword is there and there is no activity for 6 months.
:Waldo, maybe it's time to close this bug?
Flags: needinfo?(jwalden+bmo)
Assignee | ||
Comment 8•6 years ago
|
||
I'm not sure why I left this open, and we are well past the point where anything has to be moved around for 1/2-byte-specific tokenizing behavior, tho probably some things beyond what I'd done here did require moving around. (With local patches in place I can run all of jstests and jit-tests tokenizing directly from UTF-8. And the "-u <path>" argument to the JS shell works just fine to execute direct from UTF-8 right now.) So yeah, this can be closed.
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Flags: needinfo?(jwalden+bmo)
Resolution: --- → FIXED
Updated•6 years ago
|
Keywords: leave-open
You need to log in
before you can comment on or make changes to this bug.
Description
•