Closed Bug 1459382 Opened 7 years ago Closed 7 years ago

Various tokenstream class/function renames for 1/2-byte parsing

Categories

(Core :: JavaScript Engine, enhancement)

enhancement
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla62
Tracking Status
firefox62 --- fixed

People

(Reporter: Waldo, Assigned: Waldo)

References

Details

Attachments

(4 files)

The continued progress of better names for stuff must go on! Makes clearer what things need change, what do not for single-byte parsing. Anything "char" is a flat misnomer in that world, for example, and we'll want to change it to "code unit" or make a different function that talks about "code points". Nothing too interesting in the few patches I have here, mostly just renames.
Certain functions in this will be specialized for single/double-byte parsing, namely the line break stuff. It is at least *easier* to do this using a class at namespace scope. Might even only be possible at all if you do this. Also TokenBuf was a horrible name for this when the field was "userbuf" and there was a separate getTokenbuf function that *doesn't* return userbuf.
Attachment #8973396 - Flags: review?(arai.unmht)
More functions to rename in subsequent patches, just splitting it up for readability.
Attachment #8973397 - Flags: review?(arai.unmht)
I expect to review *uses* of these various functions at a somewhat later time, to be clear.
Attachment #8973398 - Flags: review?(arai.unmht)
This is a little bit of a grotesque change, but it's better than making this fully generic for UTF-8 code point encodings. '\r' is simple to one-off, so that seems better. The name is not clearly the best, but it worked. If you've got a better idea, fine with me.
Attachment #8973399 - Flags: review?(arai.unmht)
Blocks: 1459384
Comment on attachment 8973396 [details] [diff] [review] Rename TokenStreamCharsBase<CharT>::TokenBuf to SourceUnits<CharT> Review of attachment 8973396 [details] [diff] [review]: ----------------------------------------------------------------- r+ with or without the comments addressed. I'll do the remaining review Tuesday. ::: js/src/frontend/TokenStream.cpp @@ +2110,5 @@ > + // Poisoning sourceUnits on error establishes an invariant: once an > + // erroneous token has been seen, sourceUnits will not be consulted again. > + // This is true because the parser will deal with the illegal token by > + // aborting parsing immediately. > + sourceUnits.poison(); I'm not sure if removing #ifdef DEBUG around this is good. this looks misleading that poisoning is performed regardless of build options. I think enclosing the caller is much clearer that poisoning is performed only in debug build. ::: js/src/frontend/TokenStream.h @@ +854,5 @@ > +// raw Unicode code units -- 16-bit char16_t units of source text that are not > +// (always) full code points, and 8-bit units of UTF-8 source text soon. > +// TokenStreams functions are layered on top and do some extra stuff like > +// converting all EOL sequences to '\n', tracking the line number, and setting > +// |flags.isEOF|. (The "raw" in "raw chars" refers to the lack of EOL sequence s/raw chars/raw Unicode code units/
Attachment #8973396 - Flags: review?(arai.unmht) → review+
What about poisonIfDebug? Spewing #ifdefs everywhere is what I want to avoid.
Or poisonInDebug, maybe more precisely.
(In reply to Jeff Walden [:Waldo] from comment #7) > Or poisonInDebug, maybe more precisely. yes, that looks clearer :)
Attachment #8973397 - Flags: review?(arai.unmht) → review+
Attachment #8973398 - Flags: review?(arai.unmht) → review+
Comment on attachment 8973399 [details] [diff] [review] Change SourceUnits::matchRawCharBackwards to SourceUnits::ungetOptionalCRBackwards to reflect narrower scope/be character-type-agnostic Review of attachment 8973399 [details] [diff] [review]: ----------------------------------------------------------------- ::: js/src/frontend/TokenStream.h @@ +914,5 @@ > } > return false; > } > > + void ungetOptionalCRBackwards() { it would be nice to add some comment that says the optional cr is internally handled inside getChar method. then, how about ungetOptionalCRBeforeLF ? anycase "backwards" is already sufficiently implied by "unget" I think. (compared to the current "match") @@ +915,5 @@ > return false; > } > > + void ungetOptionalCRBackwards() { > + MOZ_ASSERT(ptr, "shouldn't unget a '\\r' from poisoned SourceUnits"); also, it would be nice to assert that the current character is '\n', so that it's clear when this can be called.
Attachment #8973399 - Flags: review?(arai.unmht) → review+
Pushed by jwalden@mit.edu: https://hg.mozilla.org/integration/mozilla-inbound/rev/ab024bcaf8c1 Rename TokenStreamCharsBase<CharT>::TokenBuf to SourceUnits<CharT> preparing for its being specialized for single/double-byte source code units. r=arai https://hg.mozilla.org/integration/mozilla-inbound/rev/97ede7464228 Rename various SourceUnits member functions to refer to code units, not "raw" chars. r=arai https://hg.mozilla.org/integration/mozilla-inbound/rev/a193e16a26c0 Rename yet more SourceUnits member functions to refer to code units. r=arai https://hg.mozilla.org/integration/mozilla-inbound/rev/e6431d75a9f2 Change SourceUnits::matchRawCharBackwards to SourceUnits::ungetOptionalCRBeforeLF to reflect narrower scope/be character-type-agnostic. r=arai
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: