Closed Bug 1459382 Opened 6 years ago Closed 6 years ago

Various tokenstream class/function renames for 1/2-byte parsing

Categories

(Core :: JavaScript Engine, enhancement)

enhancement
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla62
Tracking Status
firefox62 --- fixed

People

(Reporter: Waldo, Assigned: Waldo)

References

Details

Attachments

(4 files)

The continued progress of better names for stuff must go on!  Makes clearer what things need change, what do not for single-byte parsing.  Anything "char" is a flat misnomer in that world, for example, and we'll want to change it to "code unit" or make a different function that talks about "code points".

Nothing too interesting in the few patches I have here, mostly just renames.
Certain functions in this will be specialized for single/double-byte parsing, namely the line break stuff.  It is at least *easier* to do this using a class at namespace scope.  Might even only be possible at all if you do this.

Also TokenBuf was a horrible name for this when the field was "userbuf" and there was a separate getTokenbuf function that *doesn't* return userbuf.
Attachment #8973396 - Flags: review?(arai.unmht)
More functions to rename in subsequent patches, just splitting it up for readability.
Attachment #8973397 - Flags: review?(arai.unmht)
I expect to review *uses* of these various functions at a somewhat later time, to be clear.
Attachment #8973398 - Flags: review?(arai.unmht)
This is a little bit of a grotesque change, but it's better than making this fully generic for UTF-8 code point encodings.  '\r' is simple to one-off, so that seems better.

The name is not clearly the best, but it worked.  If you've got a better idea, fine with me.
Attachment #8973399 - Flags: review?(arai.unmht)
Blocks: 1459384
Comment on attachment 8973396 [details] [diff] [review]
Rename TokenStreamCharsBase<CharT>::TokenBuf to SourceUnits<CharT>

Review of attachment 8973396 [details] [diff] [review]:
-----------------------------------------------------------------

r+ with or without the comments addressed.
I'll do the remaining review Tuesday.

::: js/src/frontend/TokenStream.cpp
@@ +2110,5 @@
> +    // Poisoning sourceUnits on error establishes an invariant: once an
> +    // erroneous token has been seen, sourceUnits will not be consulted again.
> +    // This is true because the parser will deal with the illegal token by
> +    // aborting parsing immediately.
> +    sourceUnits.poison();

I'm not sure if removing #ifdef DEBUG around this is good.
this looks misleading that poisoning is performed regardless of build options.
I think enclosing the caller is much clearer that poisoning is performed only in debug build.

::: js/src/frontend/TokenStream.h
@@ +854,5 @@
> +// raw Unicode code units -- 16-bit char16_t units of source text that are not
> +// (always) full code points, and 8-bit units of UTF-8 source text soon.
> +// TokenStreams functions are layered on top and do some extra stuff like
> +// converting all EOL sequences to '\n', tracking the line number, and setting
> +// |flags.isEOF|.  (The "raw" in "raw chars" refers to the lack of EOL sequence

s/raw chars/raw Unicode code units/
Attachment #8973396 - Flags: review?(arai.unmht) → review+
What about poisonIfDebug?  Spewing #ifdefs everywhere is what I want to avoid.
Or poisonInDebug, maybe more precisely.
(In reply to Jeff Walden [:Waldo] from comment #7)
> Or poisonInDebug, maybe more precisely.

yes, that looks clearer :)
Attachment #8973397 - Flags: review?(arai.unmht) → review+
Attachment #8973398 - Flags: review?(arai.unmht) → review+
Comment on attachment 8973399 [details] [diff] [review]
Change SourceUnits::matchRawCharBackwards to SourceUnits::ungetOptionalCRBackwards to reflect narrower scope/be character-type-agnostic

Review of attachment 8973399 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/frontend/TokenStream.h
@@ +914,5 @@
>          }
>          return false;
>      }
>  
> +    void ungetOptionalCRBackwards() {

it would be nice to add some comment that says the optional cr is internally handled inside getChar method.

then, how about ungetOptionalCRBeforeLF ?
anycase "backwards" is already sufficiently implied by "unget" I think.
(compared to the current "match")

@@ +915,5 @@
>          return false;
>      }
>  
> +    void ungetOptionalCRBackwards() {
> +        MOZ_ASSERT(ptr, "shouldn't unget a '\\r' from poisoned SourceUnits");

also, it would be nice to assert that the current character is '\n', so that it's clear when this can be called.
Attachment #8973399 - Flags: review?(arai.unmht) → review+
Pushed by jwalden@mit.edu:
https://hg.mozilla.org/integration/mozilla-inbound/rev/ab024bcaf8c1
Rename TokenStreamCharsBase<CharT>::TokenBuf to SourceUnits<CharT> preparing for its being specialized for single/double-byte source code units.  r=arai
https://hg.mozilla.org/integration/mozilla-inbound/rev/97ede7464228
Rename various SourceUnits member functions to refer to code units, not "raw" chars.  r=arai
https://hg.mozilla.org/integration/mozilla-inbound/rev/a193e16a26c0
Rename yet more SourceUnits member functions to refer to code units.  r=arai
https://hg.mozilla.org/integration/mozilla-inbound/rev/e6431d75a9f2
Change SourceUnits::matchRawCharBackwards to SourceUnits::ungetOptionalCRBeforeLF to reflect narrower scope/be character-type-agnostic.  r=arai
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: