Closed Bug 1459382 Opened 7 years ago Closed 7 years ago

Various tokenstream class/function renames for 1/2-byte parsing

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla62

Tracking Flags:

Tracking

Status

firefox62

---

fixed

People

(Reporter: Waldo, Assigned: Waldo)

References

Details

Attachments

(4 files)

Rename TokenStreamCharsBase<CharT>::TokenBuf to SourceUnits<CharT> 7 years ago Jeff Walden [:Waldo] 46.41 KB, patch	arai : review+	Details \| Diff \| Splinter Review
Rename various SourceUnits member functions to refer to code units, not "raw" chars 7 years ago Jeff Walden [:Waldo] 12.35 KB, patch	arai : review+	Details \| Diff \| Splinter Review
Rename yet more SourceUnits member functions to refer to code units 7 years ago Jeff Walden [:Waldo] 17.27 KB, patch	arai : review+	Details \| Diff \| Splinter Review
Change SourceUnits::matchRawCharBackwards to SourceUnits::ungetOptionalCRBackwards to reflect narrower scope/be character-type-agnostic 7 years ago Jeff Walden [:Waldo] 2.04 KB, patch	arai : review+	Details \| Diff \| Splinter Review

Jeff Walden [:Waldo]

Assignee

Description

•

7 years ago

The continued progress of better names for stuff must go on! Makes clearer what things need change, what do not for single-byte parsing. Anything "char" is a flat misnomer in that world, for example, and we'll want to change it to "code unit" or make a different function that talks about "code points". Nothing too interesting in the few patches I have here, mostly just renames.

Jeff Walden [:Waldo]

Assignee

Comment 1

•

7 years ago

Attached patch Rename TokenStreamCharsBase<CharT>::TokenBuf to SourceUnits<CharT> — Details — Splinter Review

Certain functions in this will be specialized for single/double-byte parsing, namely the line break stuff. It is at least *easier* to do this using a class at namespace scope. Might even only be possible at all if you do this. Also TokenBuf was a horrible name for this when the field was "userbuf" and there was a separate getTokenbuf function that *doesn't* return userbuf.

Attachment #8973396 - Flags: review?(arai.unmht)

Jeff Walden [:Waldo]

Assignee

Comment 2

•

7 years ago

Attached patch Rename various SourceUnits member functions to refer to code units, not "raw" chars — Details — Splinter Review

More functions to rename in subsequent patches, just splitting it up for readability.

Attachment #8973397 - Flags: review?(arai.unmht)

Jeff Walden [:Waldo]

Assignee

Comment 3

•

7 years ago

Attached patch Rename yet more SourceUnits member functions to refer to code units — Details — Splinter Review

I expect to review *uses* of these various functions at a somewhat later time, to be clear.

Attachment #8973398 - Flags: review?(arai.unmht)

Jeff Walden [:Waldo]

Assignee

Comment 4

•

7 years ago

Attached patch Change SourceUnits::matchRawCharBackwards to SourceUnits::ungetOptionalCRBackwards to reflect narrower scope/be character-type-agnostic — Details — Splinter Review

This is a little bit of a grotesque change, but it's better than making this fully generic for UTF-8 code point encodings. '\r' is simple to one-off, so that seems better. The name is not clearly the best, but it worked. If you've got a better idea, fine with me.

Attachment #8973399 - Flags: review?(arai.unmht)

Jeff Walden [:Waldo]

Assignee

Updated

•

7 years ago

Blocks: 1459384

Tooru Fujisawa [:arai]

Comment 5

•

7 years ago

Comment on attachment 8973396 [details] [diff] [review] Rename TokenStreamCharsBase<CharT>::TokenBuf to SourceUnits<CharT> Review of attachment 8973396 [details] [diff] [review]: ----------------------------------------------------------------- r+ with or without the comments addressed. I'll do the remaining review Tuesday. ::: js/src/frontend/TokenStream.cpp @@ +2110,5 @@ > + // Poisoning sourceUnits on error establishes an invariant: once an > + // erroneous token has been seen, sourceUnits will not be consulted again. > + // This is true because the parser will deal with the illegal token by > + // aborting parsing immediately. > + sourceUnits.poison(); I'm not sure if removing #ifdef DEBUG around this is good. this looks misleading that poisoning is performed regardless of build options. I think enclosing the caller is much clearer that poisoning is performed only in debug build. ::: js/src/frontend/TokenStream.h @@ +854,5 @@ > +// raw Unicode code units -- 16-bit char16_t units of source text that are not > +// (always) full code points, and 8-bit units of UTF-8 source text soon. > +// TokenStreams functions are layered on top and do some extra stuff like > +// converting all EOL sequences to '\n', tracking the line number, and setting > +// |flags.isEOF|. (The "raw" in "raw chars" refers to the lack of EOL sequence s/raw chars/raw Unicode code units/

Attachment #8973396 - Flags: review?(arai.unmht) → review+

Jeff Walden [:Waldo]

Assignee

Comment 6

•

7 years ago

What about poisonIfDebug? Spewing #ifdefs everywhere is what I want to avoid.

Jeff Walden [:Waldo]

Assignee

Comment 7

•

7 years ago

Or poisonInDebug, maybe more precisely.

Tooru Fujisawa [:arai]

Comment 8

•

7 years ago

(In reply to Jeff Walden [:Waldo] from comment #7) > Or poisonInDebug, maybe more precisely. yes, that looks clearer :)

Tooru Fujisawa [:arai]

Updated

•

7 years ago

Attachment #8973397 - Flags: review?(arai.unmht) → review+

Tooru Fujisawa [:arai]

Updated

•

7 years ago

Attachment #8973398 - Flags: review?(arai.unmht) → review+

Tooru Fujisawa [:arai]

Comment 9

•

7 years ago

Comment on attachment 8973399 [details] [diff] [review] Change SourceUnits::matchRawCharBackwards to SourceUnits::ungetOptionalCRBackwards to reflect narrower scope/be character-type-agnostic Review of attachment 8973399 [details] [diff] [review]: ----------------------------------------------------------------- ::: js/src/frontend/TokenStream.h @@ +914,5 @@ > } > return false; > } > > + void ungetOptionalCRBackwards() { it would be nice to add some comment that says the optional cr is internally handled inside getChar method. then, how about ungetOptionalCRBeforeLF ? anycase "backwards" is already sufficiently implied by "unget" I think. (compared to the current "match") @@ +915,5 @@ > return false; > } > > + void ungetOptionalCRBackwards() { > + MOZ_ASSERT(ptr, "shouldn't unget a '\\r' from poisoned SourceUnits"); also, it would be nice to assert that the current character is '\n', so that it's clear when this can be called.

Attachment #8973399 - Flags: review?(arai.unmht) → review+

Pulsebot

Comment 10

•

7 years ago

Pushed by jwalden@mit.edu: https://hg.mozilla.org/integration/mozilla-inbound/rev/ab024bcaf8c1 Rename TokenStreamCharsBase<CharT>::TokenBuf to SourceUnits<CharT> preparing for its being specialized for single/double-byte source code units. r=arai https://hg.mozilla.org/integration/mozilla-inbound/rev/97ede7464228 Rename various SourceUnits member functions to refer to code units, not "raw" chars. r=arai https://hg.mozilla.org/integration/mozilla-inbound/rev/a193e16a26c0 Rename yet more SourceUnits member functions to refer to code units. r=arai https://hg.mozilla.org/integration/mozilla-inbound/rev/e6431d75a9f2 Change SourceUnits::matchRawCharBackwards to SourceUnits::ungetOptionalCRBeforeLF to reflect narrower scope/be character-type-agnostic. r=arai

Natalia Csoregi [:nataliaCs]

Comment 11

•

7 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/ab024bcaf8c1 https://hg.mozilla.org/mozilla-central/rev/97ede7464228 https://hg.mozilla.org/mozilla-central/rev/a193e16a26c0 https://hg.mozilla.org/mozilla-central/rev/e6431d75a9f2

Status: ASSIGNED → RESOLVED

Closed: 7 years ago

status-firefox62: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla62

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Various tokenstream class/function renames for 1/2-byte parsing

Categories

(Core :: JavaScript Engine, enhancement)

Tracking

()

People

(Reporter: Waldo, Assigned: Waldo)

References

Details

Crash Data

Security

(public)

User Story

Attachments

(4 files)

Description

Comment 1

Comment 2

Comment 3

Comment 4

Updated

Comment 5

Comment 6

Comment 7

Comment 8

Updated

Updated

Comment 9

Comment 10

Comment 11

Attachment

General

Description

File Name

Content Type