Closed Bug 1478045 Opened 4 years ago Closed 4 years ago

Replace ungetCodePointIgnoreEOL/ungetNonAsciiNormalizedCodePoint with SourceUnits::{peek,consumeKnown}CodePoint

Categories

(Core :: JavaScript Engine, enhancement)

enhancement
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla63
Tracking Status
firefox63 --- fixed

People

(Reporter: Waldo, Assigned: Waldo)

References

Details

Attachments

(6 files, 2 obsolete files)

1.29 KB, patch
arai
: review+
Details | Diff | Splinter Review
1.56 KB, patch
arai
: review+
Details | Diff | Splinter Review
2.10 KB, patch
arai
: review+
Details | Diff | Splinter Review
2.54 KB, patch
arai
: review+
Details | Diff | Splinter Review
3.45 KB, patch
arai
: review+
Details | Diff | Splinter Review
8.95 KB, patch
arai
: review+
Details | Diff | Splinter Review
I added these interfaces earlier, but upon further review they're not really the right thing.  Most cases where they're used, it's super-obnoxious to have to unget-and-undo-EOL-updating.  It's also messy to have to unget a variable number of code units -- a variable count that will be a bit more complicated in a UTF-8 world (and will involve us computing length-counts two separate places).  Much better to peek at the next code point -- the actual operation we usually want -- and act based on that, sometimes never getting the code point, sometimes consuming it based on that already-computed info.  It's maybe a bit subtle, but it's clearer than the alternative.
The first patch adds these functions and introduces one use of them -- incidentally, the most complicated use, because it must manually address updates for EOL (no other users will).  Then I have patches replacing the other users of the existing functions, then a final patch to remove the old functions.
Attachment #8994549 - Flags: review?(arai.unmht)
Oh -- in the previous patch, note that PeekedCodePoint being CharT-parametrized is only to be able to assert proper length-in-units.  I assume compilers can boil away the multiple class instantiations in practice.
Attachment #8994550 - Flags: review?(arai.unmht)
I suppose I should note that the existing users are all known-non-ASCII, but {peek,consumeKnown}CodePoint handle ASCII as well as non-ASCII.  In principle I could exclude the ASCII case, maybe, but I'm not sure it buys anything, and I kinda feel like handling everything may come in handy if we use this peek-code-point functionality in any other places in the future.
Attachment #8994556 - Flags: review?(arai.unmht)
Turns out this algorithm comes in handy for SourceUnits::findWindowEnd, so it makes sense to define it such that SourceUnits::peekCodePoint *and* that function can use it.
Attachment #8994687 - Flags: review?(arai.unmht)
Attachment #8994549 - Attachment is obsolete: true
Attachment #8994549 - Flags: review?(arai.unmht)
Blocks: 1478170
Attachment #8994704 - Flags: review?(arai.unmht)
Attachment #8994687 - Attachment is obsolete: true
Attachment #8994687 - Flags: review?(arai.unmht)
Comment on attachment 8994704 [details] [diff] [review]
One last bit of tweaking to this, for realz

Review of attachment 8994704 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/frontend/TokenStream.h
@@ +986,5 @@
> +template<>
> +struct SourceUnitTraits<mozilla::Utf8Unit>
> +{
> +  public:
> +    static constexpr uint8_t maxUnitsLength = 2;

4

@@ +993,5 @@
> +        return codePoint < 0x80
> +               ? 1
> +               : codePoint < 0x800
> +               ? 2
> +               : codePoint < 0x10000

might be nice to add constants for those boundaries in Unicode.h or somewhere, maybe in followup bug/patch.

@@ +1016,5 @@
> +        MOZ_ASSERT(lengthInUnits != 0, "bad code point length");
> +        MOZ_ASSERT(lengthInUnits == SourceUnitTraits::lengthInUnits(codePoint));
> +    }
> +
> +    static PeekedCodePoint none() {

can you add some comment at the top of this class that clarifies this class represents either a single code point or "none", and what the "none" actually means?
(basically similar thing as the comment for `SourceUnits::peekCodePoint()`)
without that it's surprising to see this method here.
Attachment #8994704 - Flags: review?(arai.unmht) → review+
Attachment #8994550 - Flags: review?(arai.unmht) → review+
Attachment #8994551 - Flags: review?(arai.unmht) → review+
Attachment #8994552 - Flags: review?(arai.unmht) → review+
Attachment #8994553 - Flags: review?(arai.unmht) → review+
Attachment #8994556 - Flags: review?(arai.unmht) → review+
Pushed by jwalden@mit.edu:
https://hg.mozilla.org/integration/mozilla-inbound/rev/b4a63f805c03
Implement SourceUnits::{peek,consumeKnown}CodePoint for the uncommon cases where a code point must be gotten, tested against one or more predicates, then sometimes ungotten based on those predicates.  (This is unfortunately a bit subtle, but getting and ungetting is arguably worse, because ungetting has to unget a variable number of code units -- whereas peeking can compute that number of code units and then use it directly when the peeked code point is consumed, avoiding double-computation and increased potential for error.)  r=arai
https://hg.mozilla.org/integration/mozilla-inbound/rev/c79e79e350bf
Use SourceUnits::peekCodePoint to look for an IdentifierStart after most decimal numeric literals, rather than getting and ungetting.  r=arai
https://hg.mozilla.org/integration/mozilla-inbound/rev/51e2ba3c67d6
Use SourceUnits::peekCodePoint when examining a non-ASCII code point that might terminate a directive in a comment.  r=arai
https://hg.mozilla.org/integration/mozilla-inbound/rev/466d4573b39a
Use SourceUnits::peekCodePoint when examining a non-ASCII code point that might terminate an IdentifierName.  r=arai
https://hg.mozilla.org/integration/mozilla-inbound/rev/d1d2c7b998ef
Use SourceUnits::peekCodePoint when examining a non-ASCII code point after non-decimal numeric literals and a subset of decimal numeric literals.  r=arai
https://hg.mozilla.org/integration/mozilla-inbound/rev/d93c6f631ca7
Remove TokenStreamChars::{ungetCodePointIgnoreEOL,ungetNonAsciiNormalizedCodePoint} as unused now that SourceUnits::{peek,consumeKnown}CodePoint have replaced them.  r=arai
You need to log in before you can comment on or make changes to this bug.