Closed Bug 1459384 Opened 6 years ago Closed 6 years ago

Fix TokenStreamSpecific::getTokenInternal to be amenable to template-specializing narrow parts of it for 1/2-byte parsing

Categories

(Core :: JavaScript Engine, enhancement)

enhancement
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla62
Tracking Status
firefox62 --- fixed

People

(Reporter: Waldo, Assigned: Waldo)

References

Details

Attachments

(21 files)

10.55 KB, patch
arai
: review+
Details | Diff | Splinter Review
14.01 KB, patch
arai
: review+
Details | Diff | Splinter Review
21.43 KB, patch
arai
: review+
Details | Diff | Splinter Review
44.17 KB, patch
arai
: review+
Details | Diff | Splinter Review
16.40 KB, patch
Details | Diff | Splinter Review
1.72 KB, patch
arai
: review+
Details | Diff | Splinter Review
4.92 KB, patch
arai
: review+
Details | Diff | Splinter Review
15.27 KB, patch
arai
: review+
Details | Diff | Splinter Review
2.28 KB, patch
arai
: review+
Details | Diff | Splinter Review
16.82 KB, patch
arai
: review+
Details | Diff | Splinter Review
8.41 KB, patch
arai
: review+
Details | Diff | Splinter Review
23.60 KB, patch
arai
: review+
Details | Diff | Splinter Review
4.46 KB, patch
arai
: review+
Details | Diff | Splinter Review
4.67 KB, patch
arai
: review+
Details | Diff | Splinter Review
4.26 KB, patch
arai
: review+
Details | Diff | Splinter Review
2.85 KB, patch
arai
: review+
Details | Diff | Splinter Review
14.86 KB, patch
arai
: review+
Details | Diff | Splinter Review
17.46 KB, patch
arai
: review+
Details | Diff | Splinter Review
5.54 KB, patch
arai
: review+
Details | Diff | Splinter Review
10.57 KB, patch
arai
: review+
Details | Diff | Splinter Review
39.37 KB, patch
arai
: review+
Details | Diff | Splinter Review
It is clearly unworkable to specialize getTokenInternal for different code unit sizes.  So we need to be able to have smaller parts of it specializable.

This basically requires that the function's labels, gotos, and other awful hacks be cleaned up.  This function can be normal C++.  Let's do that, then everything will be far simpler to specialize individually.  (This also has some optimization benefits -- no allocating a Token, then having to explicitly deallocate it when it turns out we don't actually need it.)

Long patch series to follow.  Fair warning: some changes I make, then they end up disappearing by the end of the sequence as the final product is IMO more readable.  Extra work, but hopefully it'll engender more confidence.
Arguably FinishToken could return true and then it could be |return FinishToken();| everywhere, but I don't have finishToken *at all* in the final picture, so for now it's two separate lines.
Attachment #8973404 - Flags: review?(arai.unmht)
Sorry 'bout this requiring touching the whole function body basically.  I'll post a -w version in a sec too.
Attachment #8973406 - Flags: review?(arai.unmht)
This is serious gnarl, but the labels and gotos here were even seriouser gnarl.  I hope the comment by decimalNumber is mildly helpful in understanding just how each path through the function is safe.  I suppose it's *very mildly* less efficient to parse .13893-style decimal numbers now, because you have an |IsAsciiDigit(c)| test and a |c == '.'| test to step through to get there, but this is kinda silly IMO.  Dot-leading decimal numbers are not horrifically common, and not jumping to the exactest starting place is not going to kill anyone.
Attachment #8973410 - Flags: review?(arai.unmht)
Ultimately the Token* will be allocated at the point at which we can fill in all parts of it, so pushing |tp| arguments everywhere is somewhat temporary.
Attachment #8973412 - Flags: review?(arai.unmht)
The start offset is the only real thing we need to get early on.  Allocating a Token early on is not necessary.  So separate out that computation, so that removing the Token computation later is simpler.
Attachment #8973413 - Flags: review?(arai.unmht)
The ultimate goal is for callout functions to just be |return decimalNumber(...)| or |return identifierName(...)| or similar.  This step is in pursuit of that.
Attachment #8973415 - Flags: review?(arai.unmht)
I'm hoping the compiler can optimize and recognize the embedded bool is always true *up til* the last step.  But honestly, if it isn't, it's probably micro-optimization territory to be worried.
Attachment #8973417 - Flags: review?(arai.unmht)
Prep for making finish-token be folded into token allocation.
Attachment #8973419 - Flags: review?(arai.unmht)
This is kind of the big kahuna.

Note that IsTokenSane's tests are folded into token-creation, so no need for the function any more.

It's debatable whether the new*Token functions should return true unconditionally or not.  I went with making the return-true explicit in source, but I could go either way.

It's really a joy to not have to deallocate already-allocated tokens in some of the various places in the prior code, after this change.

The simplification in this change is so good, like -40 lines or so.  *Overall*, this patch stack adds probably 150+ lines.  Some of that is taken up by more comments, tho.  I think the end state is a lot better, tho -- and some of this *has* to happen for single-byte parsing, e.g. consume-rest-of-single-line-comment must be CharT-aware because not all line terminators are single code units.

(I see the trailing-WS on tthat one line, will fix before landing.)

There is somewhat sort of one thing with all this that I haven't done: done stupidly low-level benchmarking.  If it comes to it, some of this we can surely fix by adding forced-inline versions of relevant functions, then calling the forced-inline functions in those places that need it.  But as a first pass at least, I am disinclined to spend a bunch of time investigating, versus landing this and seeing what complaints actually manifest.
Attachment #8973423 - Flags: review?(arai.unmht)
https://treeherder.mozilla.org/#/jobs?repo=try&revision=b6b5c40c3d6696c625d664adbf51721f5cd7f451 is a nice greenish tree with all these patches applied, maybe with mild bits of fuzz for one or two post-push tweakings.
Attachment #8973402 - Flags: review?(arai.unmht) → review+
Attachment #8973404 - Flags: review?(arai.unmht) → review+
Attachment #8973405 - Flags: review?(arai.unmht) → review+
Attachment #8973406 - Flags: review?(arai.unmht) → review+
Attachment #8973408 - Flags: review?(arai.unmht) → review+
Attachment #8973409 - Flags: review?(arai.unmht) → review+
Comment on attachment 8973410 [details] [diff] [review]
Move decimal number tokenizing into a separate function and eliminate the final two labels from TokenStreamSpecific::getTokenInternal

Review of attachment 8973410 [details] [diff] [review]:
-----------------------------------------------------------------

r+ with the comments addressed.

::: js/src/frontend/TokenStream.cpp
@@ +1589,5 @@
> +            error(JSMSG_IDSTART_AFTER_NUMBER);
> +            return false;
> +        }
> +
> +        ungetCharIgnoreEOL(c);

if `codePoint != 0`, we should unget trail surrogate before `c`, which is lead surrogate.

@@ -1742,5 @@
> -                    return false;
> -                }
> -
> -                uint32_t codePoint;
> -                if (!matchMultiUnitCodePoint(c, &codePoint)) {

apparently this wasn't working properly.
consumeKnownCharIgnoreEOL which you've added in this patch fixes.

(I think matchMultiUnitCodePoint should be renamed to clarify it matches from current code unit, not next...  but this is for another bug)

::: js/src/frontend/TokenStream.h
@@ +1391,5 @@
> +     *      integer part so is interpreted as decimal, e.g. '9' in "09.28" or
> +     *      '8' in "0386" or '9' in "09+7" (three separate tokens").
> +     *
> +     * In this case, the next |getCharIgnoreEOL()| returns the code unit after
> +     * |c|: '.', '6', or '+' in the examples above.

2 more spaces indentation.
Attachment #8973410 - Flags: review?(arai.unmht) → review+
Attachment #8973411 - Flags: review?(arai.unmht) → review+
Attachment #8973412 - Flags: review?(arai.unmht) → review+
Comment on attachment 8973413 [details] [diff] [review]
Make |newToken()| only get a Token*, and have it accept a start offset, not compute one

Review of attachment 8973413 [details] [diff] [review]:
-----------------------------------------------------------------

r+ with or without the comment addressed.

::: js/src/frontend/TokenStream.h
@@ +1031,5 @@
> +    uint32_t startOffset_;
> +
> +  public:
> +    template<class SourceUnits>
> +    TokenStart(const SourceUnits& sourceUnits, ptrdiff_t adjust)

might be nice to add comment that says what `adjust` is for?
(it's almost clear from the code just below tho...)

or maybe negate the value and rename it to "alreadyReadCodeUnitsCount" or something that clarifies the purpose?
Attachment #8973413 - Flags: review?(arai.unmht) → review+
Comment on attachment 8973414 [details] [diff] [review]
Eliminate the BadToken lambda in favor of pessimistically marking the outparam undefined to start, then using a MOZ_COLD badToken() function to handle failure code

Review of attachment 8973414 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/frontend/TokenStream.h
@@ +1078,5 @@
>       * buffer of Tokens in |anyCharsAccess()|, that begins at |start.offset()|.
>       */
>      Token* newToken(TokenStart start);
>  
> +    MOZ_COLD bool badToken();

TIL MOZ_COLD :)
Attachment #8973414 - Flags: review?(arai.unmht) → review+
Attachment #8973415 - Flags: review?(arai.unmht) → review+
Attachment #8973416 - Flags: review?(arai.unmht) → review+
Attachment #8973417 - Flags: review?(arai.unmht) → review+
Attachment #8973418 - Flags: review?(arai.unmht) → review+
Attachment #8973419 - Flags: review?(arai.unmht) → review+
Attachment #8973420 - Flags: review?(arai.unmht) → review+
Attachment #8973421 - Flags: review?(arai.unmht) → review+
Attachment #8973422 - Flags: review?(arai.unmht) → review+
Attachment #8973423 - Flags: review?(arai.unmht) → review+
Pushed by jwalden@mit.edu:
https://hg.mozilla.org/integration/mozilla-inbound/rev/8f3a487410d5
Eliminate the skipline label in TokenStreamSpecific::getTokenInternal in favor of a separate function.  r=arai
https://hg.mozilla.org/integration/mozilla-inbound/rev/5edb44d0f9b8
Eliminate the |out| label from |TokenStreamSpecific::getTokenInternal|.  r=arai
https://hg.mozilla.org/integration/mozilla-inbound/rev/3a72d92a62f6
Eliminate the |error| label from |TokenStreamSpecific::getTokenInternal|.  r=arai
https://hg.mozilla.org/integration/mozilla-inbound/rev/ce75b53f42af
Eliminate the |retry| label in |TokenStreamSpecific::getTokenInternal| by converting the surrounding code to a loop and using |continue| instead.  r=arai
https://hg.mozilla.org/integration/mozilla-inbound/rev/4b04f751a36e
Move some overscoped variables inside the retry-for-comment loop.  r=arai
https://hg.mozilla.org/integration/mozilla-inbound/rev/5b8dd9e4d9b2
Rename FirstCharKind's BasePrefix initializer to ZeroDigit for better understandability in the place where behavior diverges when that value is found.  r=arai
https://hg.mozilla.org/integration/mozilla-inbound/rev/66dd3e49d19e
Move decimal number tokenizing into a separate function and eliminate the final two labels from TokenStreamSpecific::getTokenInternal.  r=arai
https://hg.mozilla.org/integration/mozilla-inbound/rev/102af98a466e
Merge a few declarations into their first assignments, now that that's possible.  r=arai
https://hg.mozilla.org/integration/mozilla-inbound/rev/6eb5216cdcf2
Pass |Token* tp| directly to |FinishToken| to eliminate another overshared local.  r=arai
https://hg.mozilla.org/integration/mozilla-inbound/rev/d8c1ab8b7bd0
Make |newToken()| only get a Token*, and have it accept a start offset, not compute one.  r=arai
https://hg.mozilla.org/integration/mozilla-inbound/rev/1eb9f57d3bf8
Eliminate the BadToken lambda in favor of pessimistically marking the outparam undefined to start, then using a MOZ_COLD badToken() function to handle failure code.  r=arai
https://hg.mozilla.org/integration/mozilla-inbound/rev/1a6282d1c64e
Make getStringOrTemplateToken handle calling badToken() itself if needed.  r=arai
https://hg.mozilla.org/integration/mozilla-inbound/rev/1d16428bd882
Make identifierName handle calling badToken() itself if needed.  r=arai
https://hg.mozilla.org/integration/mozilla-inbound/rev/034fcfc536de
Make decimalNumber handle calling badToken() itself if needed.  r=arai
https://hg.mozilla.org/integration/mozilla-inbound/rev/db1d9e0147f7
Make getDirectives handle calling badToken() itself if needed.  r=arai
https://hg.mozilla.org/integration/mozilla-inbound/rev/6393d31a33bc
Explicitly pass to FinishToken the two non-|this| captures it currently has.  r=arai
https://hg.mozilla.org/integration/mozilla-inbound/rev/a959c06847ba
Make FinishToken into a member function, not a lambda.  r=arai
https://hg.mozilla.org/integration/mozilla-inbound/rev/33ec7c8c85f4
Move the finishToken call into getStringOrTemplateToken.  r=arai
https://hg.mozilla.org/integration/mozilla-inbound/rev/d7a60d080f6c
Make TokenStreamAnyChars::cursor private, and interact with it only through accessors and mutator functions.  r=arai
https://hg.mozilla.org/integration/mozilla-inbound/rev/91303e1d488f
Allocate tokens only when the token's contents are fully determined, rather than early and sometimes over-eagerly.  r=arai
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: