Closed
Bug 1459384
Opened 7 years ago
Closed 7 years ago
Fix TokenStreamSpecific::getTokenInternal to be amenable to template-specializing narrow parts of it for 1/2-byte parsing
Categories
(Core :: JavaScript Engine, enhancement)
Core
JavaScript Engine
Tracking
()
RESOLVED
FIXED
mozilla62
Tracking | Status | |
---|---|---|
firefox62 | --- | fixed |
People
(Reporter: Waldo, Assigned: Waldo)
References
Details
Attachments
(21 files)
It is clearly unworkable to specialize getTokenInternal for different code unit sizes. So we need to be able to have smaller parts of it specializable.
This basically requires that the function's labels, gotos, and other awful hacks be cleaned up. This function can be normal C++. Let's do that, then everything will be far simpler to specialize individually. (This also has some optimization benefits -- no allocating a Token, then having to explicitly deallocate it when it turns out we don't actually need it.)
Long patch series to follow. Fair warning: some changes I make, then they end up disappearing by the end of the sequence as the final product is IMO more readable. Extra work, but hopefully it'll engender more confidence.
Assignee | ||
Comment 1•7 years ago
|
||
Attachment #8973402 -
Flags: review?(arai.unmht)
Assignee | ||
Comment 2•7 years ago
|
||
Arguably FinishToken could return true and then it could be |return FinishToken();| everywhere, but I don't have finishToken *at all* in the final picture, so for now it's two separate lines.
Attachment #8973404 -
Flags: review?(arai.unmht)
Assignee | ||
Comment 3•7 years ago
|
||
Attachment #8973405 -
Flags: review?(arai.unmht)
Assignee | ||
Comment 4•7 years ago
|
||
Sorry 'bout this requiring touching the whole function body basically. I'll post a -w version in a sec too.
Attachment #8973406 -
Flags: review?(arai.unmht)
Assignee | ||
Comment 5•7 years ago
|
||
Assignee | ||
Comment 6•7 years ago
|
||
Attachment #8973408 -
Flags: review?(arai.unmht)
Assignee | ||
Comment 7•7 years ago
|
||
Attachment #8973409 -
Flags: review?(arai.unmht)
Assignee | ||
Comment 8•7 years ago
|
||
This is serious gnarl, but the labels and gotos here were even seriouser gnarl. I hope the comment by decimalNumber is mildly helpful in understanding just how each path through the function is safe. I suppose it's *very mildly* less efficient to parse .13893-style decimal numbers now, because you have an |IsAsciiDigit(c)| test and a |c == '.'| test to step through to get there, but this is kinda silly IMO. Dot-leading decimal numbers are not horrifically common, and not jumping to the exactest starting place is not going to kill anyone.
Attachment #8973410 -
Flags: review?(arai.unmht)
Assignee | ||
Comment 9•7 years ago
|
||
Attachment #8973411 -
Flags: review?(arai.unmht)
Assignee | ||
Comment 10•7 years ago
|
||
Ultimately the Token* will be allocated at the point at which we can fill in all parts of it, so pushing |tp| arguments everywhere is somewhat temporary.
Attachment #8973412 -
Flags: review?(arai.unmht)
Assignee | ||
Comment 11•7 years ago
|
||
The start offset is the only real thing we need to get early on. Allocating a Token early on is not necessary. So separate out that computation, so that removing the Token computation later is simpler.
Attachment #8973413 -
Flags: review?(arai.unmht)
Assignee | ||
Comment 12•7 years ago
|
||
Attachment #8973414 -
Flags: review?(arai.unmht)
Assignee | ||
Comment 13•7 years ago
|
||
The ultimate goal is for callout functions to just be |return decimalNumber(...)| or |return identifierName(...)| or similar. This step is in pursuit of that.
Attachment #8973415 -
Flags: review?(arai.unmht)
Assignee | ||
Comment 14•7 years ago
|
||
Attachment #8973416 -
Flags: review?(arai.unmht)
Assignee | ||
Comment 15•7 years ago
|
||
I'm hoping the compiler can optimize and recognize the embedded bool is always true *up til* the last step. But honestly, if it isn't, it's probably micro-optimization territory to be worried.
Attachment #8973417 -
Flags: review?(arai.unmht)
Assignee | ||
Comment 16•7 years ago
|
||
Attachment #8973418 -
Flags: review?(arai.unmht)
Assignee | ||
Comment 17•7 years ago
|
||
Prep for making finish-token be folded into token allocation.
Attachment #8973419 -
Flags: review?(arai.unmht)
Assignee | ||
Comment 18•7 years ago
|
||
Attachment #8973420 -
Flags: review?(arai.unmht)
Assignee | ||
Comment 19•7 years ago
|
||
Attachment #8973421 -
Flags: review?(arai.unmht)
Assignee | ||
Comment 20•7 years ago
|
||
Attachment #8973422 -
Flags: review?(arai.unmht)
Assignee | ||
Comment 21•7 years ago
|
||
This is kind of the big kahuna.
Note that IsTokenSane's tests are folded into token-creation, so no need for the function any more.
It's debatable whether the new*Token functions should return true unconditionally or not. I went with making the return-true explicit in source, but I could go either way.
It's really a joy to not have to deallocate already-allocated tokens in some of the various places in the prior code, after this change.
The simplification in this change is so good, like -40 lines or so. *Overall*, this patch stack adds probably 150+ lines. Some of that is taken up by more comments, tho. I think the end state is a lot better, tho -- and some of this *has* to happen for single-byte parsing, e.g. consume-rest-of-single-line-comment must be CharT-aware because not all line terminators are single code units.
(I see the trailing-WS on tthat one line, will fix before landing.)
There is somewhat sort of one thing with all this that I haven't done: done stupidly low-level benchmarking. If it comes to it, some of this we can surely fix by adding forced-inline versions of relevant functions, then calling the forced-inline functions in those places that need it. But as a first pass at least, I am disinclined to spend a bunch of time investigating, versus landing this and seeing what complaints actually manifest.
Attachment #8973423 -
Flags: review?(arai.unmht)
Assignee | ||
Comment 22•7 years ago
|
||
https://treeherder.mozilla.org/#/jobs?repo=try&revision=b6b5c40c3d6696c625d664adbf51721f5cd7f451 is a nice greenish tree with all these patches applied, maybe with mild bits of fuzz for one or two post-push tweakings.
Updated•7 years ago
|
Attachment #8973402 -
Flags: review?(arai.unmht) → review+
Updated•7 years ago
|
Attachment #8973404 -
Flags: review?(arai.unmht) → review+
Updated•7 years ago
|
Attachment #8973405 -
Flags: review?(arai.unmht) → review+
Updated•7 years ago
|
Attachment #8973406 -
Flags: review?(arai.unmht) → review+
Updated•7 years ago
|
Attachment #8973408 -
Flags: review?(arai.unmht) → review+
Updated•7 years ago
|
Attachment #8973409 -
Flags: review?(arai.unmht) → review+
Comment 23•7 years ago
|
||
Comment on attachment 8973410 [details] [diff] [review]
Move decimal number tokenizing into a separate function and eliminate the final two labels from TokenStreamSpecific::getTokenInternal
Review of attachment 8973410 [details] [diff] [review]:
-----------------------------------------------------------------
r+ with the comments addressed.
::: js/src/frontend/TokenStream.cpp
@@ +1589,5 @@
> + error(JSMSG_IDSTART_AFTER_NUMBER);
> + return false;
> + }
> +
> + ungetCharIgnoreEOL(c);
if `codePoint != 0`, we should unget trail surrogate before `c`, which is lead surrogate.
@@ -1742,5 @@
> - return false;
> - }
> -
> - uint32_t codePoint;
> - if (!matchMultiUnitCodePoint(c, &codePoint)) {
apparently this wasn't working properly.
consumeKnownCharIgnoreEOL which you've added in this patch fixes.
(I think matchMultiUnitCodePoint should be renamed to clarify it matches from current code unit, not next... but this is for another bug)
::: js/src/frontend/TokenStream.h
@@ +1391,5 @@
> + * integer part so is interpreted as decimal, e.g. '9' in "09.28" or
> + * '8' in "0386" or '9' in "09+7" (three separate tokens").
> + *
> + * In this case, the next |getCharIgnoreEOL()| returns the code unit after
> + * |c|: '.', '6', or '+' in the examples above.
2 more spaces indentation.
Attachment #8973410 -
Flags: review?(arai.unmht) → review+
Updated•7 years ago
|
Attachment #8973411 -
Flags: review?(arai.unmht) → review+
Updated•7 years ago
|
Attachment #8973412 -
Flags: review?(arai.unmht) → review+
Comment 24•7 years ago
|
||
Comment on attachment 8973413 [details] [diff] [review]
Make |newToken()| only get a Token*, and have it accept a start offset, not compute one
Review of attachment 8973413 [details] [diff] [review]:
-----------------------------------------------------------------
r+ with or without the comment addressed.
::: js/src/frontend/TokenStream.h
@@ +1031,5 @@
> + uint32_t startOffset_;
> +
> + public:
> + template<class SourceUnits>
> + TokenStart(const SourceUnits& sourceUnits, ptrdiff_t adjust)
might be nice to add comment that says what `adjust` is for?
(it's almost clear from the code just below tho...)
or maybe negate the value and rename it to "alreadyReadCodeUnitsCount" or something that clarifies the purpose?
Attachment #8973413 -
Flags: review?(arai.unmht) → review+
Comment 25•7 years ago
|
||
Comment on attachment 8973414 [details] [diff] [review]
Eliminate the BadToken lambda in favor of pessimistically marking the outparam undefined to start, then using a MOZ_COLD badToken() function to handle failure code
Review of attachment 8973414 [details] [diff] [review]:
-----------------------------------------------------------------
::: js/src/frontend/TokenStream.h
@@ +1078,5 @@
> * buffer of Tokens in |anyCharsAccess()|, that begins at |start.offset()|.
> */
> Token* newToken(TokenStart start);
>
> + MOZ_COLD bool badToken();
TIL MOZ_COLD :)
Attachment #8973414 -
Flags: review?(arai.unmht) → review+
Updated•7 years ago
|
Attachment #8973415 -
Flags: review?(arai.unmht) → review+
Updated•7 years ago
|
Attachment #8973416 -
Flags: review?(arai.unmht) → review+
Updated•7 years ago
|
Attachment #8973417 -
Flags: review?(arai.unmht) → review+
Updated•7 years ago
|
Attachment #8973418 -
Flags: review?(arai.unmht) → review+
Updated•7 years ago
|
Attachment #8973419 -
Flags: review?(arai.unmht) → review+
Updated•7 years ago
|
Attachment #8973420 -
Flags: review?(arai.unmht) → review+
Updated•7 years ago
|
Attachment #8973421 -
Flags: review?(arai.unmht) → review+
Updated•7 years ago
|
Attachment #8973422 -
Flags: review?(arai.unmht) → review+
Updated•7 years ago
|
Attachment #8973423 -
Flags: review?(arai.unmht) → review+
Comment 26•7 years ago
|
||
Pushed by jwalden@mit.edu:
https://hg.mozilla.org/integration/mozilla-inbound/rev/8f3a487410d5
Eliminate the skipline label in TokenStreamSpecific::getTokenInternal in favor of a separate function. r=arai
https://hg.mozilla.org/integration/mozilla-inbound/rev/5edb44d0f9b8
Eliminate the |out| label from |TokenStreamSpecific::getTokenInternal|. r=arai
https://hg.mozilla.org/integration/mozilla-inbound/rev/3a72d92a62f6
Eliminate the |error| label from |TokenStreamSpecific::getTokenInternal|. r=arai
https://hg.mozilla.org/integration/mozilla-inbound/rev/ce75b53f42af
Eliminate the |retry| label in |TokenStreamSpecific::getTokenInternal| by converting the surrounding code to a loop and using |continue| instead. r=arai
https://hg.mozilla.org/integration/mozilla-inbound/rev/4b04f751a36e
Move some overscoped variables inside the retry-for-comment loop. r=arai
https://hg.mozilla.org/integration/mozilla-inbound/rev/5b8dd9e4d9b2
Rename FirstCharKind's BasePrefix initializer to ZeroDigit for better understandability in the place where behavior diverges when that value is found. r=arai
https://hg.mozilla.org/integration/mozilla-inbound/rev/66dd3e49d19e
Move decimal number tokenizing into a separate function and eliminate the final two labels from TokenStreamSpecific::getTokenInternal. r=arai
https://hg.mozilla.org/integration/mozilla-inbound/rev/102af98a466e
Merge a few declarations into their first assignments, now that that's possible. r=arai
https://hg.mozilla.org/integration/mozilla-inbound/rev/6eb5216cdcf2
Pass |Token* tp| directly to |FinishToken| to eliminate another overshared local. r=arai
https://hg.mozilla.org/integration/mozilla-inbound/rev/d8c1ab8b7bd0
Make |newToken()| only get a Token*, and have it accept a start offset, not compute one. r=arai
https://hg.mozilla.org/integration/mozilla-inbound/rev/1eb9f57d3bf8
Eliminate the BadToken lambda in favor of pessimistically marking the outparam undefined to start, then using a MOZ_COLD badToken() function to handle failure code. r=arai
https://hg.mozilla.org/integration/mozilla-inbound/rev/1a6282d1c64e
Make getStringOrTemplateToken handle calling badToken() itself if needed. r=arai
https://hg.mozilla.org/integration/mozilla-inbound/rev/1d16428bd882
Make identifierName handle calling badToken() itself if needed. r=arai
https://hg.mozilla.org/integration/mozilla-inbound/rev/034fcfc536de
Make decimalNumber handle calling badToken() itself if needed. r=arai
https://hg.mozilla.org/integration/mozilla-inbound/rev/db1d9e0147f7
Make getDirectives handle calling badToken() itself if needed. r=arai
https://hg.mozilla.org/integration/mozilla-inbound/rev/6393d31a33bc
Explicitly pass to FinishToken the two non-|this| captures it currently has. r=arai
https://hg.mozilla.org/integration/mozilla-inbound/rev/a959c06847ba
Make FinishToken into a member function, not a lambda. r=arai
https://hg.mozilla.org/integration/mozilla-inbound/rev/33ec7c8c85f4
Move the finishToken call into getStringOrTemplateToken. r=arai
https://hg.mozilla.org/integration/mozilla-inbound/rev/d7a60d080f6c
Make TokenStreamAnyChars::cursor private, and interact with it only through accessors and mutator functions. r=arai
https://hg.mozilla.org/integration/mozilla-inbound/rev/91303e1d488f
Allocate tokens only when the token's contents are fully determined, rather than early and sometimes over-eagerly. r=arai
Comment 27•7 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/8f3a487410d5
https://hg.mozilla.org/mozilla-central/rev/5edb44d0f9b8
https://hg.mozilla.org/mozilla-central/rev/3a72d92a62f6
https://hg.mozilla.org/mozilla-central/rev/ce75b53f42af
https://hg.mozilla.org/mozilla-central/rev/4b04f751a36e
https://hg.mozilla.org/mozilla-central/rev/5b8dd9e4d9b2
https://hg.mozilla.org/mozilla-central/rev/66dd3e49d19e
https://hg.mozilla.org/mozilla-central/rev/102af98a466e
https://hg.mozilla.org/mozilla-central/rev/6eb5216cdcf2
https://hg.mozilla.org/mozilla-central/rev/d8c1ab8b7bd0
https://hg.mozilla.org/mozilla-central/rev/1eb9f57d3bf8
https://hg.mozilla.org/mozilla-central/rev/1a6282d1c64e
https://hg.mozilla.org/mozilla-central/rev/1d16428bd882
https://hg.mozilla.org/mozilla-central/rev/034fcfc536de
https://hg.mozilla.org/mozilla-central/rev/db1d9e0147f7
https://hg.mozilla.org/mozilla-central/rev/6393d31a33bc
https://hg.mozilla.org/mozilla-central/rev/a959c06847ba
https://hg.mozilla.org/mozilla-central/rev/33ec7c8c85f4
https://hg.mozilla.org/mozilla-central/rev/d7a60d080f6c
https://hg.mozilla.org/mozilla-central/rev/91303e1d488f
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
status-firefox62:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla62
You need to log in
before you can comment on or make changes to this bug.
Description
•