Closed Bug 1459384 Opened 7 years ago Closed 7 years ago

Fix TokenStreamSpecific::getTokenInternal to be amenable to template-specializing narrow parts of it for 1/2-byte parsing

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla62

Tracking Flags:

Tracking

Status

firefox62

---

fixed

People

(Reporter: Waldo, Assigned: Waldo)

References

Details

Attachments

(21 files)

Eliminate the skipline label in TokenStreamSpecific::getTokenInternal in favor of a separate function 7 years ago Jeff Walden [:Waldo] 10.55 KB, patch	arai : review+	Details \| Diff \| Splinter Review
Eliminate the \|out\| label from \|TokenStreamSpecific::getTokenInternal\| 7 years ago Jeff Walden [:Waldo] 14.01 KB, patch	arai : review+	Details \| Diff \| Splinter Review
Eliminate the \|error\| label from \|TokenStreamSpecific::getTokenInternal\| 7 years ago Jeff Walden [:Waldo] 21.43 KB, patch	arai : review+	Details \| Diff \| Splinter Review
Eliminate the \|retry\| label in \|TokenStreamSpecific::getTokenInternal\| by converting the surrounding code to a loop and using \|continue\| instead 7 years ago Jeff Walden [:Waldo] 44.17 KB, patch	arai : review+	Details \| Diff \| Splinter Review
Eliminate the \|retry\| label, no-whitespace-changes version 7 years ago Jeff Walden [:Waldo] 16.40 KB, patch		Details \| Diff \| Splinter Review
Move some overscoped variables inside the retry-for-comment loop 7 years ago Jeff Walden [:Waldo] 1.72 KB, patch	arai : review+	Details \| Diff \| Splinter Review
Rename FirstCharKind's BasePrefix initializer to ZeroDigit for better understandability in the place where behavior diverges when that value is found 7 years ago Jeff Walden [:Waldo] 4.92 KB, patch	arai : review+	Details \| Diff \| Splinter Review
Move decimal number tokenizing into a separate function and eliminate the final two labels from TokenStreamSpecific::getTokenInternal 7 years ago Jeff Walden [:Waldo] 15.27 KB, patch	arai : review+	Details \| Diff \| Splinter Review
Merge a few declarations into their first assignments, now that that's possible 7 years ago Jeff Walden [:Waldo] 2.28 KB, patch	arai : review+	Details \| Diff \| Splinter Review
Pass \|Token* tp\| directly to \|FinishToken\| to eliminate another overshared local 7 years ago Jeff Walden [:Waldo] 16.82 KB, patch	arai : review+	Details \| Diff \| Splinter Review
Make \|newToken()\| only get a Token*, and have it accept a start offset, not compute one 7 years ago Jeff Walden [:Waldo] 8.41 KB, patch	arai : review+	Details \| Diff \| Splinter Review
Eliminate the BadToken lambda in favor of pessimistically marking the outparam undefined to start, then using a MOZ_COLD badToken() function to handle failure code 7 years ago Jeff Walden [:Waldo] 23.60 KB, patch	arai : review+	Details \| Diff \| Splinter Review
Make getStringOrTemplateToken handle calling badToken() itself if needed 7 years ago Jeff Walden [:Waldo] 4.46 KB, patch	arai : review+	Details \| Diff \| Splinter Review
Make identifierName handle calling badToken() itself if needed 7 years ago Jeff Walden [:Waldo] 4.67 KB, patch	arai : review+	Details \| Diff \| Splinter Review
Make decimalNumber handle calling badToken() itself if needed 7 years ago Jeff Walden [:Waldo] 4.26 KB, patch	arai : review+	Details \| Diff \| Splinter Review
Make getDirectives handle calling badToken() itself if needed 7 years ago Jeff Walden [:Waldo] 2.85 KB, patch	arai : review+	Details \| Diff \| Splinter Review
Explicitly pass to FinishToken the two non-\|this\| captures it currently has 7 years ago Jeff Walden [:Waldo] 14.86 KB, patch	arai : review+	Details \| Diff \| Splinter Review
Make FinishToken into a member function, not a lambda 7 years ago Jeff Walden [:Waldo] 17.46 KB, patch	arai : review+	Details \| Diff \| Splinter Review
Move the finishToken call into getStringOrTemplateToken 7 years ago Jeff Walden [:Waldo] 5.54 KB, patch	arai : review+	Details \| Diff \| Splinter Review
Make TokenStreamAnyChars::cursor private, and interact with it only through accessors and mutator functions 7 years ago Jeff Walden [:Waldo] 10.57 KB, patch	arai : review+	Details \| Diff \| Splinter Review
Allocate tokens only when the token's contents are fully determined, rather than early and sometimes over-eagerly 7 years ago Jeff Walden [:Waldo] 39.37 KB, patch	arai : review+	Details \| Diff \| Splinter Review

Jeff Walden [:Waldo]

Assignee

Description

•

7 years ago

It is clearly unworkable to specialize getTokenInternal for different code unit sizes. So we need to be able to have smaller parts of it specializable. This basically requires that the function's labels, gotos, and other awful hacks be cleaned up. This function can be normal C++. Let's do that, then everything will be far simpler to specialize individually. (This also has some optimization benefits -- no allocating a Token, then having to explicitly deallocate it when it turns out we don't actually need it.) Long patch series to follow. Fair warning: some changes I make, then they end up disappearing by the end of the sequence as the final product is IMO more readable. Extra work, but hopefully it'll engender more confidence.

Jeff Walden [:Waldo]

Assignee

Comment 1

•

7 years ago

Attached patch Eliminate the skipline label in TokenStreamSpecific::getTokenInternal in favor of a separate function — Details — Splinter Review

Attachment #8973402 - Flags: review?(arai.unmht)

Jeff Walden [:Waldo]

Assignee

Comment 2

•

7 years ago

Attached patch Eliminate the |out| label from |TokenStreamSpecific::getTokenInternal| — Details — Splinter Review

Arguably FinishToken could return true and then it could be |return FinishToken();| everywhere, but I don't have finishToken *at all* in the final picture, so for now it's two separate lines.

Attachment #8973404 - Flags: review?(arai.unmht)

Jeff Walden [:Waldo]

Assignee

Comment 3

•

7 years ago

Attached patch Eliminate the |error| label from |TokenStreamSpecific::getTokenInternal| — Details — Splinter Review

Attachment #8973405 - Flags: review?(arai.unmht)

Jeff Walden [:Waldo]

Assignee

Comment 4

•

7 years ago

Sorry 'bout this requiring touching the whole function body basically. I'll post a -w version in a sec too.

Attachment #8973406 - Flags: review?(arai.unmht)

Jeff Walden [:Waldo]

Assignee

Comment 5

•

7 years ago

Attached patch Eliminate the |retry| label, no-whitespace-changes version — Details — Splinter Review

Jeff Walden [:Waldo]

Assignee

Comment 6

•

7 years ago

Attached patch Move some overscoped variables inside the retry-for-comment loop — Details — Splinter Review

Attachment #8973408 - Flags: review?(arai.unmht)

Jeff Walden [:Waldo]

Assignee

Comment 7

•

7 years ago

Attached patch Rename FirstCharKind's BasePrefix initializer to ZeroDigit for better understandability in the place where behavior diverges when that value is found — Details — Splinter Review

Attachment #8973409 - Flags: review?(arai.unmht)

Jeff Walden [:Waldo]

Assignee

Comment 8

•

7 years ago

Attached patch Move decimal number tokenizing into a separate function and eliminate the final two labels from TokenStreamSpecific::getTokenInternal — Details — Splinter Review

This is serious gnarl, but the labels and gotos here were even seriouser gnarl. I hope the comment by decimalNumber is mildly helpful in understanding just how each path through the function is safe. I suppose it's *very mildly* less efficient to parse .13893-style decimal numbers now, because you have an |IsAsciiDigit(c)| test and a |c == '.'| test to step through to get there, but this is kinda silly IMO. Dot-leading decimal numbers are not horrifically common, and not jumping to the exactest starting place is not going to kill anyone.

Attachment #8973410 - Flags: review?(arai.unmht)

Jeff Walden [:Waldo]

Assignee

Comment 9

•

7 years ago

Attached patch Merge a few declarations into their first assignments, now that that's possible — Details — Splinter Review

Attachment #8973411 - Flags: review?(arai.unmht)

Jeff Walden [:Waldo]

Assignee

Comment 10

•

7 years ago

Attached patch Pass |Token* tp| directly to |FinishToken| to eliminate another overshared local — Details — Splinter Review

Ultimately the Token* will be allocated at the point at which we can fill in all parts of it, so pushing |tp| arguments everywhere is somewhat temporary.

Attachment #8973412 - Flags: review?(arai.unmht)

Jeff Walden [:Waldo]

Assignee

Comment 11

•

7 years ago

Attached patch Make |newToken()| only get a Token*, and have it accept a start offset, not compute one — Details — Splinter Review

The start offset is the only real thing we need to get early on. Allocating a Token early on is not necessary. So separate out that computation, so that removing the Token computation later is simpler.

Attachment #8973413 - Flags: review?(arai.unmht)

Jeff Walden [:Waldo]

Assignee

Comment 12

•

7 years ago

Attached patch Eliminate the BadToken lambda in favor of pessimistically marking the outparam undefined to start, then using a MOZ_COLD badToken() function to handle failure code — Details — Splinter Review

Attachment #8973414 - Flags: review?(arai.unmht)

Jeff Walden [:Waldo]

Assignee

Comment 13

•

7 years ago

Attached patch Make getStringOrTemplateToken handle calling badToken() itself if needed — Details — Splinter Review

The ultimate goal is for callout functions to just be |return decimalNumber(...)| or |return identifierName(...)| or similar. This step is in pursuit of that.

Attachment #8973415 - Flags: review?(arai.unmht)

Jeff Walden [:Waldo]

Assignee

Comment 14

•

7 years ago

Attached patch Make identifierName handle calling badToken() itself if needed — Details — Splinter Review

Attachment #8973416 - Flags: review?(arai.unmht)

Jeff Walden [:Waldo]

Assignee

Comment 15

•

7 years ago

Attached patch Make decimalNumber handle calling badToken() itself if needed — Details — Splinter Review

I'm hoping the compiler can optimize and recognize the embedded bool is always true *up til* the last step. But honestly, if it isn't, it's probably micro-optimization territory to be worried.

Attachment #8973417 - Flags: review?(arai.unmht)

Jeff Walden [:Waldo]

Assignee

Comment 16

•

7 years ago

Attached patch Make getDirectives handle calling badToken() itself if needed — Details — Splinter Review

Attachment #8973418 - Flags: review?(arai.unmht)

Jeff Walden [:Waldo]

Assignee

Comment 17

•

7 years ago

Attached patch Explicitly pass to FinishToken the two non-|this| captures it currently has — Details — Splinter Review

Prep for making finish-token be folded into token allocation.

Attachment #8973419 - Flags: review?(arai.unmht)

Jeff Walden [:Waldo]

Assignee

Comment 18

•

7 years ago

Attached patch Make FinishToken into a member function, not a lambda — Details — Splinter Review

Attachment #8973420 - Flags: review?(arai.unmht)

Jeff Walden [:Waldo]

Assignee

Comment 19

•

7 years ago

Attached patch Move the finishToken call into getStringOrTemplateToken — Details — Splinter Review

Attachment #8973421 - Flags: review?(arai.unmht)

Jeff Walden [:Waldo]

Assignee

Comment 20

•

7 years ago

Attached patch Make TokenStreamAnyChars::cursor private, and interact with it only through accessors and mutator functions — Details — Splinter Review

Attachment #8973422 - Flags: review?(arai.unmht)

Jeff Walden [:Waldo]

Assignee

Comment 21

•

7 years ago

Attached patch Allocate tokens only when the token's contents are fully determined, rather than early and sometimes over-eagerly — Details — Splinter Review

This is kind of the big kahuna. Note that IsTokenSane's tests are folded into token-creation, so no need for the function any more. It's debatable whether the new*Token functions should return true unconditionally or not. I went with making the return-true explicit in source, but I could go either way. It's really a joy to not have to deallocate already-allocated tokens in some of the various places in the prior code, after this change. The simplification in this change is so good, like -40 lines or so. *Overall*, this patch stack adds probably 150+ lines. Some of that is taken up by more comments, tho. I think the end state is a lot better, tho -- and some of this *has* to happen for single-byte parsing, e.g. consume-rest-of-single-line-comment must be CharT-aware because not all line terminators are single code units. (I see the trailing-WS on tthat one line, will fix before landing.) There is somewhat sort of one thing with all this that I haven't done: done stupidly low-level benchmarking. If it comes to it, some of this we can surely fix by adding forced-inline versions of relevant functions, then calling the forced-inline functions in those places that need it. But as a first pass at least, I am disinclined to spend a bunch of time investigating, versus landing this and seeing what complaints actually manifest.

Attachment #8973423 - Flags: review?(arai.unmht)

Jeff Walden [:Waldo]

Assignee

Comment 22

•

7 years ago

https://treeherder.mozilla.org/#/jobs?repo=try&revision=b6b5c40c3d6696c625d664adbf51721f5cd7f451 is a nice greenish tree with all these patches applied, maybe with mild bits of fuzz for one or two post-push tweakings.

Tooru Fujisawa [:arai]

Updated

•

7 years ago

Attachment #8973402 - Flags: review?(arai.unmht) → review+

Tooru Fujisawa [:arai]

Updated

•

7 years ago

Attachment #8973404 - Flags: review?(arai.unmht) → review+

Tooru Fujisawa [:arai]

Updated

•

7 years ago

Attachment #8973405 - Flags: review?(arai.unmht) → review+

Tooru Fujisawa [:arai]

Updated

•

7 years ago

Attachment #8973406 - Flags: review?(arai.unmht) → review+

Tooru Fujisawa [:arai]

Updated

•

7 years ago

Attachment #8973408 - Flags: review?(arai.unmht) → review+

Tooru Fujisawa [:arai]

Updated

•

7 years ago

Attachment #8973409 - Flags: review?(arai.unmht) → review+

Tooru Fujisawa [:arai]

Comment 23

•

7 years ago

Comment on attachment 8973410 [details] [diff] [review] Move decimal number tokenizing into a separate function and eliminate the final two labels from TokenStreamSpecific::getTokenInternal Review of attachment 8973410 [details] [diff] [review]: ----------------------------------------------------------------- r+ with the comments addressed. ::: js/src/frontend/TokenStream.cpp @@ +1589,5 @@ > + error(JSMSG_IDSTART_AFTER_NUMBER); > + return false; > + } > + > + ungetCharIgnoreEOL(c); if `codePoint != 0`, we should unget trail surrogate before `c`, which is lead surrogate. @@ -1742,5 @@ > - return false; > - } > - > - uint32_t codePoint; > - if (!matchMultiUnitCodePoint(c, &codePoint)) { apparently this wasn't working properly. consumeKnownCharIgnoreEOL which you've added in this patch fixes. (I think matchMultiUnitCodePoint should be renamed to clarify it matches from current code unit, not next... but this is for another bug) ::: js/src/frontend/TokenStream.h @@ +1391,5 @@ > + * integer part so is interpreted as decimal, e.g. '9' in "09.28" or > + * '8' in "0386" or '9' in "09+7" (three separate tokens"). > + * > + * In this case, the next |getCharIgnoreEOL()| returns the code unit after > + * |c|: '.', '6', or '+' in the examples above. 2 more spaces indentation.

Attachment #8973410 - Flags: review?(arai.unmht) → review+

Tooru Fujisawa [:arai]

Updated

•

7 years ago

Attachment #8973411 - Flags: review?(arai.unmht) → review+

Tooru Fujisawa [:arai]

Updated

•

7 years ago

Attachment #8973412 - Flags: review?(arai.unmht) → review+

Tooru Fujisawa [:arai]

Comment 24

•

7 years ago

Comment on attachment 8973413 [details] [diff] [review] Make |newToken()| only get a Token*, and have it accept a start offset, not compute one Review of attachment 8973413 [details] [diff] [review]: ----------------------------------------------------------------- r+ with or without the comment addressed. ::: js/src/frontend/TokenStream.h @@ +1031,5 @@ > + uint32_t startOffset_; > + > + public: > + template<class SourceUnits> > + TokenStart(const SourceUnits& sourceUnits, ptrdiff_t adjust) might be nice to add comment that says what `adjust` is for? (it's almost clear from the code just below tho...) or maybe negate the value and rename it to "alreadyReadCodeUnitsCount" or something that clarifies the purpose?

Attachment #8973413 - Flags: review?(arai.unmht) → review+

Tooru Fujisawa [:arai]

Comment 25

•

7 years ago

Comment on attachment 8973414 [details] [diff] [review] Eliminate the BadToken lambda in favor of pessimistically marking the outparam undefined to start, then using a MOZ_COLD badToken() function to handle failure code Review of attachment 8973414 [details] [diff] [review]: ----------------------------------------------------------------- ::: js/src/frontend/TokenStream.h @@ +1078,5 @@ > * buffer of Tokens in |anyCharsAccess()|, that begins at |start.offset()|. > */ > Token* newToken(TokenStart start); > > + MOZ_COLD bool badToken(); TIL MOZ_COLD :)

Attachment #8973414 - Flags: review?(arai.unmht) → review+

Tooru Fujisawa [:arai]

Updated

•

7 years ago

Attachment #8973415 - Flags: review?(arai.unmht) → review+

Tooru Fujisawa [:arai]

Updated

•

7 years ago

Attachment #8973416 - Flags: review?(arai.unmht) → review+

Tooru Fujisawa [:arai]

Updated

•

7 years ago

Attachment #8973417 - Flags: review?(arai.unmht) → review+

Tooru Fujisawa [:arai]

Updated

•

7 years ago

Attachment #8973418 - Flags: review?(arai.unmht) → review+

Tooru Fujisawa [:arai]

Updated

•

7 years ago

Attachment #8973419 - Flags: review?(arai.unmht) → review+

Tooru Fujisawa [:arai]

Updated

•

7 years ago

Attachment #8973420 - Flags: review?(arai.unmht) → review+

Tooru Fujisawa [:arai]

Updated

•

7 years ago

Attachment #8973421 - Flags: review?(arai.unmht) → review+

Tooru Fujisawa [:arai]

Updated

•

7 years ago

Attachment #8973422 - Flags: review?(arai.unmht) → review+

Tooru Fujisawa [:arai]

Updated

•

7 years ago

Attachment #8973423 - Flags: review?(arai.unmht) → review+

Pulsebot

Comment 26

•

7 years ago

Pushed by jwalden@mit.edu: https://hg.mozilla.org/integration/mozilla-inbound/rev/8f3a487410d5 Eliminate the skipline label in TokenStreamSpecific::getTokenInternal in favor of a separate function. r=arai https://hg.mozilla.org/integration/mozilla-inbound/rev/5edb44d0f9b8 Eliminate the |out| label from |TokenStreamSpecific::getTokenInternal|. r=arai https://hg.mozilla.org/integration/mozilla-inbound/rev/3a72d92a62f6 Eliminate the |error| label from |TokenStreamSpecific::getTokenInternal|. r=arai https://hg.mozilla.org/integration/mozilla-inbound/rev/ce75b53f42af Eliminate the |retry| label in |TokenStreamSpecific::getTokenInternal| by converting the surrounding code to a loop and using |continue| instead. r=arai https://hg.mozilla.org/integration/mozilla-inbound/rev/4b04f751a36e Move some overscoped variables inside the retry-for-comment loop. r=arai https://hg.mozilla.org/integration/mozilla-inbound/rev/5b8dd9e4d9b2 Rename FirstCharKind's BasePrefix initializer to ZeroDigit for better understandability in the place where behavior diverges when that value is found. r=arai https://hg.mozilla.org/integration/mozilla-inbound/rev/66dd3e49d19e Move decimal number tokenizing into a separate function and eliminate the final two labels from TokenStreamSpecific::getTokenInternal. r=arai https://hg.mozilla.org/integration/mozilla-inbound/rev/102af98a466e Merge a few declarations into their first assignments, now that that's possible. r=arai https://hg.mozilla.org/integration/mozilla-inbound/rev/6eb5216cdcf2 Pass |Token* tp| directly to |FinishToken| to eliminate another overshared local. r=arai https://hg.mozilla.org/integration/mozilla-inbound/rev/d8c1ab8b7bd0 Make |newToken()| only get a Token*, and have it accept a start offset, not compute one. r=arai https://hg.mozilla.org/integration/mozilla-inbound/rev/1eb9f57d3bf8 Eliminate the BadToken lambda in favor of pessimistically marking the outparam undefined to start, then using a MOZ_COLD badToken() function to handle failure code. r=arai https://hg.mozilla.org/integration/mozilla-inbound/rev/1a6282d1c64e Make getStringOrTemplateToken handle calling badToken() itself if needed. r=arai https://hg.mozilla.org/integration/mozilla-inbound/rev/1d16428bd882 Make identifierName handle calling badToken() itself if needed. r=arai https://hg.mozilla.org/integration/mozilla-inbound/rev/034fcfc536de Make decimalNumber handle calling badToken() itself if needed. r=arai https://hg.mozilla.org/integration/mozilla-inbound/rev/db1d9e0147f7 Make getDirectives handle calling badToken() itself if needed. r=arai https://hg.mozilla.org/integration/mozilla-inbound/rev/6393d31a33bc Explicitly pass to FinishToken the two non-|this| captures it currently has. r=arai https://hg.mozilla.org/integration/mozilla-inbound/rev/a959c06847ba Make FinishToken into a member function, not a lambda. r=arai https://hg.mozilla.org/integration/mozilla-inbound/rev/33ec7c8c85f4 Move the finishToken call into getStringOrTemplateToken. r=arai https://hg.mozilla.org/integration/mozilla-inbound/rev/d7a60d080f6c Make TokenStreamAnyChars::cursor private, and interact with it only through accessors and mutator functions. r=arai https://hg.mozilla.org/integration/mozilla-inbound/rev/91303e1d488f Allocate tokens only when the token's contents are fully determined, rather than early and sometimes over-eagerly. r=arai

Natalia Csoregi [:nataliaCs]

Comment 27

•

7 years ago

bugherder

Status: ASSIGNED → RESOLVED

Closed: 7 years ago

status-firefox62: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla62

You need to log in before you can comment on or make changes to this bug.