Closed
Bug 747831
Opened 12 years ago
Closed 11 years ago
Record buffer index in ParseNodes
Categories
(Core :: JavaScript Engine, defect)
Core
JavaScript Engine
Tracking
()
RESOLVED
FIXED
mozilla22
People
(Reporter: n.nethercote, Assigned: n.nethercote)
References
Details
(Whiteboard: [MemShrink:P2])
Attachments
(2 files, 2 obsolete files)
2.86 KB,
patch
|
Details | Diff | Splinter Review | |
62.94 KB,
patch
|
jorendorff
:
review+
|
Details | Diff | Splinter Review |
In order to implement decompilation and lazy bytecode compilation, we need to record the source code for functions. I.e. given a ParseNode for a function, we need to be able to copy the source code chars from the original buffer. That doesn't sound difficult. However, currently the beginning and end of each ParseNode is recorded with a {lineno, index} pair, where |index| is the char index in the line specified by |lineno|. This is fine for error message but no good for copying chars from the original buffer. There's a partial patch in bug 626932 that started towards replacing |lineno| and |index| with a single offset into the original buffer, which will make this easy. (It will also shrink ParseNode if I play my cards right, which is a good thing because we can have 10s of MBs of them, albeit short-lived.) I will try to resurrect this patch.
Assignee | ||
Comment 1•12 years ago
|
||
This patch: - Changes TokenPtr to just be a uint32_t, which is an index into the source code buffer. - Added TokenStream::SourceCoords, a class for mapping between buffer indices and line/column numbers. Consequences: - Having to look up a table to get a line/column number is obviously more complex than just having it attached to the Token/ParseNode. But there is a complexity win, which is that we don't have the possibility of inconsistent lineno/index values in a TokenPtr; the existing code sets these in different places (e.g. for multi-line tokens). This simplifies things in a couple of places. - I carefully optimized the lookups. Cachegrind tells me the increase in instructions executed on SunSpider/V8/Kraken is negligible (i.e. less than 1 part in 1000). When parsing and byte-code compiling parsemark the increase is about 1%, which is higher than I'd like. Things I'm uncertain about: - I haven't handled //@line comments. For SourceCoords lookups to be fast it relies heavily on line numbers increasing monotonically, and //@line comments can break that. (There's an "njn" comment in the relevant part of the patch about this.) - I haven't done anything w.r.t. source maps. They're not implemented fully enough for me to tell how my changes interact with them. - I used 1-indexing for the column numbers, because that seemed obviously right, and it's what vim uses, GCC uses, etc. But yesterday I learnt that Reflect.parse and emacs both use 0-indexed column numbers (but 1-indexed line numbers, of course). Sigh.
Attachment #618894 -
Flags: feedback?(jimb)
Assignee | ||
Comment 2•12 years ago
|
||
This version inlines the crucial isOnThisLine() function. Perf on parsemark -- a much tougher test than SS/V8/Kraken because there's no code execution time to dilute costs added during parsing/bytecode-compilation -- is now unchanged on average: --------------------------------------------------------------- | millions of instructions executed | | total | compiled (may overestimate) | --------------------------------------------------------------- | 16.534 16.614 (0.995x) | 0 0 (------) | 280slides-comb | 65.168 65.131 (1.001x) | 0 0 (------) | ball-pool-comb | 33.581 33.457 (1.004x) | 0 0 (------) | dojo | 60.508 60.720 (0.997x) | 0 0 (------) | effectgames-ga | 15.534 15.495 (1.003x) | 0 0 (------) | ga | 354.401 353.013 (1.004x) | 0 0 (------) | gmail-combined | 100.657 100.465 (1.002x) | 0 0 (------) | gravity-combin | 22.697 22.634 (1.003x) | 0 0 (------) | jquery-min | 43.582 43.830 (0.994x) | 0 0 (------) | jsgb-combined | 62.129 62.353 (0.996x) | 0 0 (------) | mochikit | 310.067 308.866 (1.004x) | 0 0 (------) | pipio-combined | 12.622 12.778 (0.988x) | 0 0 (------) | processing-twi | 54.844 55.115 (0.995x) | 0 0 (------) | sunspider-comb | 19.806 19.835 (0.999x) | 0 0 (------) | tetris-combine | 76.899 76.980 (0.999x) | 0 0 (------) | twitter-combin | 81.912 82.916 (0.988x) | 0 0 (------) | v8-combined | 9.928 9.904 (1.002x) | 0 0 (------) | yui-min | 466.954 467.718 (0.998x) | 0 0 (------) | zimbra-combine ------- | 1807.832 1807.831 (------) | 0 0 (------) | all So that neutralizes the performance concerns, which is good. The //@line comments are still a problem.
Attachment #618894 -
Attachment is obsolete: true
Attachment #618894 -
Flags: feedback?(jimb)
Attachment #618926 -
Flags: feedback?(jimb)
Assignee | ||
Comment 3•12 years ago
|
||
jimb: 19-day feedback ping. This is blocking lazy bytecode generation and decompiler removal and so I'd like to keep it moving. Thanks.
Assignee | ||
Comment 4•12 years ago
|
||
Looks like I missed this feedback request's 1 month birthday, which was two days ago. Sorry kiddo, I'm a bad father.
Comment 6•12 years ago
|
||
Wouldn't it be easiest to just record the original buffer index at Parser::functionDef, stick it in funbox, and be done with it?
Assignee | ||
Updated•12 years ago
|
Blocks: savesource
Assignee | ||
Comment 7•12 years ago
|
||
(In reply to Benjamin Peterson from comment #6) > Wouldn't it be easiest to just record the original buffer index at > Parser::functionDef, stick it in funbox, and be done with it? Hmm, good question. Quite possibly.
Comment 8•12 years ago
|
||
(In reply to Nicholas Nethercote [:njn] from comment #7) > (In reply to Benjamin Peterson from comment #6) > > Wouldn't it be easiest to just record the original buffer index at > > Parser::functionDef, stick it in funbox, and be done with it? > > Hmm, good question. Quite possibly. FWIW, I'm taking this approach for bug 791723.
Comment 9•12 years ago
|
||
Uh, I meant 761723.
Assignee | ||
Updated•12 years ago
|
Attachment #618926 -
Flags: feedback?(jimb)
Assignee | ||
Comment 10•12 years ago
|
||
> > Wouldn't it be easiest to just record the original buffer index at
> > Parser::functionDef, stick it in funbox, and be done with it?
>
> Hmm, good question. Quite possibly.
Yeah, that's much simpler and avoids all the difficulty with //@line comments and source maps.
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → WONTFIX
Assignee | ||
Updated•11 years ago
|
Status: RESOLVED → REOPENED
Resolution: WONTFIX → ---
Comment 11•11 years ago
|
||
As I said in bug 626932 comment 1: I love this idea.
Assignee | ||
Comment 12•11 years ago
|
||
Not yet ready for review, but here's a version that passes jit-tests and jstests for Luke to play with.
Assignee | ||
Comment 13•11 years ago
|
||
Luke, are you happy to review this? If not, I'll ask jorendorff. This patch replaces Token's begin/end line number and column index with a begin/end buffer index, and adds a SourceCoords table on the side that allows a line/column lookup from the buffer index. On Parsemark it increase the number of instructions executed by 1.002x, but on a big asm.js file it reduces it by 1.005x. But I have some good parsing speed-ups forthcoming in bug 849367 anyway. Note that I fixed an incorrect |end| in the TOK_CONTINUE case in Parser.cpp. This was filed as bug 748201, but it was easy to do it here. One thing I found was that |tp->pos.end| was sometimes being read (in reportCompileErrorNumberVA) before being set in getTokenInternal(). It was just for an assertion, but I fixed it nonetheless, and added a Valgrind annotation to catch similar problems in the future.
Attachment #723735 -
Flags: review?(luke)
Assignee | ||
Updated•11 years ago
|
Attachment #723056 -
Attachment is obsolete: true
Comment 14•11 years ago
|
||
(In reply to Nicholas Nethercote [:njn] from comment #13) > Luke, are you happy to review this? If not, I'll ask jorendorff. I'm pretty swamped in the short-term, so better to not.
Assignee | ||
Updated•11 years ago
|
Attachment #723735 -
Flags: review?(luke) → review?(jorendorff)
Comment 16•11 years ago
|
||
Comment on attachment 723735 [details] [diff] [review] Record buffer index in ParseNodes instead of line number and column index. Review of attachment 723735 [details] [diff] [review]: ----------------------------------------------------------------- ::: js/src/frontend/BytecodeEmitter.cpp @@ +88,5 @@ > > BytecodeEmitter::BytecodeEmitter(BytecodeEmitter *parent, > Parser<FullParseHandler> *parser, SharedContext *sc, > HandleScript script, HandleScript evalCaller, bool hasGlobalScope, > + unsigned lineNum, bool selfHostingMode) uint32_t here too? ::: js/src/frontend/FullParseHandler.h @@ +142,3 @@ > return new_<ContinueStatement>(label, begin, end); > } > ParseNode *newDebuggerStatement(const TokenPos &pos) { Hmm. You changed a bunch of places that take TokenPos arguments from const references to values, but not others (including this one). What was the rule? ::: js/src/frontend/TokenStream.cpp @@ +115,5 @@ > + // The first line begins at index 0. MAX_PTR is the sentinel. > + // The appends cannot fail because |lineStartPtrs_| has > + // statically-allocated elements. > + JS_ASSERT(lineStartPtrs_.capacity() >= 2); > + (void)lineStartPtrs_.reserve(2); Is this reserve() call necessary? @@ +152,5 @@ > + > + if (lineStartPtrs_[lastLineIndex_] <= tptr) { > + // If we reach here, tptr is on a line the same as or higher than last > + // time. Check first for the +0, +1, +2 cases, because they typically > + // cover 85--98% of cases. Wow. This really improves performance? Huh. @@ -410,4 @@ > > // Poison other members. > cur->type = TOK_ERROR; > - cur->ptr = NULL; Micro-nit: Plural "members" in the comment isn't true anymore. @@ +622,5 @@ > /* > * Given a token, T, that we want to complain about: if T's (starting) > * lineno doesn't match TokenStream's lineno, that means we've scanned past > * the line that T starts on, which makes it hard to print some or all of > * T's (starting) line for context. Um. Is it still hard? ::: js/src/frontend/TokenStream.h @@ +176,5 @@ > return tt == TOK_VAR; > #endif > } > > +typedef uint32_t TokenPtr; Can we just replace this with uint32_t everywhere? That'll be less confusing to people reading the code later. (Especially anyone who doesn't see this patch go in and would assume that TokenPtr is still a struct.) The things called "tptr" and such could just be called "offset". @@ +622,5 @@ > */ > bool checkForKeyword(const jschar *s, size_t length, TokenKind *ttp, JSOp *topp); > > + // This class maps a userbuf index (which is 0-indexed) to a line number > + // (which is 1-indexed) and a column index (which is 0-indexed). Great comment.
Attachment #723735 -
Flags: review?(jorendorff) → review+
Assignee | ||
Comment 17•11 years ago
|
||
> Hmm. You changed a bunch of places that take TokenPos arguments from const > references to values, but not others (including this one). What was the rule? Hmm... I didn't mean to change those. I must have mixed them up with TokenPtr arguments. I'll revert the changes. > Is this reserve() call necessary? Yep. If you call infallibleAppend() without a prior reserve() you get assertion failures. > > + if (lineStartPtrs_[lastLineIndex_] <= tptr) { > > + // If we reach here, tptr is on a line the same as or higher than last > > + // time. Check first for the +0, +1, +2 cases, because they typically > > + // cover 85--98% of cases. > > Wow. This really improves performance? Huh. Not by a huge amount, but it does -- I just remeasured several configurations, and checking for +0, +1, +2 was the best. > @@ +622,5 @@ > > /* > > * Given a token, T, that we want to complain about: if T's (starting) > > * lineno doesn't match TokenStream's lineno, that means we've scanned past > > * the line that T starts on, which makes it hard to print some or all of > > * T's (starting) line for context. > > Um. Is it still hard? Probably not. Want to file a follow-up? > Can we just replace this with uint32_t everywhere? That'll be less confusing > to people reading the code later. (Especially anyone who doesn't see this > patch go in and would assume that TokenPtr is still a struct.) > > The things called "tptr" and such could just be called "offset". Ok.
Assignee | ||
Comment 18•11 years ago
|
||
Landed: https://hg.mozilla.org/integration/mozilla-inbound/rev/d1b71de5bbc1 jorendorff, in bug 846406 you modified peekTokenSameLine() in a way that made things difficult for this patch. I changed it back to what it was previously. The tests still pass (https://tbpl.mozilla.org/?tree=Try&rev=fcf77bb5b425) so I figured it was ok.
Assignee | ||
Updated•11 years ago
|
Whiteboard: [MemShrink]
Comment 19•11 years ago
|
||
(In reply to Nicholas Nethercote [:njn] from comment #18) > Landed: > https://hg.mozilla.org/integration/mozilla-inbound/rev/d1b71de5bbc1 Backed out for Windows bustage. https://hg.mozilla.org/integration/mozilla-inbound/rev/cad5306d569e https://tbpl.mozilla.org/php/getParsedLog.php?id=20839159&tree=Mozilla-Inbound TokenStream.obj : error LNK2005: "private: static unsigned int const js::frontend::TokenStream::SourceCoords::MAX_PTR" (?MAX_PTR@SourceCoords@TokenStream@frontend@js@@0IB) already defined in jsapi.obj
Assignee | ||
Updated•11 years ago
|
Whiteboard: [MemShrink] → [MemShrink:P2]
Assignee | ||
Comment 20•11 years ago
|
||
Second attempt. My build-on-all-platforms try run was green. https://hg.mozilla.org/integration/mozilla-inbound/rev/1f3587e02361
Comment 21•11 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/1f3587e02361
Status: REOPENED → RESOLVED
Closed: 12 years ago → 11 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla22
Updated•11 years ago
|
Depends on: CVE-2013-5619
You need to log in
before you can comment on or make changes to this bug.
Description
•