Closed
Bug 591836
Opened 14 years ago
Closed 14 years ago
JM: Make FrameState work for large, linear scripts
Categories
(Core :: JavaScript Engine, defect)
Tracking
()
RESOLVED
FIXED
Tracking | Status | |
---|---|---|
blocking2.0 | --- | beta7+ |
People
(Reporter: dvander, Assigned: dvander)
References
Details
(Whiteboard: [ETA: 9/24] fixed-in-tracemonkey[suspect-regress-ts_cold_generated_med][suspect-regress-dromaeo_css][suspect-regress-dromaeo_jslib])
Attachments
(10 files)
8.71 KB,
patch
|
dmandelin
:
review+
|
Details | Diff | Splinter Review |
28.98 KB,
patch
|
dmandelin
:
review+
|
Details | Diff | Splinter Review |
4.97 KB,
patch
|
dmandelin
:
review+
|
Details | Diff | Splinter Review |
3.33 KB,
patch
|
dmandelin
:
review+
|
Details | Diff | Splinter Review |
10.51 KB,
patch
|
dmandelin
:
review+
|
Details | Diff | Splinter Review |
2.16 KB,
patch
|
dmandelin
:
review+
|
Details | Diff | Splinter Review |
13.81 KB,
patch
|
dmandelin
:
review+
|
Details | Diff | Splinter Review |
4.27 KB,
patch
|
dmandelin
:
review+
|
Details | Diff | Splinter Review |
6.36 KB,
application/javascript
|
Details | |
2.31 KB,
patch
|
dmandelin
:
review+
|
Details | Diff | Splinter Review |
The test case in bug 591631 reveals that FrameState can exhibit quadratic behavior with large, linear scripts. This was originally required to safely sync for GC points. Now that we have conservative GC, and precise stack depth syncing at ops, we can make this algorithm much simpler.
Updated•14 years ago
|
blocking2.0: --- → beta6+
Assignee | ||
Comment 1•14 years ago
|
||
This patch removes the "base" array in FrameState. "base" was a vector of FrameEntry pointers, and was used to tell whether a FrameEntry was tracked. Now, the "tracked" bit is stored directly in the FrameEntry. The purpose of this patch is to ween FrameState off the tracker. At this patch (baseline): interations: 100; total dur: 10073ms; per item: 100.73ms
Attachment #470369 -
Flags: review?(dmandelin)
Assignee | ||
Comment 2•14 years ago
|
||
This patch changes the register pinning invariant such that RegisterState::save is guaranteed NULL if the register is not pinned. The purpose of this patch is to make it possible to walk FrameState::regstate and correctly determine register assignment. A side effect is that all uses of pinReg must be matched by either unpinReg, or unpinKilledReg. It's no longer acceptable to rely on sync-and-kill implicitly unpinning registers. To make some of this clearer there are also some function renamings.
Attachment #470371 -
Flags: review?(dmandelin)
Assignee | ||
Comment 3•14 years ago
|
||
Get rid of a function that's not needed after part 1.
Attachment #470372 -
Flags: review?(dmandelin)
Assignee | ||
Comment 4•14 years ago
|
||
Right now we're guaranteed that by walking the tracker backwards, we observe every copy before its backing store. This patch guarantees the same invariant if we want to walk the stack backwards.
Attachment #470373 -
Flags: review?(dmandelin)
Assignee | ||
Comment 5•14 years ago
|
||
This patch rewrites the constant-sync algorithm, and ImmutableSync, in two ways: * First, all registers are synced up-front. This removes some complexity, and guarantees that we never have to spill. * Next, only |Uses| is traversed, instead of the entire tracker. This works because registers have already been synced. interations: 100; total dur: 2466ms; per item: 24.66ms
Attachment #470374 -
Flags: review?(dmandelin)
Assignee | ||
Comment 6•14 years ago
|
||
This patch changes FrameState::merge() to only look at the register state, rather than the tracker. interations: 100; total dur: 961ms; per item: 9.61ms
Attachment #470375 -
Flags: review?(dmandelin)
Assignee | ||
Comment 7•14 years ago
|
||
This patch tries to decide whether it's more advantageous to use the tracker to find copies, or the stack. The tracker is better when there are many slots, but few items tracked (lots of control flow). The stack is better when the code is very linear, since the tracker must be swept twice: Once to find the best candidate, and another time to fix up references. Unfortunately this patch doesn't help a whole lot because we create lots of copies for arithmetic, which sets the "isCopied" bit, even though a pop is imminent. That means we perform a very wide search for copies and come up with nothing. We could improve this by precisely tracking how many times each FE is copied. The bookkeeping overhead may be worth it. In this case, it could save ~130ms. interations: 100; total dur: 894ms; per item: 8.94ms
Attachment #470377 -
Flags: review?(dmandelin)
Assignee | ||
Comment 8•14 years ago
|
||
This patch makes syncAndKill() simpler and faster: * Registers are synced up-front. * Only stack uses are scanned. * Any live registers are re-scanned at the end and evicted. This is cheap because they are guaranteed to have been synced. interations: 100; total dur: 843ms; per item: 8.43ms
Attachment #470378 -
Flags: review?(dmandelin)
Comment 9•14 years ago
|
||
Out of curiousity... how does interp do on the same hardware for the relevant testcase?
Updated•14 years ago
|
Attachment #470369 -
Flags: review?(dmandelin) → review+
Comment 10•14 years ago
|
||
Comment on attachment 470371 [details] [diff] [review] part 2: new register pinning invariants Looks good. Pinning seems much improved. This comment has a lot of redundancy: >+ /* >+ * Some RegisterState invariants. >+ * >+ * If |fe| is non-NULL, |save| is NULL. >+ * If |save| is non-NULL, |fe| is NULL. >+ * That is, both |fe| and |save| cannot be non-NULL. >+ * >+ * If either |fe| or |save| is non-NULL, the register is not in freeRegs. >+ * If both |fe| and |save| are NULL, the register is either in freeRegs, >+ * or owned by the compiler. >+ */ > struct RegisterState { The first 3 are all equivalent. I'd pick just one, probably the last. The last section is good. Except you might want to gloss on "owned by compiler". I think this expands to "the compiler code for the current op is managing the register out of the FrameState" or something like that. Otherwise, r+.
Attachment #470371 -
Flags: review?(dmandelin) → review+
Updated•14 years ago
|
Attachment #470372 -
Flags: review?(dmandelin) → review+
Updated•14 years ago
|
Attachment #470373 -
Flags: review?(dmandelin) → review+
Updated•14 years ago
|
Attachment #470374 -
Flags: review?(dmandelin) → review+
Updated•14 years ago
|
Attachment #470375 -
Flags: review?(dmandelin) → review+
Updated•14 years ago
|
Attachment #470377 -
Flags: review?(dmandelin) → review+
Updated•14 years ago
|
Attachment #470378 -
Flags: review?(dmandelin) → review+
Comment 11•14 years ago
|
||
(In reply to comment #7) > Unfortunately this patch doesn't help a whole lot because we create lots of > copies for arithmetic, which sets the "isCopied" bit, even though a pop is > imminent. That means we perform a very wide search for copies and come up with > nothing. I was idly taking a look at the compiler, and there are several places (including very common ops) where the following code pattern is used: > /* Perform a swap. */ > frame.dup2(); > frame.shift(-3); > frame.shift(-1); Would it be worth introducing a function frame.swap() which swaps the top two entries without unnecessary copying?
Comment 12•14 years ago
|
||
(In reply to comment #11) > (In reply to comment #7) > > /* Perform a swap. */ > > frame.dup2(); > > frame.shift(-3); > > frame.shift(-1); > > Would it be worth introducing a function frame.swap() which swaps the top two > entries without unnecessary copying? It probably doesn't matter too much, because the copying should only occur at compile time--the register allocator should be able to avoid generating actual copies. dvander, is that right, or is there a chance this could create more memory moves? Adding a swap() function to FrameState would make the API a little nicer, though.
Assignee | ||
Comment 13•14 years ago
|
||
(In reply to comment #12) Yes, compile-time only, but comment #11 might have been concerned about the compile-time cost of uncopy(). The copy created by the swap will live at the end of the stack, so the uncopy() operation, if it needs to happen, should be very fast. But we should verify this. The big problem is code like: GETLOCAL 2 GETLOCAL 4000 ADD SETLOCAL 2 The GETLOCAL 2 creates a copy on the stack, and the distance between slot 2 and the copy is guaranteed to be at least (4000 - 2 + 1). The ADD pops the copy, but slot 2 is still marked as being copied. When SETLOCAL 2 comes into play, uncopy() has to sweep for any copies, but there are none.
Comment 14•14 years ago
|
||
Yes, I was worried about compile-time cost. I was having trouble convincing myself that the uncopy() wouldn't be expensive if there were lots of variables around for the frame to track. Oddly enough, when I try to benchmark it, it goes the opposite direction: compilation time for my "lots of variables" version is about 1/3 of that for a similar test without lots of variables. Hmmm... Test case 1: lots of variables: d1=(new Date).getTime(); a="da=(new Date).getTime();"; for (i=0;i<100000;i++){a+="v"+i+"=0;";} for (i=0;i<100000;i++){a+="v0+=v"+i+";"} d2=(new Date).getTime(); eval(a); d3=da; d4=(new Date).getTime(); eval(a); d5=da; d6=(new Date).getTime(); print("Generate:",d2-d1,"Compile:",d3-d2,"(",d5-d4,") Execute:",d4-d3,"(",d6-d5,")"); Test case 2: not so many variables: d1=(new Date).getTime(); a="da=(new Date).getTime();"; for (i=0;i<100000;i++){a+="v0=0;";} for (i=0;i<100000;i++){a+="v0+=v0;"} d2=(new Date).getTime(); eval(a); d3=da; d4=(new Date).getTime(); eval(a); d5=da; d6=(new Date).getTime(); print("Generate:",d2-d1,"Compile:",d3-d2,"(",d5-d4,") Execute:",d4-d3,"(",d6-d5,")"); Each test case builds a big string and then calls eval() on it twice, and reports timing results. For the first "lots of variables" version, my aged laptop gets: /c/mozilla-build/tracemonkey-802d34381fe4/js/src/build_opt $ js -m twftest3a.js Generate: 209 Compile: 1138 ( 1086 ) Execute: 199 ( 165 ) For the second "not so many variables" version, I get: /c/mozilla-build/tracemonkey-802d34381fe4/js/src/build_opt $ js -m twftest4a.js Generate: 36 Compile: 2858 ( 3153 ) Execute: 133 ( 133 )
Assignee | ||
Comment 15•14 years ago
|
||
The first script is so long it doesn't JIT compile. We don't yet support some of the more obscure opcodes that occur in extremely long scripts (in this case, JSOP_INDEXBASE). Neither test uses the "var" keyword, so the variables will not be tracked as local. Instead they'll be name lookups. Wrapping the code inside a function and using the var keyword should let you test uncopy() performance. Keep in mind that the number of local variables is capped at 65,535 or something, and that uncopy() is more likely to be slow if it had an early declaration and late use.
Comment 16•14 years ago
|
||
Thanks! Update testcases which maybe look at the right thing: 1. Creates lots of local variables, then repeatedly uses the earliest-declared of these. d1=(new Date).getTime(); a="da=(new Date).getTime(); function f() {"; for (i=0;i<10000;i++){a+="var v"+i+"=0;";} for (i=0;i<10000;i++){a+="v0+=v"+i+";"} a+="} f();"; d2=(new Date).getTime(); eval(a); d3=da; d4=(new Date).getTime(); eval(a); d5=da; d6=(new Date).getTime(); print("Generate:",d2-d1,"Compile:",d3-d2,"(",d5-d4,") Execute:",d4-d3,"(",d6-d5,")"); vs. 2. similar version using only one variable: d1=(new Date).getTime(); a="da=(new Date).getTime(); function f() {"; for (i=0;i<10000;i++){a+="var v0=0;";} for (i=0;i<10000;i++){a+="v0+=v0;"} a+="} f();"; d2=(new Date).getTime(); eval(a); d3=da; d4=(new Date).getTime(); eval(a); d5=da; d6=(new Date).getTime(); print("Generate:",d2-d1,"Compile:",d3-d2,"(",d5-d4,") Execute:",d4-d3,"(",d6-d5,")"); Results: 1. js -m twftest3b.js Generate: 23 Compile: 364 ( 331 ) Execute: 29 ( 29 ) 2. js -m twftest4b.js Generate: 2 Compile: 59 ( 59 ) Execute: 22 ( 23 ) This time the result is more as expected, at least in terms of more variables being more expensive.
Comment 17•14 years ago
|
||
Playing with this kind of testcase further, it doesn't look like there is anything pathological -- the expense seems to be just the overhead for adding those variables. So sorry for the distraction...
Comment 18•14 years ago
|
||
For what its worth, the "lots of local variables" testcase I was playing with spends most of its time in the parser -- it triggers quadratic-in-the-number-of-local-variables behavior via a linear lookup for adding new variables. The stack trace to the hot routine is like:
js.exe!js::Shape::search(js::Shape * * startp=0x01507a98, jsid id={...}, bool adding=false) Line 788 + 0x7 bytes C++
js.exe!JSFunction::lookupLocal(JSContext * cx=0x01271a18, JSAtom * atom=0x01f8d6e0, unsigned int * indexp=0x00000000) Line 3236 + 0x21 bytes C++
> js.exe!BindVarOrConst(JSContext * cx=0x01271a18, BindData * data=0x0013f0b4, JSAtom * atom=0x01f8d6e0, JSTreeContext * tc=0x0013f2c8) Line 3668 + 0x15 bytes C++
js.exe!js::Parser::variables(bool inLetHead=false) Line 5992 + 0x1c bytes C++
js.exe!js::Parser::statement() Line 5578 + 0xd bytes C++
js.exe!js::Parser::statements() Line 3189 + 0x8 bytes C++
js.exe!js::Parser::functionBody() Line 1309 + 0x8 bytes C++
js.exe!js::Parser::functionDef(JSAtom * funAtom=0x007f9e50, js::Parser::FunctionType type=GENERAL, unsigned int lambda=0) Line 2933 + 0xb bytes C++
js.exe!js::Parser::functionStmt() Line 3110 C++
js.exe!js::Parser::statement() Line 4732 + 0xb bytes C++
js.exe!js::Compiler::compileScript(JSContext * cx=0x01271a18, JSObject * scopeChain=0x015021f8, JSStackFrame * callerFrame=0x00e30108, JSPrincipals * principals=0x00000000, unsigned long tcflags=33693696, const wchar_t * chars=0x02b00040, unsigned int length=573934, _iobuf * file=0x00000000, const char * filename=0x01280471, unsigned int lineno=8, JSString * source=0x01f3cf00, unsigned int staticLevel=2) Line 864 + 0x8 bytes C++
js.exe!obj_eval(JSContext * cx=0x01271a18, unsigned int argc=1, js::Value * vp=0x00e30160) Line 1242 + 0x33 bytes C++
Comment 19•14 years ago
|
||
(In reply to comment #18) > For what its worth, the "lots of local variables" testcase I was playing with > spends most of its time in the parser -- it triggers > quadratic-in-the-number-of-local-variables behavior via a linear lookup for > adding new variables. The stack trace to the hot routine is like: > > js.exe!js::Shape::search(js::Shape * * startp=0x01507a98, jsid id={...}, > bool adding=false) Line 788 + 0x7 bytes C++ Whoops -- need some maybeHash love there, but it's not on the path specialized for JSFunction::addLocal. Filed bug 595918 on this, must fix. /be
Assignee | ||
Comment 20•14 years ago
|
||
This test case also exhibits pathological behaviour (10 seconds to compile), because the compile-time algorithms deoptimize inside |try| blocks. Our method of handling |try| is to sync the entire frame on every slow exit, so when jumping to the exception handler, live variables are readable. That's crap both for test cases like this, and in general it causes code explosion. An alternate solution is to sync immediately when setting local variables/args inside try blocks. It'll slightly slow down code inside |try|. v8 seems to do something similar, at a slightly greater expense (it seems to require a load before the store - maybe storing to a scope object).
Assignee | ||
Comment 21•14 years ago
|
||
Amusingly, with this patch, we run the |try| test case 1% faster than the non-|try| version. v8 is 20% slower on the |try| version. (Either way, we're 2X slower than v8.)
Attachment #474935 -
Flags: review?(dmandelin)
Updated•14 years ago
|
Attachment #474935 -
Flags: review?(dmandelin) → review+
Updated•14 years ago
|
Whiteboard: [ETA: 9/24]
Assignee | ||
Comment 22•14 years ago
|
||
http://hg.mozilla.org/tracemonkey/rev/1af8f0c895bc
Whiteboard: [ETA: 9/24] → [ETA: 9/24] fixed-in-tracemonkey
Assignee | ||
Comment 23•14 years ago
|
||
http://hg.mozilla.org/tracemonkey/rev/c6000ca7e7fd Backed out from PGO bustage.
Whiteboard: [ETA: 9/24] fixed-in-tracemonkey → [ETA: 9/24]
Switching to --enable-shared-js let PGO complete for me on try, so I recommend that as a way to land this hotly-desired patch!
Comment 25•14 years ago
|
||
http://hg.mozilla.org/tracemonkey/rev/1bbc0fc10747
Whiteboard: [ETA: 9/24] → [ETA: 9/24] fixed-in-tracemonkey
Comment 26•14 years ago
|
||
http://hg.mozilla.org/mozilla-central/rev/0ff917ab21e4
Status: ASSIGNED → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Comment 27•14 years ago
|
||
Extrapolating from mass-orange on the tinderbox and from IRC remarks, this probably caused mozjs.dll not to get properly packaged with Windows builds.
Comment 28•14 years ago
|
||
(In reply to comment #27) > Extrapolating from mass-orange on the tinderbox and from IRC remarks, this > probably caused mozjs.dll not to get properly packaged with Windows builds. Bug 600493.
Depends on: 600493
Updated•14 years ago
|
Whiteboard: [ETA: 9/24] fixed-in-tracemonkey → [ETA: 9/24] fixed-in-tracemonkey[suspect-regress-ts_cold_generated_med]
Comment 29•14 years ago
|
||
A changeset from this bug was associated with a XP Ts, Cold MED Dirty Profile regression on Firefox. boo-urns :( Previous: avg 420.356 stddev 3.679 of 30 runs up to a9d1ad0bc386 New : avg 430.832 stddev 1.905 of 5 runs since a60414d076b5 Change : +10.475 (2.49% / z=2.847) Graph : http://mzl.la/bZFaB3 The regression occurred from changesets in the following range: http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=a9d1ad0bc386&tochange=a60414d076b5 The tag [suspect-regress-ts_cold_generated_med] has been added to the status whiteboard; please remove it only once you have confirmed this bug is not the cause of the regression.
Updated•14 years ago
|
Whiteboard: [ETA: 9/24] fixed-in-tracemonkey[suspect-regress-ts_cold_generated_med] → [ETA: 9/24] fixed-in-tracemonkey[suspect-regress-ts_cold_generated_med][suspect-regress-dromaeo_css]
Comment 30•14 years ago
|
||
A changeset from this bug was associated with a Win7 Dromaeo (CSS) regression on Firefox. boo-urns :( Previous: avg 2014.419 stddev 40.480 of 30 runs up to a9d1ad0bc386 New : avg 1901.610 stddev 12.432 of 5 runs since a60414d076b5 Change : -112.809 (-5.6% / z=2.787) Graph : http://mzl.la/9gFu4n The regression occurred from changesets in the following range: http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=a9d1ad0bc386&tochange=a60414d076b5 The tag [suspect-regress-dromaeo_css] has been added to the status whiteboard; please remove it only once you have confirmed this bug is not the cause of the regression.
Updated•14 years ago
|
Whiteboard: [ETA: 9/24] fixed-in-tracemonkey[suspect-regress-ts_cold_generated_med][suspect-regress-dromaeo_css] → [ETA: 9/24] fixed-in-tracemonkey[suspect-regress-ts_cold_generated_med][suspect-regress-dromaeo_css][suspect-regress-dromaeo_jslib]
Comment 31•14 years ago
|
||
A changeset from this bug was associated with a Win7 Dromaeo (jslib) regression on Firefox. boo-urns :( Previous: avg 127.610 stddev 4.222 of 30 runs up to a9d1ad0bc386 New : avg 116.384 stddev 0.751 of 5 runs since a60414d076b5 Change : -11.226 (-8.8% / z=2.659) Graph : http://mzl.la/bZu5UP The regression occurred from changesets in the following range: http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=a9d1ad0bc386&tochange=a60414d076b5 The tag [suspect-regress-dromaeo_jslib] has been added to the status whiteboard; please remove it only once you have confirmed this bug is not the cause of the regression.
Comment 32•14 years ago
|
||
A changeset from this bug was associated with a XP Dromaeo (CSS) regression on Firefox. boo-urns :( Previous: avg 2045.275 stddev 49.676 of 30 runs up to a9d1ad0bc386 New : avg 1936.120 stddev 13.937 of 5 runs since a60414d076b5 Change : -109.155 (-5.34% / z=2.197) Graph : http://mzl.la/b0dlou The regression occurred from changesets in the following range: http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=a9d1ad0bc386&tochange=a60414d076b5 The tag [suspect-regress-dromaeo_css] has been added to the status whiteboard; please remove it only once you have confirmed this bug is not the cause of the regression.
Comment 33•14 years ago
|
||
A changeset from this bug was associated with a XP Dromaeo (jslib) regression on Firefox. boo-urns :( Previous: avg 129.703 stddev 4.099 of 30 runs up to a9d1ad0bc386 New : avg 117.954 stddev 0.660 of 5 runs since a60414d076b5 Change : -11.749 (-9.06% / z=2.866) Graph : http://mzl.la/avZij4 The regression occurred from changesets in the following range: http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=a9d1ad0bc386&tochange=a60414d076b5 The tag [suspect-regress-dromaeo_jslib] has been added to the status whiteboard; please remove it only once you have confirmed this bug is not the cause of the regression.
You need to log in
before you can comment on or make changes to this bug.
Description
•