Closed
Bug 450833
Opened 16 years ago
Closed 16 years ago
TM: Multiple trees per entry point
Categories
(Core :: JavaScript Engine, defect, P3)
Tracking
()
RESOLVED
FIXED
mozilla1.9.1
People
(Reporter: dvander, Assigned: dvander)
References
Details
(Keywords: testcase)
Attachments
(1 file, 16 obsolete files)
65.99 KB,
patch
|
gal
:
review+
|
Details | Diff | Splinter Review |
Quick sketch just to test the concept. I tried to limit changes to nanojit by using a linked list in TreeInfo. Some problems so far:
1) Fail four NaN related cases in trace-tests.js -- this scares me, maybe a latent bug somewhere or maybe my code is wrong.
2) Deallocation isn't hooked up properly yet.
Assignee | ||
Comment 1•16 years ago
|
||
Comment 2•16 years ago
|
||
Updated•16 years ago
|
Attachment #334567 -
Attachment is patch: true
Attachment #334567 -
Attachment mime type: application/octet-stream → text/plain
Comment 3•16 years ago
|
||
This test case calculates an incorrect result:
function f(i) {
for (var m = 0; m < 20; ++m)
for (var n = 0; n < 100; n += i)
;
return n;
}
print(f(1));
print(f(.5));
Comment 4•16 years ago
|
||
Attachment #334567 -
Attachment is obsolete: true
Updated•16 years ago
|
Priority: -- → P3
Target Milestone: --- → mozilla1.9.1
Assignee | ||
Comment 5•16 years ago
|
||
WIP. Major diversions from the original patch:
1. ValueToNative no longer attempts to coerce void/undefined to non-boolean types
2. Type unstable loops (usually undefines from hoisted values) now record a straight-line trace that connects to a type stable loop.
Assignee: gal → danderson
Attachment #334059 -
Attachment is obsolete: true
Attachment #334599 -
Attachment is obsolete: true
Assignee | ||
Comment 6•16 years ago
|
||
Remaining issues: there are some regressions in SunSpider that need to be analyzed. There are a few cases where we have more exits because we build more traces out of the same code, I'd like to make sure that's out of necessity rather than a regression.
Examples:
string-validate-input has 5X as many exits.
crypto-aes gives incorrect results.
Assignee | ||
Comment 7•16 years ago
|
||
Big update, I split out the peer logic from js_ExecuteTree. It now executes only the fragment you give it, and it requires the fragment to have code and a matching type map.
Callers of js_ExecuteTree must now call js_PrepareForExecute and js_FindEntryTree, which respectively check the global object/double pool and find a suitable peer fragment that matches in the incoming types.
This meant moving a lot of stuff around, js_RecordLoopEdge/js_MonitorLoopEdge got a little cleaner. There's still some bugs lying around, string-validate-input is giving me problems but things are looking better.
Attachment #342128 -
Attachment is obsolete: true
Assignee | ||
Comment 8•16 years ago
|
||
Getting closer, bench.sh shows 80ms loss now. The three outliers:
3d-raytrace gains 2K exits and loses a lot of perf (2.3X -> 1.1X)
access-fannkuch gains 50K exits and gains perf (1.65X -> 1.9X)
string-tagcloud goes from 3K exits to 70, no perf change
(correct.sh passes)
Once I fully understand 3d-raytrace's problems I'll start cleaning this patch up for review. Hopefully there is something small I'm missing that can win the performance back.
Attachment #342174 -
Attachment is obsolete: true
Assignee | ||
Comment 9•16 years ago
|
||
Now that thin loops has landed, I've adjusted the multi-trees patch a bit.
1) If there is no type-stable peer for a type-unstable trace, we now compile a straight-line (LIR_x) trace anyway. This requires a little hack to work with thin loops.
2) If we compile a type-stable loop, we check for type-unstable peers. If we find any, we join them to the type-stable loop.
Unfortunately this is hurting perf on raytrace, and now fannkuch as well. I'm looking into this now as the ideas behind the patch are mostly stable.
Attachment #342338 -
Attachment is obsolete: true
Assignee | ||
Comment 10•16 years ago
|
||
I lied - another big change, I removed the Oracle for deducing stack slot instability. It now tries to infer demotions from the typemaps of peers instead, by looking for peers with demoted entries or demoted exits. Nested tree calls explicitly record a demoted tree, so the original trace doesn't need to be trashed.
correct.sh passes and crypto-aes no longer chokes. I'm still losing 60ms here and there, mostly from 3d-raytrace which I'm looking into.
Attachment #342524 -
Attachment is obsolete: true
Assignee | ||
Comment 11•16 years ago
|
||
Ready for review. I made some final changes, I re-introduced the Oracle for two scenarios:
- Global slot demotions
- Stack slot demotions on inner trees that we want to remember past GCs
I also decided to stop build int->double unstable traces. If we record these on non-thin loop traces, we instead immediately start re-recording with demotions. This saves us from spending time compiling a useless trace.
3d-raytrace is still not great on perf, about 1.05X on my mac and 1.3X on my linux box. This down from 2X, about 60ms loss on my linux box. 80% of its side exits are MISMATCH_EXITs.
As for the rest of the tests, some gain perf and others lose marginal perf. fannkuch gains a lot (1X to 2X), the rest are +/- ~.1-.2X. The exceptions are crypto-sha1 and access-nbody, which lose about 3ms and 5ms respectively (6.1X -> 5.5X, 5.1X -> 4.8X).
The main reason for lost perf is that loop hoisted undefined variables are poison to nested loops -- access-nobody and crypto-sha1 suffer from having to stabilize inner loops before finally compiling an outer tree. The heuristics for finally deciding when to blacklist an outer tree are important. If tweaked some tests get slower (fannkuch) and others get faster (sha1). It feels like these should be studied more in depth sometime. Blacklisting an outer tree can be deadly (especially ror fannkuch), and we should account for that more precisely.
In the meantime this patch is pretty much at unity with the current tip, minus raytrace which was wrong from the start. Including raytrace it loses about 60ms in bench.sh for me.
Attachment #343469 -
Attachment is obsolete: true
Attachment #343690 -
Flags: review?(gal)
Comment 12•16 years ago
|
||
Great work. I will post detailed review comments tomorrow. I am pretty confident my patch will allow us to make MISMATCH_EXIT => BRANCH_EXIT and with that we should gain a lot in raytrace.
Assignee | ||
Comment 13•16 years ago
|
||
WIP, needs to be cleaned up and it does not work 100% yet -- does a better job at joining unstable fragments together, cuts down aborts+started trees on raytrace by a decent factor.
Attachment #343690 -
Attachment is obsolete: true
Attachment #343690 -
Flags: review?(gal)
Assignee | ||
Comment 14•16 years ago
|
||
patch for andreas to shark, modified raytrace for testing
Attachment #344415 -
Attachment is obsolete: true
Assignee | ||
Comment 15•16 years ago
|
||
Difference between tip and mtrees is about 80ms right now. This patch fixes a lot of bugs in the trace joining heuristics.
raytrace is about a 60ms loss, which I'm discounting from incorrectness. That leaves ~20ms perf loss, which comes from:
math-cordic 6ms
date-format-xparb 10ms
date-format-tofte 10ms
crypto-aes 7ms
I cannot see why crypto-aes gets slower yet (JIT stats are the same). math-cordic takes a little longer to stabilize (1 aborted, 6 exits versus 0/4 on tip).
date-format-* both have very thin loops which when compiled don't gain us anything, entering and leaving the JIT is more expensive than staying in the interpreter. mtrees makes us execute these loops more than we used to. possibly, heuristics need to change there.
Attachment #344435 -
Attachment is obsolete: true
Attachment #344542 -
Flags: review?(gal)
Comment 16•16 years ago
|
||
Comment on attachment 344542 [details] [diff] [review]
proposed patch, v2
- I would make dupEntryMaps a hard assert.
- We should re-visit our blacklisting and loop-trigger heuristics in a separate bug.
- Could you add some macrology for the demote_slot lengths in offset 0 hack? Something like ADD_SLOT_TO_LIST or something. Can't we use Queue or some of our existing data structures?
- Can the list of tails/unstable exits be a Queue?
Attachment #344542 -
Flags: review?(gal) → review+
Assignee | ||
Comment 17•16 years ago
|
||
1. changes dupEntryMaps to a hard assert
2. macroizes usage of the demote array - i prefer stack allocations here but it's not a big deal to use Queue either
3. our Queue doesn't have a remove - avmplus::List does (though it's really a vector). seemed easier to just use a manual linked list for now, might be worth changing if we have nanojit keep more information about loop edges
Attachment #344542 -
Attachment is obsolete: true
Assignee | ||
Comment 18•16 years ago
|
||
patches against tip again
Attachment #344550 -
Attachment is obsolete: true
Comment 19•16 years ago
|
||
Attachment #344703 -
Attachment is obsolete: true
Assignee | ||
Comment 20•16 years ago
|
||
unchanged from previous patch, rebased against tip
bug in mandlebrot was fixed on tip earlier
Attachment #344711 -
Attachment is obsolete: true
Attachment #345042 -
Flags: review?(gal)
Updated•16 years ago
|
Attachment #345042 -
Flags: review?(gal) → review+
Assignee | ||
Comment 21•16 years ago
|
||
Pushed: http://hg.mozilla.org/tracemonkey/rev/74dd5792d4de
Will use new bugs for follow-ups.
Status: ASSIGNED → RESOLVED
Closed: 16 years ago
Resolution: --- → FIXED
Comment 22•16 years ago
|
||
test landed http://hg.mozilla.org/mozilla-central/rev/6a732748843a and cvs
Flags: in-testsuite+
Flags: in-litmus-
You need to log in
before you can comment on or make changes to this bug.
Description
•