Closed Bug 450833 Opened 16 years ago Closed 16 years ago

TM: Multiple trees per entry point

Categories

(Core :: JavaScript Engine, defect, P3)

x86
macOS
defect

Tracking

()

RESOLVED FIXED
mozilla1.9.1

People

(Reporter: dvander, Assigned: dvander)

References

Details

(Keywords: testcase)

Attachments

(1 file, 16 obsolete files)

65.99 KB, patch
gal
: review+
Details | Diff | Splinter Review
Attached patch simple multiple trees (obsolete) — Splinter Review
Quick sketch just to test the concept. I tried to limit changes to nanojit by using a linked list in TreeInfo. Some problems so far: 1) Fail four NaN related cases in trace-tests.js -- this scares me, maybe a latent bug somewhere or maybe my code is wrong. 2) Deallocation isn't hooked up properly yet.
Blocks: landtm
Assignee: general → gal
Attachment #334058 - Attachment is obsolete: true
Status: NEW → ASSIGNED
Attachment #334567 - Attachment is patch: true
Attachment #334567 - Attachment mime type: application/octet-stream → text/plain
This test case calculates an incorrect result: function f(i) { for (var m = 0; m < 20; ++m) for (var n = 0; n < 100; n += i) ; return n; } print(f(1)); print(f(.5));
Attached patch New patch, still broken. (obsolete) — Splinter Review
Attachment #334567 - Attachment is obsolete: true
Priority: -- → P3
Target Milestone: --- → mozilla1.9.1
Blocks: 454353
Attached patch work in progress (obsolete) — Splinter Review
WIP. Major diversions from the original patch: 1. ValueToNative no longer attempts to coerce void/undefined to non-boolean types 2. Type unstable loops (usually undefines from hoisted values) now record a straight-line trace that connects to a type stable loop.
Assignee: gal → danderson
Attachment #334059 - Attachment is obsolete: true
Attachment #334599 - Attachment is obsolete: true
Remaining issues: there are some regressions in SunSpider that need to be analyzed. There are a few cases where we have more exits because we build more traces out of the same code, I'd like to make sure that's out of necessity rather than a regression. Examples: string-validate-input has 5X as many exits. crypto-aes gives incorrect results.
Attached patch work in progress, v2 (obsolete) — Splinter Review
Big update, I split out the peer logic from js_ExecuteTree. It now executes only the fragment you give it, and it requires the fragment to have code and a matching type map. Callers of js_ExecuteTree must now call js_PrepareForExecute and js_FindEntryTree, which respectively check the global object/double pool and find a suitable peer fragment that matches in the incoming types. This meant moving a lot of stuff around, js_RecordLoopEdge/js_MonitorLoopEdge got a little cleaner. There's still some bugs lying around, string-validate-input is giving me problems but things are looking better.
Attachment #342128 - Attachment is obsolete: true
Attached patch work in progress, v3 (obsolete) — Splinter Review
Getting closer, bench.sh shows 80ms loss now. The three outliers: 3d-raytrace gains 2K exits and loses a lot of perf (2.3X -> 1.1X) access-fannkuch gains 50K exits and gains perf (1.65X -> 1.9X) string-tagcloud goes from 3K exits to 70, no perf change (correct.sh passes) Once I fully understand 3d-raytrace's problems I'll start cleaning this patch up for review. Hopefully there is something small I'm missing that can win the performance back.
Attachment #342174 - Attachment is obsolete: true
Attached patch work in progress, v4 (obsolete) — Splinter Review
Now that thin loops has landed, I've adjusted the multi-trees patch a bit. 1) If there is no type-stable peer for a type-unstable trace, we now compile a straight-line (LIR_x) trace anyway. This requires a little hack to work with thin loops. 2) If we compile a type-stable loop, we check for type-unstable peers. If we find any, we join them to the type-stable loop. Unfortunately this is hurting perf on raytrace, and now fannkuch as well. I'm looking into this now as the ideas behind the patch are mostly stable.
Attachment #342338 - Attachment is obsolete: true
Attached patch work in progress, v5 (obsolete) — Splinter Review
I lied - another big change, I removed the Oracle for deducing stack slot instability. It now tries to infer demotions from the typemaps of peers instead, by looking for peers with demoted entries or demoted exits. Nested tree calls explicitly record a demoted tree, so the original trace doesn't need to be trashed. correct.sh passes and crypto-aes no longer chokes. I'm still losing 60ms here and there, mostly from 3d-raytrace which I'm looking into.
Attachment #342524 - Attachment is obsolete: true
Attached patch proposed patch, v1 (obsolete) — Splinter Review
Ready for review. I made some final changes, I re-introduced the Oracle for two scenarios: - Global slot demotions - Stack slot demotions on inner trees that we want to remember past GCs I also decided to stop build int->double unstable traces. If we record these on non-thin loop traces, we instead immediately start re-recording with demotions. This saves us from spending time compiling a useless trace. 3d-raytrace is still not great on perf, about 1.05X on my mac and 1.3X on my linux box. This down from 2X, about 60ms loss on my linux box. 80% of its side exits are MISMATCH_EXITs. As for the rest of the tests, some gain perf and others lose marginal perf. fannkuch gains a lot (1X to 2X), the rest are +/- ~.1-.2X. The exceptions are crypto-sha1 and access-nbody, which lose about 3ms and 5ms respectively (6.1X -> 5.5X, 5.1X -> 4.8X). The main reason for lost perf is that loop hoisted undefined variables are poison to nested loops -- access-nobody and crypto-sha1 suffer from having to stabilize inner loops before finally compiling an outer tree. The heuristics for finally deciding when to blacklist an outer tree are important. If tweaked some tests get slower (fannkuch) and others get faster (sha1). It feels like these should be studied more in depth sometime. Blacklisting an outer tree can be deadly (especially ror fannkuch), and we should account for that more precisely. In the meantime this patch is pretty much at unity with the current tip, minus raytrace which was wrong from the start. Including raytrace it loses about 60ms in bench.sh for me.
Attachment #343469 - Attachment is obsolete: true
Attachment #343690 - Flags: review?(gal)
Great work. I will post detailed review comments tomorrow. I am pretty confident my patch will allow us to make MISMATCH_EXIT => BRANCH_EXIT and with that we should gain a lot in raytrace.
Depends on: 461076
Attached patch new heuristics (obsolete) — Splinter Review
WIP, needs to be cleaned up and it does not work 100% yet -- does a better job at joining unstable fragments together, cuts down aborts+started trees on raytrace by a decent factor.
Attachment #343690 - Attachment is obsolete: true
Attachment #343690 - Flags: review?(gal)
Attached patch dont use (obsolete) — Splinter Review
patch for andreas to shark, modified raytrace for testing
Attachment #344415 - Attachment is obsolete: true
Attached patch proposed patch, v2 (obsolete) — Splinter Review
Difference between tip and mtrees is about 80ms right now. This patch fixes a lot of bugs in the trace joining heuristics. raytrace is about a 60ms loss, which I'm discounting from incorrectness. That leaves ~20ms perf loss, which comes from: math-cordic 6ms date-format-xparb 10ms date-format-tofte 10ms crypto-aes 7ms I cannot see why crypto-aes gets slower yet (JIT stats are the same). math-cordic takes a little longer to stabilize (1 aborted, 6 exits versus 0/4 on tip). date-format-* both have very thin loops which when compiled don't gain us anything, entering and leaving the JIT is more expensive than staying in the interpreter. mtrees makes us execute these loops more than we used to. possibly, heuristics need to change there.
Attachment #344435 - Attachment is obsolete: true
Attachment #344542 - Flags: review?(gal)
Comment on attachment 344542 [details] [diff] [review] proposed patch, v2 - I would make dupEntryMaps a hard assert. - We should re-visit our blacklisting and loop-trigger heuristics in a separate bug. - Could you add some macrology for the demote_slot lengths in offset 0 hack? Something like ADD_SLOT_TO_LIST or something. Can't we use Queue or some of our existing data structures? - Can the list of tails/unstable exits be a Queue?
Attachment #344542 - Flags: review?(gal) → review+
Attached patch proposed patch, v3 (obsolete) — Splinter Review
1. changes dupEntryMaps to a hard assert 2. macroizes usage of the demote array - i prefer stack allocations here but it's not a big deal to use Queue either 3. our Queue doesn't have a remove - avmplus::List does (though it's really a vector). seemed easier to just use a manual linked list for now, might be worth changing if we have nanojit keep more information about loop edges
Attachment #344542 - Attachment is obsolete: true
Attached patch updated v3 patch (obsolete) — Splinter Review
patches against tip again
Attachment #344550 - Attachment is obsolete: true
unchanged from previous patch, rebased against tip bug in mandlebrot was fixed on tip earlier
Attachment #344711 - Attachment is obsolete: true
Attachment #345042 - Flags: review?(gal)
Attachment #345042 - Flags: review?(gal) → review+
Pushed: http://hg.mozilla.org/tracemonkey/rev/74dd5792d4de Will use new bugs for follow-ups.
Status: ASSIGNED → RESOLVED
Closed: 16 years ago
Resolution: --- → FIXED
Depends on: 461945
Depends on: 462504
Keywords: testcase
Flags: in-testsuite+
Flags: in-litmus-
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: