Open Bug 824249 Opened 12 years ago Updated 2 years ago

IonMonkey: inline v8-richards run() call

Categories

(Core :: JavaScript Engine, defect)

Other Branch
x86
macOS
defect

Tracking

()

People

(Reporter: bhackett1024, Unassigned)

References

(Blocks 1 open bug)

Details

(Whiteboard: [leave open])

Attachments

(1 file, 1 obsolete file)

Attached patch partial patch (obsolete) — Splinter Review
The v8-richards benchmark is a task scheduler that loops over task blocks and calls the run() method on them. This run() call is a polymorphic dispatch about which we have perfect type information, but the call is currently not inlined and we do a CallGeneric which hurts performance tremendously. If I hack the code to allow inlining this call (js_new below), I get these scores (average of 6 runs): js_old: 10006 js_new: 11621 d8: 12919 So, fixing this closes more than half the gap with v8. There are three things preventing inlining of this call: - One of the inlined functions contains a loop (which doesn't actually run that much). JM+TI can't inline calls with loops because of its overspecialized loop analysis code. This restriction has carried over to Ion, but I don't see any reason why this would actually cause problems. Attached patch removes this restriction. - Inlining the call fails on some heuristics for inlined script length and use counts. Now that we compile off thread I don't think the complexity here is warranted, the attached patch makes simplifications and tweaks to allow this call to be inlined. - There is a type barrier on the call site (one of the targets, IdleTask, is only ever called with a null packet due to invariants in the program beyond TI's purview). This should be fixed by bug 796114, which has had a patch waiting for review the last two months :(
Attachment #695179 - Flags: review?(dvander)
Depends on: 768288
Blocks: 824257
Attached patch partial patchSplinter Review
Alas, the measurements from the previous patch were with the use count checks totally removed, and that patch gives scores a few hundred points less than was reported due to other call sites not inlining as much as they could. This patch relaxes usesBeforeInlining even further to get the reported scores. It might be worth killing this field entirely, as it is partially redundant with inlineUseCountRatio.
Attachment #695179 - Attachment is obsolete: true
Attachment #695179 - Flags: review?(dvander)
Attachment #695194 - Flags: review?(dvander)
Comment on attachment 695194 [details] [diff] [review] partial patch Review of attachment 695194 [details] [diff] [review]: ----------------------------------------------------------------- Cool, glad to see that restriction removed. re: bug 796114, we couldn't measure any performance improvement at the time, so we decided not to take it. If it will help now that other things are in place, I'll review it.
Attachment #695194 - Flags: review?(dvander) → review+
Please don't commit the "enable inlining loop" part. This will cause a huge regression on earley-boyer. Details are in bug 768288 . Also bug 768288 is about inlining the loops ... I will have a proper patch for that before the end of the week.
Hum, I think we want to keep these checks for mono-threaded CPUs, such as mobile phones targeted by b2g. Unless we don't want IonMonkey with B2G at-all ?!
Bug 823884 has landed. Earley-boyer shouldn't cause any problems now when inlining. Now for bug 768288, that bug will introduce one extra heuristic, to not inline functions that invalidate a lot (because the impact will be bigger as the caller will be invalidated too). Now because this bug is about removing these heuristics, I'm not sure, especially because we don't want to decrease performance also on single core computers. Can you make sure this isn't the case? In that case you can just go ahead an push this... because I think it is better than adding more heuristics, like in 768288
I locally profiled after a rebase. In summary: Richards gets better, nothing gets hurt. SM without patch: Richards: 17164 DeltaBlue: 18559 Crypto: 22930 RayTrace: 19163 EarleyBoyer: 21540 RegExp: 3458 Splay: 16810 NavierStokes: 27251 PdfJS: 10986 Mandreel: 17791 Gameboy: 15713 CodeLoad: 14926 Box2D: 13883 SM with patch: Richards: 18541 DeltaBlue: 18546 Crypto: 22580 RayTrace: 17355 EarleyBoyer: 22609 RegExp: 3546 Splay: 15879 NavierStokes: 27278 PdfJS: 10965 Mandreel: 17722 Gameboy: 15532 CodeLoad: 15056 Box2D: 12334 d8: Richards: 17312 DeltaBlue: 23174 Crypto: 21932 RayTrace: 26270 EarleyBoyer: 42457 RegExp: 5722 Splay: 8076 NavierStokes: 26484 PdfJS: 18279 Mandreel: 20588 Gameboy: 20056 CodeLoad: 21456 Box2D: 20621
Note that Raytrace may be hurt, but it fluctuates too wildly to tell conclusively.
(In reply to Sean Stangl from comment #6) > I locally profiled after a rebase. In summary: Richards gets better, nothing > gets hurt. This bug is waiting on bug 796114 before we'll be able to actually inline the main call in richards' critical loop. Did you work around that, or are these numbers still making a slow call there?
(In reply to Brian Hackett (:bhackett) from comment #8) > This bug is waiting on bug 796114 before we'll be able to actually inline > the main call in richards' critical loop. Did you work around that, or are > these numbers still making a slow call there? The numbers are just from applying the patch from Comment 1.
Tuning patch. While this still needs bug 796114 to land in order to inline the run() call, this patch seems to help now and is good to get in before doing baseline tuning. https://hg.mozilla.org/integration/mozilla-inbound/rev/335f8cd13b68
Whiteboard: [leave open]
Looks like bug 796114 landed, but has a few "issues" indicated by some new dependent bugs. Can this bug still move forward?
This still needs bug 768288 since this patch didn't enable inlining functions with loops. Doing that messes up several other v8 benchmarks, and moreover the current heuristics don't normally trigger inlining when used with parallel compilation because the function with a loop is called infrequently in the main loop. I think now that bug 837312 is the better strategy for this, though the performance improvement sstangl reported in bug 837312 comment 3 is much less than what I got with full inlining in bug 837312 comment 1. So, still uncertain.
Is this worth revisiting now that bug 768288 is fixed?
Assignee: general → nobody
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: