Open Bug 824249 Opened 12 years ago Updated 2 years ago

IonMonkey: inline v8-richards run() call

Categories

(Core :: JavaScript Engine, defect)

Other Branch
x86
macOS
defect

Tracking

()

People

(Reporter: bhackett1024, Unassigned)

References

(Blocks 1 open bug)

Details

(Whiteboard: [leave open])

Attachments

(1 file, 1 obsolete file)

Attached patch partial patch (obsolete) — Splinter Review
The v8-richards benchmark is a task scheduler that loops over task blocks and calls the run() method on them.  This run() call is a polymorphic dispatch about which we have perfect type information, but the call is currently not inlined and we do a CallGeneric which hurts performance tremendously.  If I hack the code to allow inlining this call (js_new below), I get these scores (average of 6 runs):

js_old: 10006
js_new: 11621 
d8:     12919

So, fixing this closes more than half the gap with v8.  There are three things preventing inlining of this call:

- One of the inlined functions contains a loop (which doesn't actually run that much).  JM+TI can't inline calls with loops because of its overspecialized loop analysis code.  This restriction has carried over to Ion, but I don't see any reason why this would actually cause problems.  Attached patch removes this restriction.

- Inlining the call fails on some heuristics for inlined script length and use counts.  Now that we compile off thread I don't think the complexity here is warranted, the attached patch makes simplifications and tweaks to allow this call to be inlined.

- There is a type barrier on the call site (one of the targets, IdleTask, is only ever called with a null packet due to invariants in the program beyond TI's purview).  This should be fixed by bug 796114, which has had a patch waiting for review the last two months :(
Attachment #695179 - Flags: review?(dvander)
Depends on: 768288
Blocks: 824257
Attached patch partial patchSplinter Review
Alas, the measurements from the previous patch were with the use count checks totally removed, and that patch gives scores a few hundred points less than was reported due to other call sites not inlining as much as they could.  This patch relaxes usesBeforeInlining even further to get the reported scores.  It might be worth killing this field entirely, as it is partially redundant with inlineUseCountRatio.
Attachment #695179 - Attachment is obsolete: true
Attachment #695179 - Flags: review?(dvander)
Attachment #695194 - Flags: review?(dvander)
Comment on attachment 695194 [details] [diff] [review]
partial patch

Review of attachment 695194 [details] [diff] [review]:
-----------------------------------------------------------------

Cool, glad to see that restriction removed.

re: bug 796114, we couldn't measure any performance improvement at the time, so we decided not to take it. If it will help now that other things are in place, I'll review it.
Attachment #695194 - Flags: review?(dvander) → review+
Please don't commit the "enable inlining loop" part. This will cause a huge regression on earley-boyer. Details are in bug 768288 . Also bug 768288 is about inlining the loops ... I will have a proper patch for that before the end of the week.
Hum, I think we want to keep these checks for mono-threaded CPUs, such as mobile phones targeted by b2g.  Unless we don't want IonMonkey with B2G at-all ?!
Bug 823884 has landed. Earley-boyer shouldn't cause any problems now when inlining. Now for bug 768288, that bug will introduce one extra heuristic, to not inline functions that invalidate a lot (because the impact will be bigger as the caller will be invalidated too). Now because this bug is about removing these heuristics, I'm not sure, especially because we don't want to decrease performance also on single core computers.
Can you make sure this isn't the case? In that case you can just go ahead an push this... because I think it is better than adding more heuristics, like in 768288
I locally profiled after a rebase. In summary: Richards gets better, nothing gets hurt.

SM without patch:

Richards: 17164
DeltaBlue: 18559
Crypto: 22930
RayTrace: 19163
EarleyBoyer: 21540
RegExp: 3458
Splay: 16810
NavierStokes: 27251
PdfJS: 10986
Mandreel: 17791
Gameboy: 15713
CodeLoad: 14926
Box2D: 13883

SM with patch:

Richards: 18541
DeltaBlue: 18546
Crypto: 22580
RayTrace: 17355
EarleyBoyer: 22609
RegExp: 3546
Splay: 15879
NavierStokes: 27278
PdfJS: 10965
Mandreel: 17722
Gameboy: 15532
CodeLoad: 15056
Box2D: 12334

d8:

Richards: 17312
DeltaBlue: 23174
Crypto: 21932
RayTrace: 26270
EarleyBoyer: 42457
RegExp: 5722
Splay: 8076
NavierStokes: 26484
PdfJS: 18279
Mandreel: 20588
Gameboy: 20056
CodeLoad: 21456
Box2D: 20621
Note that Raytrace may be hurt, but it fluctuates too wildly to tell conclusively.
(In reply to Sean Stangl from comment #6)
> I locally profiled after a rebase. In summary: Richards gets better, nothing
> gets hurt.

This bug is waiting on bug 796114 before we'll be able to actually inline the main call in richards' critical loop.  Did you work around that, or are these numbers still making a slow call there?
(In reply to Brian Hackett (:bhackett) from comment #8)
> This bug is waiting on bug 796114 before we'll be able to actually inline
> the main call in richards' critical loop.  Did you work around that, or are
> these numbers still making a slow call there?

The numbers are just from applying the patch from Comment 1.
Tuning patch.  While this still needs bug 796114 to land in order to inline the run() call, this patch seems to help now and is good to get in before doing baseline tuning.

https://hg.mozilla.org/integration/mozilla-inbound/rev/335f8cd13b68
Whiteboard: [leave open]
Looks like bug 796114 landed, but has a few "issues" indicated by some new dependent bugs.  Can this bug still move forward?
This still needs bug 768288 since this patch didn't enable inlining functions with loops.  Doing that messes up several other v8 benchmarks, and moreover the current heuristics don't normally trigger inlining when used with parallel compilation because the function with a loop is called infrequently in the main loop.  I think now that bug 837312 is the better strategy for this, though the performance improvement sstangl reported in bug 837312 comment 3 is much less than what I got with full inlining in bug 837312 comment 1.  So, still uncertain.
Is this worth revisiting now that bug 768288 is fixed?
Assignee: general → nobody
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: