Closed Bug 831268 Opened 12 years ago Closed 12 years ago

IonMonkey: Investigate V8's --inline_new optimization

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
normal

Tracking

()

RESOLVED INVALID

People

(Reporter: h4writer, Unassigned)

References

(Blocks 1 open bug)

Details

Like I found out yesterday, V8 has a mode called "--inline_new". js run-raytrace.js RayTrace: 11248 d8 --inline_new run-raytrace.js RayTrace: 17207 d8 --noinline_new run-raytrace.js RayTrace: 9719 Disabling this feature makes V8 slower than IM. Therefore we need this optimization to be faster than V8 on raytrace. Dvander told me and as far as I understand this optimization creates a stub for the constructing call, therefore removing the need to call the constructing call. From the tracelogging profile, this is the time spend in the constructing calls (initialize) and they remove them. ion_cannon,raytrace.js:84: 212292605 cycles (7%), called: 1726808, 122 cycles/call ion_cannon,raytrace.js:223: 211970214 cycles (7%), called: 1820609, 116 cycles/call ion_cannon,raytrace.js:531: 168080960 cycles (5%), called: 859969, 195 cycles/call ion_cannon,raytrace.js:285: 33788262 cycles (1%), called: 180317, 187 cycles/call So this optimization would remove 20% of our calling time. As the tracelogger profiler takes some overhead when also reporting ion->ion calls I actually assume if we implement this we would gain the same improvement as V8: 77%. Improving our score to 19900
Blocks: 768745
Why aren't we inlining the 'new' calls fully in v8-raytrace already? (Or is that what this bug is about, I'm confused by comment 0.) That doesn't seem like it needs any fancy new optimizations, and it seems like the only additional thing such a special optimization mode would add would be to avoid using ICs for adding properties in the constructor. That latter bit could be a significant cost, but there are several good ways of fixing that, including improving the definite properties analysis to handle Class.create (probably simple to do) or using JM IC info to handle property adds with inline MIR nodes as we do for normal get/set.
We do inline constructing calls. On V8 that would be --inline_construct. Note that on raytrace we now also inlining the constructing calls since bug 813784. This bug isn't about that. --inline_new is something totally different. They have a special constructor scheme: <dvander`home> I just meant, v8 takes function slike <dvander`home> function f(x) { this.x = x; } <dvander`home> if you do |new f(5)| they don't use either compiler <dvander`home> they generate a one-off stub for just that closure <dvander`home> it might be shared but it's definitely specially generated
OK, but it doesn't follow at all that exactly copying v8's optimization is what we should be doing too, nor that copying the optimization will give us the performance gains you discuss in comment 0 (there could be some perf fault in v8 that bites them on v8-raytrace without that optimization, because they didn't optimize for that case and the constructing stubs are a holdover from pre-crankshaft, which I suspect). Put another way, there isn't any reason this can't be fixed by staying in the SSA based compiler. Can you write a microbenchmark which illustrates the performance flaw you're trying to fix?
I don't say we need to copy that optimization word by word, but we should do something similar. That fits our system ;). At least it should get investigated ... I'm not starting to implement yet, as I don't know how it works exactly and what we need exactly to fix this issue ;). More a reminder there are some gains here
To have more reason to look into this: DeltaBlue: 17052 to 19221 (12%) RayTrace: 9990 to 17890 (79%) EarleyBoyer: 17857 to 30124 (68%) ---- Score (version 7): 11998 to 14447 (20%)
Summary: IonMonkey: Create constructing stubs like v8 → IonMonkey: Investigate V8's --inline_new optimization
Blocks: 765980
Blocks: 768739
(In reply to Hannes Verschore [:h4writer] from comment #5) > To have more reason to look into this: > > DeltaBlue: 17052 to 19221 (12%) > RayTrace: 9990 to 17890 (79%) > EarleyBoyer: 17857 to 30124 (68%) > ---- > Score (version 7): 11998 to 14447 (20%) that's v8 difference with/without --inline_new
Looking into this a little more, I don't find this bug credible at all. On this microbenchmark: function A() { this.f = 0; this.g = 1; } function foo() { for (var i = 0; i < 10000000; i++) var x = new A(); } foo(); js: 312 d8: 57 d8 --noinline_new: 436 The reason that d8 is so much faster than us at this sort of test is that it uses a generational GC. The reason it is so much slower with --noinline_new is that it is doing something hilariously slow instead of generating its normal optimized code for constructing the A objects. We are already generating basically optimal code for this test. The only way we can catch v8 is to build a generational GC or (in addition, really, since GGC will happen) add an escape analysis and stack allocate the object.
"inline_new" is indeed more than creating a stub for constructing calls. I've now measured how much the stub creating alone accounts for: // V8, with stubs Richards: 3657 DeltaBlue: 4465 Crypto: 4586 RayTrace: 4452 EarleyBoyer: 7547 RegExp: 1199 Splay: 1509 NavierStokes: 5238 ---- Score (version 7): 3525 // V8 without stubs Richards: 3633 DeltaBlue: 4423 Crypto: 4548 RayTrace: 4168 EarleyBoyer: 6692 RegExp: 1201 Splay: 1483 NavierStokes: 5243 ---- Score (version 7): 3427 Raytrace and earleyboyer both take a hit of 12%. I was maybe to eagerly to say we could maybe hit a 70% increase. But I think it should be doable to get the 12%. I already explained that we spend around 20% in the constructing functions. I don't know how long the constructing stub in V8 takes, but I assume it will be much lower than the 20%. I don't see a reason that we can't get the 12% V8 get's on it. But to be sure, I'll try to make a quick and dirty prototype and see what it gives us. (It will also give me ideas on how to do this properly) @Brain, the testcase provided doesn't use the generate stubs. There is no speed difference with enabling/disabling it. d8 --inline_new /tmp/test.js // Without stubs 0m0.323s d8 --inline_new /tmp/test.js // With stubs 0m0.326s d8 --noinline_new /tmp/test.js // Inline new totally disabled 0m1.100s js /tmp/test.js 0m0.538s
With "stubs", I mean the optimization with specialized constructor stubs only. Not stubs in general.
OK, not trying to hound on this, but can you write a microbenchmark which illustrates the performance flaw you're trying to fix? Basically, what we do and what v8 does are different in these constructors because of TI's definite properties analysis. If we can determine a constructor will be creating an object with specific properties in specific slots, we pregenerate its shape and do direct writes to the object's properties during the constructor. v8 doesn't have this, and I believe uses these constructor stubs for the same purpose. So adding constructor stubs ourselves would just be a second way to do the same thing, and I'd like to see why we don't have definite property info on v8-raytrace because I think this is an easy fix now that bug 638794 is done. Having definite properties would also improve speed for later property accesses as well, unlike constructor stubs. Having a microbenchmark to look at would make this job easier, both to make sure we are talking about the same performance flaw, and to ensure the fix is addressing that flaw.
Ah sorry I didn't report this earlier, but I've discarded the idea the "constructors stubs" are the observed performance difference. I made some wrong assumptions when going through the V8 code base. Looks like the "constructors stubs" alone have no performance impact whatsoever on raytrace :(. I think it's best to just close the bug, as too much wrong information is in here!
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → INVALID
You need to log in before you can comment on or make changes to this bug.