Open Bug 1161576 Opened 8 years ago Updated 2 months ago
performance of dart2js output for RSA benchmark fell off a cliff
We recently updated to jsshell C36.0.1 (from C28), which slowed down the dart2js RSA benchmark (attached) by 70%. jsshell C38 improved things a bit however performance is still below what we saw previously. Currently, jsshell is about 6x slower than v8. Is there something in the generated code that is **** jsshell or is this a bug?
Regression range on nightlies: http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=2114ef80f6ae&tochange=17e190839086 Various JS changes in here, but the one looking most suspicious to me is bug 914255.
Yeah, confirmed. Bumping TYPE_FLAG_OBJECT_COUNT_LIMIT to be the same as TYPE_FLAG_DOMOBJECT_COUNT_LIMIT (or indeed just to 11) makes trunk about 5x faster on this testcase. 10 doesn't seem to be enough. In the slow case, there's a ton of time spent in jitcode proper, and about 50% under SetObjectElement, calling NativeSetProperty, calling setDenseElementWithType. Not sure whether that's happening for arrays that we failed to fast-path because of the limited typeset size or whether we're dealing with normal objects being used as arrays.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Oh, profiling just the JS a bunch of time is claimed to be spent in dart<.BigInteger.am$6. But that's also true in the case with bigger typesets... in fact the fraction of time spent there is _larger_ with bigger typesets. I did confirm that with bigger typesets the SetObjectElement vmcalls are gone.
We should probably make the SETELEM unknown-object path faster. It's possible we're currently not using an IC in that case. Even if we bump TYPE_FLAG_OBJECT_COUNT_LIMIT up again, the next benchmark will have 12 different ObjectGroups and we have the same perf cliff...
(In reply to Jan de Mooij [:jandem] from comment #4) > We should probably make the SETELEM unknown-object path faster. It's > possible we're currently not using an IC in that case. > > Even if we bump TYPE_FLAG_OBJECT_COUNT_LIMIT up again, the next benchmark > will have 12 different ObjectGroups and we have the same perf cliff... Agreeing here. The reason to decrease it was to decrease the need for recompilations. So bumping would reintroduce that problem again.
Oh, I agree we should not be bumping the object count limit. Sorry if you thought I was suggesting we should do that. So I instrumented IonBuilder::setElemTryCache. With the unknown object typeset, we fail to output the IC because PropertyWriteNeedsTypeBarrier returns true. Which it always does if types->unknownObject(). I tried just disabling this check (yes, I know, not safe, but I wanted to get an idea of how much of the perf problem was coming from here) and it helps a good bit. To put numbers to this, on my machine: Current trunk: ~450ms With larger typesets: ~110ms Current trunk with setElemTryCache hacked to emit an IC: ~170ms Current trunk with setElemTryCache hacked and the ElementAccessIsDenseNative check in getElemTryDense hacked to pass (as it does with larger typesets): ~130ms
Oh, also it's possible that with larger typesets we manage to do the set via setElemTryDense, not setElemTryCache. But if I hack things to allow setElemTryDense to work (by disabling both the ElementAccessIsDenseNative and PropertyWriteNeedsTypeBarrier checks in there), that actually makes things slower than when I just forced the IC to work. Presumably because we're getting bad type info elsewhere or something.
You need to log in before you can comment on or make changes to this bug.