Open Bug 1161576 Opened 9 years ago Updated 2 years ago

performance of dart2js output for RSA benchmark fell off a cliff

Tracking

()

Status:

NEW

People

(Reporter: herhut, Unassigned)

References

Details

Attachments

(1 file)

dart2js generated output for RSA benchmark 9 years ago herhut 111.48 KB, application/javascript		Details

herhut

Reporter

Description

•

9 years ago

Attached file dart2js generated output for RSA benchmark — Details

We recently updated to jsshell C36.0.1 (from C28), which slowed down the dart2js RSA benchmark (attached) by 70%. jsshell C38 improved things a bit however performance is still below what we saw previously.

Currently, jsshell is about 6x slower than v8.

Is there something in the generated code that is **** jsshell or is this a bug?

Boris Zbarsky [:bzbarsky]

Comment 1

•

9 years ago

Regression range on nightlies: http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=2114ef80f6ae&tochange=17e190839086

Various JS changes in here, but the one looking most suspicious to me is bug 914255.

Boris Zbarsky [:bzbarsky]

Comment 2

•

9 years ago

Yeah, confirmed.  Bumping TYPE_FLAG_OBJECT_COUNT_LIMIT to be the same as TYPE_FLAG_DOMOBJECT_COUNT_LIMIT (or indeed just to 11) makes trunk about 5x faster on this testcase.  10 doesn't seem to be enough.

In the slow case, there's a ton of time spent in jitcode proper, and about 50% under SetObjectElement, calling NativeSetProperty, calling setDenseElementWithType.  Not sure whether that's happening for arrays that we failed to fast-path because of the limited typeset size or whether we're dealing with normal objects being used as arrays.

Blocks: 914255

Status: UNCONFIRMED → NEW

Ever confirmed: true

Flags: needinfo?(hv1989)

Boris Zbarsky [:bzbarsky]

Comment 3

•

9 years ago

Oh, profiling just the JS a bunch of time is claimed to be spent in dart<.BigInteger.am$6.  But that's also true in the case with bigger typesets... in fact the fraction of time spent there is _larger_ with bigger typesets.

I did confirm that with bigger typesets the SetObjectElement vmcalls are gone.

Jan de Mooij [:jandem]

Comment 4

•

9 years ago

We should probably make the SETELEM unknown-object path faster. It's possible we're currently not using an IC in that case.

Even if we bump TYPE_FLAG_OBJECT_COUNT_LIMIT up again, the next benchmark will have 12 different ObjectGroups and we have the same perf cliff...

Hannes Verschore [:h4writer]

Comment 5

•

9 years ago

(In reply to Jan de Mooij [:jandem] from comment #4)
> We should probably make the SETELEM unknown-object path faster. It's
> possible we're currently not using an IC in that case.
> 
> Even if we bump TYPE_FLAG_OBJECT_COUNT_LIMIT up again, the next benchmark
> will have 12 different ObjectGroups and we have the same perf cliff...

Agreeing here. The reason to decrease it was to decrease the need for recompilations. So bumping would reintroduce that problem again.

Flags: needinfo?(hv1989)

Boris Zbarsky [:bzbarsky]

Comment 6

•

9 years ago

Oh, I agree we should not be bumping the object count limit.  Sorry if you thought I was suggesting we should do that.

So I instrumented IonBuilder::setElemTryCache.  With the unknown object typeset, we fail to output the IC because PropertyWriteNeedsTypeBarrier returns true.  Which it always does if types->unknownObject().  I tried just disabling this check (yes, I know, not safe, but I wanted to get an idea of how much of the perf problem was coming from here) and it helps a good bit.  To put numbers to this, on my machine:

Current trunk: ~450ms
With larger typesets: ~110ms
Current trunk with setElemTryCache hacked to emit an IC: ~170ms
Current trunk with setElemTryCache hacked and the ElementAccessIsDenseNative check in
   getElemTryDense hacked to pass (as it does with larger typesets): ~130ms

Boris Zbarsky [:bzbarsky]

Comment 7

•

9 years ago

Oh, also it's possible that with larger typesets we manage to do the set via setElemTryDense, not setElemTryCache.  But if I hack things to allow setElemTryDense to work (by disabling both the ElementAccessIsDenseNative and PropertyWriteNeedsTypeBarrier checks in there), that actually makes things slower than when I just forced the IC to work.  Presumably because we're getting bad type info elsewhere or something.

BMO Automation

Updated

•

2 years ago

Severity: normal → S3

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Quick Search

performance of dart2js output for RSA benchmark fell off a cliff

Categories

(Core :: JavaScript Engine, defect)

Tracking

()

People

(Reporter: herhut, Unassigned)

References

Details

Crash Data

Security

(public)

User Story

Attachments

(1 file)

Description

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Updated

Attachment

General

Description

File Name

Content Type