Closed Bug 1054972 Opened 10 years ago Closed 10 years ago

90% performance regression on The Stanford Javascript Crypto Library/SHA-512

Tracking

()

Status:

VERIFIED FIXED

Milestone:

mozilla35

Tracking Flags:

Tracking

Status

firefox31

---

unaffected

firefox32

wontfix

firefox33

verified

firefox34

verified

firefox35

---

verified

People

(Reporter: alexander.kjeldaas, Assigned: sunfish)

References

Details

(Keywords: perf, regression)

Attachments

(4 files)

All the code needed to run the SJCL benchmark in a standalone script 10 years ago Dan Gohman [:sunfish] 22.69 KB, application/javascript		Details
truncate-phis.patch 10 years ago Dan Gohman [:sunfish] 10.36 KB, patch	nbp : review+ Sylvestre : approval-mozilla-aurora+ Sylvestre : approval-mozilla-beta+	Details \| Diff \| Splinter Review
gvn-more-misc.patch 10 years ago Dan Gohman [:sunfish] 3.50 KB, patch	nbp : review+ Sylvestre : approval-mozilla-aurora+ Sylvestre : approval-mozilla-beta+	Details \| Diff \| Splinter Review
gvn-dont-set-use-removed.patch 10 years ago Dan Gohman [:sunfish] 9.62 KB, patch	nbp : review+ Sylvestre : approval-mozilla-aurora+ Sylvestre : approval-mozilla-beta+	Details \| Diff \| Splinter Review

alexander.kjeldaas

Reporter

Description

•

10 years ago

User Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:31.0) Gecko/20100101 Firefox/31.0 (Beta/Release)
Build ID: 20140715214327

Steps to reproduce:

Run the jsperf tests on a mobile device using the FF and FF Beta versions.


Actual results:

I see a performance degradation of around 70% for Crypto-JS and around 80-90% for SJCL.


Expected results:

The performance should not degrade significantly.

alexander.kjeldaas

Reporter

Comment 1

•

10 years ago

I forgot to point to the test:  http://jsperf.com/sha-512/3

alexander.kjeldaas

Reporter

Updated

•

10 years ago

OS: Linux → Android

Hardware: x86_64 → ARM

Gian-Carlo Pascutto [:gcp]

Comment 2

•

10 years ago

I don't see any regression for Crypto-JS on desktop nor mobile, but I do see a 90% performance regression on SJCL on all platforms, so this isn't mobile specific.

Component: General → JavaScript Engine

OS: Android → All

Product: Firefox for Android → Core

Hardware: ARM → All

Summary: Crypto-JS and SJCL performance regressions on SHA-512 implementations → 90% performance regression on The Stanford Javascript Crypto Library/SHA-512

Version: Firefox 32 → 32 Branch

Gian-Carlo Pascutto [:gcp]

Updated

•

10 years ago

Status: UNCONFIRMED → NEW

Ever confirmed: true

Gian-Carlo Pascutto [:gcp]

Updated

•

10 years ago

Keywords: regression, regressionwindow-wanted

Till Schneidereit [:till]

Comment 3

•

10 years ago

My money is on bug 976446, so testing builds immediately before and after that landed will probably speed up regression hunting.

Jan de Mooij [:jandem]

Updated

•

10 years ago

Flags: needinfo?(jdemooij)

Alice0775 White

Comment 4

•

10 years ago

Regression window(m-i)
Good:
https://hg.mozilla.org/integration/mozilla-inbound/rev/46d57b8f48ce
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:32.0) Gecko/20100101 Firefox/32.0 ID:20140505141944
Bad:
https://hg.mozilla.org/integration/mozilla-inbound/rev/695049af7654
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:32.0) Gecko/20100101 Firefox/32.0 ID:20140505145245
Pushlog:
http://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=46d57b8f48ce&tochange=695049af7654

Regressed by: 
695049af7654	Dan Gohman — Bug 999580 - IonMonkey: Generalize RangeAnalysis truncation to handle other kinds of paths to integer types. r=nbp

Blocks: 999580

Alice0775 White

Comment 5

•

10 years ago

The bug# should be Bug998580 instead of 999580

Blocks: 998580
No longer blocks: 999580

Gian-Carlo Pascutto [:gcp]

Updated

•

10 years ago

status-firefox31: --- → unaffected

status-firefox32: --- → affected

status-firefox33: --- → affected

status-firefox34: --- → affected

Gian-Carlo Pascutto [:gcp]

Updated

•

10 years ago

Keywords: regressionwindow-wanted

Aaron Train [:aaronmt]

Updated

•

10 years ago

Keywords: perf

alexander.kjeldaas

Reporter

Comment 6

•

10 years ago

OP here, I'm testing Crypto-JS again on Android and I get:

Firefox Mobile (31?): 118 +- 4.30

Firefox Mobile Beta (32?): 63.36 +- 38.78%

For me there is a significant regression also on the Crypto-JS benchmark.  Notice that the uncertainty is much larger on the 32 version, so the JIT might be a bit on and off.

alexander.kjeldaas

Reporter

Comment 7

•

10 years ago

Changing my mind.  The Crypto-JS results have a long "warmup period" on android, especially when my phone is low on battery.  I get anywhere from 30, to 60 in the beginning and then it tops out at 118 on version 31 and 108 on version 32, so there is no major perf regression.  Maybe a small one though.

Gian-Carlo Pascutto [:gcp]

Comment 8

•

10 years ago

You probably want to test performance with a fully charged battery and plugged into the power outlet, otherwise whatever power savings stuff Android and your CPU have will totally throw the results off - which is what that large error margin is trying to tell you.

I tested this and there is no performance difference between 31 and 32 on Crypto-JS, desktop nor mobile.

Jan de Mooij [:jandem]

Comment 9

•

10 years ago

NI from sunfish based on comment 4.

Flags: needinfo?(jdemooij) → needinfo?(sunfish)

Dan Gohman [:sunfish]

Assignee

Comment 10

•

10 years ago

Attached file All the code needed to run the SJCL benchmark in a standalone script — Details

Dan Gohman [:sunfish]

Assignee

Comment 11

•

10 years ago

I can reproduce the slowdown at the revision isolated in comment 4 with the standalone script in comment 10.

Assignee: nobody → sunfish

Flags: needinfo?(sunfish)

Dan Gohman [:sunfish]

Assignee

Comment 12

•

10 years ago

I see some effects which appear to be related to bailouts.

In the fast version, IONFLAGS=bailouts prints just this:

[Bailouts] Took bailout! Snapshot offset: 919

In the slow version, it prints this:

[Bailouts] Took bailout! Snapshot offset: 919
[Bailouts] Took bailout! Snapshot offset: 1956
[Bailouts] Baseline info failure all-sjcl.js:593, inlined into all-sjcl.js:593
[Bailouts] Took bailout! Snapshot offset: 2033
[Bailouts] Baseline info failure all-sjcl.js:593, inlined into all-sjcl.js:593

Annoyingly, running the slow version with --ion-parallel-compile=off produces the same output same the fast version.

Dan Gohman [:sunfish]

Assignee

Comment 13

•

10 years ago

With this diff to the attached testcase,

--- all-sjcl.js.orig 2014-08-22 14:12:32.739372163 -0700
+++ all-sjcl.js 2014-08-22 14:58:23.007290901 -0700
@@ -682,7 +682,7 @@
       t1h += chh + ((t1l >>> 0) < (chl >>> 0) ? 1 : 0);
       t1l += krl;
       t1h += krh + ((t1l >>> 0) < (krl >>> 0) ? 1 : 0);
-      t1l += wrl;
+      t1l = t1l + wrl|0;
       t1h += wrh + ((t1l >>> 0) < (wrl >>> 0) ? 1 : 0);
 
       // t2 = sigma0 + maj

it goes fast.

It appears that the problem here is that we're speculating more of the types to be int32, which is generally a good thing for code like this, but we're still using bailout checks, and they're needlessly triggering. In the above diff, after the add, t1l temporarily has a value which is beyond the int32 range, but it is truncated at every use (sometimes indirectly), so it shouldn't need a bailout check.

It looks like there are two things in the way here. One is phi nodes. It's probably possible to propagate truncation through phi nodes, though I'll have to think about it. The other is resume points. replaceAllUsesWith is setting UseRemoved on the original instruction's operands, which makes range analysis be conservative about resume points. I think we can do something better there. I have a mock-up patch simulating both of these changes in my tree, and with this patch the unmodified benchmark goes fast, which is encouraging.

Boris Zbarsky [:bzbarsky]

Updated

•

10 years ago

tracking-firefox32: --- → ?

tracking-firefox33: --- → ?

tracking-firefox34: --- → ?

Benjamin Kerensa [:bkerensa]

Updated

•

10 years ago

tracking-firefox32: ? → +

tracking-firefox33: ? → +

tracking-firefox34: ? → +

Dan Gohman [:sunfish]

Assignee

Comment 14

•

10 years ago

Attached patch truncate-phis.patch — Details — Splinter Review

Here are the patches.

This first patch implements the truncation analysis for phis. In the SJCL code base, some of the important adds are used by phis that are then truncated. This patch allows the truncation to flow through phis to allow the adds to be truncated.

Attachment #8477921 - Flags: review?(nicolas.b.pierron)

Dan Gohman [:sunfish]

Assignee

Comment 15

•

10 years ago

Attached patch gvn-more-misc.patch — Details — Splinter Review

This patch has some minor cleanups split out from the following patch.

Attachment #8477922 - Flags: review?(nicolas.b.pierron)

Dan Gohman [:sunfish]

Assignee

Comment 16

•

10 years ago

Attached patch gvn-dont-set-use-removed.patch — Details — Splinter Review

This is the patch reducing the usage of the UseRemoved flag.

With these three patches applied, the attached SJCL benchmark goes fast.

While looking at this benchmark, I also noticed that there were still some floating-point operations in what otherwise should have been purely integer code in the hot loop even after this patch series. I filed bug 1057779 to track this.

Attachment #8477923 - Flags: review?(nicolas.b.pierron)

Lawrence Mandel [:lmandel] (use needinfo)

Comment 17

•

10 years ago

Marking this as won't fix for 32 as it's too late to take a fix for a small perf regression (comment 7).

status-firefox32: affected → wontfix

Dan Gohman [:sunfish]

Assignee

Comment 18

•

10 years ago

To clarify, comment 7 is about Crypto-JS. The main regression here is on SJCL (comment 2).

Gian-Carlo Pascutto [:gcp]

Comment 19

•

10 years ago

It's not a small regression (90%!) but we're already at the RC stage and the code hasn't even been in Nightly yet, so unfortunately this got reported a little too late to still be fixable for 32.

Emanuel Hoogeveen [:ehoogeveen]

Comment 20

•

10 years ago

IIUC this regressed because of a correctness fix (bug 998580), and the improvements needed to get the desired performance legitimately (the patches here) are fairly involved. I doubt this would land on beta even if we were early in the cycle.

Nicolas B. Pierron [:nbp]

Comment 21

•

10 years ago

Comment on attachment 8477921 [details] [diff] [review]
truncate-phis.patch

Review of attachment 8477921 [details] [diff] [review]:
-----------------------------------------------------------------

Nice :)

::: js/src/jit/RangeAnalysis.cpp
@@ +2579,5 @@
> +            if (kind == MDefinition::TruncateAfterBailouts)
> +                op = MToInt32::New(alloc, truncated->getOperand(i));
> +            else
> +                op = MTruncateToInt32::New(alloc, truncated->getOperand(i));
> +            if (truncated->isPhi()) {

style-nit: add a new line above.

Attachment #8477921 - Flags: review?(nicolas.b.pierron) → review+

Nicolas B. Pierron [:nbp]

Comment 22

•

10 years ago

Comment on attachment 8477922 [details] [diff] [review]
gvn-more-misc.patch

Review of attachment 8477922 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/ValueNumbering.cpp
@@ +501,5 @@
> +                mozilla::DebugOnly<bool> r = deleteDef(def);
> +                MOZ_ASSERT(r, "deleteDef shouldn't have tried to add anything to the worklist, "
> +                              "so it shouldn't have failed");
> +                MOZ_ASSERT(deadDefs_.empty(),
> +                           "deleteDef shouldn't have added anything to the worklist");

I think the following comment makes it easier to understand why these assertions hold:

  deleteDef should not add anything to the deadDefs, as the redundant operation should have the same input operands.

Attachment #8477922 - Flags: review?(nicolas.b.pierron) → review+

Nicolas B. Pierron [:nbp]

Comment 23

•

10 years ago

Comment on attachment 8477923 [details] [diff] [review]
gvn-dont-set-use-removed.patch

Review of attachment 8477923 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/MIR.cpp
@@ +467,5 @@
>  void
>  MDefinition::replaceAllUsesWith(MDefinition *dom)
>  {
> +    for (size_t i = 0, e = numOperands(); i < e; i++)
> +        getOperand(i)->setUseRemovedUnchecked();

The flag setUseRemovedUnchecked is used to prevent optimizing code which might have been captured under a certain form, but due to the removal of one of the use cases, this form is no longer needed and could be optimized.

One of the example of such optimization is Range Analysis, where the removal of a path which does not expect a truncation can be removed, and a truncation can bubble up to an MDefinition which is captured by a ResumePoint.

The only case where we can do this kind of optimization is if the removed definition had no additional constraints compared to the remaining use cases.

::: js/src/jit/MIR.h
@@ +667,5 @@
>      }
>      void replaceAllUsesWith(MDefinition *dom);
>  
> +    // Like replaceAllUsesWith, but doesn't set UseRemoved on |this|'s operands.
> +    void justReplaceAllUsesWith(MDefinition *dom);

"just" reflects the lack of additional constraints added by |this|.  Maybe a better name could be something which express the fact that |this| is redundant or accept anything as operands.

Attachment #8477923 - Flags: review?(nicolas.b.pierron) → review+

Dan Gohman [:sunfish]

Assignee

Comment 24

•

10 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/61f05ae95aa4
https://hg.mozilla.org/integration/mozilla-inbound/rev/afac7b1435bc
https://hg.mozilla.org/integration/mozilla-inbound/rev/0cf223c85b3b

I left the name justReplaceAllUsesWith for now. It's actually not just about |this| being redundant or accepting anything as operands. In GVN's case, GVN is setting UseRemoved manually when it needs to, so it doesn't need or want replaceAllUsesWith to do it. I'm open to renaming it though, if we can think of a better name.

In the long term, I'd like to get replaceAllUsesWith out of the UseRemoved business altogether, since it doesn't have enough information to really get it right. For now, justReplaceAllUsesWith marks those places which are already ok with this.

Ryan VanderMeulen [:RyanVM]

Comment 25

•

10 years ago

https://hg.mozilla.org/mozilla-central/rev/61f05ae95aa4
https://hg.mozilla.org/mozilla-central/rev/afac7b1435bc
https://hg.mozilla.org/mozilla-central/rev/0cf223c85b3b

Status: NEW → RESOLVED

Closed: 10 years ago

Resolution: --- → FIXED

Target Milestone: --- → mozilla35

Gary Kwong [:gkw] [:nth10sd] (NOT official MoCo now)

Updated

•

10 years ago

Depends on: 1062612

Sylvestre Ledru [:Sylvestre]

Updated

•

10 years ago

status-firefox34: affected → fixed

Gian-Carlo Pascutto [:gcp]

Comment 27

•

10 years ago

I'm fairly sure the flags change was wrong here.

status-firefox34: fixed → affected

status-firefox35: --- → fixed

Sylvestre Ledru [:Sylvestre]

Comment 28

•

10 years ago

Dan, 33 (beta) and 34 (aurora) are affected, if you don't think it is too risky, could you fill an uplift request? Thanks

Flags: needinfo?(sunfish)

Sylvestre Ledru [:Sylvestre]

Comment 29

•

10 years ago

Gian-Carlo, yes, I made a mistake, sorry :)

Dan Gohman [:sunfish]

Assignee

Comment 30

•

10 years ago

There was a bug in one of my patches here which is bug 1062612. The fix is simple, and I just checked it into mozilla-inbound. Once that's landed on mozilla-central, I'll nominate the patches for uplift.

Alice0775 White

Updated

•

10 years ago

Depends on: 1063064

Alice0775 White

Updated

•

10 years ago

Depends on: 1063379

Gary Kwong [:gkw] [:nth10sd] (NOT official MoCo now)

Updated

•

10 years ago

Depends on: 1063653

Dan Gohman [:sunfish]

Assignee

Comment 31

•

10 years ago

Also waiting for resolution on bug 1063653 before proceeding.

Virtual_ManPL [:Virtual] 🇵🇱 - (please needinfo? me - so I will see your comment/reply/question/etc.)

Updated

•

10 years ago

No longer depends on: 1063064

Sylvestre Ledru [:Sylvestre]

Comment 32

•

10 years ago

Dan, seems ok now. Could you fill the uplift request this (Monday) morning to make sure the fix arrive in beta4?
thanks

Dan Gohman [:sunfish]

Assignee

Comment 33

•

10 years ago

Comment on attachment 8477921 [details] [diff] [review]
truncate-phis.patch

Approval Request Comment
[Feature/regressing bug #]:

bug 998580

[User impact if declined]:

90% slowdown on SJCL SHA-512

[Describe test coverage new/current, TBPL]:

TBPL on mozilla-central

[Risks and why]: 

The patch itself is fairly straightforward, but though it did cause bug 1062612, which while trivial to fix, was surprising for having passed through testing.

[String/UUID change made/needed]:

None.

Attachment #8477921 - Flags: approval-mozilla-beta?

Attachment #8477921 - Flags: approval-mozilla-aurora?

Dan Gohman [:sunfish]

Assignee

Comment 34

•

10 years ago

Comment on attachment 8477922 [details] [diff] [review]
gvn-more-misc.patch

Approval Request Comment
[Feature/regressing bug #]:

bug 998580

[User impact if declined]:

90% slowdown on SJCL SHA-512

[Describe test coverage new/current, TBPL]:

TBPL on mozilla-central

[Risks and why]: 

This patch is just a cleanup in preparation for the next patch. It is low-risk.

[String/UUID change made/needed]:

None.

Attachment #8477922 - Flags: approval-mozilla-beta?

Attachment #8477922 - Flags: approval-mozilla-aurora?

Dan Gohman [:sunfish]

Assignee

Comment 35

•

10 years ago

Comment on attachment 8477923 [details] [diff] [review]
gvn-dont-set-use-removed.patch

Approval Request Comment
[Feature/regressing bug #]:

bug 998580

[User impact if declined]:

90% slowdown on SJCL SHA-512

[Describe test coverage new/current, TBPL]:

TBPL on mozilla-central

[Risks and why]: 

This patch interacts with some complex code, and did participate in bug 1063653. That said, the patch itself is fairly straightforward.

[String/UUID change made/needed]:

None.

Attachment #8477923 - Flags: approval-mozilla-beta?

Attachment #8477923 - Flags: approval-mozilla-aurora?

Flags: needinfo?(sunfish)

Sylvestre Ledru [:Sylvestre]

Updated

•

10 years ago

Attachment #8477921 - Flags: approval-mozilla-beta?

Attachment #8477921 - Flags: approval-mozilla-beta+

Attachment #8477921 - Flags: approval-mozilla-aurora?

Attachment #8477921 - Flags: approval-mozilla-aurora+

Sylvestre Ledru [:Sylvestre]

Updated

•

10 years ago

Attachment #8477922 - Flags: approval-mozilla-beta?

Attachment #8477922 - Flags: approval-mozilla-beta+

Attachment #8477922 - Flags: approval-mozilla-aurora?

Attachment #8477922 - Flags: approval-mozilla-aurora+

Sylvestre Ledru [:Sylvestre]

Updated

•

10 years ago

Attachment #8477923 - Flags: approval-mozilla-beta?

Attachment #8477923 - Flags: approval-mozilla-beta+

Attachment #8477923 - Flags: approval-mozilla-aurora?

Attachment #8477923 - Flags: approval-mozilla-aurora+

Ryan VanderMeulen [:RyanVM]

Comment 36

•

10 years ago

https://hg.mozilla.org/releases/mozilla-aurora/rev/d7245149a156
https://hg.mozilla.org/releases/mozilla-aurora/rev/2387457e8c0b
https://hg.mozilla.org/releases/mozilla-aurora/rev/22ac2e65664b

https://hg.mozilla.org/releases/mozilla-beta/rev/94dc71a06159
https://hg.mozilla.org/releases/mozilla-beta/rev/c0d46e44a6cb
https://hg.mozilla.org/releases/mozilla-beta/rev/316374007734

status-firefox33: affected → fixed

status-firefox34: affected → fixed

Andrei Vaida [:avaida]

Updated

•

10 years ago

Flags: qe-verify+

Andrei Vaida [:avaida]

Comment 37

•

10 years ago

Testing performed on Windows 7 64-bit (Intel Core 2 Duo 2.93Ghz) by running the benchmark [1] on:
 * Nightly 35.0a1 (2014-09-25) - sjcl: 847, ±4.97%, 62% slower
 * Aurora 34.0a2 (2014-09-25) - sjcl: 903, ±5.96%, 59% slower
 * Beta 33.0b7 - sjcl: 742, ±4.96%, 66% slower

I believe that the lower values are caused by my hardware/platform, as the benchmark displayed a higher ops/sec for an Ubuntu x86 machine with AMD FX(tm)-8320 Eight-Core Processor: e.g. Nightly 35 - 1,062,  ±6.26%, 57% slower.

Marking this fix as verified. Dan, please let me know if you think otherwise regarding this data.


[1] http://jsperf.com/sha-512/3

Status: RESOLVED → VERIFIED

status-firefox33: fixed → verified

status-firefox34: fixed → verified

status-firefox35: fixed → verified

Sylvestre Ledru [:Sylvestre]

Comment 38

•

10 years ago

I am not sure we can consider that a going from 90 => 60 % slower can qualify this bug as closed.

Till Schneidereit [:till]

Comment 39

•

10 years ago

(In reply to Sylvestre Ledru [:sylvestre] from comment #38)
> I am not sure we can consider that a going from 90 => 60 % slower can
> qualify this bug as closed.

I get numbers that warrant closing as fixed (all numbers with current Nightly versions):

On my 1st-gen 2.7Ghz 15" rMBP, I get an sjcl score of 2006 (with TweetNaCl.js at 4301).

On my Galaxy S4 I get an sjcl score of 161 (with TweetNaCl.js at 201).

These scores are (very roughly) on par with Chrome's on the same machines.

I think the reason for the numbers is that they're done on a different machine. Ideally, a comparison would be made on the same machine between current builds and old builds without the fixes.

You need to log in before you can comment on or make changes to this bug.