Closed Bug 1054972 Opened 5 years ago Closed 5 years ago

90% performance regression on The Stanford Javascript Crypto Library/SHA-512

Categories

(Core :: JavaScript Engine, defect)

32 Branch
defect
Not set

Tracking

()

VERIFIED FIXED
mozilla35
Tracking Status
firefox31 --- unaffected
firefox32 + wontfix
firefox33 + verified
firefox34 + verified
firefox35 --- verified

People

(Reporter: alexander.kjeldaas, Assigned: sunfish)

References

Details

(Keywords: perf, regression)

Attachments

(4 files)

User Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:31.0) Gecko/20100101 Firefox/31.0 (Beta/Release)
Build ID: 20140715214327

Steps to reproduce:

Run the jsperf tests on a mobile device using the FF and FF Beta versions.


Actual results:

I see a performance degradation of around 70% for Crypto-JS and around 80-90% for SJCL.


Expected results:

The performance should not degrade significantly.
I forgot to point to the test:  http://jsperf.com/sha-512/3
OS: Linux → Android
Hardware: x86_64 → ARM
I don't see any regression for Crypto-JS on desktop nor mobile, but I do see a 90% performance regression on SJCL on all platforms, so this isn't mobile specific.
Component: General → JavaScript Engine
OS: Android → All
Product: Firefox for Android → Core
Hardware: ARM → All
Summary: Crypto-JS and SJCL performance regressions on SHA-512 implementations → 90% performance regression on The Stanford Javascript Crypto Library/SHA-512
Version: Firefox 32 → 32 Branch
Status: UNCONFIRMED → NEW
Ever confirmed: true
My money is on bug 976446, so testing builds immediately before and after that landed will probably speed up regression hunting.
Flags: needinfo?(jdemooij)
Regression window(m-i)
Good:
https://hg.mozilla.org/integration/mozilla-inbound/rev/46d57b8f48ce
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:32.0) Gecko/20100101 Firefox/32.0 ID:20140505141944
Bad:
https://hg.mozilla.org/integration/mozilla-inbound/rev/695049af7654
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:32.0) Gecko/20100101 Firefox/32.0 ID:20140505145245
Pushlog:
http://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=46d57b8f48ce&tochange=695049af7654

Regressed by: 
695049af7654	Dan Gohman — Bug 999580 - IonMonkey: Generalize RangeAnalysis truncation to handle other kinds of paths to integer types. r=nbp
Blocks: 999580
The bug# should be Bug998580 instead of 999580
Blocks: 998580
No longer blocks: 999580
Keywords: perf
OP here, I'm testing Crypto-JS again on Android and I get:

Firefox Mobile (31?): 118 +- 4.30

Firefox Mobile Beta (32?): 63.36 +- 38.78%

For me there is a significant regression also on the Crypto-JS benchmark.  Notice that the uncertainty is much larger on the 32 version, so the JIT might be a bit on and off.
Changing my mind.  The Crypto-JS results have a long "warmup period" on android, especially when my phone is low on battery.  I get anywhere from 30, to 60 in the beginning and then it tops out at 118 on version 31 and 108 on version 32, so there is no major perf regression.  Maybe a small one though.
You probably want to test performance with a fully charged battery and plugged into the power outlet, otherwise whatever power savings stuff Android and your CPU have will totally throw the results off - which is what that large error margin is trying to tell you.

I tested this and there is no performance difference between 31 and 32 on Crypto-JS, desktop nor mobile.
NI from sunfish based on comment 4.
Flags: needinfo?(jdemooij) → needinfo?(sunfish)
I can reproduce the slowdown at the revision isolated in comment 4 with the standalone script in comment 10.
Assignee: nobody → sunfish
Flags: needinfo?(sunfish)
I see some effects which appear to be related to bailouts.

In the fast version, IONFLAGS=bailouts prints just this:

[Bailouts] Took bailout! Snapshot offset: 919

In the slow version, it prints this:

[Bailouts] Took bailout! Snapshot offset: 919
[Bailouts] Took bailout! Snapshot offset: 1956
[Bailouts] Baseline info failure all-sjcl.js:593, inlined into all-sjcl.js:593
[Bailouts] Took bailout! Snapshot offset: 2033
[Bailouts] Baseline info failure all-sjcl.js:593, inlined into all-sjcl.js:593

Annoyingly, running the slow version with --ion-parallel-compile=off produces the same output same the fast version.
With this diff to the attached testcase,

--- all-sjcl.js.orig 2014-08-22 14:12:32.739372163 -0700
+++ all-sjcl.js 2014-08-22 14:58:23.007290901 -0700
@@ -682,7 +682,7 @@
       t1h += chh + ((t1l >>> 0) < (chl >>> 0) ? 1 : 0);
       t1l += krl;
       t1h += krh + ((t1l >>> 0) < (krl >>> 0) ? 1 : 0);
-      t1l += wrl;
+      t1l = t1l + wrl|0;
       t1h += wrh + ((t1l >>> 0) < (wrl >>> 0) ? 1 : 0);
 
       // t2 = sigma0 + maj

it goes fast.

It appears that the problem here is that we're speculating more of the types to be int32, which is generally a good thing for code like this, but we're still using bailout checks, and they're needlessly triggering. In the above diff, after the add, t1l temporarily has a value which is beyond the int32 range, but it is truncated at every use (sometimes indirectly), so it shouldn't need a bailout check.

It looks like there are two things in the way here. One is phi nodes. It's probably possible to propagate truncation through phi nodes, though I'll have to think about it. The other is resume points. replaceAllUsesWith is setting UseRemoved on the original instruction's operands, which makes range analysis be conservative about resume points. I think we can do something better there. I have a mock-up patch simulating both of these changes in my tree, and with this patch the unmodified benchmark goes fast, which is encouraging.
Here are the patches.

This first patch implements the truncation analysis for phis. In the SJCL code base, some of the important adds are used by phis that are then truncated. This patch allows the truncation to flow through phis to allow the adds to be truncated.
Attachment #8477921 - Flags: review?(nicolas.b.pierron)
This patch has some minor cleanups split out from the following patch.
Attachment #8477922 - Flags: review?(nicolas.b.pierron)
This is the patch reducing the usage of the UseRemoved flag.

With these three patches applied, the attached SJCL benchmark goes fast.

While looking at this benchmark, I also noticed that there were still some floating-point operations in what otherwise should have been purely integer code in the hot loop even after this patch series. I filed bug 1057779 to track this.
Attachment #8477923 - Flags: review?(nicolas.b.pierron)
Marking this as won't fix for 32 as it's too late to take a fix for a small perf regression (comment 7).
To clarify, comment 7 is about Crypto-JS. The main regression here is on SJCL (comment 2).
It's not a small regression (90%!) but we're already at the RC stage and the code hasn't even been in Nightly yet, so unfortunately this got reported a little too late to still be fixable for 32.
IIUC this regressed because of a correctness fix (bug 998580), and the improvements needed to get the desired performance legitimately (the patches here) are fairly involved. I doubt this would land on beta even if we were early in the cycle.
Comment on attachment 8477921 [details] [diff] [review]
truncate-phis.patch

Review of attachment 8477921 [details] [diff] [review]:
-----------------------------------------------------------------

Nice :)

::: js/src/jit/RangeAnalysis.cpp
@@ +2579,5 @@
> +            if (kind == MDefinition::TruncateAfterBailouts)
> +                op = MToInt32::New(alloc, truncated->getOperand(i));
> +            else
> +                op = MTruncateToInt32::New(alloc, truncated->getOperand(i));
> +            if (truncated->isPhi()) {

style-nit: add a new line above.
Attachment #8477921 - Flags: review?(nicolas.b.pierron) → review+
Comment on attachment 8477922 [details] [diff] [review]
gvn-more-misc.patch

Review of attachment 8477922 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/ValueNumbering.cpp
@@ +501,5 @@
> +                mozilla::DebugOnly<bool> r = deleteDef(def);
> +                MOZ_ASSERT(r, "deleteDef shouldn't have tried to add anything to the worklist, "
> +                              "so it shouldn't have failed");
> +                MOZ_ASSERT(deadDefs_.empty(),
> +                           "deleteDef shouldn't have added anything to the worklist");

I think the following comment makes it easier to understand why these assertions hold:

  deleteDef should not add anything to the deadDefs, as the redundant operation should have the same input operands.
Attachment #8477922 - Flags: review?(nicolas.b.pierron) → review+
Comment on attachment 8477923 [details] [diff] [review]
gvn-dont-set-use-removed.patch

Review of attachment 8477923 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/MIR.cpp
@@ +467,5 @@
>  void
>  MDefinition::replaceAllUsesWith(MDefinition *dom)
>  {
> +    for (size_t i = 0, e = numOperands(); i < e; i++)
> +        getOperand(i)->setUseRemovedUnchecked();

The flag setUseRemovedUnchecked is used to prevent optimizing code which might have been captured under a certain form, but due to the removal of one of the use cases, this form is no longer needed and could be optimized.

One of the example of such optimization is Range Analysis, where the removal of a path which does not expect a truncation can be removed, and a truncation can bubble up to an MDefinition which is captured by a ResumePoint.

The only case where we can do this kind of optimization is if the removed definition had no additional constraints compared to the remaining use cases.

::: js/src/jit/MIR.h
@@ +667,5 @@
>      }
>      void replaceAllUsesWith(MDefinition *dom);
>  
> +    // Like replaceAllUsesWith, but doesn't set UseRemoved on |this|'s operands.
> +    void justReplaceAllUsesWith(MDefinition *dom);

"just" reflects the lack of additional constraints added by |this|.  Maybe a better name could be something which express the fact that |this| is redundant or accept anything as operands.
Attachment #8477923 - Flags: review?(nicolas.b.pierron) → review+
https://hg.mozilla.org/integration/mozilla-inbound/rev/61f05ae95aa4
https://hg.mozilla.org/integration/mozilla-inbound/rev/afac7b1435bc
https://hg.mozilla.org/integration/mozilla-inbound/rev/0cf223c85b3b

I left the name justReplaceAllUsesWith for now. It's actually not just about |this| being redundant or accepting anything as operands. In GVN's case, GVN is setting UseRemoved manually when it needs to, so it doesn't need or want replaceAllUsesWith to do it. I'm open to renaming it though, if we can think of a better name.

In the long term, I'd like to get replaceAllUsesWith out of the UseRemoved business altogether, since it doesn't have enough information to really get it right. For now, justReplaceAllUsesWith marks those places which are already ok with this.
Depends on: 1062612
Duplicate of this bug: 1061428
I'm fairly sure the flags change was wrong here.
Dan, 33 (beta) and 34 (aurora) are affected, if you don't think it is too risky, could you fill an uplift request? Thanks
Flags: needinfo?(sunfish)
Gian-Carlo, yes, I made a mistake, sorry :)
There was a bug in one of my patches here which is bug 1062612. The fix is simple, and I just checked it into mozilla-inbound. Once that's landed on mozilla-central, I'll nominate the patches for uplift.
Depends on: 1063064
Depends on: 1063379
Depends on: 1063653
Also waiting for resolution on bug 1063653 before proceeding.
Dan, seems ok now. Could you fill the uplift request this (Monday) morning to make sure the fix arrive in beta4?
thanks
Comment on attachment 8477921 [details] [diff] [review]
truncate-phis.patch

Approval Request Comment
[Feature/regressing bug #]:

bug 998580

[User impact if declined]:

90% slowdown on SJCL SHA-512

[Describe test coverage new/current, TBPL]:

TBPL on mozilla-central

[Risks and why]: 

The patch itself is fairly straightforward, but though it did cause bug 1062612, which while trivial to fix, was surprising for having passed through testing.

[String/UUID change made/needed]:

None.
Attachment #8477921 - Flags: approval-mozilla-beta?
Attachment #8477921 - Flags: approval-mozilla-aurora?
Comment on attachment 8477922 [details] [diff] [review]
gvn-more-misc.patch

Approval Request Comment
[Feature/regressing bug #]:

bug 998580

[User impact if declined]:

90% slowdown on SJCL SHA-512

[Describe test coverage new/current, TBPL]:

TBPL on mozilla-central

[Risks and why]: 

This patch is just a cleanup in preparation for the next patch. It is low-risk.

[String/UUID change made/needed]:

None.
Attachment #8477922 - Flags: approval-mozilla-beta?
Attachment #8477922 - Flags: approval-mozilla-aurora?
Comment on attachment 8477923 [details] [diff] [review]
gvn-dont-set-use-removed.patch

Approval Request Comment
[Feature/regressing bug #]:

bug 998580

[User impact if declined]:

90% slowdown on SJCL SHA-512

[Describe test coverage new/current, TBPL]:

TBPL on mozilla-central

[Risks and why]: 

This patch interacts with some complex code, and did participate in bug 1063653. That said, the patch itself is fairly straightforward.

[String/UUID change made/needed]:

None.
Attachment #8477923 - Flags: approval-mozilla-beta?
Attachment #8477923 - Flags: approval-mozilla-aurora?
Flags: needinfo?(sunfish)
Attachment #8477921 - Flags: approval-mozilla-beta?
Attachment #8477921 - Flags: approval-mozilla-beta+
Attachment #8477921 - Flags: approval-mozilla-aurora?
Attachment #8477921 - Flags: approval-mozilla-aurora+
Attachment #8477922 - Flags: approval-mozilla-beta?
Attachment #8477922 - Flags: approval-mozilla-beta+
Attachment #8477922 - Flags: approval-mozilla-aurora?
Attachment #8477922 - Flags: approval-mozilla-aurora+
Attachment #8477923 - Flags: approval-mozilla-beta?
Attachment #8477923 - Flags: approval-mozilla-beta+
Attachment #8477923 - Flags: approval-mozilla-aurora?
Attachment #8477923 - Flags: approval-mozilla-aurora+
Flags: qe-verify+
Testing performed on Windows 7 64-bit (Intel Core 2 Duo 2.93Ghz) by running the benchmark [1] on:
 * Nightly 35.0a1 (2014-09-25) - sjcl: 847, ±4.97%, 62% slower
 * Aurora 34.0a2 (2014-09-25) - sjcl: 903, ±5.96%, 59% slower
 * Beta 33.0b7 - sjcl: 742, ±4.96%, 66% slower

I believe that the lower values are caused by my hardware/platform, as the benchmark displayed a higher ops/sec for an Ubuntu x86 machine with AMD FX(tm)-8320 Eight-Core Processor: e.g. Nightly 35 - 1,062,  ±6.26%, 57% slower.

Marking this fix as verified. Dan, please let me know if you think otherwise regarding this data.


[1] http://jsperf.com/sha-512/3
I am not sure we can consider that a going from 90 => 60 % slower can qualify this bug as closed.
(In reply to Sylvestre Ledru [:sylvestre] from comment #38)
> I am not sure we can consider that a going from 90 => 60 % slower can
> qualify this bug as closed.

I get numbers that warrant closing as fixed (all numbers with current Nightly versions):

On my 1st-gen 2.7Ghz 15" rMBP, I get an sjcl score of 2006 (with TweetNaCl.js at 4301).

On my Galaxy S4 I get an sjcl score of 161 (with TweetNaCl.js at 201).

These scores are (very roughly) on par with Chrome's on the same machines.

I think the reason for the numbers is that they're done on a different machine. Ideally, a comparison would be made on the same machine between current builds and old builds without the fixes.
You need to log in before you can comment on or make changes to this bug.