Note: There are a few cases of duplicates in user autocompletion which are being worked on.

Multiply instructions are not generated on Arm

RESOLVED FIXED in mozilla8

Status

()

Core
JavaScript Engine
RESOLVED FIXED
6 years ago
6 years ago

People

(Reporter: mjrosenb, Assigned: mjrosenb)

Tracking

(Blocks: 1 bug, {mobile, perf})

unspecified
mozilla8
ARM
Linux
mobile, perf
Points:
---

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [inbound])

Attachments

(2 attachments, 1 obsolete attachment)

(Assignee)

Description

6 years ago
Created attachment 547577 [details]
simple test that runs 60 times slower on ARM than on x86

While looking into why this test case is 60 times slower on arm than on x86, I discovered that the jitted code does not contain any floating point multiply instructions (I suspect it would not contain integer multiply either).
If you want to run this test case, you probably want to cut down on the number of iterations that it performs, since currently, depending on the flags that are passed to the shell, it takes between 6 and 30 seconds on x86.
There is a comment in the relevant function (mjit::Compiler::jsop_binary), which states:

 /* ARM cannot detect integer overflow with multiplication. */

We should a) implement this for fp only multiplies, and b) attempt to handle this for integers as well.
my proposal is (approximately) for 
c <- a * b
we do
SMULL tmp,c, a, b
EORS tmp, tmp, c ASR 31

For those people that don't know ARM assembly, SMULL is a 32*32 -> 64 bit multiply.  In this case, it stores the lower 32 bits into the register "c", and the upper 32 bits into the register "tmp".  For multiplies that don't overflow, tmp should just be a sign extension of c.  The second instruction shifts c right by 31, so it will be the sign bit of c repeated 32 times, exactly what will be in tmp if the multiply did not overflow.  The exclusive or will then leave tmp 0 if there was no overflow, and non-zero in all other cases.
EORS can also be replaced with TEQ, which sets the condition codes as if an exclusive or had been performed, but not actually storing the result anywhere.
Could be a duplicate of bug 586267, though that one doesn't cover floating-point.
(Assignee)

Comment 2

6 years ago
Created attachment 548313 [details] [diff] [review]
remove preprocessor directives that disable mul generation on arm

It looks like all of the code to generate the SMULL was already there, but was just not turned on.  We pass the basic tests with this patch applied, and the original test drops from 60x slower to 45x slower.  Not ideal, but it is a start.
Blocks: 674274
No longer blocks: 673000
(Assignee)

Comment 3

6 years ago
Created attachment 549003 [details] [diff] [review]
Slightly prettier patch (does not have #if foo || 1)

Since I don't have commit access yet, if you could push this to the try-server, and then to a repo, that would be incredibly useful
Attachment #548313 - Attachment is obsolete: true
Attachment #549003 - Flags: review?(adrake)
Comment on attachment 549003 [details] [diff] [review]
Slightly prettier patch (does not have #if foo || 1)

Review of attachment 549003 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good! One tiny style nit:

::: js/src/methodjit/FastArithmetic.cpp
@@ +246,5 @@
>       * This is temporary while ops are still being implemented.
>       */
>      if ((op == JSOP_MOD) ||
>          (lhs->isTypeKnown() && (lhs->getKnownType() > JSVAL_UPPER_INCL_TYPE_OF_NUMBER_SET)) ||
> +        (rhs->isTypeKnown() && (rhs->getKnownType() > JSVAL_UPPER_INCL_TYPE_OF_NUMBER_SET))) {

Nit: Curly braces on multi-line ifs should get their own line.
Attachment #549003 - Flags: review?(adrake) → review+
To save time, I corrected that nit and pushed to try. I'll push to inbound as soon as the tryserver looks good.

Comment 6

6 years ago
Try run for 3b2166d7015e is complete.
Detailed breakdown of the results available here:
    http://tbpl.mozilla.org/?tree=Try&rev=3b2166d7015e
Results:
    success: 12
Total buildrequests: 12
Builds available at http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/drakedevel@gmail.com-3b2166d7015e
http://hg.mozilla.org/integration/mozilla-inbound/rev/6591070ab016
Whiteboard: [inbound]
Keywords: mobile, perf
http://hg.mozilla.org/mozilla-central/rev/6591070ab016
Status: NEW → RESOLVED
Last Resolved: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla8
Did this have any effect on benchmarks?
(In reply to comment #9)
> Did this have any effect on benchmarks?

Yes! We saw a 3.36% improvement in Talos Sunspider NoChrome on N900 when this landed.
Duplicate of this bug: 586267
Assignee: general → mrosenberg
You need to log in before you can comment on or make changes to this bug.