Last Comment Bug 673314 - Multiply instructions are not generated on Arm
: Multiply instructions are not generated on Arm
Status: RESOLVED FIXED
[inbound]
: mobile, perf
Product: Core
Classification: Components
Component: JavaScript Engine (show other bugs)
: unspecified
: ARM Linux
: -- normal (vote)
: mozilla8
Assigned To: Marty Rosenberg [:mjrosenb]
:
Mentors:
: 586267 (view as bug list)
Depends on:
Blocks: ARMJSPerf
  Show dependency treegraph
 
Reported: 2011-07-21 17:40 PDT by Marty Rosenberg [:mjrosenb]
Modified: 2011-08-02 11:06 PDT (History)
5 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
simple test that runs 60 times slower on ARM than on x86 (544 bytes, application/javascript)
2011-07-21 17:40 PDT, Marty Rosenberg [:mjrosenb]
no flags Details
remove preprocessor directives that disable mul generation on arm (2.16 KB, patch)
2011-07-25 15:48 PDT, Marty Rosenberg [:mjrosenb]
no flags Details | Diff | Splinter Review
Slightly prettier patch (does not have #if foo || 1) (2.79 KB, patch)
2011-07-27 18:55 PDT, Marty Rosenberg [:mjrosenb]
adrake: review+
Details | Diff | Splinter Review

Description Marty Rosenberg [:mjrosenb] 2011-07-21 17:40:44 PDT
Created attachment 547577 [details]
simple test that runs 60 times slower on ARM than on x86

While looking into why this test case is 60 times slower on arm than on x86, I discovered that the jitted code does not contain any floating point multiply instructions (I suspect it would not contain integer multiply either).
If you want to run this test case, you probably want to cut down on the number of iterations that it performs, since currently, depending on the flags that are passed to the shell, it takes between 6 and 30 seconds on x86.
There is a comment in the relevant function (mjit::Compiler::jsop_binary), which states:

 /* ARM cannot detect integer overflow with multiplication. */

We should a) implement this for fp only multiplies, and b) attempt to handle this for integers as well.
my proposal is (approximately) for 
c <- a * b
we do
SMULL tmp,c, a, b
EORS tmp, tmp, c ASR 31

For those people that don't know ARM assembly, SMULL is a 32*32 -> 64 bit multiply.  In this case, it stores the lower 32 bits into the register "c", and the upper 32 bits into the register "tmp".  For multiplies that don't overflow, tmp should just be a sign extension of c.  The second instruction shifts c right by 31, so it will be the sign bit of c repeated 32 times, exactly what will be in tmp if the multiply did not overflow.  The exclusive or will then leave tmp 0 if there was no overflow, and non-zero in all other cases.
EORS can also be replaced with TEQ, which sets the condition codes as if an exclusive or had been performed, but not actually storing the result anywhere.
Comment 1 Jacob Bramley [:jbramley] 2011-07-22 00:18:02 PDT
Could be a duplicate of bug 586267, though that one doesn't cover floating-point.
Comment 2 Marty Rosenberg [:mjrosenb] 2011-07-25 15:48:02 PDT
Created attachment 548313 [details] [diff] [review]
remove preprocessor directives that disable mul generation on arm

It looks like all of the code to generate the SMULL was already there, but was just not turned on.  We pass the basic tests with this patch applied, and the original test drops from 60x slower to 45x slower.  Not ideal, but it is a start.
Comment 3 Marty Rosenberg [:mjrosenb] 2011-07-27 18:55:27 PDT
Created attachment 549003 [details] [diff] [review]
Slightly prettier patch (does not have #if foo || 1)

Since I don't have commit access yet, if you could push this to the try-server, and then to a repo, that would be incredibly useful
Comment 4 Andrew Drake [:adrake] 2011-07-28 09:10:51 PDT
Comment on attachment 549003 [details] [diff] [review]
Slightly prettier patch (does not have #if foo || 1)

Review of attachment 549003 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good! One tiny style nit:

::: js/src/methodjit/FastArithmetic.cpp
@@ +246,5 @@
>       * This is temporary while ops are still being implemented.
>       */
>      if ((op == JSOP_MOD) ||
>          (lhs->isTypeKnown() && (lhs->getKnownType() > JSVAL_UPPER_INCL_TYPE_OF_NUMBER_SET)) ||
> +        (rhs->isTypeKnown() && (rhs->getKnownType() > JSVAL_UPPER_INCL_TYPE_OF_NUMBER_SET))) {

Nit: Curly braces on multi-line ifs should get their own line.
Comment 5 Andrew Drake [:adrake] 2011-07-28 09:35:43 PDT
To save time, I corrected that nit and pushed to try. I'll push to inbound as soon as the tryserver looks good.
Comment 6 Mozilla RelEng Bot 2011-07-28 12:00:32 PDT
Try run for 3b2166d7015e is complete.
Detailed breakdown of the results available here:
    http://tbpl.mozilla.org/?tree=Try&rev=3b2166d7015e
Results:
    success: 12
Total buildrequests: 12
Builds available at http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/drakedevel@gmail.com-3b2166d7015e
Comment 8 Marco Bonardo [::mak] (Away 6-20 Aug) 2011-07-29 03:08:51 PDT
http://hg.mozilla.org/mozilla-central/rev/6591070ab016
Comment 9 David Mandelin [:dmandelin] 2011-07-29 10:34:05 PDT
Did this have any effect on benchmarks?
Comment 10 Matt Brubeck (:mbrubeck) 2011-07-29 10:41:51 PDT
(In reply to comment #9)
> Did this have any effect on benchmarks?

Yes! We saw a 3.36% improvement in Talos Sunspider NoChrome on N900 when this landed.
Comment 11 Jacob Bramley [:jbramley] 2011-08-01 02:20:00 PDT
*** Bug 586267 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.