Closed Bug 673314 Opened 13 years ago Closed 13 years ago

Multiply instructions are not generated on Arm

Categories

(Core :: JavaScript Engine, defect)

ARM
Linux
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla8

People

(Reporter: mjrosenb, Assigned: mjrosenb)

References

(Blocks 1 open bug)

Details

(Keywords: mobile, perf, Whiteboard: [inbound])

Attachments

(2 files, 1 obsolete file)

While looking into why this test case is 60 times slower on arm than on x86, I discovered that the jitted code does not contain any floating point multiply instructions (I suspect it would not contain integer multiply either). If you want to run this test case, you probably want to cut down on the number of iterations that it performs, since currently, depending on the flags that are passed to the shell, it takes between 6 and 30 seconds on x86. There is a comment in the relevant function (mjit::Compiler::jsop_binary), which states: /* ARM cannot detect integer overflow with multiplication. */ We should a) implement this for fp only multiplies, and b) attempt to handle this for integers as well. my proposal is (approximately) for c <- a * b we do SMULL tmp,c, a, b EORS tmp, tmp, c ASR 31 For those people that don't know ARM assembly, SMULL is a 32*32 -> 64 bit multiply. In this case, it stores the lower 32 bits into the register "c", and the upper 32 bits into the register "tmp". For multiplies that don't overflow, tmp should just be a sign extension of c. The second instruction shifts c right by 31, so it will be the sign bit of c repeated 32 times, exactly what will be in tmp if the multiply did not overflow. The exclusive or will then leave tmp 0 if there was no overflow, and non-zero in all other cases. EORS can also be replaced with TEQ, which sets the condition codes as if an exclusive or had been performed, but not actually storing the result anywhere.
Could be a duplicate of bug 586267, though that one doesn't cover floating-point.
It looks like all of the code to generate the SMULL was already there, but was just not turned on. We pass the basic tests with this patch applied, and the original test drops from 60x slower to 45x slower. Not ideal, but it is a start.
No longer blocks: 673000
Since I don't have commit access yet, if you could push this to the try-server, and then to a repo, that would be incredibly useful
Attachment #548313 - Attachment is obsolete: true
Attachment #549003 - Flags: review?(adrake)
Comment on attachment 549003 [details] [diff] [review] Slightly prettier patch (does not have #if foo || 1) Review of attachment 549003 [details] [diff] [review]: ----------------------------------------------------------------- Looks good! One tiny style nit: ::: js/src/methodjit/FastArithmetic.cpp @@ +246,5 @@ > * This is temporary while ops are still being implemented. > */ > if ((op == JSOP_MOD) || > (lhs->isTypeKnown() && (lhs->getKnownType() > JSVAL_UPPER_INCL_TYPE_OF_NUMBER_SET)) || > + (rhs->isTypeKnown() && (rhs->getKnownType() > JSVAL_UPPER_INCL_TYPE_OF_NUMBER_SET))) { Nit: Curly braces on multi-line ifs should get their own line.
Attachment #549003 - Flags: review?(adrake) → review+
To save time, I corrected that nit and pushed to try. I'll push to inbound as soon as the tryserver looks good.
Try run for 3b2166d7015e is complete. Detailed breakdown of the results available here: http://tbpl.mozilla.org/?tree=Try&rev=3b2166d7015e Results: success: 12 Total buildrequests: 12 Builds available at http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/drakedevel@gmail.com-3b2166d7015e
Keywords: mobile, perf
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla8
Did this have any effect on benchmarks?
(In reply to comment #9) > Did this have any effect on benchmarks? Yes! We saw a 3.36% improvement in Talos Sunspider NoChrome on N900 when this landed.
Assignee: general → mrosenberg
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: