Closed Bug 551837 Opened 14 years ago Closed 14 years ago

Improve js_DoubleToECMAInt32 performance (on ARM).

Categories

(Core :: JavaScript Engine, defect)

ARM
All
defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: jbramley, Assigned: jbramley)

Details

(Whiteboard: fixed-in-tracemonkey)

Attachments

(2 files)

Just by adding "__arm__" to the i386 #ifdef in js_DoubleToECMAInt32 seems to get us a 5% performance improvement on Cortex-A8. The i386 code in the block looks fairly generic so I don't see any harm in doing this (though it does require IEEE format doubles). It may also benefit x64, but I have no idea about how x64 handles floating-point and there may be some reason why it won't work.

5% seems surprisingly large (and may just indicate an anomalous result), so I want to investigate further before filing a patch, but there's certainly something we can do here. Also, if the function really does take as much profile time as the improvement would suggest, it may be worth hand-writing some assembly to do this (unless the compiler output is already good).

Also, js_DoubleToECMAUint32 is a potential target, so I'll have a look at that one too.
I should add that the 5% improvement is on SunSpider on the JS shell, using Trace Monkey.
No such luck; I get some failing trace-tests with this, so there must be something non-portable in the i386 code. I will investigate and provide something similar for ARM.
* I don't like using assembly in this way, particularly outside the back-ends, but I couldn't get close to the performance of this routine using pure C. I do have a C version which still provides an excellent improvement over the slow path, and it should also work for any little-endian IEEE-754 system.

 * I've added many comments to my assembly to try to explain what's going on, but this does make the code look somewhat long (even though the output is tiny). Suggestions are welcome! The comments are important; I don't want this to be a write-only routine, and this kind of assembly algorithm can get unreadable pretty quickly.

 * I get a 5.5% performance improvement on A8 with this patch. That's rather surprising, considering that it's only one function, but I won't complain! (My pure C implementation is 3.9% faster than the slow-path on A8.)

 * I tested the routine in a standalone harness to ensure that all the code-paths were exercised. It appears functionally equivalent to the slow path.
Attachment #441050 - Flags: review?(vladimir)
Very nice. Do you think we should use your C version on Intel as well?
(In reply to comment #4)
> Very nice. Do you think we should use your C version on Intel as well?

It's unlikely to outperform the optimized i386 version, but it might be worth a try. I've attached it for reference, but note that it isn't at all polished in its current state, and it's in the form of a C function, not a patch. (I didn't generate a patch when I tested it in the JS engine, but the code is the same.)
Comment on attachment 441050 [details] [diff] [review]
Optimized DoubleToECMAInt32 for ARM.

Looks great to me -- should figure out what to do with the C version, since it sounds like that could be useful on non-x86 platforms where we don't have an optimized version.
Attachment #441050 - Flags: review?(vladimir) → review+
(In reply to comment #6)
> [...] should figure out what to do with the C version, since it
> sounds like that could be useful on non-x86 platforms where we don't have an
> optimized version.

Yep, indeed, though I'm concerned about introducing many code paths to maintain and test. We now have i386, ARM, generic, and potentially a IEEE-754 routine, and if we end up optimizing other routines we will quickly bloat the source files with #if blocks.

I think the best way to handle this, if it becomes a common optimization trick, would be to have something equivalent to the NativeARCH files so that fast-paths can be implemented for each platform. We can combine that with macros to allow tidy fall-backs to slow, but platform-agnostic paths. In any case, that's a subject for another bug, so I'll get this pushed as-is for now.
Whiteboard: fixed-in-tracemonkey
http://hg.mozilla.org/mozilla-central/rev/e8dfe85fca00
Status: ASSIGNED → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: