Investigate 64-bit optimizations

NEW
Unassigned

Status

()

Core
JavaScript Engine
--
enhancement
5 years ago
22 days ago

People

(Reporter: Yoric, Unassigned, NeedInfo)

Tracking

(Depends on: 2 bugs, Blocks: 1 bug, {perf})

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [js:p2])

According to Are We Fast Yet, it seems that our performance goes down when executed in 64-bit mode. Let's make it go up, rather :)
Source: as of today, we are slower in 64-bits on v8bench and kraken. Slightly faster on spidermonkey bench, but outperformed by other engines.
Blocks: 705294
Random idea: since we have 64 bits and only really require 32 bits for arithmetics, we could use the remaining 32 bits for storing the tag. This will surely require a custom allocator.
Depends on: 786294
Depends on: 787292
Whiteboard: [js:p2]
Blocks: 702968
64-bit looks faster on AWFY now:

Mac OS X 64-bit (Mac Pro):
kraken: 1540
sunspider: 180
octane: 15565

Mac OS X 32-bit (Mac Pro):
kraken: 1654
sunspider: 189
octane: 17033
Till pointed out that for octane, higher scores are better, so 64-bit is still slower on octane.
It would be interesting to know why JSC performance improves so much with 64bit while ours don't change much (and with Octane is ever worse). For example with Octane-Box2D we worsen by 5000 points while JSC improves by 5000 points (but before a regression around Jun 6 our performance was almost the same between 32bit and 64bit).

There are some tests where we lose to JSC but that we would win even if the performance stayed the same between 32bit and 64bit.
(Assignee)

Updated

3 years ago
Assignee: general → nobody

Comment 6

3 years ago
Windows 64-bit seems to have been prioritized. It would be important (I guess) to show improvements when changing from 32-bit to 64-bit, if possible. Deltablue, Earley-Boyer and Splay are clearly worse on 64-bit, a least on Mac on AWFY. Zlib, on the other hand, is a great example of improvement.
Hannes: Are the two computers comparable?  Is that the same hardware where one is running with x64 binaries and the other is running x86 binaries?
Flags: needinfo?(hv1989)
(In reply to Nicolas B. Pierron [:nbp] from comment #7)
> Hannes: Are the two computers comparable?  Is that the same hardware where
> one is running with x64 binaries and the other is running x86 binaries?

Yes. They run on the same computer on a 64bit system. One normally compiled, the other running cross compiled x86 binaries.

(In reply to Guilherme Lima from comment #6)
> Windows 64-bit seems to have been prioritized. It would be important (I
> guess) to show improvements when changing from 32-bit to 64-bit, if
> possible. Deltablue, Earley-Boyer and Splay are clearly worse on 64-bit, a
> least on Mac on AWFY. Zlib, on the other hand, is a great example of
> improvement.

One of the things that are quite severely worse on 64bit is the saving of registers (during calls in IM or when using PushRegsInMask). There are way more registers we need save. Vaguely remembering 4x more when looking at it long time ago. That could explain e.g. Earley-Boyer.
Flags: needinfo?(hv1989)
Marty, does your work to save only needed registers on ARM apply to all registers, or only the floating point ones? Is this something that could be used to improve x64 (when register pressure isn't high)?
Flags: needinfo?(mrosenberg)
It should be applicable on all registers, but I don't remember which registers it presently affects.
Flags: needinfo?(mrosenberg)

Comment 11

3 years ago
Another bug talking about the difference of performance between x64 and x86 is bug 802830.

Comment 12

2 years ago
Analyzing AWFY, we can see a few benchmarks where Firefox 64-bit is behind the 32-bit one:

Benchmark - 32-bit / 64-bit

Kraken:
Astar - 57ms / 67ms (17% worse)

Octane:
Box2D - 55600 / 53000 (5% worse)
EarleyBoyer - 43000 / 36000 (16% worse)
MandreelLatency - 37000 / 32000 (14% worse)
Raytrace - 133000 / 123000 (7% worse)
Richards - 34000 / 30500 (10% worse)
Splay - 23500 / 20500 (13% worse)
SplayLatency - 25500 / 22600 (11% worse)
Typescript - 33300 / 30000 (10% worse)

Misc:
basic-array-forof - 43ms / 61ms (42% worse)
basic-closures - 63ms / 198ms (214% worse) - bug 1147430
bugs-1131099-lodash1 - 60ms / 67ms (12% worse)
bugs-652377-jslint-on-jslint - 340ms / 378ms (11% worse)
typedobj-simple-struct-standard - 27ms / 39ms (44% worse)
typedobj-simple-struct-typedobj - 51ms / 62ms (21% worse)
typedobj-splay-standard - 580ms / 660ms (14% worse)
typedobj-splay-typedobj - 460ms / 540ms (17% worse)
typedobj-write-struct-field-standard - 35ms / 40ms (14% worse)

dart:
Richards - 1.35ms / 1.67ms (24% worse)

asmjs-ubench:
fasta - 8500ms / 9450ms (11% worse) - no asmjs
fbirds-polyfill - 260ms / 300ms (15% worse) - asmjs and no asmjs
mandelbrot-native - 290ms / 310ms (7% worse) - no asmjs
mandelbrot-polyfill - 366ms / 430ms (17% worse) - asmjs and no asmjs
memops - 5000ms / 6500ms (30% worse) - no asmjs

asmjs-apps:
box2d-loadtime - 220ms / 290ms (32% worse) - asmjs
luabinarytrees-loadtime - 160ms / 260ms (62% worse) - asmjs
luabinarytrees-loadtime - 270 / 295ms (9% worse) - no asmjs
luabinarytrees-throughput - 1780ms / 1860ms (4% worse) - no asmjs
zlib-loadtime - 240ms / 300ms (25% worse) - asmjs

Didn't check Sunspider for laziness, because maybe it doesn't matter that much and/or is the same problem as these benchmarks above.

Firefox 40 will be the first time Windows users will be offered a 64-bit version of Firefox officially.
See Also: → bug 1308506
Blocks: 558448
Depends on: 802830

Comment 13

5 months ago
Someone please add "PERF" key word. Thank you.

Updated

5 months ago
Keywords: perf

Comment 14

4 months ago
Didn't Michael Moy used to do a lot of work on 64 bit optimizations?
Flags: needinfo?(moy)
Depends on: 1308506
You need to log in before you can comment on or make changes to this bug.