Closed Bug 783578 Opened 9 years ago Closed 3 months ago
Investigate 64-bit optimizations
According to Are We Fast Yet, it seems that our performance goes down when executed in 64-bit mode. Let's make it go up, rather :)
Source: as of today, we are slower in 64-bits on v8bench and kraken. Slightly faster on spidermonkey bench, but outperformed by other engines.
Random idea: since we have 64 bits and only really require 32 bits for arithmetics, we could use the remaining 32 bits for storing the tag. This will surely require a custom allocator.
64-bit looks faster on AWFY now: Mac OS X 64-bit (Mac Pro): kraken: 1540 sunspider: 180 octane: 15565 Mac OS X 32-bit (Mac Pro): kraken: 1654 sunspider: 189 octane: 17033
Till pointed out that for octane, higher scores are better, so 64-bit is still slower on octane.
It would be interesting to know why JSC performance improves so much with 64bit while ours don't change much (and with Octane is ever worse). For example with Octane-Box2D we worsen by 5000 points while JSC improves by 5000 points (but before a regression around Jun 6 our performance was almost the same between 32bit and 64bit). There are some tests where we lose to JSC but that we would win even if the performance stayed the same between 32bit and 64bit.
Windows 64-bit seems to have been prioritized. It would be important (I guess) to show improvements when changing from 32-bit to 64-bit, if possible. Deltablue, Earley-Boyer and Splay are clearly worse on 64-bit, a least on Mac on AWFY. Zlib, on the other hand, is a great example of improvement.
Hannes: Are the two computers comparable? Is that the same hardware where one is running with x64 binaries and the other is running x86 binaries?
(In reply to Nicolas B. Pierron [:nbp] from comment #7) > Hannes: Are the two computers comparable? Is that the same hardware where > one is running with x64 binaries and the other is running x86 binaries? Yes. They run on the same computer on a 64bit system. One normally compiled, the other running cross compiled x86 binaries. (In reply to Guilherme Lima from comment #6) > Windows 64-bit seems to have been prioritized. It would be important (I > guess) to show improvements when changing from 32-bit to 64-bit, if > possible. Deltablue, Earley-Boyer and Splay are clearly worse on 64-bit, a > least on Mac on AWFY. Zlib, on the other hand, is a great example of > improvement. One of the things that are quite severely worse on 64bit is the saving of registers (during calls in IM or when using PushRegsInMask). There are way more registers we need save. Vaguely remembering 4x more when looking at it long time ago. That could explain e.g. Earley-Boyer.
Marty, does your work to save only needed registers on ARM apply to all registers, or only the floating point ones? Is this something that could be used to improve x64 (when register pressure isn't high)?
It should be applicable on all registers, but I don't remember which registers it presently affects.
Another bug talking about the difference of performance between x64 and x86 is bug 802830.
Analyzing AWFY, we can see a few benchmarks where Firefox 64-bit is behind the 32-bit one: Benchmark - 32-bit / 64-bit Kraken: Astar - 57ms / 67ms (17% worse) Octane: Box2D - 55600 / 53000 (5% worse) EarleyBoyer - 43000 / 36000 (16% worse) MandreelLatency - 37000 / 32000 (14% worse) Raytrace - 133000 / 123000 (7% worse) Richards - 34000 / 30500 (10% worse) Splay - 23500 / 20500 (13% worse) SplayLatency - 25500 / 22600 (11% worse) Typescript - 33300 / 30000 (10% worse) Misc: basic-array-forof - 43ms / 61ms (42% worse) basic-closures - 63ms / 198ms (214% worse) - bug 1147430 bugs-1131099-lodash1 - 60ms / 67ms (12% worse) bugs-652377-jslint-on-jslint - 340ms / 378ms (11% worse) typedobj-simple-struct-standard - 27ms / 39ms (44% worse) typedobj-simple-struct-typedobj - 51ms / 62ms (21% worse) typedobj-splay-standard - 580ms / 660ms (14% worse) typedobj-splay-typedobj - 460ms / 540ms (17% worse) typedobj-write-struct-field-standard - 35ms / 40ms (14% worse) dart: Richards - 1.35ms / 1.67ms (24% worse) asmjs-ubench: fasta - 8500ms / 9450ms (11% worse) - no asmjs fbirds-polyfill - 260ms / 300ms (15% worse) - asmjs and no asmjs mandelbrot-native - 290ms / 310ms (7% worse) - no asmjs mandelbrot-polyfill - 366ms / 430ms (17% worse) - asmjs and no asmjs memops - 5000ms / 6500ms (30% worse) - no asmjs asmjs-apps: box2d-loadtime - 220ms / 290ms (32% worse) - asmjs luabinarytrees-loadtime - 160ms / 260ms (62% worse) - asmjs luabinarytrees-loadtime - 270 / 295ms (9% worse) - no asmjs luabinarytrees-throughput - 1780ms / 1860ms (4% worse) - no asmjs zlib-loadtime - 240ms / 300ms (25% worse) - asmjs Didn't check Sunspider for laziness, because maybe it doesn't matter that much and/or is the same problem as these benchmarks above. Firefox 40 will be the first time Windows users will be offered a 64-bit version of Firefox officially.
Someone please add "PERF" key word. Thank you.
Didn't Michael Moy used to do a lot of work on 64 bit optimizations?
Status: NEW → RESOLVED
Closed: 3 months ago
Resolution: --- → INCOMPLETE
You need to log in before you can comment on or make changes to this bug.