Open
Bug 624299
Opened 14 years ago
Updated 2 years ago
2x slower than v8 on recursion+scope chain testcase
Categories
(Core :: JavaScript Engine, defect)
Core
JavaScript Engine
Tracking
()
NEW
People
(Reporter: bzbarsky, Unassigned)
References
()
Details
(Whiteboard: [js:t] [js:perf])
Attachments
(2 files, 2 obsolete files)
See bug 614834 comment 27. The testcase in question is in the url field.
Comment 1•11 years ago
|
||
The ratio improved, but we're still slowest here (all numbers on my rMBP@2.7Ghz): SpiderMonkey: 0.58 0.56 0.565 0.56 0.5575 JSC: 0.34 0.33 0.32 0.3175 0.3175 d8: 0.3 0.29 0.295 0.3325 0.29625
OS: Mac OS X → All
Hardware: x86 → All
Summary: 4x slower than v8 on recursion+scope chain testcase → 2x slower than v8 on recursion+scope chain testcase
Whiteboard: [js:t] [js:perf]
Assignee | ||
Updated•10 years ago
|
Assignee: general → nobody
Comment 2•10 years ago
|
||
Firefox 33 is faster than Chrome 39 for me. Firefox goes from 0.60 to 0.45 and Chrome goes from 0.70 to 0.55
Comment 3•10 years ago
|
||
For me (same setup as in comment 1), we're still slowest (and note the progress JSC has made): SpiderMonkey: 0.46 0.44 0.45 0.4525 0.4575 JSC: 0.22 0.2 0.21 0.2175 0.215 d8: 0.26 0.31 0.28 0.2775 0.2625 Current Nightly and Canary also reflect this. Safari is about 50% slower than JSC, but still faster than us.
Comment 4•10 years ago
|
||
This is a lot faster on 32-bit. On OS X I get 0.23-0.26 ms with an x86 build, 0.39-0.42 with an x64 build. Could be our boxing format or us spilling more registers somewhere, we should investigate.
Reporter | ||
Comment 5•10 years ago
|
||
Reporter | ||
Comment 6•10 years ago
|
||
Reporter | ||
Comment 7•10 years ago
|
||
Attachment #8527735 -
Attachment is obsolete: true
Reporter | ||
Comment 8•10 years ago
|
||
Attachment #8527736 -
Attachment is obsolete: true
Reporter | ||
Comment 9•10 years ago
|
||
Some thoughts in no particular order: 1) The overall time or the testcase on 32-bit is about 0.25 * (50 + 100 + 200 + 400 + 800) = 387.5ms. The x86-64 times are about 2x that, in the 800-900ms range. So we need to account for about 400-500 ms of runtime. 2) The testcase executes about 300e6 Unbox:Int32 instructions. On x86, there's nothing to do for these if we know we have an int. On x86-64, these correspond to a single movl. What this means on the hardware, I don't know, but if we assume that takes one cycle, that's 300e6 cycles, the CPU is at 2.6GHz, so about 115ms. But worse yet, in some of these cases we don't know we have an int. In that case, on 32-bit we get things like: [MoveGroup] movl %edx, %eax [Unbox:Int32] cmpl $0xffffff81, %ecx jne ((366)) And on 64-bit we get: [Unbox:Int32] movq %rcx, %r11 shrq $47, %r11 cmpl $0x1fff1, %r11d jne ((383)) movl %ecx, %eax So that's an extra move and shift, though on 32-bit presumably we paid part of that cost when we initially placed the high 32 bits of the Value in ecx. 3) On X86-64 there's an extra MoveGroup before the first CallKnown. But the actual call is cheaper, and in any case there aren't _that_ many CallKnowns here (about 75e6). So my money is that the main culprit here is the Unbox:Int32 bits.
Comment 10•10 years ago
|
||
(In reply to Please do not ask for reviews for a bit [:bz] from comment #9) > So my money is that the main culprit here is the Unbox:Int32 bits. Yes, I have a patch for x64 Unbox that gets us close to the 32-bit numbers. Will post soon, after testing what it does on some other benchmarks.
Comment 11•10 years ago
|
||
(In reply to Jan de Mooij [:jandem] from comment #10) > Yes, I have a patch for x64 Unbox that gets us close to the 32-bit numbers. > Will post soon, after testing what it does on some other benchmarks. Bug 1104199. With the patch there: x64 before: 0.44, 0.38, 0.425, 0.4, 0.4075 x64 after: 0.28, 0.27, 0.245, 0.26, 0.25125 x86: 0.24, 0.23, 0.25, 0.2475, 0.23625 d8 x64: 0.26, 0.23, 0.235, 0.2425, 0.22625
Updated•2 years ago
|
Severity: normal → S3
You need to log in
before you can comment on or make changes to this bug.
Description
•