Open Bug 624299 Opened 14 years ago Updated 2 months ago

2x slower than v8 on recursion+scope chain testcase

Categories

(Core :: JavaScript Engine, enhancement, P3)

enhancement

Tracking

()

People

(Reporter: bzbarsky, Unassigned)

References

(Blocks 1 open bug, )

Details

(Whiteboard: [js:t] [js:perf])

Attachments

(2 files, 2 obsolete files)

See bug 614834 comment 27. The testcase in question is in the url field.
The ratio improved, but we're still slowest here (all numbers on my rMBP@2.7Ghz): SpiderMonkey: 0.58 0.56 0.565 0.56 0.5575 JSC: 0.34 0.33 0.32 0.3175 0.3175 d8: 0.3 0.29 0.295 0.3325 0.29625
OS: Mac OS X → All
Hardware: x86 → All
Summary: 4x slower than v8 on recursion+scope chain testcase → 2x slower than v8 on recursion+scope chain testcase
Whiteboard: [js:t] [js:perf]
Assignee: general → nobody
Firefox 33 is faster than Chrome 39 for me. Firefox goes from 0.60 to 0.45 and Chrome goes from 0.70 to 0.55
For me (same setup as in comment 1), we're still slowest (and note the progress JSC has made): SpiderMonkey: 0.46 0.44 0.45 0.4525 0.4575 JSC: 0.22 0.2 0.21 0.2175 0.215 d8: 0.26 0.31 0.28 0.2775 0.2625 Current Nightly and Canary also reflect this. Safari is about 50% slower than JSC, but still faster than us.
This is a lot faster on 32-bit. On OS X I get 0.23-0.26 ms with an x86 build, 0.39-0.42 with an x64 build. Could be our boxing format or us spilling more registers somewhere, we should investigate.
Attached file 32-bit JIT Inspector output (obsolete) —
Attachment #8527735 - Attachment is obsolete: true
Attachment #8527736 - Attachment is obsolete: true
Some thoughts in no particular order: 1) The overall time or the testcase on 32-bit is about 0.25 * (50 + 100 + 200 + 400 + 800) = 387.5ms. The x86-64 times are about 2x that, in the 800-900ms range. So we need to account for about 400-500 ms of runtime. 2) The testcase executes about 300e6 Unbox:Int32 instructions. On x86, there's nothing to do for these if we know we have an int. On x86-64, these correspond to a single movl. What this means on the hardware, I don't know, but if we assume that takes one cycle, that's 300e6 cycles, the CPU is at 2.6GHz, so about 115ms. But worse yet, in some of these cases we don't know we have an int. In that case, on 32-bit we get things like: [MoveGroup] movl %edx, %eax [Unbox:Int32] cmpl $0xffffff81, %ecx jne ((366)) And on 64-bit we get: [Unbox:Int32] movq %rcx, %r11 shrq $47, %r11 cmpl $0x1fff1, %r11d jne ((383)) movl %ecx, %eax So that's an extra move and shift, though on 32-bit presumably we paid part of that cost when we initially placed the high 32 bits of the Value in ecx. 3) On X86-64 there's an extra MoveGroup before the first CallKnown. But the actual call is cheaper, and in any case there aren't _that_ many CallKnowns here (about 75e6). So my money is that the main culprit here is the Unbox:Int32 bits.
(In reply to Please do not ask for reviews for a bit [:bz] from comment #9) > So my money is that the main culprit here is the Unbox:Int32 bits. Yes, I have a patch for x64 Unbox that gets us close to the 32-bit numbers. Will post soon, after testing what it does on some other benchmarks.
Depends on: 1104199
(In reply to Jan de Mooij [:jandem] from comment #10) > Yes, I have a patch for x64 Unbox that gets us close to the 32-bit numbers. > Will post soon, after testing what it does on some other benchmarks. Bug 1104199. With the patch there: x64 before: 0.44, 0.38, 0.425, 0.4, 0.4075 x64 after: 0.28, 0.27, 0.245, 0.26, 0.25125 x86: 0.24, 0.23, 0.25, 0.2475, 0.23625 d8 x64: 0.26, 0.23, 0.235, 0.2425, 0.22625
Severity: normal → S3

Nightly: 0.312s
Chrome: 0.18s

So we are still 2x slower here.

Blocks: sm-js-perf
No longer blocks: WebJSPerf
Severity: S3 → N/A
Type: defect → enhancement
Priority: -- → P3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: