Open Bug 624299 Opened 14 years ago Updated 2 years ago

2x slower than v8 on recursion+scope chain testcase

Tracking

()

Status:

NEW

People

(Reporter: bzbarsky, Unassigned)

References

(
URL
)

Details

(Whiteboard: [js:t] [js:perf])

Attachments

(2 files, 2 obsolete files)

64-bit JIT Inspector output for the testcase 10 years ago Boris Zbarsky [:bzbarsky] 5.48 KB, text/plain		Details
32-bit JIT Inspector output 10 years ago Boris Zbarsky [:bzbarsky] 5.45 KB, text/plain		Details
64-bit JIT Inspector output for the testcase 10 years ago Boris Zbarsky [:bzbarsky] 5.63 KB, text/plain		Details
32-bit JIT Inspector output 10 years ago Boris Zbarsky [:bzbarsky] 5.59 KB, text/plain		Details

Boris Zbarsky [:bzbarsky]

Reporter

Description

•

14 years ago

See bug 614834 comment 27.  The testcase in question is in the url field.

David Mandelin [:dmandelin]

Updated

•

14 years ago

Blocks: WebJSPerf

Till Schneidereit [:till]

Comment 1

•

11 years ago

The ratio improved, but we're still slowest here (all numbers on my rMBP@2.7Ghz):

SpiderMonkey:
0.58
0.56
0.565
0.56
0.5575

JSC:
0.34
0.33
0.32
0.3175
0.3175

d8:
0.3
0.29
0.295
0.3325
0.29625

OS: Mac OS X → All

Hardware: x86 → All

Summary: 4x slower than v8 on recursion+scope chain testcase → 2x slower than v8 on recursion+scope chain testcase

Whiteboard: [js:t] [js:perf]

Nobody; OK to take it and work on it

Assignee

Updated

•

10 years ago

Assignee: general → nobody

Guilherme Lima

Comment 2

•

10 years ago

Firefox 33 is faster than Chrome 39 for me.

Firefox goes from 0.60 to 0.45 and Chrome goes from 0.70 to 0.55

Till Schneidereit [:till]

Comment 3

•

10 years ago

For me (same setup as in comment 1), we're still slowest (and note the progress JSC has made):

SpiderMonkey:
0.46
0.44
0.45
0.4525
0.4575

JSC:
0.22
0.2
0.21
0.2175
0.215

d8:
0.26
0.31
0.28
0.2775
0.2625

Current Nightly and Canary also reflect this. Safari is about 50% slower than JSC, but still faster than us.

Jan de Mooij [:jandem]

Comment 4

•

10 years ago

This is a lot faster on 32-bit. On OS X I get 0.23-0.26 ms with an x86 build, 0.39-0.42 with an x64 build.

Could be our boxing format or us spilling more registers somewhere, we should investigate.

Boris Zbarsky [:bzbarsky]

Reporter

Comment 5

•

10 years ago

Attached file 64-bit JIT Inspector output for the testcase (obsolete) — Details

Boris Zbarsky [:bzbarsky]

Reporter

Comment 6

•

10 years ago

Attached file 32-bit JIT Inspector output (obsolete) — Details

Boris Zbarsky [:bzbarsky]

Reporter

Comment 7

•

10 years ago

Attached file 64-bit JIT Inspector output for the testcase — Details

Attachment #8527735 - Attachment is obsolete: true

Boris Zbarsky [:bzbarsky]

Reporter

Comment 8

•

10 years ago

Attached file 32-bit JIT Inspector output — Details

Attachment #8527736 - Attachment is obsolete: true

Boris Zbarsky [:bzbarsky]

Reporter

Comment 9

•

10 years ago

Some thoughts in no particular order:

1)  The overall time or the testcase on 32-bit is about 0.25 * (50 + 100 + 200 + 400 + 800) = 387.5ms.  The x86-64 times are about 2x that, in the 800-900ms range.  So we need to account for about 400-500 ms of runtime.

2)  The testcase executes about 300e6 Unbox:Int32 instructions.  On x86, there's nothing to do for these if we know we have an int.  On x86-64, these correspond to a single movl.  What this means on the hardware, I don't know, but if we assume that takes one cycle, that's 300e6 cycles, the CPU is at 2.6GHz, so about 115ms.  But worse yet, in some of these cases we don't know we have an int.  In that case, on 32-bit we get things like:

[MoveGroup]
    movl       %edx, %eax
[Unbox:Int32]
    cmpl       $0xffffff81, %ecx
    jne        ((366))

And on 64-bit we get:

[Unbox:Int32]
    movq       %rcx, %r11
    shrq       $47, %r11
    cmpl       $0x1fff1, %r11d
    jne        ((383))
    movl       %ecx, %eax

So that's an extra move and shift, though on 32-bit presumably we paid part of that cost when we initially placed the high 32 bits of the Value in ecx.

3)  On X86-64 there's an extra MoveGroup before the first CallKnown.  But the actual call is cheaper, and in any case there aren't _that_ many CallKnowns here (about 75e6).

So my money is that the main culprit here is the Unbox:Int32 bits.

Jan de Mooij [:jandem]

Comment 10

•

10 years ago

(In reply to Please do not ask for reviews for a bit [:bz] from comment #9)
> So my money is that the main culprit here is the Unbox:Int32 bits.

Yes, I have a patch for x64 Unbox that gets us close to the 32-bit numbers. Will post soon, after testing what it does on some other benchmarks.

Jan de Mooij [:jandem]

Updated

•

10 years ago

Depends on: 1104199

Jan de Mooij [:jandem]

Comment 11

•

10 years ago

(In reply to Jan de Mooij [:jandem] from comment #10)
> Yes, I have a patch for x64 Unbox that gets us close to the 32-bit numbers.
> Will post soon, after testing what it does on some other benchmarks.

Bug 1104199. With the patch there:

x64 before: 0.44, 0.38, 0.425, 0.4,    0.4075
x64 after:  0.28, 0.27, 0.245, 0.26,   0.25125
x86:        0.24, 0.23, 0.25,  0.2475, 0.23625

d8 x64:     0.26, 0.23, 0.235, 0.2425, 0.22625

BMO Automation

Updated

•

2 years ago

Severity: normal → S3

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Quick Search

2x slower than v8 on recursion+scope chain testcase

Categories

(Core :: JavaScript Engine, defect)

Tracking

()

People

(Reporter: bzbarsky, Unassigned)

References

(
URL
)

Details

(Whiteboard: [js:t] [js:perf])

Crash Data

Security

(public)

User Story

Attachments

(2 files, 2 obsolete files)

Description

Updated

Comment 1

Updated

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Comment 10

Updated

Comment 11

Updated

Attachment

General

Description

File Name

Content Type