Open Bug 939438 Opened 11 years ago Updated 2 years ago

Bad/unpredictable JS performance in JSIL test case

Categories

(Core :: JavaScript Engine, defect)

x86_64
Windows 7
defect

Tracking

()

People

(Reporter: kael, Unassigned)

References

(Blocks 1 open bug)

Details

Attachments

(1 file, 2 obsolete files)

Attached file ElementProxies.zip (obsolete) —
I recently updated the JSIL compiler with a new optimization that reduces GC pressure in certain scenarios. When I applied this to one of my test cases, I found that the test case got faster in Chrome Canary and in a recent build of spidermonkey (js.exe) running in the shell, but performance in actual Firefox fell through the floor (5-10x performance regression for no obvious reason). I can't tell what's going on here or why it is fast (like I'd expect) in shell while running miserably slow in browser.

I looked a little at the generated code via jit inspector, and while it looked bad it didn't look bad in any unusual/obvious way.

Also, *occasionally* it runs fast in Firefox like it does in js.exe. It's just usually slow. I've seen it run fast maybe twice out of the couple dozen test runs I've done. It's not some sort of warming phenomenon - one of the fast runs was the *first* time I ran the test case and it was slow the second time; another fast run was after running it a couple times. It just seems to be random.

Timings from test runs:

Nightly 28.0a1 (2013-11-16)

<16775168.00, 33550336.00, 50325504.00>
Arrays: 00379.00ms
<16775168.00, 33550336.00, 50325504.00>
ManuallyPackedStructs: 05750.00ms
<16775168.00, 33550336.00, 50325504.00>
PackedStructs: 05432.00ms
<16775168.00, 33550336.00, 50325504.00>
PackedStructPointers: 05629.00ms

js.exe (2013-11-16)

<16775168.00, 33550336.00, 50325504.00>
Arrays: 00348.00ms
<16775168.00, 33550336.00, 50325504.00>
ManuallyPackedStructs: 00617.00ms
<16775168.00, 33550336.00, 50325504.00>
PackedStructs: 00462.00ms
<16775168.00, 33550336.00, 50325504.00>
PackedStructPointers: 00619.00ms

Chrome Canary (Version 33.0.1710.0 canary)

<16775168.00, 33550336.00, 50325504.00>
Arrays: 00515.00ms
<16775168.00, 33550336.00, 50325504.00>
ManuallyPackedStructs: 00787.00ms
<16775168.00, 33550336.00, 50325504.00>
PackedStructs: 00612.00ms
<16775168.00, 33550336.00, 50325504.00>
PackedStructPointers: 00600.00ms


I'd love to turn this optimization on since it's an improvement for js.exe and for Chrome, but if it's going to destroy performance in Firefox I will have to leave it turned off :(
I did some fiddling around in js.exe to see if I could reproduce the circumstances for a slow test run. Wasn't able to get something as slow as the actual browser, but:

--ion-eager produces significant slowdown; worse than --no-ion. --ion-eager runs seem to take around 3500-4000 ms for the manually packed test case, while --no-ion takes around 2000ms. Parallel script compilation seemed to have no effect on performance. I tried various values for 'uses before compile' and saw no major difference with them either (though larger values for that made it slower, for some reason)
Attached file JSIL.Unsafe.js (obsolete) —
I found a bailout in JSIL.Unsafe.js; here's an updated version that fixes it. Unfortunately, fixing this bailout does not speed things up in the browser at all. It does shave time off the shell runs, though:

<16775168.00, 33550336.00, 50325504.00>
Arrays: 00345.00ms
<16775168.00, 33550336.00, 50325504.00>
ManuallyPackedStructs: 00503.00ms
<16775168.00, 33550336.00, 50325504.00>
PackedStructs: 00507.00ms
<16775168.00, 33550336.00, 50325504.00>
PackedStructPointers: 00521.00ms
Attached file Updated test case
Updated the test case ZIP with the bailout fix and with another bailout fix (I discovered having a 'throw' statement disables inlining, so I fixed that in a bunch of places).

Still slow in the browser =[
Attachment #833346 - Attachment is obsolete: true
Attachment #833348 - Attachment is obsolete: true
I managed to catch one of those 'mysteriously fast' test runs in JIT Inspector.

The fast run appeared to only have 1-2 total compilations of each test function (the loop that does the sum), and no recompilations for the proxies (Vector3d_Proxy_get_x, etc), while a slow run shows lots of recompilations for the proxies and for the test functions. It seems like when the test case runs slowly, for some reason Spidermonkey is invalidating and recompiling the involved JS over and over.

From looking at the compilations of the test functions in the 'mysteriously fast run', it looked like *the entire test case* had been inlined into the outer function that runs it, which is what I'd expect in this case. During slow test runs, it looks like inlining might not be happening for the proxies, but it's hard to tell.

In particular recompiling the proxies is mysterious: There's no reason they should ever get compiled, they look like this:

function Vector3d_Proxy_get_y(value) {
    var bytes = this.$bytes;
var offset = ((this.$offset | 0) + 8) | 0;
  clampedByteView[0] = bytes[(offset + 0) | 0];
  clampedByteView[1] = bytes[(offset + 1) | 0];
  clampedByteView[2] = bytes[(offset + 2) | 0];
  clampedByteView[3] = bytes[(offset + 3) | 0];
  clampedByteView[4] = bytes[(offset + 4) | 0];
  clampedByteView[5] = bytes[(offset + 5) | 0];
  clampedByteView[6] = bytes[(offset + 6) | 0];
  clampedByteView[7] = bytes[(offset + 7) | 0];
return nativeView[0];
    }

nativeView and clampedByteView are known typed arrays provided via closure. this.$bytes and this.$offset are always a Uint8Array and an integer - their types don't change during loop iterations.
(In reply to K. Gadd (:kael) from comment #3)
> (I discovered having a 'throw' statement disables inlining, so I fixed that
> in a bunch of places).

Yeah that's a bit silly, filed bug 939816 to fix this.
Investigating. Big perf differences between shell and browser are bad.
Assignee: nobody → jdemooij
Status: NEW → ASSIGNED
Kevin, you're right: for some reason we're not inlining op_Addition etc in Program_TestManuallyPackedStructs in the browser. Because we don't inline this, we end up with polymorphic, scripted getter calls in op_Addition I think (for the x, y, z properties) and these are not inlined/cached in any way, so they are really slow.

I'll try to find out why we're not inlining the op_Addition call.
The problem is that when we inline, we ensure the callee's "parent" matches the current global. With parallel parsing, it's possible to have a JSFunction with a parent that's the "parallel parsing global". Parallel parsing is not used in the shell, so that explains why this is browser only.

The patch in bug 938124 will remove this check.

Kevin, can you confirm that turning off javascript.options.parallel_parsing in about:config fixes this?
Depends on: 938124
Flags: needinfo?(kg)
Thanks for the prompt investigation. Confirming that javascript.options.parallel_parsing=false fixes this in Nightly.

It *is* the case that the getters are polymorphic, but they're only coarse-grained polymorphic: each loop always uses a given getter for every invocation, and the test run invokes each loop sequentially, so you're getting like 10k addition calls with getter 1, then 10k addition calls with getter 2, etc. I'm sure this is suboptimal in that it either requires a smart on-demand recompile or a polymorphic dispatch with two possible types. Not sure what to do there other than beg for cloneAtCallsite to be accessible from user code like this (maybe I could clone the callsites myself from the JS side?) It seems plausible to me that doing this could improve performance even more, but I'm pretty happy with these test cases being remotely close to the performance for the 'Arrays' test case.
Flags: needinfo?(kg)
The numbers are much better now - we're a bit slower than in Chrome, but I no longer see a huge cliff like the one in comment 0.
Assignee: jdemooij → nobody
Status: ASSIGNED → NEW
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: