Closed Bug 852421 Opened 11 years ago Closed 11 years ago

Slow performance in JSIL Vector3 test case

Categories

(Core :: JavaScript Engine, defect)

x86_64
Windows 7
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla25

People

(Reporter: kael, Assigned: jandem)

References

(Blocks 2 open bugs)

Details

(Whiteboard: [games:p?])

Attachments

(3 files, 1 obsolete file)

The attached test case does a synthetic workload summing up an array of 3-double vectors, and compares various ways of storing them (JS array full of JS objects; byte array manually packed with structs; byte array packed automatically by JSIL infrastructure).

While fiddling with my code generator, I discovered that doing '>>> 0' to cast a value to unsigned int (for example, the offsets into my typed arrays) causes a significant performance hit compared to '| 0', even though I would expect unsigned offsets to be optimal for bounds checks and memory accesses (just check n < max instead of 0 <= n < max). Testing revealed that '>> 0' is also faster than '>>> 0'.

It looks like this is a JM problem given that --no-ion makes no difference and --ion-eager makes all the test cases slower; I have no idea why Ion isn't active for this code.

Looking at the different functions generated for '<<< 0' and '| 0' in JIT inspector didn't reveal any obvious differences; I don't see anything obvious like stub calls either.
What happens with --no-jm?  This issue is a WONTFIX for JM (see the sad history in bug 769765).  Random guess, if it's not running with Ion it could be due to the script size limit, try with --ion-limit-script-size=off or with background compilation (--ion-parallel-compile=on in a threadsafe build; the script size limit is 10x greater in this case).
--no-jm is so slow that it doesn't finish. It's hard to tell whether it's actually running in Ion; there are a bunch of aborts (using IONFLAGS= to show them), some of which I fixed, others which are caused by bugs (the main one is an abort due to new Function not producing compile-and-go scripts).

I'll try those two command line flags though and see if it changes.

Is there a way I can tell whether a particular function is actually Ion compiled? Right now --ion-eager makes everything slower but I can't see a robust way to tell if a function body is running under ion.
Setting IONFLAGS=scripts should indicate which scripts compilation starts/finishes for.  You can use that with 'aborts' to see why compilation is aborting on the script you are looking at.
OK, I ran it with --ion-eager --ion-limit-script-size=off and IONFLAGS=aborts to get abort information.

Here are the aborts for the normal test case:
http://pastebin.mozilla.org/2228300

Sorry everything is 'JSIL.Core.js:2599' and such; SpiderMonkey totally ignores closure names and sourceURLs when displaying information for |new Function| bodies. If it helps, 'JSIL.Core.js:2599' is the closure builder (top level function evaluated with |new Function|) and JSIL.Core.js:2601 is where it invokes the closure builder to get a bound closure.

Here are the aborts with jsshell evaluate() hacked in as a replacement for 'new Function' (since evaluate lets you make the loaded code compile-and-go):
http://pastebin.mozilla.org/2228287

Other than a few ones that I will probably fix in the future (I think all the fun.apply noise is because I still use Function.bind in a bunch of places) I don't really know much about the other aborts.

With the script size limit off some of the compiles fail due to the functions being too big, but I don't think that should be causing any of the hot paths to abort. They're all small (on purpose).
OK, managed to get meaningful output from IONFLAGS=scripts,aborts, though sadly it has to be with the evaluate hack, so it's not fully representative of what performance would be like with |new Function|... with the evaluate hack, the '>>> 0' and '| 0' test cases are roughly the same speed but are dramatically slower than the version using |new Function|. Whether this is because ion is generating bad code for them, or because evaluate generates inferior code (much like eval does), I don't know.

http://pastebin.mozilla.org/2228354
Based on this output, at least, it looks like almost all the hot functions are being analyzed by ion and aren't triggering an abort. So I can't figure out why they'd be way slower (2700ms with evaluate used to make them compile-and-go versus 970ms with |new Function|.
Since the >>> 0 thing is almost certainly (at this point, at least) just an artifact of my code being run in JM, I'm renaming the bug. Let me know if you want me to break this out into smaller bugs or smaller test cases.

I did some more testing with the normal version of the test case to figure out why it's running in JM instead of ion. I figured out that I needed to set IONFLAGS=scripts,aborts,bailouts - because some aborts are caused by bailouts. I was seeing aborts without any reason information in the log.

With this set, I can now see that bailouts are occurring in the hottest parts of the test case and seem to be permanently disabling Ion compilation for those functions. --ion-eager seems to prevent these bailouts from happening; that or it results in them just getting recompiled right away instead of ion compilation being turned off. Not sure which.

The bailouts look like this:

[Scripts] Analyzing script Vector3.cs.js:195 (043AC560) (usecount=10245) (maxloopcount=0)
[Scripts] Analyzing script Vector3.cs.js:176 (043AC340) (usecount=18433) (maxloopcount=0)
[Abort] Disabling Ion mode 0 compilation of script Vector3.cs.js:195
[Bailouts] Took invalidation bailout! Snapshot offset: 175
[Bailouts] Bailing out Vector3.cs.js:195, IonScript 040F5188
[Bailouts]  reading from snapshot offset 175 size 309
[Bailouts]  restoring frame
[Bailouts]  bailing from bytecode: nop, MIR: constant [0], LIR: osipoint [41]
[Bailouts]  expr stack slots 5, is function frame 1
[Bailouts]  frame slots 5, nargs 2, nfixed 0
[Bailouts]  pushing 1 expression stack slots
[Bailouts]  new PC is offset 65 within script 043AC560 (line 196)
[Bailouts] reflowing type info
[Bailouts] reflowing type info at Vector3.cs.js:195 pcoff 65
[Abort] Disabling Ion mode 0 compilation of script Vector3.cs.js:53
[Bailouts] Took invalidation bailout! Snapshot offset: 364
[Bailouts] Bailing out Vector3.cs.js:53, IonScript 047F19B8
[Bailouts]  reading from snapshot offset 364 size 405
[Bailouts]  restoring frame
[Bailouts]  bailing from bytecode: nop, MIR: constant [0], LIR: osipoint [80]
[Bailouts]  expr stack slots 3, is function frame 1
[Bailouts]  frame slots 5, nargs 0, nfixed 2
[Bailouts]  pushing 1 expression stack slots
[Bailouts]  new PC is offset 79 within script 043A3AB0 (line 57)
[Bailouts] reflowing type info
[Bailouts] reflowing type info at Vector3.cs.js:53 pcoff 79

Vector3.cs.js:195 is one of the hottest parts of the test case - it's the Vector3 addition operation, a one-line function. I can't tell from this why it's bailing out or why it would be getting forced into JM as a result. 

My best guess would be that perhaps it's bailing out due to the lazy initialization machinery used by JSIL, but it doesn't seem like that should be happening if the usecounts on these functions are in the tens of thousands - all the lazy initialization occurs on the first call to a given function.
Summary: '>>> 0' generates slower code than '| 0' for typed array manipulation → Slow performance in JSIL Vector3 test case
This test case has gotten slightly faster after the compile-and-go fix landed but it's still something like 2-3x slower than it was a few months ago.

C:\Users\Kevin\Desktop\UnsignedVsSigned>js-0506 runTest.js
// Using >>> operator to force type of offset computations
<16775168.00, 33550336.00, 50325504.00>
Arrays: 00714.00ms
<16775168.00, 33550336.00, 50325504.00>
ManuallyPackedStructs: 02355.00ms
<16775168.00, 33550336.00, 50325504.00>
PackedStructs: 02288.00ms
<16775168.00, 33550336.00, 50325504.00>
PackedStructPointers: 02326.00ms
// Using >> operator to force type of offset computations
<16775168.00, 33550336.00, 50325504.00>
Arrays: 00769.00ms
<16775168.00, 33550336.00, 50325504.00>
ManuallyPackedStructs: 02306.00ms
<16775168.00, 33550336.00, 50325504.00>
PackedStructs: 02283.00ms
<16775168.00, 33550336.00, 50325504.00>
PackedStructPointers: 02300.00ms
// Using | operator to force type of offset computations
<16775168.00, 33550336.00, 50325504.00>
Arrays: 00754.00ms
<16775168.00, 33550336.00, 50325504.00>
ManuallyPackedStructs: 02315.00ms
<16775168.00, 33550336.00, 50325504.00>
PackedStructs: 02292.00ms
<16775168.00, 33550336.00, 50325504.00>
PackedStructPointers: 02304.00ms

(The Arrays case shaved off a couple hundred ms, you'll note, so that's a win. But the other three cases are still terrible; in the JM era their runtimes were around 800-900ms.)
With a nightly jsshell from yesterday, this is still incredibly slow - if anything it's gotten even worse:

C:\Users\Kevin\Desktop\UnsignedVsSigned>js runTest.js
// Using >>> operator to force type of offset computations
<16775168.00, 33550336.00, 50325504.00>
Arrays: 01002.00ms
<16775168.00, 33550336.00, 50325504.00>
ManuallyPackedStructs: 02876.00ms
<16775168.00, 33550336.00, 50325504.00>
PackedStructs: 02910.00ms
<16775168.00, 33550336.00, 50325504.00>
PackedStructPointers: 02976.00ms
// Using >> operator to force type of offset computations
<16775168.00, 33550336.00, 50325504.00>
Arrays: 00978.00ms
<16775168.00, 33550336.00, 50325504.00>
ManuallyPackedStructs: 02912.00ms
<16775168.00, 33550336.00, 50325504.00>
PackedStructs: 02866.00ms
<16775168.00, 33550336.00, 50325504.00>
PackedStructPointers: 02921.00ms
// Using | operator to force type of offset computations
<16775168.00, 33550336.00, 50325504.00>
Arrays: 00959.00ms
<16775168.00, 33550336.00, 50325504.00>
ManuallyPackedStructs: 02926.00ms
<16775168.00, 33550336.00, 50325504.00>
PackedStructs: 03004.00ms
<16775168.00, 33550336.00, 50325504.00>
PackedStructPointers: 02978.00ms
Whiteboard: [games:p?]
I looked a bit into this. One of the problems is that we don't have good type information for |this| in Vector3d__ctor and slow path the 3 SETPROPs there.

It looks like constructing a new Vector3d goes through several layers, I have to understand where that unknown type comes from...
I'd love to optimize out the layers, but I've made lots of attempts and haven't found any way to do it given the constraints of JavaScript. There's an unfortunate cyclic relationship involved that can't easily be expressed.

I did come up with an experimental variant of JSIL that removed one of the layers of abstraction, but I saw no performance improvement in SpiderMonkey so I shelved that branch (the code was way more complex and fragile) - let me know if you'd like me to dust it off so you can look at it for a comparison.
Somehow Vector3d objects and their protos are marked as having unknown properties. What's on the prototype of these objects? It's pretty hard to find out with all the dynamic code generation. Could this be caused by one of the Object.create calls? Object.create's result has unknown properties (we should really fix that).

(In reply to Kevin Gadd (:kael) from comment #10)
> I did come up with an experimental variant of JSIL that removed one of the
> layers of abstraction, but I saw no performance improvement in SpiderMonkey
> so I shelved that branch (the code was way more complex and fragile) - let
> me know if you'd like me to dust it off so you can look at it for a
> comparison.

Maybe later, but for now I think there are more serious problems..
Each type has a prototype created using Object.create in order to derive it from its base type. So in effect the prototype chain mirrors the .NET type chain, like for

System.Object -> System.ValueType -> Vector3d

There are four nodes in the prototype chain:

Object (js) -> System.Object -> System.ValueType -> Vector3d

And each node is created using Object.create, then populated. The prototype created based on the chain is attached to the constructor for the type. (The constructor has to be created way earlier, before its actual contents are known, because it acts as the object in the global namespace that represents the type. That's why it has function indirection inside it.)
Adding a current build of the Vector3 test case (JSIL has had codegen and library improvements and this includes a newer jsshell)
Attachment #726504 - Attachment is obsolete: true
Attaching a minimal test case for the Object.create deoptimization jandem and I discovered (it severely affects the Vector3 test case).
So the problem is that Object.create(X) will deoptimize not just the result of Object.create but any object that has X as proto, since they share the same TypeObject.

Brian can we use FixObjectType instead of MarkTypeObjectUnknownProperties in obj_create? That seems to fix this problem but I didn't run jit-tests or anything yet.
Flags: needinfo?(bhackett1024)
Blocks: JSIL
(In reply to Jan de Mooij [:jandem] from comment #15)
> So the problem is that Object.create(X) will deoptimize not just the result
> of Object.create but any object that has X as proto, since they share the
> same TypeObject.
> 
> Brian can we use FixObjectType instead of MarkTypeObjectUnknownProperties in
> obj_create? That seems to fix this problem but I didn't run jit-tests or
> anything yet.

FixObjectType will only work on objects whose prototype is Object.prototype.  The MarkTypeObjectUnknownProperties call isn't necessary for correctness, and is a pointless deoptimization I think I added to improve some JSLint benchmark which relied heavily on Object.create.  That was lazy though and this call should just be removed.
Flags: needinfo?(bhackett1024)
Attached patch PatchSplinter Review
Remove the MarkTypeObjectUnknownProperties call.
Assignee: general → jdemooij
Status: NEW → ASSIGNED
Attachment #772021 - Flags: review?(bhackett1024)
Attachment #772021 - Flags: review?(bhackett1024) → review+
https://hg.mozilla.org/mozilla-central/rev/0960402d8145
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla25
Blocks: gecko-games
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: