Closed Bug 654410 Opened 13 years ago Closed 8 years ago

NES emulator 3X faster in Chrome

Categories

(Core :: JavaScript Engine, defect)

x86
Linux
defect
Not set
normal

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: dvander, Unassigned)

References

(Blocks 1 open bug, )

Details

Attachments

(2 files)

Author said his NES emulator runs 3X faster in Chrome, so filing a tracking bug to investigate.
Attached file A free test ROM
Here's a free program to use with the emulator so that people don't need to use infringing copies.  The archive contains spritecans.nes (the NES executable) and complete corresponding source code.
According to http://nesdev.parodius.com/bbs/viewtopic.php?p=77416#77416
the emulator is a fork of JSNES, for which we have bug 509986.
On spritecans, I get 30fps in Fx4, 60fps in Chrome.

xperf confirms that 2/3 of our time here is in JS, 1/3 in "unknown", but almost none in gfx, so definitely a JS issue. We should try our JS profilers on this.
I've tracked down the primary performance problem to the emulate: function. The problem is probably the JIT not compiling the switches into jump tables. This is the "ideal" method to implement the emulator. The other possibility is to have a separate function for each op, but that adds function call setup overhead which is non-trivial. Overall I've found the switch implementation method to be faster in chrome thus far, but there are some possibilities for the function call method that I haven't explored.
(In reply to comment #4)
> I've tracked down the primary performance problem to the emulate: function.
> The problem is probably the JIT not compiling the switches into jump tables.
> This is the "ideal" method to implement the emulator. The other possibility
> is to have a separate function for each op, but that adds function call
> setup overhead which is non-trivial. Overall I've found the switch
> implementation method to be faster in chrome thus far, but there are some
> possibilities for the function call method that I haven't explored.

I think we do compile switches to jump tables (except on ARM), if the switch is done with JSOP_TABLESWITCH by the bytecode compiler. So maybe it's getting compiled as JSOP_LOOKUPSWITCH, or maybe the problem is something else.
In the case of switches over symbolic constants which are properties of the same object, could we maybe guard on the shape of the holder object and then generate a table?
(In reply to comment #5)
> 
> I think we do compile switches to jump tables (except on ARM), if the switch
> is done with JSOP_TABLESWITCH by the bytecode compiler

I hope you're talking about the method JIT, because we recently removed table switch support from the trace JIT (bug 620757).
Here's a shell version + the free ROM. It runs 100 frames like this:
--
100 frames: 3042ms.
32.9 fps
--
It reads the rom using the shiny new snarf(.., "binary").
Most time is spent in GetElem/SetElem stubs. Shark shows 23.5% under js::PropertyTable::search...

The switch-statements here look like table switches, I also see no switch-related stub calls in the profile.
On some NES ROMs, with JSNES, the arrays fill in an order that causes them to become sparse. Could something similar be happening here?
(In reply to comment #10)
> On some NES ROMs, with JSNES, the arrays fill in an order that causes them
> to become sparse. Could something similar be happening here?

Yeah if I change this:
--
var i = 256*240;
while(i--) {
    buffer[i] = bgColor;
}
var pixrendered = this.pixrendered;
i = pixrendered.length;
while(i--) {
    pixrendered[i]=65;
}
--
to this:
--
for(var i=0; i<256*240; i++) {
    buffer[i] = bgColor;
}
var pixrendered = this.pixrendered;
for(i = 0; i < pixrendered.length; i++) {
    pixrendered[i]=65;
}
--
we're almost twice as fast:

100 frames: 1753ms.
57 fps
I've updated the code everywhere from a simple new Array(size) to a function which new's the array, then initializes all its elements in 0 .. size-1 order. (rather than reverse order).

It is indeed faster. Before I was getting 10-15fps on my macbook air, now its more like 23-25fps. Still not 60fps like chrome, but definitely an improvement!
This is definitely something I can work with :) I'll swat at the code tonight and see if I can squeeze out some perf =D Thanks guys!
Blocks: 467263
Jon: if/as you find any more performance faults, please let us know so (as dvander did with bug 586842) we can compile use cases to guide our optimization efforts.
Initializing an array via 
{ someArray: [ v1, v2, v3, v4] } 

seems much faster when using the array after initialization than initing via
this.someArray = new Array(4)
this.someArray[0] = v1;
this.someArray[1] = v2;
this.someArray[2] = v3;
this.someArray[3] = v4;

Am I hallucinating a performance improvement here? (obviously with much larger than 4 elements and with random accesses)

So far I've improved perf by 40% or so in FF4. Still looking for places to improve and tricks to get things faster.
That example isn't 100% accurate

{foo: [{a:0, b:0}, {a:0, b:0}, {a:0, b:0}, {a:0, b:0}]

vs

this.foo = new Array(4);
this.foo[0] = [];
this.foo[0].a = 0;
this.foo[1] = [];
this.foo[1].a = 0;
this.foo[2] = [];
this.foo[2].a = 0;
this.foo[3] = [];
this.foo[3].a = 0;
this.foo[0].b = 0;
this.foo[1].b = 0;
this.foo[2].b = 0;
this.foo[3].b = 0;
(In reply to comment #16)
> Initializing an array via 
> { someArray: [ v1, v2, v3, v4] } 
> 
> seems much faster when using the array after initialization than initing via
> this.someArray = new Array(4)
> this.someArray[0] = v1;
> this.someArray[1] = v2;
> this.someArray[2] = v3;
> this.someArray[3] = v4;
> 
> Am I hallucinating a performance improvement here? (obviously with much
> larger than 4 elements and with random accesses)

For those particular examples, I get a dense array in either one. I also tried your code in comment 17 and that gave me a dense array, too. But if you initialize the elements of a long array in random order, then it will probably get demoted to a sparse array.

Btw, I check whether the array is dense or not using the |dumpObject| function of the JS shell. Example of a dense array:

js> x = [ 1, 2, 3 ]
[1, 2, 3]
js> dumpObject(x)
object 00D0C048
class 016A8D48 Array
flags: none
elements
   0: 1
   1: 2
   2: 3

Example of a sparse array:

js> y = new Array(4)
[, , , ,]
js> y[1000000] = 8
8
js> dumpObject(y)
object 00D0C090
class 016A8F60 Array
flags: indexed
proto <Array object at 00D02118>
parent <global object at 00D02028>
private 000F4241
properties:
    ((Shape *) 00D09AA0) permanent shared getterOp=0130B880 setterOp=0130B8D0 "length": slot -1
    ((Shape *) 00D09AC8) enumerate 1000000: slot 0 = 8

Only the sparse (slow) array has proto, parent, or properties. Only the dense array has |elements|. The |class| value is different, but it doesn't tell you which is which so that's less useful.
(In reply to comment #16)
> Initializing an array via 
> { someArray: [ v1, v2, v3, v4] } 
> 
> seems much faster when using the array after initialization than initing via
> this.someArray = new Array(4)
> this.someArray[0] = v1;
> this.someArray[1] = v2;
> this.someArray[2] = v3;
> this.someArray[3] = v4;
> 
> Am I hallucinating a performance improvement here? (obviously with much
> larger than 4 elements and with random accesses)

At least in JaegerMonkey, array initializers are very fast - it knows the layout up-front and can poke directly into the slots. In the latter example there are a lot more instructions and memory traffic needed.
I know this isn't the right place for this. But for the next HTML5 spec, I'd like to request support for gamepads from HTML5 devices.
Bug 827490 just landed.  It might help here.
I just ran the shell test case on my Linux64 box and got:

  100 frames: 1803ms.
  55.5 fps

Then I ran an old (pre-bug 827490) build and got almost identical numbers.

As for the browser, I tried http://zelex.net/nezulator/ with spritecans.nes but I couldn't get it to do anything useful (i.e. the FPS was stuck at 0).  Can someone who knows how to run it try it again with a Nightly build?
Assignee: general → nobody
According to https://arewefastyet.com/#machine=11&view=single&suite=misc&subtest=bugs-654410-nezulator this bug got fixed in the first half of 2013.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: