This is probably not worth looking into in detail until bug 580468 is fixed. Steps to reproduce: 1) Load http://grantgalitz.org/gameboy/ 2) In the in-page gameboy thing, select File > Open As > URL Address 3) In the prompt, put in http://grantgalitz.org/gameboy/romStorage/proxima.gb for the ROM. Grant says that his CPU usage for just methodjit is about 93% while method+tracing is over 98%. jsc/v8 end up at about 75%, so even methodjit has some headroom here (or maybe graphics does).
Yeah, overall firefox isn't as fast as chrome yet on my site, as of 4.0b7pre. Many gameboy and gameboy color roms are averaging the same result, though the gameboy color games with "double clock speed mode" in the terminal will be taking up more processing power than the usual gameboy or gameboy color game.
Disable auto frameskip in the settings to get a better visual, and to max out the CPU as such.
People gotsa play Pokemon Yellow without stuttering. ;)
On the webkit nightlies the gameboy color CPU usage can sometimes drop to as low as 40%, while on FF4 it will get ~90%. Maybe webkit is optimizing how it handles switch statement execution?
(In reply to comment #4) > On the webkit nightlies the gameboy color CPU usage can sometimes drop to as > low as 40%, while on FF4 it will get ~90%. Maybe webkit is optimizing how it > handles switch statement execution? Nevermind, seems likely that the webkit nightlies use hardware acceleration, so to judge that against FF4 with no HW acceleration would be cruel.
You can load your own roms in locally now thanks to the File API, so it should be easier to test different cases with this emulator.
Aside from switch statements, another major issue for emulators is array accesses. Emulators can have a lot of code that looks like this; var MEMORY = ; // Defined once; this is the emulated RAM ... var b = MEMORY[a+10]; MEMORY[b+98] = 17; ... for (var i = 0; i < 100; i++) MEMORY[c+i] = MEMORY[d+i]; and so forth. Typically the exact same array is used for all of these (each time the same statement is executed, but also the same across all of these different statements), and always with the same data type being stored, numbers (maybe mixing integers and floating points though). Perhaps this type of code can be optimized, if it isn't already, using a PIC-like approach, or something else? Any speedup here would significantly affect emulator performance (and also code compiled into JS from C, LLVM, etc.).
Gal/Nick, was there some plan to have a COPYELEM opcode?
Alon: Yes, the emulator builds upon this. Also it uses multiple separate arrays for RAM banking. There are also ranges of each array that are left out (never initialized) because these separate arrays do these ranges' jobs (As required for banking). And yes, the opcodes are executed by switch, as well as by specifying the opcode as a property of the this object (and accessing it later in the array-like format). I'm wondering for curiosity what would be the fastest way for opcode execution (in an object-oriented kind of way (As in things use the this variable)).
I think there could be multiple problems trying to statically or dynamically JIT the code. The issues are that the code can be self-modifying, and that all the data, not just the OPs are in the ROM. The image data can be in the ROM data interlaced possibly with instruction code. Also, trying to dynarec JIT can also be a problem, since the interrupts are very sensitive and possibly skipping over them to a "final state" for dynarec'd instructions can possibly cause problems along the way.
Interesting. Ok, then I guess all that can be done is what was mentioned before (to optimize the JS engine implementation of array accesses and/or switches).
This problem is going to get worse though when the N64 emulator goes live. Though I'm trying to port the dynarec techniques from mupen64plus. Hopefully that will work.
The only game that currently works for the JS N64 emulator surprisingly is Diddy Kong Racing, though there are numerous dimension miscalculations from the ricevideolinux plugin port causing trouble with WebGL
Also, there's the matter of emulating 64 bit integers in 32 bit numerical typing.
Add Mario Kart to Comment 14 as well. :D
(In reply to comment #8) > Gal/Nick, was there some plan to have a COPYELEM opcode? I've heard the idea mentioned, but never more than that. Let's think about what it could gain us. A GETELEM/SETELEM pair on an initialized dense array, eg. a[j] = a[i]; involves these steps in trace-generated code (I imagine JM code must do similar things): GETELEM 1. load pointer to 'a' from the stack 2. if 'a' is not a dense array, exit 3. if 'i' exceeds the capacity of 'a', exit 4. load the dslots pointer of 'a' 5. compute the address of the dslot for a[i] 6. load a[i]'s tag from the dslot 7. if the tag doesn't match the expected type, exit 8. load a[i]'s payload 9. unbox the value 10. store the unboxed value on the stack SETELEM 11. load pointer to 'a' from the stack 12. if 'a' is not a dense array, exit 13. if 'j' exceeds the capacity of 'a', call js_EnsureDenseArrayCapacity() 14. load the dslots pointer of 'a' 15. compute the address of the dslot for a[j] 16. load the a[j]'s existing tag from the dslot 17. if the existing tag is for a hole, call js_Array_dense_setelem_hole 18. box the new value 19. store the boxed value into a[j]'s dslot Horrid, isn't it? Here's what COPYELEM would do: COPYELEM 1. load pointer to 'a' from the stack 2. if 'a' is not a dense array, exit 3. if 'i' exceeds the capacity of 'a', exit 4. if 'j' exceeds the capacity of 'a', call js_EnsureDenseArrayCapacity() 5. load the dslots pointer of 'a' 6. compute the address of the dslot for a[i] 7. load a[i]'s tag and payload from the dslot 8. compute the address of the dslot for a[j] 9. load a[j]'s existing tag 10. if the existing tag is for a hole, call js_Array_dense_setelem_hole 11. store a[i]'s tag and payload into a[j]'s slot. So this has the following benefits (for trace-generated code): - No need to reload 'a' from the stack and test if it's a dense array. (The tracer can omit these steps thanks to CSE, but the methodjit presumably can't.) - No need to reload the dslots pointer and recompute the dslots address. (The tracer can't CSE those in the GETELEM/SETELEM version because of the possibly intervening call to js_EnsureDenseArrayCapacity(), which can reallocate dslots.) - No need to do anything with the gotten value -- type-test it, unbox it, or store it to the stack. So that's quite a bit better, I would guess close to 2x faster. This assumes that COPYELEM does not put a copy of the element on the stack; doing so would negate several of the potential benefits, and it doesn't make much sense. One issue would be whether the front-end fuses separate GETELEM/SETELEM pairs, eg: var tmp = a[i]; a[j] = tmp; You'd hope so, but I'm not certain, I don't know much about the front-end. Things can get more complicated if there is code between the GETELEM and the SETELEM.
Don't test against Internet Explorer 9, since IE9 has broken CanvasPixelArray.data setting support (Error seems to indicate that this is read-only in IE9, while every other browser that supports this object allows for this to be set.).
(In reply to comment #17) > > I've heard the idea mentioned, but never more than that. Let's think about > what it could gain us. A GETELEM/SETELEM pair on an initialized dense array, > eg. a[j] = a[i]; involves these steps in trace-generated code (I imagine JM > code must do similar things): > > GETELEM > 1. load pointer to 'a' from the stack > 2. if 'a' is not a dense array, exit > 3. if 'i' exceeds the capacity of 'a', exit > 4. load the dslots pointer of 'a' > 5. compute the address of the dslot for a[i] > 6. load a[i]'s tag from the dslot > 7. if the tag doesn't match the expected type, exit > 8. load a[i]'s payload > 9. unbox the value > 10. store the unboxed value on the stack > Hmm, I am reminded of an idea of pcwalton's - to use WebGL typed arrays for this kind of code. In theory perhaps this could work like so (for GETs): 1. load pointer to 'a' from the stack 2. if 'a' is not a WebGL typed array, exit 3. if 'i' exceeds the capacity of 'a', exit 4. load 'a's pointer to the actual memory buffer 5. load a[i]'s value 6. store that value on the stack ? On the emulator side, the JS code would need to read&write only the proper type to the array, but that is feasible. (It might require separate arrays for integer and float values.)
(In reply to comment #19) > > Hmm, I am reminded of an idea of pcwalton's - to use WebGL typed arrays for > this kind of code. Absolutely, typed arrays are less flexible and so will be faster. If the array we are getting/setting is a typed array, the generated code will a bit different (and simpler) than that above (for the trace JIT at least; it will see that the array is typed at record-time and so can specialize the code for that case). (I suspect a lot of JS programmers think that arrays are fast, because they are in other languages. But they're not in JS, even though a lot of effort has gone into making them fast, and more will be done in the future, esp. because Kraken uses them a lot.)
Nick: The emulator is now using Uint8Array for the main memory. Seems to be working in Chrome 7 and Firefox 4. I haven't measured the performance difference yet. Thanks though for bringing this stuff up, since I explicitly needed giant 8-bit memory arrays.
(FYI, typed arrays are not IC'd in JM yet. See bug 594247.)
Well, I guess I added typed array support in advance inside the GBC emu for when special code paths are made later on, so that it will make an improvement later on.
The GBC emu initializes most of its RAM now through the format: new Uint8Array(memory bank length in bytes);
I opened bug 602366 on adding a COPYELEM bytecode.
Memory is much more branchy than the only other gameboy emulator out there, since I support a lot ram banking types, as well as gameboy color specific ram banking. So trace jit especially fails on my emu.
Wow, I'm a bad typer, here's what I meant to say if you weren't able to follow the last comment: GameBoy Online's memory is much more branchy than JSGB's memory (the other gameboy emu).
Also, sound support seems to be adding a bit of load, but https://bugzilla.mozilla.org/show_bug.cgi?id=578916 seems to have helped possibly.
Also, the original ROM posted in a much earlier comment won't work anymore unless you disable the BIOS ROM from booting on game startup in the settings, sine that ROM has a bad ROM header that the GameBoy Color BIOS ROM checks in order to execute Nintendo's lockout of unapproved ROMs.
Also the measurements earlier for CPU usage are now invalid, since the way opcodes are executed and memory is read and written to has changed.
It would probably be helpful if you kept older versions archived somewhere so that Moz developers can test specific optimization problems -- that's difficult when there's a moving target.
Yeah, I need to archive my revisions. It's sad that I don't even have an SVN, a GIT, or a CVS repository (Que in the laziness explanation). But overall performance should be getting better for JM, though TM seems to suck a lot when enabled. If JM can easily access any of over a thousand functions as properties of the this variable, then JM is set for speed, since the opcodes and memory functions were chewed down into functions that are now numeric properties of "this" that can be accessed like this[address + 0x10000] or this[op + 0x200].
Correction: Numerically addressed properties
Grant, can you _please_ stop the noise? I don't need all the email that's not really relevant to getting this bug resolved....
Bug seems to have been fixed since hueristics have landed. Firefox might not run it as fast as V8, but the specific topic of this bug has been resolved (trace jit slowing down method jit)
Status: NEW → RESOLVED
Last Resolved: 9 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.