Closed Bug 678939 Opened 13 years ago Closed 11 years ago

slower than V8 on code translated from C++ to JS

Categories

(Core :: JavaScript Engine, defect)

x86_64
Windows 7
defect
Not set
normal

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: onangames, Assigned: jandem)

References

Details

Attachments

(1 file)

User Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/14.0.835.35 Safari/535.1

Steps to reproduce:

you can check these 2 games
http://gunbros.glu.com
http://chrome.monsterdashgame.com/
these games are using mandreel technology to convert from c++ into js. 


Actual results:

Performance is not good


Expected results:

Have a better framerate. On my computer chrome 13 is 3 times faster
Please don't file bugs with inciting title/content or you will be banned from buzilla for violating the bug etiquette: https://bugzilla.mozilla.org/page.cgi?id=etiquette.html
Both sites request access to my google account. Mildly suspicious.
you can see these games in the chrome web store
https://chrome.google.com/webstore/detail/cknghehebaconkajgiobncfleofebcog?hl=en-US&hc=hp&hcp=new
https://chrome.google.com/webstore/detail/ciamkmigckbgfajcieiflmkedohjjohh?hl=en-US&hc=hp&hcp=new

we are accessing your google account to store the date in the cloud

it's real :)

http://www.mandreel.com/

Gunbros.glu.com is asking for some information from your Google Account andreas.gal@gmail.com

[Allow] [No thanks]

Remember this approval

This got to be the least meaningful security dialog in the history of time. I tried the other link. That game displays "bad fopen" dialogs in a current Nightly.

If someone feels brave enough for this or wants to make a fake google account, we could look a bit more into this.
Summary: javacript engine → slower than V8 on C++ code translated from C++ to JS
Summary: slower than V8 on C++ code translated from C++ to JS → slower than V8 on code translated from C++ to JS
Version: 5 Branch → Trunk
With the first game, I registered a fake google account and granted the application access. It downloaded 140 MB of offline content, and then stopped doing anything besides animate a small running outline in the corner of the game. Refreshing the page a few times eventually yielded a single "bad fopen" dialog, a black rectangle, and no further output. This behavior occurred in both current Aurora and current Nightly, Linux x86_64.

With the second game, I got the same repeated "bad fopen" dialogs Andreas reports on Aurora, but was unable to try on a Linux x86_64 Nightly because the server began reporting (with the link provided):

Error: Server Error

The server encountered an error and could not complete your request.

If the problem persists, please report your problem and mention this error message and the query that caused it.
it works for me with firefox 5 on windows

don't know what's going on and it works on chrome as well

I didn't try a nightly build
it seems something changed on XMLHttpRequest

the response array buffer is not working any more

    var xhr = new XMLHttpRequest();  
		
    xhr.open("GET", full_name, false);  
	
    if(xhr.hasOwnProperty("responseType"))
       xhr.responseType="arraybuffer";
    else
       xhr.overrideMimeType('text/plain; charset=x-user-defined');
	 				
	
	try{
	xhr.send(null); 
	}catch(e)
	{
	dump('\nerror opening file ' + full_name + '\n');
	r_g0 = 0;
	return;
	}
	
	//dump(xhr.status);
	
	
	

	if (xhr.responseType=="arraybuffer")
		buffer=xhr.response;
	else if (xhr.mozResponseArrayBuffer != null)
		buffer = xhr.mozResponseArrayBuffer;  
	else
		alert('bad fopen');
I guess the problem is here

var xhr = new XMLHttpRequest();  
xhr.hasOwnProperty('responseType')


hasOwnProperty is not working properly, XMLHttpRequest does have responseType
This is way I solved the problem

I will let you know when I upload a new version of the games with this "fix"

if(xhr.hasOwnProperty("responseType"))
  xhr.responseType="arraybuffer";
else
{
  if (BrowserDetect.browser == 'Firefox' && BrowserDetect.version >=6)
    xhr.responseType='arraybuffer';
  else
    xhr.overrideMimeType('text/plain; charset=x-user-defined');
}
(In reply to onangames from comment #8)
> var xhr = new XMLHttpRequest();  
> xhr.hasOwnProperty('responseType')
> 
> hasOwnProperty is not working properly, XMLHttpRequest does have responseType

The problem is that responseType is on the prototype, but hasOwnProperty only checks if its on the object itself. See also bug 657550. To fix it you can change 

if (xhr.hasOwnProperty("responseType")) {

to this:

if ("responseType" in xhr) {
thanks,

I already changed it
tomorrow we will deploy new versions with the fix
onangames, can you please provide a console version of this code that is also unminified, so that it is easy to benchmark and debug this using the sm shell and d8?
Hi,

here you can download a bullet physics library benchmark demo
www.mandreel.com/demos/BenchmarkDemo.zip

it runs around 12 fps on chrome and around 4 fps on firefox

that's the performance we usually get on firefox, 3 times slower than chrome

let me know if you need anything else

we really want to make our games playable on firefox

Thanks
Miguel
We're spending time in the interpreter because analysis of some script fails (and therefore JM does not compile it). I will try to find out why...
Status: UNCONFIRMED → NEW
Ever confirmed: true
we are gonna release 4 more games this month and more to come

we really want the games work on firefox as well
if you need any help let me know

thanks,
Miguel
The problem is with this function: ZN8OpenGLES9OpenGLES213OpenGLESState17setCurrentProgramEv (4500 lines...)

JM and jsanalyze need to support IFEQX and TABLESWITCHX to compile this function. Fortunately the jsanalyze part of this is already done on the TI branch so I will just implement these ops in JM and see how much it helps here.
Miguel, meanwhile as a partial workaround you can try to not use |switch| in your generated code. Most of those switches look like they could be replaced by ifs anyhow. This would also make the generated code smaller so I don't see a downside there.

That leaves the IFEQX stuff though. I'm not familiar with it but guessing from the SM code it looks like an extended (big?) jump? Jan, is there some workaround for that, or will it always be generated in large functions?
indeed we can change that switch/case to if/else, most of the time it's just a switch/case with only one case

the IFEQX no idea, all that inline and big functions it's because we are using llvm-ld with optimizations on, it does a lot inlining, generating huge methods
(In reply to Alon Zakai (:azakai) from comment #17)
> Miguel, meanwhile as a partial workaround you can try to not use |switch| in
> your generated code. Most of those switches look like they could be replaced
> by ifs anyhow. This would also make the generated code smaller so I don't
> see a downside there.

The downside is that it's a bit less efficient because a table switch is optimized to a jump table. But yeah, we have to compile them first for this to work...

> That leaves the IFEQX stuff though. I'm not familiar with it but guessing
> from the SM code it looks like an extended (big?) jump? Jan, is there some
> workaround for that, or will it always be generated in large functions?

IFEQ and IFNE are used for if-statements and loops. These ops have an argument, the difference between the current PC and the target PC (jump offset). If this offset is very large the extended ops (IFEQX etc) are used. So to workaround them you need a smaller if/else/loop body... (note that this offset is relative, so smaller if-statements in a very large function should be okay)

@onangames, the best workaround here is to make the body of this function smaller: _ZN8OpenGLES9OpenGLES213OpenGLESState17setCurrentProgramEv. There's one huge if-statement:

        if (!(r2 == 0)) //_LBB1282_736
        {
            r2 = 0;
            r3 = r1 >> 2;
        //...

There's also one very large switch statement. This function has many deeply nested loops and switch-statements; would it be possible to create separate functions for some of these statements? We're currently unable to compile this function due to these two statements.

I will attach a patch to compile these ops, but it will take at least a few months before it ens up in a release build... And these super large functions have other problems, compiling them can be expensive.
For comparison, with Emscripten we see better results with not doing inlining in LLVM, instead we let closure compiler do inlining - it has a much better understanding of where it makes sense to do that in JS than llvm-ld I think. Hopefully that can help with these overly-large functions.
Attached patch PatchSplinter Review
Compile IFEQX, IFNEX, TABLESWITCHX and GOTOX + tests.
Assignee: general → jandemooij
Status: NEW → ASSIGNED
Attachment #553559 - Flags: review?(bhackett1024)
Comment on attachment 553559 [details] [diff] [review]
Patch

Review of attachment 553559 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/methodjit/Compiler.cpp
@@ +1644,5 @@
>            }
>            END_CASE(JSOP_IFNE)
>  
> +          BEGIN_CASE(JSOP_IFEQX)
> +          BEGIN_CASE(JSOP_IFNEX)

Can you combine these with the IFEQ/IFNE case above, and use GetJumpOffset?
Attachment #553559 - Flags: review?(bhackett1024) → review+
(In reply to Brian Hackett from comment #22)
> Can you combine these with the IFEQ/IFNE case above, and use GetJumpOffset?

Pushed with this fixed. Also had to add an PC += js_CodeSpec[op].length; and break, as IFEQ and IFEQX have different length.
 
http://hg.mozilla.org/projects/jaegermonkey/rev/427522c34b31
Hi,

we have been playing around with llvm-ld and llc, disabling some optimizations, but if we disable inline optimizations, overall performance is a lot of worse

by the way, you can try
http://chrome.monsterdashgame.com/
now it works, we fixed the problem with XMLHttpRequest
we have to check with our client to see if we can provide you a build with "readable" js
(In reply to onangames from comment #24)
>
> we have been playing around with llvm-ld and llc, disabling some
> optimizations, but if we disable inline optimizations, overall performance
> is a lot of worse
> 

Yes, inlining is crucial for good performance. But using closure compiler advanced opts to do inlining is usually better than letting LLVM do it, since it is tuned for different things. However maybe you can modify LLVM to inline less aggressively.
@onangames: we fixed some problems on the experimental TI branch. In case you want to try it out, here's a Windows build: ftp://ftp.mozilla.org/pub/firefox/tinderbox-builds/jaegermonkey-win32/1313683080/firefox-8.0a1.en-US.win32.zip

The FPS of the demo zip file should be better now. Can you confirm this? And how do you measure performance on http://chrome.monsterdashgame.com/? It does not display the FPS?
Hi Jan,

performance boot is really impressive from 1.5 fps on firefox 6 to 6/7 fps
chrome 15 canary build is 9/10 fps

on chrome we usually use --show-fps-counter to display fps
is there something like that on firefox?

congratualations
Hi Jan,

is it optimizing the switch? 

Miguel
(In reply to onangames from comment #27)
> performance boot is really impressive from 1.5 fps on firefox 6 to 6/7 fps
> chrome 15 canary build is 9/10 fps

Nice. It will take at least a few months for this to appear in a stable Firefox release though, so I'd still try azakai's suggestion of using Closure Compiler.

Clang seems to respect __attribute__((noinline)). Would it help to add this attribute to some (large?) functions to prevent inlining them in _ZN8OpenGLES9OpenGLES213OpenGLESState17setCurrentProgramEv and similar functions? So instead of

static int test() { ... }

You could do something like this:

#ifdef COMPILING_TO_JS
#define NEVER_INLINE_JS __attribute__((noinline))
#else
#define NEVER_INLINE_JS
#endif

static NEVER_INLINE_JS int test() {  ... }

> on chrome we usually use --show-fps-counter to display fps
> is there something like that on firefox?

I don't know..

(In reply to onangames from comment #28)
> is it optimizing the switch? 

Yeah it should be optimizing the very large switch I mentioned in comment 19.
Hi Jan,

closure doesn't work with mandreel, it breaks the javascript. We will have to check this with google team

so this new js engine is gonna be ready for firefox 8?

Miguel
I think it is (tentatively) planned for FF9, not 8.
Hi,

we already prepared a new performance test
this time uncompressing pvr textures

this is the link for the js implementation
http://www.mandreel.com/demos/test1/index.html

this is the link for the js implementation
http://www.mandreel.com/demos/test1/indexFlash.html


on my machine, i7
chrome : 2500
flash: 1900
firefox nightly 11.0a1 (2011-11-22): 5300

Thanks,
Miguel
Miguel, it's very hard to see what JS code actually runs in that link. The JS files included on the actual web page don't seem to be the important ones - I assume the important code is loaded later somehow.

Without seeing the source, it's hard for us to help you and/or find ways to make Firefox faster on this code.

Any chance you can provide a clearer example?
here you have the generated js

http://www.mandreel.com/demos/test1/mandreel.js

and this is the troublespot _ZN15chanka_test_pvr15PVRTCDecompressEPKviiiPh

let me know if you need anything else
Miguel, I have some comments about your generated code. For example,

  function fmod(sp)
  {
  var value = heapDouble[sp>>3];sp+=8;
  var value2 = heapDouble[sp>>3];
	f_g0 = value % value2;
  }

1. If instead of sp += 8, you just did (sp+8) in the second appearance of sp, I think this could run faster in principle. sp += 8 means the variable itself will be incremented, while (sp+8) means that value will be calculated just for its use, so the compiler has more room to optimize.

2. Passing around a stack pointer and reading all arguments from memory (and writing them to memory beforehand) is almost certainly slower than passing around normal JS arguments. JS engines have been optimized for normal JS arguments, various optimizations should work better that way I would expect.

Regarding

        r7 = (r0 + 96)&-1;
	heap32[(fp+-331)] = r7;

3. The assignment to r7 is unnecessary (it isn't re-used later). This happens a lot in the critical function. Emscripten btw has a tool that can eliminate similar things (eliminator.coffee - but it works on variables, not 'registers'), perhaps you can adapt that to help here.

4. In general, I think I already mentioned that I am skeptical of the register approach you have. You create a very large number of variables on the stack that way (bad), and I assume values can end up being spilled to memory (also bad). Instead, you can just use normal JS variables and let the JS engine do actual register allocation. Also, you should probably consolidate variables (closure compiler advanced opts can do that for you - and actually that might work well enough even with your current register allocation).

All of this is general stuff, I don't know if it can explain why Chrome is faster on this benchmark. It might be though that Chrome is better at something that your code relies on, for example reading/writing to typed arrays, while Firefox is better at other things; the optimizations I mentioned before did seem to indicate that there are unnecessary memory accesses in your generated code.
Hi,

thanks for your comments

1) the thing using sp+=8, it's just for clarity, it's not in the critical functions
2) that's something it's in the todo, but we are aggressively inlining functions I don't think in this case it could help a lot
3) we changed the target generator to support this heap32[(fp+-331)] = (r0 + 96)
and no performance gain
4) the register approach in this case only uses 25 registers, there is no spill
Aggressively inlining functions is something that was discussed previously in this bug. That can probably explain the problem here, I have seen it in several other benchmarks (broadway h264, bullet, for example).

Right now, Firefox's type inference can be slow on large functions, because it ends up recompiling the entire thing. There are bugs open on it, for example bug 698822 and dependencies.

To verify that this is the problem, disable inlining or make it much less aggressive. I expect this will make you *much* faster on Firefox nightly.

Two minor additional comments:

1. Even if you have no spilling of registers, there is a cost to defining and using 25 local variables in a function. Coalescing them can speed things up (again, closure advanced does this for example).
2. Looks like you create the heapX's in a function. If you create them instead in the top level, then in theory that can help global static inference (it can see they are definitely created, and only created once, so it knows what it will get wherever they are used). I don't know if right now there is optimization for that, but it's worth trying (if there isn't right now, I expect there will be in the future).
Last activity here was over a year ago and a *lot* has happened in the meantime, both in the JS engine and I assume Mandreel. For instance we are now faster than v8 on Octane-mandreel.

Please file a new bug with updated tests if you still see this problem.
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: