Last Comment Bug 557423 - WebGL quake "too slow to be supported" in Firefox
: WebGL quake "too slow to be supported" in Firefox
Status: NEW
:
Product: Core
Classification: Components
Component: JavaScript Engine (show other bugs)
: Trunk
: All All
: -- normal with 9 votes (vote)
: ---
Assigned To: Mike Shaver (:shaver -- probably not reading bugmail closely)
:
Mentors:
http://code.google.com/p/quake2-gwt-p...
Depends on: 692180 592833
Blocks:
  Show dependency treegraph
 
Reported: 2010-04-05 20:34 PDT by Mike Shaver (:shaver -- probably not reading bugmail closely)
Modified: 2012-03-27 11:35 PDT (History)
34 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
add "webgl" canvas-id (988 bytes, patch)
2010-06-21 07:14 PDT, Benoit Jacob [:bjacob] (mostly away)
vladimir: review-
Details | Diff | Review
quake2-gwt-port changes (12.59 KB, patch)
2010-06-21 08:10 PDT, Barak Gross
no flags Details | Diff | Review
Updated diff (23.29 KB, patch)
2010-06-21 15:52 PDT, Barak Gross
no flags Details | Diff | Review
latest diff (28.91 KB, patch)
2010-06-22 04:14 PDT, Barak Gross
no flags Details | Diff | Review
screenshot (240.64 KB, image/jpeg)
2010-06-22 04:27 PDT, Barak Gross
no flags Details
screenshot: quake2 scene in minefield (62.19 KB, image/jpeg)
2010-06-22 16:14 PDT, Benoit Jacob [:bjacob] (mostly away)
no flags Details
latest diff, keeping OBF (recommended if you just want to play) (42.44 KB, patch)
2010-06-23 06:10 PDT, Benoit Jacob [:bjacob] (mostly away)
no flags Details | Diff | Review
Perf (profiler) results for quake2 benchmark (130.07 KB, text/plain)
2010-06-23 08:43 PDT, Benoit Jacob [:bjacob] (mostly away)
no flags Details
Perf results, more detailed, less readable (291.21 KB, text/plain)
2010-06-23 08:54 PDT, Benoit Jacob [:bjacob] (mostly away)
no flags Details
Raw perf output, compressed (907.20 KB, application/octet-stream)
2010-06-23 09:05 PDT, Benoit Jacob [:bjacob] (mostly away)
no flags Details
Raw perf data with call-graph recording (1019.19 KB, application/x-xz)
2010-06-23 12:26 PDT, Benoit Jacob [:bjacob] (mostly away)
no flags Details
Perf call-graph (plain text, ASCII art) (82.19 KB, application/x-xz)
2010-06-23 12:31 PDT, Benoit Jacob [:bjacob] (mostly away)
no flags Details
:) (157.75 KB, image/jpeg)
2010-06-24 13:08 PDT, Barak Gross
no flags Details
latest diff (30.55 KB, patch)
2010-06-24 17:06 PDT, Barak Gross
no flags Details | Diff | Review
Profiler simple results against tracemonkey branch (126.24 KB, text/plain)
2010-07-16 12:09 PDT, Benoit Jacob [:bjacob] (mostly away)
no flags Details
Profiler call graph against tracemonkey branch (76.03 KB, application/x-xz)
2010-07-16 12:11 PDT, Benoit Jacob [:bjacob] (mostly away)
no flags Details
Patch to address texImage2D changes (2.89 KB, patch)
2010-08-15 12:15 PDT, Barak Gross
no flags Details | Diff | Review
Shader valdation patch (517 bytes, patch)
2010-08-22 10:15 PDT, Barak Gross
no flags Details | Diff | Review
Windbg log of the crash (216.87 KB, text/plain)
2010-09-01 11:58 PDT, Barak Gross
no flags Details
Windbg log of the latest nightly (546.73 KB, text/plain)
2010-09-01 12:31 PDT, Barak Gross
no flags Details
Backtraces for first few threads (9.82 KB, text/plain)
2010-09-01 12:32 PDT, Benoit Jacob [:bjacob] (mostly away)
no flags Details
Backtraces for all threads when crash occurs (481.88 KB, text/plain)
2010-09-01 12:38 PDT, Benoit Jacob [:bjacob] (mostly away)
no flags Details
tracing thread creation (47.74 KB, text/plain)
2010-09-01 12:54 PDT, Benoit Jacob [:bjacob] (mostly away)
no flags Details

Description Mike Shaver (:shaver -- probably not reading bugmail closely) 2010-04-05 20:34:38 PDT
"Comment by stefan.haustein,  Apr 02 (3 days ago)

@jeremycalvert93: We would love to support FireFox. I think we would not have WebGL today without Vladimir Vukićević's pioneering work on Canvas 3D and WebGL. Actually FireFox was the first browser to support Canvas 3D: http://blog.vlad1.com/2007/11/26/canvas-3d-gl-power-web-style/

The main problem with FireFox is JavaScript performance: While WebGL shifts a lot of the work to the graphics card, there is still a significant amount of JavaScript code that needs to be executed. The frame rates we saw when doing experiments with FireFox were just too low to make the game playable."

To the bat-profiler, I guess.
Comment 1 Mike Shaver (:shaver -- probably not reading bugmail closely) 2010-04-05 20:40:43 PDT
Oh sweet jeebers, you have to compiler a bunch of Java stuff.  I'll...sit on this for a while.
Comment 2 Andreas Gal :gal 2010-04-05 21:06:52 PDT
Yeah, its GWT, a JavaScript interpreter running Java bytecode.
Comment 3 Mike Shaver (:shaver -- probably not reading bugmail closely) 2010-04-05 21:11:44 PDT
I think GWT produces JS as output, not an interpreter running JVM bytecode.  But the server side of it is a bunch of Java that I have to set up an environment for, which will take some cycles.
Comment 4 David Mandelin [:dmandelin] 2010-04-06 13:29:42 PDT
Could we get them to test it? Send them a debug build and have them send back TM spew and/or TraceVis results?
Comment 5 Boris Zbarsky [:bz] (Out June 25-July 6) 2010-04-06 13:34:21 PDT
Or just point us to a public url where the thing is set up....
Comment 6 Robert Sayre 2010-04-07 07:55:35 PDT
I followed their build instructions and got the Jetty server up and running, but it says my WebGL support is turned off. I think I have the right prefs... which is it?
Comment 8 Mike Shaver (:shaver -- probably not reading bugmail closely) 2010-04-07 08:00:10 PDT
webgl.enabled_for_all_sites should be set to true.  Does the page at https://cvs.khronos.org/svn/repos/registry/trunk/public/webgl/sdk/tests/webgl-conformance-tests.html say "WebGL not enabled" in red near the "run tests" button at the top?

(I don't recommend running the tests, since the first one I clicked crashed me. :-P )
Comment 9 Benoit Jacob [:bjacob] (mostly away) 2010-06-16 09:01:44 PDT
@ Robert Sayre especially:

quake2-gwt does *not even try* to create a WebGL context before saying "WebGL Support Required". I know that because I set breakpoints in our WebGL initialization code, and they are never reached when loading quake2-gwt, while of course they are reached by every WebGL-using page.

So at least this particular error message, "WebGL Support Required", is erroneous. There must be other reasons why quake2-gwt doesn't run in minefield, for example web sockets or something like that.

I filed a bug report but didn't get a reply:
http://code.google.com/p/quake2-gwt-port/issues/detail?id=26

In any case I'm not happy about this erroneous error message that gives our WebGL implementation a bad name, and the above-quoted comment on our JS performance doesn't hold water either, because:
 * if they were able to test then why aren't _we_ able to test?
 * it's not like we're 10 times slower than google, so if it's playable for them then (on good enough hardware) it must be playable for us.
Comment 10 Barak Gross 2010-06-21 05:49:04 PDT
After seeing Benoit's comment I was curious to see where exactly in the code Firefox bails out, what I've found is the following:
First off- the messages API that the project uses (while working fine with Chrome) doesn't work at all with Firefox so debugging the code was very hard initially - using the title bar to "throw out" messages (later on, I added the gwt-log module which lets the project throw messages in Firebug as well as Chrome which seems to be working well now).

After some poking in to the code, I saw that it fails in:
gl = WebGL.getContext(canvas, JavaScriptObject.createObject()) (src/jake2/gwt/client/WebGLAdapter.java, line:117).

which goes into

    var names = ["webgl", "moz-webgl", "webkit-webgl", "webkit-3d", "experimental-webgl"];
    for (var i = 0; i < names.length; i++) {
      var ctx = canvas.getContext(names[i], options);
      if (ctx != null) { 
          return ctx;
      }
    }
    return null;

src/com/google/gwt/corp/webgl/client/WebGL.java , line:420

As you can see, the code assumes that if the returned value from getContext != NULL it's a valid value.
After consulting with Benoit I changed the order of the possible canvas ids so that "experimental-webgl" and "moz-webgl" were be the first to be initialized and it worked (I guess firefox returns something non-null once it's not the correct id?) :)

After that hurdle it seemed to get along a little further- initializing resource loader, compatibility stuff (had to disable some test there so it'll pass that part), sound, web sockets.
But then it crashes in initializing some of the basic routines (initialization, shutdown, frame generation) so I had to fix logging to understand where the other problems are.
After I fixing the logging I have the following problems (compared to latest Chrome nightly running on my computer, I'd use the term running loosely since it also doesn't run):

* Config files aren't loaded properly (Web Storage API problems?)
* WebGL info such as:
GL_VENDOR: NVIDIA Corporation
GL_RENDERER: GeForce 8600M GT/PCI/SSE2
GL_VERSION: OpenGL ES 2.0 Chromium

Don't load up properly in Firefox, a number of javascript errors appear in the console instead

* After that the same errors that appear in Chrome appear in Firefox- a bug was opened about it in the project's website (http://code.google.com/p/quake2-gwt-port/issues/detail?id=25) and supposedly fixed but it still doesn't seem to work on my pc...
The errors are:
at Unknown.$Kd((Unknown source:0)
at Unknown.yOd(Unknown source:0)
at Unknown.vFb(Unknown source:0)
at Unknown.Ecb(Unknown source:0)
at Unknown.C_d((Unknown source:0)
at Unknown.tPd((Unknown source:0)
etc...

Would you like me to post the changes I made so it'll get this far? if so, where to and how?
Also, I'm a CS student (in between tests) so I probably won't have any more time to spend on this in the near future (couple of weeks) - sorry :)
Comment 11 Benoit Jacob [:bjacob] (mostly away) 2010-06-21 07:13:34 PDT
Adding Vlad in CC...

> After consulting with Benoit I changed the order of the possible canvas
> ids so that "experimental-webgl" and "moz-webgl" were be the first to
> be initialized and it worked (I guess firefox returns something non-null
> once it's not the correct id?) :)

Strange. I tested some JS trying to create a canvas with a wrong id, and it did return null:

    canvas = document.createElement("canvas");
    var context = null;
    context = canvas.getContext("somenonsensicalcanvasid");

So I don't understand why this isn't happening with quake2-gwt. On the other hand, what would people here think about supporting the "webgl" canvas id? Is it important to wait for the webgl 1.0 release to do that? I'm attaching a patch doing that.

> But then it crashes

Just a clarification for other people reading this - in private email, Barak told me he means that quake2-gwt stops running here, not that Firefox itself crashes.

> * WebGL info such as:
> GL_VENDOR: NVIDIA Corporation
> GL_RENDERER: GeForce 8600M GT/PCI/SSE2
> GL_VERSION: OpenGL ES 2.0 Chromium
> 
> Don't load up properly in Firefox, a number of javascript errors
> appear in the console instead

That is certainly because of the recent WebGL spec change, basically getString() got removed from the API and instead this is now done with getParameter(), so quake2-gwt probably needs to be updated to call gl.getParameter(gl.VERSION) instead of gl.getString(gl.VERSION). Apparently, Chromium is still supporting the old API for this, but either they also support the new API or they very soon will.

Notice that here Chromium is basically telling the Web the name of your graphics card -- something they too will probably need to fix ;) We're just returning "Mozilla" here to protect your privacy.

> Would you like me to post the changes I made so it'll get this far?
> if so, where to and how?

For now, you could attach here the diff of your changes to quake2-gwt. Changes that are mozilla-specific debugging hacks should go here. If you make changes that are potentially interesting to quake2-gwt devs, send them to them!

To generate your diff:
    cd quake2-gwt-port
    hg diff > somefile.diff
Comment 12 Benoit Jacob [:bjacob] (mostly away) 2010-06-21 07:14:44 PDT
Created attachment 452714 [details] [diff] [review]
add "webgl" canvas-id

This adds "webgl" as a canvas id that we support.
Comment 13 Barak Gross 2010-06-21 08:10:32 PDT
Created attachment 452725 [details] [diff] [review]
quake2-gwt-port changes

This patch mostly adds debugging messages in the code.
It also adds firebug output to console, I didn't write it in my original post but the method for sending messages to the browser console (for some unknown reason, maybe an oversight) also in some instances sends messages to the server console, meaning some of the server's console messages appear twice and in a little garbled mode- Not that big of an issue for now :)
Comment 14 Benoit Jacob [:bjacob] (mostly away) 2010-06-21 08:15:14 PDT
Great. I see that your patch also has the change making it test "experimental-webgl" first, so one can test stuff without applying my patch to firefox.
Comment 15 Barak Gross 2010-06-21 11:49:27 PDT
Yeah, I guess I wasn't clear enough in comment 13- the attached patch includes everything I've talked about in the previous comment :)
Comment 16 Vladimir Vukicevic [:vlad] [:vladv] 2010-06-21 14:05:03 PDT
Comment on attachment 452714 [details] [diff] [review]
add "webgl" canvas-id

No, it can't be called "webgl" until we have WebGL 1.0; all the browsers agreed to support "experimental-webgl" until then (and do so) -- is something not working there?

The bug actually seems to be that we throw when we get an invalid context string, instead of returning null.  That's an easy fix.
Comment 17 Barak Gross 2010-06-21 14:17:36 PDT
Well, I did as Benoit suggested and added the getParameter() method and used it instead of getString() - it makes Firefox show the GL info:
GL_VENDOR: Mozilla
GL_RENDERER: Mozilla
GL_VERSION: WebGL 1.0
GL_EXTENSIONS: null

But breaks Chrome:
GL_VENDOR: null
GL_RENDERER: null
GL_VERSION: null
GL_EXTENSIONS: null

Which means the Chrome guys aren't supporting this yet, right?
Looking at the code, I can see that GL_EXTENSIONS are supposed to be filled with all my GL extensions but I guess that isn't working in both Chrome (see the comment I wrote before) and Firefox.
The code retrieves it with getParameter(GLAdapter.GL_EXTENSIONS), GLAdapter.GL_EXTENSIONS == 0x1F03 - is that the "correct" way?
My video cards is Opengl 3.2 compatible and an external extensions viewer (no glxinfo type prog in windows right?) reports a lot of extensions- so it definitely shouldn't be null :)

Any ideas?
Comment 18 Vladimir Vukicevic [:vlad] [:vladv] 2010-06-21 14:39:36 PDT
(In reply to comment #17)
> Well, I did as Benoit suggested and added the getParameter() method and used it
> instead of getString() - it makes Firefox show the GL info:
> GL_VENDOR: Mozilla
> GL_RENDERER: Mozilla
> GL_VERSION: WebGL 1.0
> GL_EXTENSIONS: null
> 
> But breaks Chrome:
> GL_VENDOR: null
> GL_RENDERER: null
> GL_VERSION: null
> GL_EXTENSIONS: null
> 
> Which means the Chrome guys aren't supporting this yet, right?
> Looking at the code, I can see that GL_EXTENSIONS are supposed to be filled
> with all my GL extensions but I guess that isn't working in both Chrome (see
> the comment I wrote before) and Firefox.
> The code retrieves it with getParameter(GLAdapter.GL_EXTENSIONS),
> GLAdapter.GL_EXTENSIONS == 0x1F03 - is that the "correct" way?

Nope, it should be null -- there are no WebGL extensions defined yet, so none are listed.  Though, perhaps, an empty array is more correct here rather than null.
Comment 19 Barak Gross 2010-06-21 15:52:41 PDT
Created attachment 452887 [details] [diff] [review]
Updated diff

You were right, that wasn't the problem here.
The problem was that gwt-quake expects the returned GL_VERSION to be only numbers and hits an exception when trying to convert it to float (btw- the same problem exists in Chrome and the fix posted on the project website by users doesn't work for Chrome either).
I've fixed it to cut everything and now (after a few other fixes as well) it goes a little further- it runs the game's inner loop (you can see in the messages it responds to browser window resizes) but it still doesn't render properly- I figured it's because it doesn't load the game's config files properly (like I wrote earlier) but for some odd reason it now loads the game's config files! :)

I'd post the exception it throws but since it's gwt it won't mean much to anyone (method names translated by gwt are random letters- to make it compress better) - to get to the bottom of this someone will have to debug the hell out of this thing :)

Like I said- for the next couple of weeks I'll have absolutely no free time for this so I'm leaving this patch for prosperity :)
Comment 20 Barak Gross 2010-06-21 15:54:12 PDT
note:
to cut everything = to cut everything other than the version numbers
Comment 21 Vladimir Vukicevic [:vlad] [:vladv] 2010-06-21 16:11:19 PDT
It really shouldn't care what the version number is -- I don't think there's anything useful with that information that it can do currently anyway.  What's the exception that's generated?
Comment 22 Barak Gross 2010-06-21 16:41:42 PDT
I agree- couldn't see anywhere the version number is actually being used, I fixed it because it's a very easy fix and since it affects Chrome as well as Firefox.
The exception is:

com.google.gwt.core.client.JavaScriptException: (TypeError): yqb is null stack: Dcb(394)@http://10.0.0.66:8080/gwtquake/516096A248E688D17A636C1E78AE5D89.cache.html:7837 B_d(394)@http://10.0.0.66:8080/gwtquake/516096A248E688D17A636C1E78AE5D89.cache.html:8053 FPd()@http://10.0.0.66:8080/gwtquake/516096A248E688D17A636C1E78AE5D89.cache.html:7327 bab()@http://10.0.0.66:8080/gwtquake/516096A248E688D17A636C1E78AE5D89.cache.html:5561 (393)@http://10.0.0.66:8080/gwtquake/516096A248E688D17A636C1E78AE5D89.cache.html:6101 BB((function () {a.ee();}),[object XPCCrossOriginWrapper],[object Object])@http://10.0.0.66:8080/gwtquake/516096A248E688D17A636C1E78AE5D89.cache.html:6384 (393)@http://10.0.0.66:8080/gwtquake/516096A248E688D17A636C1E78AE5D89.cache.html:5900 fileName: http://10.0.0.66:8080/gwtquake/516096A248E688D17A636C1E78AE5D89.cache.html lineNumber: 7837
at Unknown.B_d(Unknown source:0)
at Unknown.FPd(Unknown source:0)
at Unknown.bab(Unknown source:0)
at Unknown.anonymous(Unknown source:0)
at Unknown.ee();}),(Unknown source:0)
at Unknown.anonymous(Unknown source:0)

Like I said- you can't really make sense of the "translated" code :)
Comment 23 Vladimir Vukicevic [:vlad] [:vladv] 2010-06-21 16:45:03 PDT
oh, I thought you were getting an xpcom exception.  Still, TypeError is interesting there -- "yqb is null" is the actual error.  So, if you want to do some spelunking, take a look at what's on that line and see how yqb is being used.  Don't they have some way to debug this goop?
Comment 24 Barak Gross 2010-06-21 17:27:10 PDT
There actually is- gwt can be passed an argument to build the methods with meaningful names - http://java.dzone.com/news/understanding-gwt-compiler.
This project uses the Maven build system and I couldn't gwt to work by passing these argument or by setting it in it's config files (http://mojo.codehaus.org/gwt-maven-plugin/compile-mojo.html) - I probably missed something obvious...
Oh, according to what I read enabling these modes also outputs to Firebug's console so the change I made (so that Firebug output works) wouldn't be needed anymore :)
Comment 25 Barak Gross 2010-06-22 04:14:52 PDT
Created attachment 453036 [details] [diff] [review]
latest diff

OK, so- I've flipped on DETAILED mode which gave me the exact place this failed (I forgot to set some variable, silly me :)) and fixed the part that was failing.
It now gets into the main menu of the game, when I tried to start a new game it failed on trying to create/load a save game- I've disabled that part but the game still won't load.
Looking back, I can see that after the graphics initialization it throws a glGetEror() = 0x500.
As I understand it that means GL_INVALID_ENUM which shouldn't cause the game to not work...
Anyway, I'm done with messing with this for a while, a few pointers to anyone brave enough to touch this in the mean time:
* Looks like previous versions of Chrome did output the GL Extensions and by parsing that the game knew which rendering options to load (it is after all originally based on OpenGL), what I've done is to set only the flags that seem necessary for the game to run (all of this hackery is done in /src/jake2/render/gl/Main.java, line:1277) - it is very possible that rendering doesn't work because of these flags not set correctly :)
* The comment I made above about a save game not created/loaded is probably due to some deeper problem with saving / loading files

Good luck :)
Comment 26 Barak Gross 2010-06-22 04:27:50 PDT
Created attachment 453037 [details]
screenshot
Comment 27 Benoit Jacob [:bjacob] (mostly away) 2010-06-22 05:28:19 PDT
(In reply to comment #25)
> Looking back, I can see that after the graphics initialization it throws a
> glGetEror() = 0x500.
> As I understand it that means GL_INVALID_ENUM which shouldn't cause the game to
> not work...

This rings a bell: I saw code in quake2-gwt throwing an exception on every GL error. So the next step is to disable that.

Many thanks for your work - seeing the main game menu is a huge progress already.




> Anyway, I'm done with messing with this for a while, a few pointers to anyone
> brave enough to touch this in the mean time:
> * Looks like previous versions of Chrome did output the GL Extensions and by
> parsing that the game knew which rendering options to load (it is after all
> originally based on OpenGL), what I've done is to set only the flags that seem
> necessary for the game to run (all of this hackery is done in
> /src/jake2/render/gl/Main.java, line:1277) - it is very possible that rendering
> doesn't work because of these flags not set correctly :)
> * The comment I made above about a save game not created/loaded is probably due
> to some deeper problem with saving / loading files
> 
> Good luck :)
Comment 28 Benoit Jacob [:bjacob] (mostly away) 2010-06-22 15:02:35 PDT
I applied your 'latest diff' but I don't see anything; I get this error:

JavaScript error: http://localhost:8080/gwtquake/7245D2C92A086CFA7731D30F48655FD3.cache.html, line 734: console is not defined

The corresponding source code is:

console.log(this.com_google_gwt_corp_compatibility_ConsolePrintStream_buf.java_lang_StringBuilder_impl.com_google_gwt_core_client_impl_StringBufferImplAppend_string);

How to fix that? Sorry, i'm very ignorant about Java and JS.
Comment 29 Barak Gross 2010-06-22 15:07:27 PDT
That's alright, I'm pretty ignorant at JS as well :)
That message appears when the Firebug console isn't enabled... 
Do you have Firebug installed? if not you need to install it from http://getfirebug.com/downloads (latest alpha versions), open it and enable console...
Comment 30 Benoit Jacob [:bjacob] (mostly away) 2010-06-22 16:13:19 PDT
OK, thanks, once this was fixed, I got the game actually loading !!!! It just took veeeeery long. Should retry with a release build, instead of debug --- that's going to make a huge difference here.

Here is a screenshot!
Anyone happy about this, should thank Barak!
Comment 31 Benoit Jacob [:bjacob] (mostly away) 2010-06-22 16:14:07 PDT
Created attachment 453225 [details]
screenshot: quake2 scene in minefield
Comment 32 Benoit Jacob [:bjacob] (mostly away) 2010-06-22 16:55:17 PDT
Just tried with a release build of Minefield, and it's already playable!
Comment 33 Nicholas Nethercote [:njn] (on vacation until July 11) 2010-06-22 18:09:24 PDT
Benoit, Barak:  awesome work!
Comment 34 Barak Gross 2010-06-23 00:23:01 PDT
Great to hear :)
What did you change to make it get past the main menu?
Did you change back from DETAILED mode (back to OBF)? it probably also has a lot to do with the current slowness of the game...
Comment 35 Benoit Jacob [:bjacob] (mostly away) 2010-06-23 04:22:40 PDT
The only difference between what you told me and what I did, is that I commented out all the console... lines in the generated HTML, as I had not yet read your reply about this requiring the firebug console to be opened.

Also, note that with a debug build of firefox, it takes a good minute or two from the time you launch a game in the main menu, to the time you actually start playing. With a release build of firefox, this goes much faster.
Comment 36 Barak Gross 2010-06-23 05:03:38 PDT
I see, well- I'm using a nightly build from the mozilla-central trunk builds (http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-central-win32/)- any idea if those are built in release or debug mode?

I tried to wait a few minutes but the game still won't load, I think the difference is that I'm running the browser in a Windows box and the server is running in a Linux VM (since quake-gwt server won't compile/run on Windows)- while you're obviously running both on the same box- I'm not sure if my configuration is even supported in quake-gwt...
The exceptions I get are about load/save file permission problems on the Linux box- so it sounds plausible that the "server/client on different environments" issue is causing this.
I've tried connecting via the multiplayer option but the default config tries to connect to the localhost, and when I manually try to connect to the VM's ip address the server won't accept my connection... oh well, at least it works on your end :)

If anyone else cares to try this, I'd love to hear more success/fail stories :)
And like I said in my previous comment- changing the build options back from DETAILED mode to OBF mode (in the diff, see the second change in pom.xml)  means you won't have to install Firebug to get this working...
Comment 37 Benoit Jacob [:bjacob] (mostly away) 2010-06-23 05:08:05 PDT
Indeed, I am running both client and server on the same machine.

With a colleague at Mozilla, we tried letting  him use my server from his machine. After unblocking port 8080 on my machine, he could access my server from his on the local network. He then tried 2 machines. With his windows one, all he got was "WebGL support required". But with his Mac, he could actually play!
Comment 38 Benoit Jacob [:bjacob] (mostly away) 2010-06-23 05:10:13 PDT
Ah, thanks for the DETAILED/OBF idea, I had overlooked this!
Comment 39 Benoit Jacob [:bjacob] (mostly away) 2010-06-23 05:42:55 PDT
I tried the OBF config, by removing the corresponding line from your diff, that didn't work, even on a clean build directory. Still the same error about 'console' being undefined. I then tried with firebug console open: still the same error!

So I redid my trick about removing manually the 'console' thing from the generated HTML, namely this file (only 1 file matches this pattern):

    war/gwtquake/*.cache.html

and that fixed again the game.
Comment 40 Benoit Jacob [:bjacob] (mostly away) 2010-06-23 06:08:12 PDT
The benchmark is running! However, it seems to be running many times and I was not patient enough to wait for it to complete all its runs. Here's the result from the 1st run, which was the best out of 4:

632 frames, 71 seconds, 8.9 FPS

This is largely independent of the size of the windows; fwiw, it was 1324x768.
Comment 41 Barak Gross 2010-06-23 06:10:02 PDT
That's weird :)
could you check that the file /maven-build/client/pom.xml has <style>OBF</style> instead of <style>DETAILED</style> - just to make sure that the diff was applied correctly...
Firebug seems to have a "work/no-work" relationship with the latest builds- seems to work fine, close and open Firefox and for some reason it stops working... what usually fixes that for me is a reset it's settings/uninstall/reinstall loop.
Could you try and hold off on setting the style back and try to get it work with the Firebug console on a windows machine? I'd love to see if that box has the same problem I'm having (I get a NoFileFoundException, trying to open a RandomAccessFile)
Comment 42 Benoit Jacob [:bjacob] (mostly away) 2010-06-23 06:10:50 PDT
Created attachment 453365 [details] [diff] [review]
latest diff, keeping OBF (recommended if you just want to play)
Comment 43 Barak Gross 2010-06-23 06:11:42 PDT
That's weird = about console, not about the benchmark :)
Comment 44 Benoit Jacob [:bjacob] (mostly away) 2010-06-23 06:35:08 PDT
HOWTO play Quake2 in Firefox:

Prerequisites: you need a Linux or Mac box, and some packages installed: Lame, Ogg Vorbis, Java. These instructions are adapted from
  http://code.google.com/p/quake2-gwt-port/wiki/BuildingAndRunning

Check out the Quake2 code:

  $ hg clone https://quake2-gwt-port.googlecode.com/hg/ quake2-gwt-port
  $ cd quake2-gwt-port

Now apply Barak's patch, the version keeping OBF: go to
  https://bug557423.bugzilla.mozilla.org/attachment.cgi?id=453365
and save as diff-keeping OBF, and apply it:

  $ patch -p1 < /path/to/diff-keeping-OBF

Now build quake2 and install the resources (will download them for you).

  $ ./build-dedicated-server
  $ ./install-resources
  $ cp -r maven-build/server/target/gwtquake/war/gwtquake war

Note, the 'cp' line above is really needed for now (explained on above-linked install instructions).

Now it where I have to remove the console.log(...) calls. I use this command:

  $ find war/gwtquake/ -name *.cache.html | xargs sed -i 's/console\.log([^)]*);*//g'

Now you should have a working quake2, just launch the server:

  $ ./run-dedicated-server

Lauch a RELEASE BUILD of Minefield and navigate to:

  http://localhost:8080/GwtQuake.html
Comment 45 Benoit Jacob [:bjacob] (mostly away) 2010-06-23 06:37:29 PDT
(In reply to comment #41)
> That's weird :)
> could you check that the file /maven-build/client/pom.xml has
> <style>OBF</style> instead of <style>DETAILED</style> - just to make sure that
> the diff was applied correctly...

Yes, I checked. In OBF, there are only 3 console.log() calls to remove, but I still have to remove them.

> Firebug seems to have a "work/no-work" relationship with the latest builds-
> seems to work fine, close and open Firefox and for some reason it stops
> working... what usually fixes that for me is a reset it's
> settings/uninstall/reinstall loop.
> Could you try and hold off on setting the style back and try to get it work
> with the Firebug console on a windows machine?

Sorry, I don't have a windows setup at the moment. But the quake2-gwt wiki says windows is not supported anyway.
Comment 46 Barak Gross 2010-06-23 06:49:04 PDT
Windows isn't supported as a server, should work as a client...
I don't really understand the reason behind my problem but I'm just ecstatic this works on anything :)
Comment 47 Benoit Jacob [:bjacob] (mostly away) 2010-06-23 08:43:42 PDT
Created attachment 453388 [details]
Perf (profiler) results for quake2 benchmark

Here is the output of 'perf' (the new linux kernel profiler).

Quick summary: we're spending most of our time in the JS engine, very little time in WebGL. That can be explained by the fact that Quake2, released in 1997, does a lot of stuff in software, it doesn't take full advantage of GL.

I hope that this inspires our JS gurus to make us run faster! There seems to be room for improvement: I get only 8.9 FPS on my big Core i7 @ 1.6 GHz; I think to have read somewhere that Chromium gets much better framerates.

*** How this was generated ***

On a release build of minefield:
  $ perf record -f -- dist/bin/firefox -P test-release -no-remote
Then go to quake2-gwt, run the benchmark, let it complete two runs, quit firefox, then do:
  $ perf report --sort symbol | c++filt  > ~/quake2-perf

*** More analysis of the results ***

The symbols accounting for more that 1% of execution time are:

    33.45%  [.] js_Interpret
     5.88%  [.] JSScopeProperty::trace(JSTracer*)
     4.77%  [.] js_TraceObject
     4.46%  [.] js_CallGCMarker(JSTracer*, void*, unsigned int)
     4.25%  [.] 0x000038200802bf
     3.39%  [.] 0x0000383bc6ba20
     3.01%  [.] JSScope::searchTable(long, bool)
     2.15%  [.] gfxUtils::PremultiplyImageSurface(gfxImageSurface*, gfxImageSurface*)
     2.09%  [.] js_LookupPropertyWithFlags
     1.55%  [.] js_GetPropertyHelper
     1.43%  [.] JS_TraceChildren
     1.30%  [.] js::SweepScopeProperties(JSContext*)
     1.27%  [.] JSScope::changeTable(JSContext*, int)
     1.22%  [.] js_NativeGet

This is mostly JS stuff.

Where are the WebGL symbols? Here are the top 10:

$ grep WebGL ~/quake2-perf | head -10
     0.15%  [.] mozilla::WebGLContext::QueryInterface(nsID const&, void**)
     0.09%  [.] mozilla::WebGLContext::ValidateBuffers(unsigned int)
     0.08%  [.] mozilla::WebGLContext::AddRef()
     0.06%  [.] nsICanvasRenderingContextWebGL_Uniform1i(JSContext*, unsigned int, long*)
     0.04%  [.] int xpc_qsUnwrapThis<nsICanvasRenderingContextWebGL>(JSContext*, JSObject*, JSObject*, nsICanvasRenderingContextWebGL**, nsISupports**, long*, XPCLazyCallContext*)
     0.04%  [.] mozilla::WebGLUniformLocation::QueryInterface(nsID const&, void**)
     0.04%  [.] nsICanvasRenderingContextWebGL_DrawArrays(JSContext*, unsigned int, long*)
     0.03%  [.] unsigned int xpc_qsUnwrapArg<nsIWebGLUniformLocation>(JSContext*, long, nsIWebGLUniformLocation**, nsISupports**, long*)
     0.03%  [.] mozilla::WebGLContext::Uniform1i(nsIWebGLUniformLocation*, int)
     0.03%  [.] mozilla::WebGLContext::Release()

Not only this is very little, but much of that is actually XPCOM stuff!

In conclusion, to make Quake2 go faster, we need to look at the JS side of things, not the WebGL side. As you guys had apparently guessed in the very first comments on this bug :)
Comment 48 Benoit Jacob [:bjacob] (mostly away) 2010-06-23 08:54:14 PDT
Created attachment 453392 [details]
Perf results, more detailed, less readable

This version is more detailed, it has been generated by this command:

perf report --sort comm,dso,symbol | c++filt  > ~/quake2-perf-moreinfo
Comment 49 Benoit Jacob [:bjacob] (mostly away) 2010-06-23 09:05:14 PDT
Created attachment 453397 [details]
Raw perf output, compressed

Here's the raw output of 'perf', compressed.

Uncompress it:
  $ bunzip2 perf.data.bz2

Then you can generate reports for yourself with 'perf report'.
Comment 50 Andreas Gal :gal 2010-06-23 09:06:23 PDT
Amongst others we GC ourselves to death here. We are working on that part. Great work everyone. Fun to watch this bug make progress.
Comment 51 Andreas Gal :gal 2010-06-23 09:29:35 PDT
We should try this with the fatval branch.
Comment 52 Boris Zbarsky [:bz] (Out June 25-July 6) 2010-06-23 09:32:42 PDT
Can we please get bugs blocking this one filed on at least the following issues:

1) 15% or more is spent in GC, based on comment 47.
2) We're somewhat failing to trace here (30+% in js_Interpret!).
Comment 53 Andreas Gal :gal 2010-06-23 09:36:04 PDT
bz, if you guys want to try the fatval branch, I wouldn't be surprised if the GC stuff is doubles.
Comment 54 Boris Zbarsky [:bz] (Out June 25-July 6) 2010-06-23 09:47:50 PDT
Does browser run on fatval branch?  If so, are there nightlies?  If not, can you point Benoit at the right thing to pull?

I still think it's worth the separate bugs to keep issues straight.
Comment 55 David Mandelin [:dmandelin] 2010-06-23 10:45:01 PDT
(In reply to comment #54)
> Does browser run on fatval branch?  If so, are there nightlies?  If not, can
> you point Benoit at the right thing to pull?
> 
> I still think it's worth the separate bugs to keep issues straight.

The fatval browser works. There are no nightlies yet. But I think you want to wait until we get the traceable-native quickstubs re-enabled there before trying this one out. I'll post here when it is ready.
Comment 56 Benoit Jacob [:bjacob] (mostly away) 2010-06-23 12:26:35 PDT
Created attachment 453464 [details]
Raw perf data with call-graph recording

This was recorded with the -g option, so it has call graph info.

You need xz to uncompress it (it was really big).

You can then do:
  $ perf report | c++filt | wc -l

and you should see a nice graph in ASCII art.
Comment 57 Benoit Jacob [:bjacob] (mostly away) 2010-06-23 12:31:05 PDT
Created attachment 453466 [details]
Perf call-graph (plain text, ASCII art)

Uncompress this with xz. The resulting file is a plain text call-graph.
Comment 58 David Mandelin [:dmandelin] 2010-06-23 12:35:26 PDT
(In reply to comment #52)
> Can we please get bugs blocking this one filed on at least the following
> issues:
> 
> 1) 15% or more is spent in GC, based on comment 47.

fatvals may help here, as Andreas said. Compartments could conceivably mitigate this a bit too.

> 2) We're somewhat failing to trace here (30+% in js_Interpret!).

Feel free to find out why; maybe it's just a simple missing feature. Otherwise, JM hopefully will help with this. I'm going to be trying to get the 'moo' browser in a testable state very soon, so we can see how it does there.
Comment 59 David Mandelin [:dmandelin] 2010-06-23 12:37:09 PDT
Oh, one more thing from the profile: it seems like some property access functions are in the top group. Some of this may be fixed by bug 558451. But there may be other kinds of property accesses that we are not caching, but should. We should figure out what's going on there so we know whether current scope and PIC work will cover it, or if we need more.
Comment 60 Nicholas Nethercote [:njn] (on vacation until July 11) 2010-06-23 14:20:07 PDT
In case people don't know, "fatvals" refers to a branch currently being developed in which the representation of basic JS values has changed.  Most significantly for this bug, doubles are represented directly, rather than being represented by pointers to values on the heap.  This make them faster to access and also means much less garbage collection is required.
Comment 61 David Mandelin [:dmandelin] 2010-06-23 17:20:09 PDT
OK, fatvals is ready to test this. You can check out and build from:

  http://hg.mozilla.org/users/lwagner_mozilla.com/fatval/

on any *32-bit* OS (don't worry, 64-bit support is getting done even as we speak). Or, we could supply someone with an opt build. Just let me know what platform.
Comment 62 Benoit Jacob [:bjacob] (mostly away) 2010-06-23 19:26:29 PDT
My platform is linux x86-64, please let me know when it is supported by fatvals.
Comment 63 Barak Gross 2010-06-24 11:45:42 PDT
Hi Everyone,
Well- I've snooped a little more into the code (to try and understand why it doesn't work on my setup - Windows browser, linux server on a different machine) and have a good understanding of how the code is supposed to work but have no idea why it doesn't:
It seems to get stuck at loading certain files (mainly map files).
To load these files the game uses XMLHttpRequest and has an async callback function which is supposed to run whenever XMLHttpRequest's internal status changes (http://msdn.microsoft.com/en-us/library/dd576252%28v=VS.85%29.aspx is how the async callback is set up, http://msdn.microsoft.com/en-us/library/ms534361%28v=VS.85%29.aspx are possible states it checks) - inside it checks the current state and by that determines if the file has finished loading or not (Sorry if this stuff is obvious to you- all new to me :)).

Using debug messages I can see that the callback is only called once (after it's been setup - readyState = 1) but never called again, even if there is some kind of a connection issue it should be called again at the end of the call (after it's command has been set and the send command was called - should change to readyState = 2).
Trying to load a large map file only shows that the browser sent the command to the server (via Firebug console- GET http://10.0.0.66:8080/baseq2/maps/demo1.bsp 200 OK 13ms) but it never changes past that and eventually times out (internal game timer).
At a small demo files (GET http://10.0.0.66:8080/baseq2/demos/q2demo1.dm2 200 OK 99ms - total size 160kb for example) I can see via the Firebug net column that the file has finished downloading but the callback function is never called!

The function that's responsible for setting all of this up (and has the callback function) can be seen here: http://www.google.com/codesearch/p?hl=en#bwYYd3m9KuA/src/jake2/gwt/client/GwtResourceLoaderImpl.java&q=loadResourceasync%20package:http://quake2-gwt-port\.googlecode\.com&sa=N&cd=3&ct=rc&l=68

Who should I talk to about this? anyone with any insight on this?

Thanks :)
Comment 64 Barak Gross 2010-06-24 13:08:26 PDT
Created attachment 453827 [details]
:)

Hooray it works on my setup as well :)
Going over XMLHTTPREQUEST bugs I ran into Bug 343769 - (XMLHttpRequest onreadystatechange not firing after readyState = 1) which basically means the behavior I described is exactly what happens when Firebug is running.
So- I removed any firebug related stuff in the code (OBF + Benoit's script) and it worked :)

Boy do I feel dumb for not following Benoit's exact instructions, oh well- no harm done... so this is tested to work on Linux and Windows (with a different machine as the server of course, since windows is not supported as a server) :)
Comment 65 Barak Gross 2010-06-24 16:19:01 PDT
So... since I'm running a 32bit OS I thought I'd give the fatvals branch a try so I built a build from the repo David linked to.
Unfortunately the game didn't start the first time I ran it, after (enabling and) looking at the error console I saw a "GetParameter: invalid parameter" error :(
Which I guessed means that the whole "get by getParameter and not by getString" discussion we had above (see above) about the WebGL spec changes was not merged into the Fatvals branch.
I changed it to use the getString method but that still didn't work- it returned null values and the following message appeared in the console:

com.google.gwt.core.client.JavaScriptException: (NS_ERROR_NOT_IMPLEMENTED): Component returned failure code: 0x80004001 (NS_ERROR_NOT_IMPLEMENTED) [nsICanvasRenderingContextWebGL.getString] QueryInterface: function QueryInterface() { [native code] } result: 2147500033 filename: http://10.0.0.66:8080/gwtquake/D1208C0C140A6CE3CB97175A50D436F3.cache.html lineNumber: 61084 columnNumber: 0 inner: null data: null initialize: function initialize() { [native code] 

So I'm guessing this is half baked right now?

Anyway, instead of doing this querying stuff I Just set these strings in advance (as I probably should've done in the first place- to avoid compatibility problems with Chrome as well) to my mozilla strings (just set the strings to match the ones outputted above) and... the game works :)

Unfortunately the FPS in the benchmark is lower than on the previous build :(
Latest nightly: 6.9fps
Fatvals branch: 5.7fps

I know this info is useless without an analysis of the run, I'm running windows so if anyone wants to talk me through it I'll be happy to do it :)
I'll post these latest changes in an updated diff (though not much as changed- like I said...).
Comment 66 Benoit Jacob [:bjacob] (mostly away) 2010-06-24 16:46:11 PDT
Hm, indeed the WebGL implementation has improved a lot recently, and fatvals seems not to have merged in mozilla-central changes in the past few weeks, so fatvals doesn't have these WebGL fixes. You could try doing the merge yourself, e.g. "hg merge ../mozilla-central", but I have no idea if this is likely to give a lot of conflicts, you need fatvals people here :)
Comment 67 Barak Gross 2010-06-24 17:06:57 PDT
Created attachment 453924 [details] [diff] [review]
latest diff

Like I wrote in the previous comment- changed it so that we no longer query any of the GL variables (GL_VENDOR, GL_RENDERER etc...) and instead these variables are preset.
This also makes the latest Chrome run (although it outputs tons of error messages about one of the GL functions being obsolete) but it freezes a couple of seconds into the game... boo hoo :)
Comment 68 Nicholas Nethercote [:njn] (on vacation until July 11) 2010-06-24 17:09:22 PDT
The fatvals branch will be merged into the TraceMonkey repository (hg.mozilla.org/tracemonkey) soon, perhaps by this weekend?  TraceMonkey is kept pretty well up-to-date w.r.t. mozilla-central.  So perhaps it's best to wait until that merge happens.
Comment 69 Barak Gross 2010-07-07 01:08:37 PDT
Thanks to Stefan Haustein the Firefox fixes were merged to mainline and latest Minefield nightly works out of the box :)
(http://code.google.com/p/quake2-gwt-port/issues/detail?id=26)
Comment 70 Benoit Jacob [:bjacob] (mostly away) 2010-07-07 12:18:42 PDT
Thanks a lot Barak and Stefan, for handling this, this is beautiful!
Comment 71 Barak Gross 2010-07-15 15:08:47 PDT
As I understand it fatvals was recently merged into Tracemonkey so... anyone up for profiling this with a Tracemonkey build? :)
Comment 72 Benoit Jacob [:bjacob] (mostly away) 2010-07-15 15:15:30 PDT
Does it support linux x86-64 now?
Comment 73 Barak Gross 2010-07-15 15:31:35 PDT
Well, the TraceMonkey nightly builds directory in http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/latest-tracemonkey/ has a linux x86_64 package so I guess so...
Can any of the JS guys confirm this assumption?
Comment 74 David Anderson [:dvander] 2010-07-15 15:35:25 PDT
Yes (it couldn't have landed without x64 support).
Comment 75 Benoit Jacob [:bjacob] (mostly away) 2010-07-16 12:03:36 PDT
OK, indeed the tracemonkey branch runs (very well) here on linux x86-64. I just had to apply the patch from bug 578215 which fixes an important canvas-related crash.

I have run and profiled quake2's benchmark with the tracemonkey branch.

Summary (by "before" I mean the results reported above in this bug report, i.e. mozilla-central from a few weeks ago):

 * In-game performance is noticeably improved, and the quake2 benchmark now reports 9.5 FPS, up from 8.9 FPS before.

 * What these numbers don't say, is that the benchmark *loading* speed is much improved. Runs an order of magnitude faster now.

 * From what I understand, profiler results (see below) seem to show that the crazy GC'ing is indeed fixed now --- presumably thanks to fat values.

 * But, profiler results still show 37% of the time spent in js::Interpret, and, as far as I understand, only 4% of the time spent in JIT'ed code.


I am attaching below the profiler results, let me just paste here the top of the list of most expensive functions:

# Overhead  Symbol
# ........  ......
#
    37.58%  [.] js::Interpret(JSContext*)
     3.85%  [.] 0x00003825818769
     3.74%  [.] JSScope::searchTable(long, bool)
     3.73%  [.] JSScopeProperty::trace(JSTracer*)
     3.21%  [.] js_TraceObject(JSTracer*, JSObject*)
     2.82%  [.] js::Mark(JSTracer*, void*, unsigned int)
     2.61%  [.] js_LookupPropertyWithFlags(JSContext*, JSObject*, long, unsigned int, JSObject**, JSProperty**)
     2.53%  [.] gfxUtils::PremultiplyImageSurface(gfxImageSurface*, gfxImageSurface*)
     1.62%  [.] js_GetPropertyHelper(JSContext*, JSObject*, long, unsigned int, js::Value*)
     1.48%  [.] JSScope::changeTable(JSContext*, int)
     1.44%  [.] 0x000038218448f0
     1.38%  [.] js_NativeGet(JSContext*, JSObject*, JSObject*, JSScopeProperty*, unsigned int, js::Value*)
     1.15%  [.] js_SetPropertyHelper(JSContext*, JSObject*, long, unsigned int, js::Value*)
     0.95%  [.] JSScope::addProperty(JSContext*, long, int (*)(JSContext*, JSObject*, long, js::Value*), int (*)(JSContext*, JSObject*, long, js::Value*), unsigned int, unsigned int, unsigned int, int)
     0.90%  [k] _nv006601rm
     0.86%  [.] js::SweepScopeProperties(JSContext*)
     0.84%  [k] audit_syscall_entry
     0.80%  [.] JS_TraceChildren
     0.50%  [.] JSScope::addPropertyHelper(JSContext*, long, int (*)(JSContext*, JSObject*, long, js::Value*), int (*)(JSContext*, JSObject*, long, js::Value*), unsigned int, unsigned int, unsigned int, int, JSScopeProperty**)
Comment 76 Benoit Jacob [:bjacob] (mostly away) 2010-07-16 12:09:12 PDT
Created attachment 457928 [details]
Profiler simple results against tracemonkey branch

Profiler results for quake2 benchmark in a build of the tracemonkey branch from today.

This is a simple table of most expensive functions, sorted by decreasing own time.
Comment 77 Benoit Jacob [:bjacob] (mostly away) 2010-07-16 12:11:51 PDT
Created attachment 457929 [details]
Profiler call graph against tracemonkey branch

This file is XZ-compressed because it was big.

It's an ASCII-art call graph of today's profiler results with the tracemonkey branch.

I'm not uploading the raw perf.data file this time, because it's quite big and I'm not sure if anybody is using it. Just tell me if you would like to have it.
Comment 78 Barak Gross 2010-08-15 12:15:05 PDT
Created attachment 466158 [details] [diff] [review]
Patch to address texImage2D changes

Hey Everyone,
I use the Firefox nightly builds on a daily basis so every now and then I run the quake2-gwt-port and see if it's improved (it mostly has :)).
Anyway, when I ran it today I saw that it no longer works- the error console shows errors regarding texImage2D and after some reading I realized that Firefox was changed to no longer accept the deprecated form of texImage2D (Bug 584815).

The attached patch makes mostly trivial changes to the function's signature/call in the quake2-gwt-port code.
I've opened a bug on the quake2-gwt-port project (http://code.google.com/p/quake2-gwt-port/issues/detail?id=30) to take back the patch but in the mean time this patch is needed to make it run with the latest nightlies.
Comment 79 Benoit Jacob [:bjacob] (mostly away) 2010-08-15 14:08:37 PDT
(In reply to comment #78)
> Created attachment 466158 [details] [diff] [review]
> Patch to address texImage2D changes
> 
> Hey Everyone,
> I use the Firefox nightly builds on a daily basis so every now and then I run
> the quake2-gwt-port and see if it's improved (it mostly has :)).
> Anyway, when I ran it today I saw that it no longer works- the error console
> shows errors regarding texImage2D and after some reading I realized that
> Firefox was changed to no longer accept the deprecated form of texImage2D (Bug
> 584815).

Yep. Chromium is also no longer accepting the old form.

> 
> The attached patch makes mostly trivial changes to the function's
> signature/call in the quake2-gwt-port code.
> I've opened a bug on the quake2-gwt-port project
> (http://code.google.com/p/quake2-gwt-port/issues/detail?id=30) to take back the
> patch but in the mean time this patch is needed to make it run with the latest
> nightlies.

Awesome! So you have done everything that was useful to do, I don't have anything to add :-)
Comment 80 Barak Gross 2010-08-22 10:15:28 PDT
Created attachment 468156 [details] [diff] [review]
Shader valdation patch

Hey again,
Firefox stopped working a few days ago, after a little investigation I found that the problem was with the fragment shader validation (due to http://bugzilla.mozilla.org/show_bug.cgi?id=589011) and required a pretty straight forward fix.
As before, I've opened a bug on the quake2-gwt-port project (http://code.google.com/p/quake2-gwt-port/issues/detail?id=31) to take back the patch but in the mean time this is needed to run on Firefox... :)
Comment 81 Benoit Jacob [:bjacob] (mostly away) 2010-08-22 10:53:10 PDT
Thanks once again Barak!

Notice that another work-around is to go to about:config and set webgl.shader_validator to false.
Comment 82 Barak Gross 2010-08-22 11:18:57 PDT
(In reply to comment #81)
> Thanks once again Barak!
Sure thing :)

> Notice that another work-around is to go to about:config and set
> webgl.shader_validator to false.
Sure, though solving it with the patch is better since shader validation will always be on by default, right?
Comment 83 Benoit Jacob [:bjacob] (mostly away) 2010-08-22 12:03:01 PDT
Right! I was just mentioning this as a work-around for testing.
Comment 84 Benoit Jacob [:bjacob] (mostly away) 2010-09-01 06:44:18 PDT
I tested today the tracemonkey repo, changeset e2e1ea2a39ce. My limited understanding is that this is _after_ a big jaegermonkey merge.

The results are not very good: it's sluggish and tends to crash while trying to load the game's files.

The only useful info that I have is that when it crashes, there are 512 threads (!!!). Most of them are @pthread_cond_wait.

Is there something in Jaegermonkey that would explain why so many threads are being created?

Do you need a good backtrace (requires me to make a debug build) or a profiler run?
Comment 85 Boris Zbarsky [:bz] (Out June 25-July 6) 2010-09-01 07:16:32 PDT
Is there a reason your opt builds don't have debugging symbols?  That's the simplest way to make sure you always get useful stacks...

Are you willing to check whether it's the merge itself that caused the performance drop?  Things to maybe test:

1)  Disable the method jit in preferences.  How does that affect things?
2)  Check builds from right before and after the merge.
3)  Check things from right before and after recursion-tracing was disabled.
Comment 86 Benoit Jacob [:bjacob] (mostly away) 2010-09-01 08:52:08 PDT
(In reply to comment #85)
> Is there a reason your opt builds don't have debugging symbols?  That's the
> simplest way to make sure you always get useful stacks...

Right, I didn't know how to get debugging symbols, now I've found it, thanks. Rebuilt with that option; now I sometimes (not consistently) get a different crash:

Program received signal SIGILL, Illegal instruction.

(gdb) bt
#0  0x00007fff853fa9bc in ?? ()
#1  0x0300000000000000 in ?? ()
#2  0x0000000000000000 in ?? ()


My only guess is a JIT assembler bug.

When this crash happens, there are roughly 500 other threads.

> 
> Are you willing to check whether it's the merge itself that caused the
> performance drop?  Things to maybe test:
> 
> 1)  Disable the method jit in preferences.  How does that affect things?

Disabling method jit, and also disabling all four jit options doesn't make any visible difference: still same speed, still hundreds of threads created.

I couldn't reproduce any crash with method jit disabled. But that doesn't mean much as the crash was hard to reproduce anyway.

> 2)  Check builds from right before and after the merge.

I tried the revision just before the Jaegermonkey merge. That is fc11a310d9a2 "Merge mozilla-central to tracemonkey." Unfortunately it doesn't build here (linux x86-64). Going before it is nontrivial as it is a merge, so I decided I don't have time, sorry.
Comment 87 Barak Gross 2010-09-01 10:25:18 PDT
Hey Everyone,
First of all- the above patches are no longer needed as they were merged back to the quake2-gwt source.

Regarding the recent TM builds:
To test the issue I downloaded an opt TM build from the TM tryserver builds and more specifically the one built from http://hg.mozilla.org/tracemonkey/rev/319b1a4e0883 (a Win32 build).
I've had the same problem as Benoit- the browser crashes after about a minute into the game and there seems to be a consistent pause every second or so (GC problem?) of gameplay which makes it seem very slow, looking at the crash log I see around 200 threads when it crashed.
Just as Benoit noted he browser doesn't seem to crash once I turn off methodjit but I have no idea how to check the number of threads.
Other than looking at the crash log, how can I see the number of threads in windows? (Process Explorer to the rescue maybe?)

Crash reports:
http://crash-stats.mozilla.com/report/index/bp-8df09db8-5eaa-4a53-8791-850752100901
http://crash-stats.mozilla.com/report/index/bp-aa13c280-cd1a-4ba7-9211-690ed2100901

> 3)  Check things from right before and after recursion-tracing was disabled.

Is this still suspected? if so, around which date/cset did this change happen?
Comment 88 Barak Gross 2010-09-01 10:29:35 PDT
Oh and I forgot to mention that it still crashes even when I disable tracing and only enable methodjit...
Comment 89 Benoit Jacob [:bjacob] (mostly away) 2010-09-01 10:30:51 PDT
(In reply to comment #87)
> the above patches are no longer needed

Then, for each of them, can you click 'Details' and check 'obsolete' ? Thanks.

> but I have no idea how to check the number of threads.
> Other than looking at the crash log, how can I see the number of threads in
> windows? (Process Explorer to the rescue maybe?)

I don't know much about windows, will let others reply. On my side, this is something that I see when running Firefox in a debugger.
Comment 90 Barak Gross 2010-09-01 10:58:30 PDT
(In reply to comment #89)
> I don't know much about windows, will let others reply. On my side, this is
> something that I see when running Firefox in a debugger.

Well, Process Explorer does let me see the number of threads for a given process.
There are 250+ threads even with tracing only (shoots up when the game starts).
I have the latest nightly (which TM was merged to just before JM was merged to TM as I understand it) and this 250+ threads behaviour happens there as well.
Comment 91 Boris Zbarsky [:bz] (Out June 25-July 6) 2010-09-01 11:30:48 PDT
In a debugger, can someone look at what all those threads are doing, please?
Comment 92 Barak Gross 2010-09-01 11:58:57 PDT
Created attachment 471206 [details]
Windbg log of the crash

Here's a Windbg log of the crash... interesting stuff :)
Comment 93 Vladimir Vukicevic [:vlad] [:vladv] 2010-09-01 12:13:48 PDT
That's unfortunately not useful -- you don't actually have symbols for any of the gecko files, so it's just picking nearest exported symbols.

(*** ERROR: Symbol file could not be found.  Defaulted to export symbols for C:\Users\Barak\Downloads\firefox\TM After merge\01092010\firefox-4.0b5pre.en-US.win32\firefox\xul.dll -)

If these are TM nightlies, do we know that TM nightlies push their symbols to the symbol server?
Comment 94 Barak Gross 2010-09-01 12:31:56 PDT
Created attachment 471224 [details]
Windbg log of the latest nightly

This is a Windbg log of the latest nightly, it doesn't crash but I did a break in the middle of gameplay so you could see all the background threads...
Comment 95 Benoit Jacob [:bjacob] (mostly away) 2010-09-01 12:32:26 PDT
Created attachment 471225 [details]
Backtraces for first few threads

Here are backtraces for the first dozen threads or so.

There are hundreds more threads omitted there.

The active thread is thread 1.
Comment 96 Boris Zbarsky [:bz] (Out June 25-July 6) 2010-09-01 12:35:01 PDT
Hmm.  Those last several threads with the ThreadFunc thing going on bother me.

What about breakpointing in thread creation and seeing who does it?
Comment 97 Boris Zbarsky [:bz] (Out June 25-July 6) 2010-09-01 12:36:20 PDT
In particular, nsThread::Init.
Comment 98 Benoit Jacob [:bjacob] (mostly away) 2010-09-01 12:38:04 PDT
Created attachment 471227 [details]
Backtraces for all threads when crash occurs

OK, here's a backtrace of all hundreds of threads when the crash occurs.

Thread 1 is the active thread (grrr, it is the only thread not to have debug info... does that mean that it's JITted code?)
Comment 99 Benoit Jacob [:bjacob] (mostly away) 2010-09-01 12:54:41 PDT
Created attachment 471234 [details]
tracing thread creation

OK, so I have let quake2 run until it starts getting crazy, interrupted it, set a breakpoint on nsThread::Init, and got some traces, see attachment.
Comment 100 Boris Zbarsky [:bz] (Out June 25-July 6) 2010-09-01 12:59:15 PDT
> does that mean that it's JITted code?)

Seems likely, yes.  Or just a busted stack.

All those extra threads seem to be for media elements (<audio> or <video>).  iirc we currently use 3 threads per such element... and it sounds like this page has dozens to hundreds of them?
Comment 101 Benoit Jacob [:bjacob] (mostly away) 2010-09-01 13:06:18 PDT
Indeed, this page is loading a large number of <audio> elements.

Is it really reasonnable to let a web page coerce us into spawning hundreds of concurrent threads? Naive question here --- I really don't know if it's OK to have hundreds of threads. Perhaps the crash and sluggishness are unrelated.
Comment 102 Brendan Eich [:brendan] 2010-09-01 13:11:27 PDT
No, it's not ok for us to fail to pool and share threads. Bug on file on that?

/be
Comment 103 Benoit Jacob [:bjacob] (mostly away) 2010-09-01 13:39:27 PDT
(In reply to comment #102)
> No, it's not ok for us to fail to pool and share threads. Bug on file on that?

OK, filed bug 592833 blocking this one.
Comment 104 Chris Pearce (:cpearce) 2010-09-01 14:38:08 PDT
(In reply to comment #101)
> Indeed, this page is loading a large number of <audio> elements.
> 
> Is it really reasonnable to let a web page coerce us into spawning hundreds of
> concurrent threads? Naive question here --- I really don't know if it's OK to
> have hundreds of threads. Perhaps the crash and sluggishness are unrelated.

If you disable sound does Quake2 performance improve significantly?

We've been trying to rewrite the audio subsystem to remove one of the threads per media element, but that's not going to make FF4 unfortunately. We probably could do something clever to pool and reduce all the media element threads, it would be a major piece of work, and need significant testing though.
Comment 105 Benoit Jacob [:bjacob] (mostly away) 2010-09-01 14:47:47 PDT
(In reply to comment #104)
> If you disable sound does Quake2 performance improve significantly?

Unfortunately, I can't see any way of disabling sound in Quake2-GWT (this Javascript port). All there is is a volume slider that doesn't seem to go down to zero.

> 
> We've been trying to rewrite the audio subsystem to remove one of the threads
> per media element, but that's not going to make FF4 unfortunately.

Anyway, if having hundreds of threads is really the issue here, then removing one thread per element will not fix this issue, as it will still leave 2 threads per element...

> We probably
> could do something clever to pool and reduce all the media element threads, it
> would be a major piece of work, and need significant testing though.

Yes, I guess that if the large number of threads is really what causes problems, I guess that setting a hard limit on it is the only solution; this would mean that audio loading 'jobs' would have to be queued, I guess. (being naive).
Comment 106 Chris Pearce (:cpearce) 2010-09-01 15:01:07 PDT
(In reply to comment #105)
> Anyway, if having hundreds of threads is really the issue here, then removing
> one thread per element will not fix this issue, as it will still leave 2
> threads per element...

Well, if we reduce the thread count by 1 per element, we're 30% of the way to a solution. ;)

But ultimately yes, we do need to pool the media element threads.
Comment 107 Barak Gross 2010-09-01 15:14:09 PDT
I couldn't see any easy way to disable sound either so I reverted stefan's patch to add ogg support for Firefox, since by default quake2-gwt used mp3 which Firefox doesn't support (http://code.google.com/p/quake2-gwt-port/source/detail?r=8a4cb2e062e8e08cc44c173a3725392aa001c53f has the details, if you don't feel like editing it yourself I can attach a diff).
Doing this does seem (at least to my eyes) to make the game alot better (speed wise) but it still crashes when methodjit is enabled...

Here's the crash report for this (unlike the previous reports this one seems to have a stack trace- has something changed in the last few hours?):
http://crash-stats.mozilla.com/report/index/f74d8bd7-a903-4362-89fb-4aa562100901
Comment 108 Benoit Jacob [:bjacob] (mostly away) 2010-09-01 15:26:28 PDT
(In reply to comment #107)
> Here's the crash report for this (unlike the previous reports this one seems to
> have a stack trace- has something changed in the last few hours?):
> http://crash-stats.mozilla.com/report/index/f74d8bd7-a903-4362-89fb-4aa562100901

More likely, this is a different crash: this new crash seems to be happening in non-jitted javascript (the trace shows js::Interpret and friends), while the previous crash we were having (with the sounds) looked like it happened in jitted javascript (the trace showed only memory addresses with no function names).
Comment 109 Barak Gross 2010-09-01 15:34:36 PDT
(In reply to comment #108)
> More likely, this is a different crash: this new crash seems to be happening in
> non-jitted javascript (the trace shows js::Interpret and friends), while the
> previous crash we were having (with the sounds) looked like it happened in
> jitted javascript (the trace showed only memory addresses with no function
> names).

Ah, right :)
Comment 110 Barak Gross 2010-09-01 15:53:23 PDT
(In reply to comment #107)
> Doing this does seem (at least to my eyes) to make the game alot better (speed
> wise) but it still crashes when methodjit is enabled...

Well, the browser works without crashing just long enough to run the built-in benchmark, here are my machine's results while enabling/disabling audio:
Audio enabled: 6.4-7.1 fps
Audio disabled: 10.5 fps
Comment 111 Nicholas Nethercote [:njn] (on vacation until July 11) 2010-09-01 15:54:54 PDT
(In reply to comment #105)
> 
> Yes, I guess that if the large number of threads is really what causes
> problems

Hang on... although hundreds of audio threads isn't good, the program was working well before the JM merge, right?  So either the methodjit or some other changes that were merged in along with the methodjit is the cause of the slowdown.  

When the methodjit is disabled is it still worse than it was before the JM merge?
Comment 112 Benoit Jacob [:bjacob] (mostly away) 2010-09-01 16:20:28 PDT
(In reply to comment #111)
> (In reply to comment #105)
> > 
> > Yes, I guess that if the large number of threads is really what causes
> > problems
> 
> Hang on... although hundreds of audio threads isn't good, the program was
> working well before the JM merge, right?

See comment 107. Between today and the previous time that we tried Quake2-GWT, there have also been changes in Quake2-GWT, adding sound support for Firefox. So the previous time, we just were not doing any sound.
Comment 113 Barak Gross 2010-09-01 17:11:57 PDT
(In reply to comment #112)
> See comment 107. Between today and the previous time that we tried Quake2-GWT,
> there have also been changes in Quake2-GWT, adding sound support for Firefox.
> So the previous time, we just were not doing any sound.

Hold on... sound support for Firefox was added about two months ago (see the date of the revision I linked to) - I'm not sure if the audio threads issue is a new regression or not (I've never checked out the number of threads before), I'll try and find a regression range (if one exists) tomorrow.

> Hang on... although hundreds of audio threads isn't good, the program was
> working well before the JM merge, right?
As far as performance before the JM merge and after it- subjectively speaking the "after JM merge" builds seem to have pauses every second or so while the "pre JM merge" ones doesn't seem to have that.
I've tried to use the built-in benchmark as my guide but it seems to fluctuate too much to be a decent tool for measuring these changes (at least on my system).
Comment 114 David Mandelin [:dmandelin] 2010-09-01 17:56:49 PDT
I am working on some patches to add debugging prefs that disable the ICs (inline caches) used by the methodjit. I'll let you know when they are available in TM nightlies. Seeing if the JM-related problems still occur with those pref'd off will give us some clues about what's going on.
Comment 115 David Mandelin [:dmandelin] 2010-09-01 18:15:57 PDT
(In reply to comment #114)
> I am working on some patches to add debugging prefs that disable the ICs
> (inline caches) used by the methodjit. I'll let you know when they are
> available in TM nightlies. Seeing if the JM-related problems still occur with
> those pref'd off will give us some clues about what's going on.

Actually, that plan looks like it would be more annoying and crufty than I expected. I'm just going to make a few special builds and do some testing myself.
Comment 116 Benoit Jacob [:bjacob] (mostly away) 2010-09-02 04:06:35 PDT
(In reply to comment #113)
> (In reply to comment #112)
> > See comment 107. Between today and the previous time that we tried Quake2-GWT,
> > there have also been changes in Quake2-GWT, adding sound support for Firefox.
> > So the previous time, we just were not doing any sound.
> 
> Hold on... sound support for Firefox was added about two months ago (see the
> date of the revision I linked to)

Until yesterday, I was using a revision of Quake2-GWT dating back to June. Basically, I hadn't updated since I wrote my HOWTO (comment 44). Also, yesterday was the first time that I could hear any sound.
Comment 117 Chris Pearce (:cpearce) 2010-09-02 04:50:18 PDT
(In reply to comment #110)
> (In reply to comment #107)
> > Doing this does seem (at least to my eyes) to make the game alot better (speed
> > wise) but it still crashes when methodjit is enabled...
> 
> Well, the browser works without crashing just long enough to run the built-in
> benchmark, here are my machine's results while enabling/disabling audio:
> Audio enabled: 6.4-7.1 fps
> Audio disabled: 10.5 fps

You could do the following to improve sound perf:

1. Convert the (short) sounds into WAV files. I assume they're mostly short SFX files? The Wave format isn't compressed, so is much less compute intensive to playback. The wave backend also requires fewer threads per media element FWIW.
2. Set the <audio> element's 'preload' attribute to 'none'. This means no network traffic will occur until the first time the sound is loaded. If you're creating 100 audio elements at startup to preload all the sounds (including the ones you're not going to use), this will also put a strain on your HTTP connection limit. Incidentally, WAV files require fewer connections than Ogg to load even if you've not set preload='none'.

The above measures will probably help much more than us refactoring the media decoders to have fewer threads. I discussed this today with the other guys in Auckland, reducing the thread count won't reduce the amount of work we need to do to play back or preload multiple media files concurrently. We're pretty careful to put the threads to sleep when we're not using them. We'd just be swapping the overhead of thread context switches for event dispatching overhead (or however we implemented it). Reducing the thread count would reduce memory overhead (particularly on Linux), so is still worth doing however.

Note You need to log in before you can comment on or make changes to this bug.