Last Comment Bug 679864 - Upgrade WebGL conformance test suite to r15318
: Upgrade WebGL conformance test suite to r15318
Status: RESOLVED FIXED
:
Product: Core
Classification: Components
Component: Canvas: WebGL (show other bugs)
: unspecified
: x86_64 Linux
: -- normal (vote)
: ---
Assigned To: Nobody; OK to take it and work on it
:
Mentors:
Depends on: 681400
Blocks:
  Show dependency treegraph
 
Reported: 2011-08-17 13:52 PDT by Benoit Jacob [:bjacob] (mostly away)
Modified: 2011-08-24 08:34 PDT (History)
3 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments

Description Benoit Jacob [:bjacob] (mostly away) 2011-08-17 13:52:25 PDT
This should be almost the 1.0.1 release. We're currently on 1.0.0.

Tryserver:
http://tbpl.mozilla.org/?tree=Try&rev=14c11082c56d
http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/bjacob@mozilla.com-14c11082c56d
Comment 1 Benoit Jacob [:bjacob] (mostly away) 2011-08-17 14:03:21 PDT
Test failure stats:


OS    | Failing pages  | Newly failing | Newly passing | Failing pages
      | before upgrade | pages         | pages         | after upgrade
------+----------------+---------------+---------------+---------------
Win   | 11             | 8             | 0             | 19
Linux | 15             | 17            | 5             | 27
Mac   | 14             | 16            | 2             | 28
Comment 2 Benoit Jacob [:bjacob] (mostly away) 2011-08-18 15:37:14 PDT
Had to increase timeout for webGLarray.html on win7 debug; and ignore a couple of shader related tests intermittently failing ("images are different") on Mac.

New try: http://tbpl.allizom.org/?tree=Try&usebuildbot=1&rev=4fd4c1c9ffa4
Comment 4 Benoit Jacob [:bjacob] (mostly away) 2011-08-19 08:42:18 PDT
webGLarray.html was still intermittently timing out even after I doubled the timeout delay, so I reverted that change and instead just ignored this test on Windows. It's not timing out on other platforms.
Comment 5 :Ehsan Akhgari 2011-08-19 11:23:49 PDT
I had to back it out because of mochitest-1 oranges:

http://hg.mozilla.org/mozilla-central/rev/64a6b17da6e7
Comment 7 Benoit Jacob [:bjacob] (mostly away) 2011-08-19 14:06:23 PDT
webGLarrays test is actually intermittent on all OSes. Commenting out from test list.
Comment 9 Benoit Jacob [:bjacob] (mostly away) 2011-08-21 06:00:52 PDT
The mochitest is now green on all platforms, retriggering to make sure.
Comment 10 Benoit Jacob [:bjacob] (mostly away) 2011-08-21 19:39:52 PDT
OK, what happens is that whichever test runs after quickCheckAPI.html intermittently times out due to a long GC pause. In our case, webGLarray.html was timing out and when I disabled it above I started getting timeouts in the next test, bindBuffer.html.

I used to work around that by disabling quickCheckAPI.html (before this test suite upgrade), but instead a better approach is to implement ad-hoc code in the test harness to run the GC manually after this test. This way, we don't have to disable any test.

New try:

http://tbpl.allizom.org/?tree=Try&usebuildbot=1&rev=14e5f9c1be4d
http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/bjacob@mozilla.com-14e5f9c1be4d
Comment 11 Josh Matthews [:jdm] (away until 9/3) 2011-08-21 19:48:39 PDT
If that's a mochitest, please use SpecialPowers.gc() instead of the explicit QI etc.
Comment 12 Benoit Jacob [:bjacob] (mostly away) 2011-08-21 20:12:51 PDT
Ah OK. Yes, that's a mochitest, will do.
Comment 13 Benoit Jacob [:bjacob] (mostly away) 2011-08-22 08:56:15 PDT
It turns out that running the GC didn't make a difference.

The faulty test, quickCheckAPI.html, is known for its high and random memory usage. It's a naive random fuzzer, and causes some randomly sized buffers to be allocated. Presumably the timeouts happen when the memory usage is really high, maybe due to swapping... I can't reproduce locally, but I have 4G of RAM.

This test has been disabled in our copy of the test suite for this reason, I tried to re-enable it, but I'm giving up for now. Will file a follow-up bug to make it use less memory and reenable it.
Comment 15 :Ehsan Akhgari 2011-08-22 11:26:21 PDT
(In reply to comment #10)
> OK, what happens is that whichever test runs after quickCheckAPI.html
> intermittently times out due to a long GC pause. In our case, webGLarray.html
> was timing out and when I disabled it above I started getting timeouts in the
> next test, bindBuffer.html.
> 
> I used to work around that by disabling quickCheckAPI.html (before this test
> suite upgrade), but instead a better approach is to implement ad-hoc code in
> the test harness to run the GC manually after this test. This way, we don't
> have to disable any test.

You can also try using SimpleTest.requestLongerTimeout...
Comment 16 Matt Brubeck (:mbrubeck) 2011-08-22 13:31:06 PDT
Backed out because of Win debug test failures:
https://hg.mozilla.org/mozilla-central/rev/0ae2d673d617
Comment 17 Benoit Jacob [:bjacob] (mostly away) 2011-08-22 13:55:18 PDT
CAN HAS WORKING TEST SYSTEM??? KTHX

This might be the same pattern as in comment 13: the WebGL mochitest uses lots of memory, possibly leaving the test slaves swapping pages...
Comment 18 Benoit Jacob [:bjacob] (mostly away) 2011-08-22 13:59:01 PDT
(In reply to Ehsan Akhgari [:ehsan] from comment #15)
> (In reply to comment #10)
> > OK, what happens is that whichever test runs after quickCheckAPI.html
> > intermittently times out due to a long GC pause. In our case, webGLarray.html
> > was timing out and when I disabled it above I started getting timeouts in the
> > next test, bindBuffer.html.
> > 
> > I used to work around that by disabling quickCheckAPI.html (before this test
> > suite upgrade), but instead a better approach is to implement ad-hoc code in
> > the test harness to run the GC manually after this test. This way, we don't
> > have to disable any test.
> 
> You can also try using SimpleTest.requestLongerTimeout...

Well, if tests time out for this reason, that's an indication that this quickCheckAPI test really goes over the board with memory usage and I'd rather fix it.
Comment 19 Benoit Jacob [:bjacob] (mostly away) 2011-08-22 14:23:06 PDT
All the mochitests that fail, are tests that are run SHORTLY AFTER the WebGL mochitest. This confirms that running the WebGL mochitest leaves the test slave is some sort of 'disturbed' state.

Stale WebGL contexts are all GC'd BEFORE the other failing mochitests start. This means that this isn't going to be fixed by triggering the GC somewhere specific. Though it could still be that triggering the GC after every WebGL test page keeps memory usage low enough to prevent the problem we're seeing here.
Comment 20 Benoit Jacob [:bjacob] (mostly away) 2011-08-23 08:58:21 PDT
Sorry, apparently I mistakenly marked this as fixed...
Comment 21 Benoit Jacob [:bjacob] (mostly away) 2011-08-23 12:08:32 PDT
Running mochitests localling on Win7 debug, I saw a pop-up dialog box informing me that Firefox had triggered an abort(). Filed bug 681400 about this; turns out to be a ANGLE assertion triggered by a WebGL test; the patch in bug 681400 fixes it.

The strange thing is that the abort() didn't kill the process, didn't even stop it, it really only resulted in showing this pop-up dialog, with the mochitest continuing in the background.

With the patch from bug 681400, the mochitests that are causing trouble here did run fine, so hopefully this was the only issue.

Try:
http://tbpl.allizom.org/?tree=Try&usebuildbot=1&rev=337d8cbc7a6c
http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/bjacob@mozilla.com-337d8cbc7a6c
Comment 22 Benoit Jacob [:bjacob] (mostly away) 2011-08-23 15:29:04 PDT
Seems fixed! I now have a strange orange on Mac, which seems to be a regression in mozilla-central when I last pulled; let's do one more try:

http://tbpl.allizom.org/?tree=Try&usebuildbot=1&rev=4d12e80cd934
Comment 23 Benoit Jacob [:bjacob] (mostly away) 2011-08-23 15:34:58 PDT
So, here's the explanation for the weird mochitest failures we were getting.

The new WebGL conformance tests, that I'm trying to land here, are triggering a new bug in the ANGLE library we're using. The bug is a bad assert, which is why we only have trouble in debug builds, and is in the ANGLE Direct3D renderer, which is why it only happens on Windows.

The bad assert makes us call abort() which is overridden on the test slaves (like it is on my Windows machine, seems to be a MSVC thing) so that it pops up a 'Abort/Retry' dialog box. This dialog box steals focus from the Mochitest/Firefox window, causing subsequent focus-dependent events-stuff mochitests to fail.

This bad assert is what bug 681400 fixes.
Comment 24 Benoit Jacob [:bjacob] (mostly away) 2011-08-24 08:29:28 PDT
(In reply to Benoit Jacob [:bjacob] from comment #22)
> Seems fixed! I now have a strange orange on Mac, which seems to be a
> regression in mozilla-central when I last pulled; let's do one more try:
> 
> http://tbpl.allizom.org/?tree=Try&usebuildbot=1&rev=4d12e80cd934

Is now as green as an alien brain preserved in plutonium slime.

Note You need to log in before you can comment on or make changes to this bug.