Closed Bug 679864 Opened 13 years ago Closed 13 years ago

Upgrade WebGL conformance test suite to r15318

Categories

(Core :: Graphics: CanvasWebGL, defect)

x86_64
Linux
defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: bjacob, Unassigned)

References

Details

Test failure stats:


OS    | Failing pages  | Newly failing | Newly passing | Failing pages
      | before upgrade | pages         | pages         | after upgrade
------+----------------+---------------+---------------+---------------
Win   | 11             | 8             | 0             | 19
Linux | 15             | 17            | 5             | 27
Mac   | 14             | 16            | 2             | 28
Had to increase timeout for webGLarray.html on win7 debug; and ignore a couple of shader related tests intermittently failing ("images are different") on Mac.

New try: http://tbpl.allizom.org/?tree=Try&usebuildbot=1&rev=4fd4c1c9ffa4
webGLarray.html was still intermittently timing out even after I doubled the timeout delay, so I reverted that change and instead just ignored this test on Windows. It's not timing out on other platforms.
I had to back it out because of mochitest-1 oranges:

http://hg.mozilla.org/mozilla-central/rev/64a6b17da6e7
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
webGLarrays test is actually intermittent on all OSes. Commenting out from test list.
The mochitest is now green on all platforms, retriggering to make sure.
OK, what happens is that whichever test runs after quickCheckAPI.html intermittently times out due to a long GC pause. In our case, webGLarray.html was timing out and when I disabled it above I started getting timeouts in the next test, bindBuffer.html.

I used to work around that by disabling quickCheckAPI.html (before this test suite upgrade), but instead a better approach is to implement ad-hoc code in the test harness to run the GC manually after this test. This way, we don't have to disable any test.

New try:

http://tbpl.allizom.org/?tree=Try&usebuildbot=1&rev=14e5f9c1be4d
http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/bjacob@mozilla.com-14e5f9c1be4d
If that's a mochitest, please use SpecialPowers.gc() instead of the explicit QI etc.
Ah OK. Yes, that's a mochitest, will do.
It turns out that running the GC didn't make a difference.

The faulty test, quickCheckAPI.html, is known for its high and random memory usage. It's a naive random fuzzer, and causes some randomly sized buffers to be allocated. Presumably the timeouts happen when the memory usage is really high, maybe due to swapping... I can't reproduce locally, but I have 4G of RAM.

This test has been disabled in our copy of the test suite for this reason, I tried to re-enable it, but I'm giving up for now. Will file a follow-up bug to make it use less memory and reenable it.
(In reply to comment #10)
> OK, what happens is that whichever test runs after quickCheckAPI.html
> intermittently times out due to a long GC pause. In our case, webGLarray.html
> was timing out and when I disabled it above I started getting timeouts in the
> next test, bindBuffer.html.
> 
> I used to work around that by disabling quickCheckAPI.html (before this test
> suite upgrade), but instead a better approach is to implement ad-hoc code in
> the test harness to run the GC manually after this test. This way, we don't
> have to disable any test.

You can also try using SimpleTest.requestLongerTimeout...
Backed out because of Win debug test failures:
https://hg.mozilla.org/mozilla-central/rev/0ae2d673d617
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
CAN HAS WORKING TEST SYSTEM??? KTHX

This might be the same pattern as in comment 13: the WebGL mochitest uses lots of memory, possibly leaving the test slaves swapping pages...
Status: REOPENED → RESOLVED
Closed: 13 years ago13 years ago
Resolution: --- → FIXED
(In reply to Ehsan Akhgari [:ehsan] from comment #15)
> (In reply to comment #10)
> > OK, what happens is that whichever test runs after quickCheckAPI.html
> > intermittently times out due to a long GC pause. In our case, webGLarray.html
> > was timing out and when I disabled it above I started getting timeouts in the
> > next test, bindBuffer.html.
> > 
> > I used to work around that by disabling quickCheckAPI.html (before this test
> > suite upgrade), but instead a better approach is to implement ad-hoc code in
> > the test harness to run the GC manually after this test. This way, we don't
> > have to disable any test.
> 
> You can also try using SimpleTest.requestLongerTimeout...

Well, if tests time out for this reason, that's an indication that this quickCheckAPI test really goes over the board with memory usage and I'd rather fix it.
All the mochitests that fail, are tests that are run SHORTLY AFTER the WebGL mochitest. This confirms that running the WebGL mochitest leaves the test slave is some sort of 'disturbed' state.

Stale WebGL contexts are all GC'd BEFORE the other failing mochitests start. This means that this isn't going to be fixed by triggering the GC somewhere specific. Though it could still be that triggering the GC after every WebGL test page keeps memory usage low enough to prevent the problem we're seeing here.
Sorry, apparently I mistakenly marked this as fixed...
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Depends on: 681400
Running mochitests localling on Win7 debug, I saw a pop-up dialog box informing me that Firefox had triggered an abort(). Filed bug 681400 about this; turns out to be a ANGLE assertion triggered by a WebGL test; the patch in bug 681400 fixes it.

The strange thing is that the abort() didn't kill the process, didn't even stop it, it really only resulted in showing this pop-up dialog, with the mochitest continuing in the background.

With the patch from bug 681400, the mochitests that are causing trouble here did run fine, so hopefully this was the only issue.

Try:
http://tbpl.allizom.org/?tree=Try&usebuildbot=1&rev=337d8cbc7a6c
http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/bjacob@mozilla.com-337d8cbc7a6c
Seems fixed! I now have a strange orange on Mac, which seems to be a regression in mozilla-central when I last pulled; let's do one more try:

http://tbpl.allizom.org/?tree=Try&usebuildbot=1&rev=4d12e80cd934
So, here's the explanation for the weird mochitest failures we were getting.

The new WebGL conformance tests, that I'm trying to land here, are triggering a new bug in the ANGLE library we're using. The bug is a bad assert, which is why we only have trouble in debug builds, and is in the ANGLE Direct3D renderer, which is why it only happens on Windows.

The bad assert makes us call abort() which is overridden on the test slaves (like it is on my Windows machine, seems to be a MSVC thing) so that it pops up a 'Abort/Retry' dialog box. This dialog box steals focus from the Mochitest/Firefox window, causing subsequent focus-dependent events-stuff mochitests to fail.

This bad assert is what bug 681400 fixes.
(In reply to Benoit Jacob [:bjacob] from comment #22)
> Seems fixed! I now have a strange orange on Mac, which seems to be a
> regression in mozilla-central when I last pulled; let's do one more try:
> 
> http://tbpl.allizom.org/?tree=Try&usebuildbot=1&rev=4d12e80cd934

Is now as green as an alien brain preserved in plutonium slime.
You need to log in before you can comment on or make changes to this bug.