Last Comment Bug 763355 - Intermittent on Android: test_webgl_conformance_test_suite.html | [conformance/reading/read-pixels-test.html] Timeout in this test page
: Intermittent on Android: test_webgl_conformance_test_suite.html | [conformanc...
Status: RESOLVED FIXED
: intermittent-failure
Product: Core
Classification: Components
Component: Canvas: WebGL (show other bugs)
: unspecified
: ARM Android
: -- normal (vote)
: mozilla17
Assigned To: Benoit Jacob [:bjacob] (mostly away)
:
Mentors:
: 763453 (view as bug list)
Depends on:
Blocks: 438871 1080922
  Show dependency treegraph
 
Reported: 2012-06-10 16:47 PDT by Benoit Jacob [:bjacob] (mostly away)
Modified: 2014-10-09 19:00 PDT (History)
1 user (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
log exceptions to understand timeout in read-pixels-test (842 bytes, patch)
2012-06-10 17:15 PDT, Benoit Jacob [:bjacob] (mostly away)
jgilbert: review+
emorley: checkin+
Details | Diff | Splinter Review
more logging (3.30 KB, patch)
2012-06-14 12:25 PDT, Benoit Jacob [:bjacob] (mostly away)
jgilbert: review+
emorley: checkin+
Details | Diff | Splinter Review
even moar logging (3.12 KB, patch)
2012-06-18 10:28 PDT, Benoit Jacob [:bjacob] (mostly away)
jgilbert: review+
emorley: checkin+
Details | Diff | Splinter Review
stop loading js-test-post.js altogether; instead, paste its contents into finishTest() (67.85 KB, patch)
2012-08-10 12:20 PDT, Benoit Jacob [:bjacob] (mostly away)
jgilbert: review+
Details | Diff | Splinter Review

Description Benoit Jacob [:bjacob] (mostly away) 2012-06-10 16:47:48 PDT

    
Comment 1 Benoit Jacob [:bjacob] (mostly away) 2012-06-10 17:15:23 PDT
Created attachment 631785 [details] [diff] [review]
log exceptions to understand timeout in read-pixels-test

The logs seem to show that the timeout is always at the end of this test, and the only thing we're doing there is calling this finishTest() function, so it's worth checking if an exception is occurring there.

By the way, it would be wonderful if all uncaught exceptions could be logged in the test logs.
Comment 2 Benoit Jacob [:bjacob] (mostly away) 2012-06-11 06:20:35 PDT
*** Bug 763453 has been marked as a duplicate of this bug. ***
Comment 3 Ed Morley [:emorley] 2012-06-11 06:25:14 PDT
(Adding whiteboard/blocking:randomorange so this shows up on TBPL)
Comment 4 Jeff Gilbert [:jgilbert] 2012-06-11 14:05:33 PDT
Comment on attachment 631785 [details] [diff] [review]
log exceptions to understand timeout in read-pixels-test

Review of attachment 631785 [details] [diff] [review]:
-----------------------------------------------------------------

This could be overwritten when we upgrade the conformance suite, but this should be a temporary thing anyway.
Comment 5 Benoit Jacob [:bjacob] (mostly away) 2012-06-12 14:22:23 PDT
http://hg.mozilla.org/integration/mozilla-inbound/rev/70dccb11784f
Comment 6 Ed Morley [:emorley] 2012-06-13 05:59:02 PDT
https://hg.mozilla.org/mozilla-central/rev/70dccb11784f
Comment 7 Ed Morley [:emorley] 2012-06-14 06:59:52 PDT
Please use '[leave open]', rather than 'Keep open!', or the new marking tool will still resolve:
https://wiki.mozilla.org/Tree_Rules/Inbound#Please_do_the_following_after_pushing_to_inbound
Comment 8 Treeherder Robot 2012-06-14 07:00:08 PDT
edmorley
https://tbpl.mozilla.org/php/getParsedLog.php?id=12657877&tree=Profiling
Android Tegra 250 profiling opt test mochitest-1 on 2012-06-14 04:37:13
slave: tegra-214

11718 ERROR TEST-UNEXPECTED-FAIL | /tests/content/canvas/test/webgl/test_webgl_conformance_test_suite.html | [conformance/reading/read-pixels-test.html] Timeout in this test page
Comment 9 Treeherder Robot 2012-06-14 08:38:18 PDT
edmorley
https://tbpl.mozilla.org/php/getParsedLog.php?id=12662854&tree=Firefox
Android Tegra 250 mozilla-central opt test mochitest-1 on 2012-06-14 07:31:15
slave: tegra-048

11718 ERROR TEST-UNEXPECTED-FAIL | /tests/content/canvas/test/webgl/test_webgl_conformance_test_suite.html | [conformance/reading/read-pixels-test.html] Timeout in this test page
Comment 10 Benoit Jacob [:bjacob] (mostly away) 2012-06-14 08:56:40 PDT
This last push has the patch, and no exception is logged, so my exception theory was wrong.
Comment 11 Benoit Jacob [:bjacob] (mostly away) 2012-06-14 12:25:59 PDT
Created attachment 633232 [details] [diff] [review]
more logging

I know this is starting to get ugly, but I chatted with Ehsan and we agreed this was the righ/onlyt way to debug this...

It's possible that the problem is we don't correctly hook into nonKhronosFrameworkNotifyDone(), will investigate in parallel (and this logging will help decide that).
Comment 12 Benoit Jacob [:bjacob] (mostly away) 2012-06-14 13:13:07 PDT
http://hg.mozilla.org/integration/mozilla-inbound/rev/2a03063474b4
Comment 13 Treeherder Robot 2012-06-14 21:23:48 PDT
philor
https://tbpl.mozilla.org/php/getParsedLog.php?id=12661957&tree=Mozilla-Inbound
Android Tegra 250 mozilla-inbound opt test mochitest-1 on 2012-06-14 07:08:58
slave: tegra-237

35988 ERROR TEST-UNEXPECTED-FAIL | /tests/content/base/test/test_text_wholeText.html | an unexpected uncaught JS exception reported through window.onerror - TypeError: e1.stopWatchingInstanceCount is not a function at http://mochi.test:8888/tests/content/base/test/test_plugin_freezing.html:21
46949 ERROR TEST-UNEXPECTED-FAIL | /tests/content/canvas/test/webgl/test_webgl_conformance_test_suite.html | [conformance/reading/read-pixels-test.html] Timeout in this test page
Comment 14 Ed Morley [:emorley] 2012-06-15 05:58:39 PDT
https://hg.mozilla.org/mozilla-central/rev/2a03063474b4
Comment 15 Treeherder Robot 2012-06-18 06:14:42 PDT
edmorley
https://tbpl.mozilla.org/php/getParsedLog.php?id=12753525&tree=Mozilla-Inbound
Android Tegra 250 mozilla-inbound opt test mochitest-1 on 2012-06-18 04:53:54
slave: tegra-246

36038 ERROR TEST-UNEXPECTED-FAIL | /tests/content/base/test/test_text_wholeText.html | an unexpected uncaught JS exception reported through window.onerror - TypeError: e1.stopWatchingInstanceCount is not a function at http://mochi.test:8888/tests/content/base/test/test_plugin_freezing.html:21
47500 ERROR TEST-UNEXPECTED-FAIL | /tests/content/canvas/test/webgl/test_webgl_conformance_test_suite.html | [conformance/reading/read-pixels-test.html] Timeout in this test page
Comment 16 Benoit Jacob [:bjacob] (mostly away) 2012-06-18 10:16:31 PDT
Trimmed log from this push:

(WebGL mochitest) Starting test page
Test passed - read-pixels-test start: 1339996845526
Test passed - color pixel at 0, 0 should be about 127.5
Test passed - color pixel at 0, 0 should be about 178.5
Test passed - color pixel at 0, 0 should be about 255
Test passed - color pixel at 0, 0 should be about 255
... snip ...
Test passed - read-pixels-test continueTestPart2: 1339996845930
Test passed - actual is expected
Test passed - actual is expected
Test passed - actual is expected
Test passed - getError was expected value: NO_ERROR : there should be no GL errors
Test passed - read-pixels-test calling finishTest: 1339997640033
Test passed - beginning of finishTest: 1339997640035
Test passed - end of finishTest: 1339997640045
Test passed - read-pixels-test after finishTest: 1339997640048
Timeout in this test page

So here we see that that problem is that while the first part of the test only took 0.4 seconds, continueTestPart2 took (1339997640033 - 1339996845930)/1000 = 800 seconds = 13 minutes.

After that, finishTest by itself runs fast.

Conclusion: we need to instrument continueTestPart2 to figure what's taking long.
Comment 17 Benoit Jacob [:bjacob] (mostly away) 2012-06-18 10:28:36 PDT
Created attachment 634092 [details] [diff] [review]
even moar logging
Comment 18 Treeherder Robot 2012-06-19 12:35:14 PDT
philor
https://tbpl.mozilla.org/php/getParsedLog.php?id=12797009&tree=Mozilla-Inbound
Android Tegra 250 mozilla-inbound opt test mochitest-1 on 2012-06-19 09:47:29
slave: tegra-238

36021 ERROR TEST-UNEXPECTED-FAIL | /tests/content/base/test/test_title.html | an unexpected uncaught JS exception reported through window.onerror - TypeError: e1.stopWatchingInstanceCount is not a function at http://mochi.test:8888/tests/content/base/test/test_plugin_freezing.html:21
47430 ERROR TEST-UNEXPECTED-FAIL | /tests/content/canvas/test/webgl/test_webgl_conformance_test_suite.html | [conformance/reading/read-pixels-test.html] Timeout in this test page
Comment 19 Treeherder Robot 2012-06-26 07:02:41 PDT
edmorley
https://tbpl.mozilla.org/php/getParsedLog.php?id=13001493&tree=Firefox
Android Tegra 250 mozilla-central opt test mochitest-1 on 2012-06-26 05:36:21
slave: tegra-234

24711 ERROR TEST-UNEXPECTED-FAIL | /tests/content/base/test/test_bug435425.html | Wrong responseText - got Bad request
36060 ERROR TEST-UNEXPECTED-FAIL | /tests/content/base/test/test_title.html | an unexpected uncaught JS exception reported through window.onerror - TypeError: e1.stopWatchingInstanceCount is not a function at http://mochi.test:8888/tests/content/base/test/test_plugin_freezing.html:21
47469 ERROR TEST-UNEXPECTED-FAIL | /tests/content/canvas/test/webgl/test_webgl_conformance_test_suite.html | [conformance/reading/read-pixels-test.html] Timeout in this test page
Comment 20 Benoit Jacob [:bjacob] (mostly away) 2012-06-29 14:21:58 PDT
http://hg.mozilla.org/integration/mozilla-inbound/rev/a3322674e2c3
Comment 21 Ryan VanderMeulen [:RyanVM] 2012-06-30 12:43:29 PDT
https://hg.mozilla.org/mozilla-central/rev/a3322674e2c3
Comment 22 Treeherder Robot 2012-07-11 15:43:18 PDT
philor
https://tbpl.mozilla.org/php/getParsedLog.php?id=13438136&tree=Mozilla-Inbound
Android Tegra 250 mozilla-inbound opt test mochitest-1 on 2012-07-11 13:52:04
slave: tegra-273

47540 ERROR TEST-UNEXPECTED-FAIL | /tests/content/canvas/test/webgl/test_webgl_conformance_test_suite.html | [conformance/reading/read-pixels-test.html] Timeout in this test page
Comment 23 Treeherder Robot 2012-07-12 02:48:09 PDT
edmorley
https://tbpl.mozilla.org/php/getParsedLog.php?id=13453165&tree=Mozilla-Inbound
Android Tegra 250 mozilla-inbound opt test mochitest-1 on 2012-07-12 01:38:03
slave: tegra-138

47546 ERROR TEST-UNEXPECTED-FAIL | /tests/content/canvas/test/webgl/test_webgl_conformance_test_suite.html | [conformance/reading/read-pixels-test.html] Timeout in this test page
Comment 24 Treeherder Robot 2012-07-16 02:23:55 PDT
edmorley
https://tbpl.mozilla.org/php/getParsedLog.php?id=13503088&tree=Profiling
Android Tegra 250 profiling opt test mochitest-1 on 2012-07-13 12:05:47
slave: tegra-179

47534 ERROR TEST-UNEXPECTED-FAIL | /tests/content/canvas/test/webgl/test_webgl_conformance_test_suite.html | [conformance/reading/read-pixels-test.html] Timeout in this test page
Comment 25 Ed Morley [:emorley] 2012-07-30 02:05:19 PDT
Comment on attachment 631785 [details] [diff] [review]
log exceptions to understand timeout in read-pixels-test

Adding checked-in+ since bug still open and helps me differentiate between [oranges] that have patches that have not yet been checked in vs not (so as to avoid patches being lost in the tbplbot spam).
Comment 26 Treeherder Robot 2012-08-08 16:48:41 PDT
:emk
https://tbpl.mozilla.org/php/getParsedLog.php?id=14226971&tree=Try
Android Tegra 250 try opt test mochitest-1 on 2012-08-08 06:41:16
slave: tegra-170

45727 ERROR TEST-UNEXPECTED-FAIL | /tests/content/canvas/test/webgl/test_webgl_conformance_test_suite.html | [conformance/reading/read-pixels-test.html] Timeout in this test page
Comment 27 Benoit Jacob [:bjacob] (mostly away) 2012-08-08 18:26:28 PDT
So, looking at the logs:

45726 INFO TEST-PASS | /tests/content/canvas/test/webgl/test_webgl_conformance_test_suite.html | [conformance/reading/read-pixels-test.html] Test passed - read-pixels-test after finishTest: 1344409944746
45727 ERROR TEST-UNEXPECTED-FAIL | /tests/content/canvas/test/webgl/test_webgl_conformance_test_suite.html | [conformance/reading/read-pixels-test.html] Timeout in this test page

I can't make sense of that :-( The test proceeds to after the very last line of the script fast, and _then_ it times out.
Comment 28 Benoit Jacob [:bjacob] (mostly away) 2012-08-08 18:46:01 PDT
OK, here's what's happening.

The test calls finishTest() at the end. The logs show that this function is run. It is defined in js-test-pre.js:

function finishTest() {
  testPassed("beginning of finishTest: " + (new Date().getTime()));

  successfullyParsed = true;
  var epilogue = document.createElement("script");
  epilogue.onload = function() {
    testPassed("finishTest epilogue onload: " + (new Date().getTime()));
    if (window.nonKhronosFrameworkNotifyDone) {
      window.nonKhronosFrameworkNotifyDone();
    }
  };

  var basePath = "";
  var expectedBase = "js-test-pre.js";
  var scripts = document.getElementsByTagName('script');
  for (var script, i = 0; script = scripts[i]; i++) {
    var src = script.src;
    var l = src.length;
    if (src.substr(l - expectedBase.length) == expectedBase) {
      basePath = src.substr(0, l - expectedBase.length);
      break;
    }
  }
  epilogue.src = basePath + "js-test-post.js";
  document.body.appendChild(epilogue);
  testPassed("end of finishTest: " + (new Date().getTime()));
}

The testPassed here are debug stuff added in the present bug. The logs show these messages BUT they do not have the "finishTest epilogue onload" message.

Here, the point of this code is to run a script, "js-test-post.js". The fact that "finishTest epilogue onload" isn't in the logs means that this trick to run this script doesn't work.

Is this code valid? I.e. is it a Firefox bug?
Comment 29 Treeherder Robot 2012-08-09 05:43:19 PDT
ttaubert
https://tbpl.mozilla.org/php/getParsedLog.php?id=14255916&tree=Fx-Team
Android Tegra 250 fx-team opt test mochitest-1 on 2012-08-09 04:44:50
slave: tegra-183

45709 ERROR TEST-UNEXPECTED-FAIL | /tests/content/canvas/test/webgl/test_webgl_conformance_test_suite.html | [conformance/reading/read-pixels-test.html] Timeout in this test page
Comment 30 Benoit Jacob [:bjacob] (mostly away) 2012-08-10 12:20:27 PDT
Created attachment 650971 [details] [diff] [review]
stop loading js-test-post.js altogether; instead, paste its contents into finishTest()

I have good hopes that this will remove the timeouts even though I still don't understand everything here.

The timeouts are apparently caused by js-test-post.js failing to load. Test pages load it in either of two possible ways:
 - some pages have a <script src="../../resources/js-test-post.js">
 - some pages call finishTest(), which programatically creates such a script element.

Timeouts happen in both approaches.

This page changes all pages to call finishTest(); and changes finishTest() to directly do the job by pasting the (very short) contents of js-test-post.js instead of attempting to load it.

There is one difference: I dropped the assert(successfullyParsed); I never really understood that part, it was giving me trouble, and out of experience, hasn't been what caught webgl regressions.
Comment 31 Jeff Gilbert [:jgilbert] 2012-08-10 16:32:12 PDT
Comment on attachment 650971 [details] [diff] [review]
stop loading js-test-post.js altogether; instead, paste its contents into finishTest()

This was the most fun review ever.

Can we not remove js-test-post, as well? I would feel better if we deleted it because it would make any tests that might have been missed here fail.
Comment 32 Benoit Jacob [:bjacob] (mostly away) 2012-08-11 08:04:35 PDT
Oops. I should have mentioned that only js-test-pre.js had custom changes; the rest was search and replace.

Good idea about removing js-test-post.js.
Comment 33 Benoit Jacob [:bjacob] (mostly away) 2012-08-14 11:54:08 PDT
On try, since inbound is still closed:
https://tbpl.mozilla.org/?tree=Try&rev=b2f58297b04b

you were right, i had indeed forgotten quite a few occurences, caught by removing this file.
Comment 34 Benoit Jacob [:bjacob] (mostly away) 2012-08-15 07:27:44 PDT
http://hg.mozilla.org/integration/mozilla-inbound/rev/1a96900f09c8

Seems pretty stable now, after a few tries. 

Funnily, this makes glsl-function-nodes.html pass. Looking into why.
Comment 35 Benoit Jacob [:bjacob] (mostly away) 2012-08-15 07:32:23 PDT
Hm, this test was failing at the "Threw exception ReferenceError: successfullyParsed is not defined" check, according to logs. Don't know why. Normally that would mean that an exception occured in this tests, but uncaught exceptions don't get logged on android.
Comment 36 Ryan VanderMeulen [:RyanVM] 2012-08-15 18:46:30 PDT
https://hg.mozilla.org/mozilla-central/rev/1a96900f09c8
Comment 37 Treeherder Robot 2012-08-19 15:38:28 PDT
philor
https://tbpl.mozilla.org/php/getParsedLog.php?id=14513027&tree=Mozilla-Aurora
Android Tegra 250 mozilla-aurora opt test mochitest-1 on 2012-08-19 14:38:52
slave: tegra-151

47525 ERROR TEST-UNEXPECTED-FAIL | /tests/content/canvas/test/webgl/test_webgl_conformance_test_suite.html | [conformance/reading/read-pixels-test.html] Timeout in this test page
Comment 38 Benoit Jacob [:bjacob] (mostly away) 2012-08-20 08:12:37 PDT
Comment 37 is aurora: does not invalidate the m-c fix (comment 36).

Note You need to log in before you can comment on or make changes to this bug.