Closed Bug 763355 Opened 12 years ago Closed 12 years ago

Intermittent on Android: test_webgl_conformance_test_suite.html | [conformance/reading/read-pixels-test.html] Timeout in this test page

Categories

(Core :: Graphics: CanvasWebGL, defect)

ARM
Android
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla17

People

(Reporter: bjacob, Assigned: bjacob)

References

Details

(Keywords: intermittent-failure)

Attachments

(4 files)

      No description provided.
Summary: Intermittent on Android → Intermittent on Android: test_webgl_conformance_test_suite.html | [conformance/reading/read-pixels-test.html] Timeout in this test page
The logs seem to show that the timeout is always at the end of this test, and the only thing we're doing there is calling this finishTest() function, so it's worth checking if an exception is occurring there.

By the way, it would be wonderful if all uncaught exceptions could be logged in the test logs.
Attachment #631785 - Flags: review?(jgilbert)
(Adding whiteboard/blocking:randomorange so this shows up on TBPL)
Blocks: 438871
Whiteboard: [orange]
Comment on attachment 631785 [details] [diff] [review]
log exceptions to understand timeout in read-pixels-test

Review of attachment 631785 [details] [diff] [review]:
-----------------------------------------------------------------

This could be overwritten when we upgrade the conformance suite, but this should be a temporary thing anyway.
Attachment #631785 - Flags: review?(jgilbert) → review+
http://hg.mozilla.org/integration/mozilla-inbound/rev/70dccb11784f
Assignee: nobody → bjacob
Whiteboard: [orange] → [orange] Keep open!
Target Milestone: --- → mozilla16
https://hg.mozilla.org/mozilla-central/rev/70dccb11784f
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Please use '[leave open]', rather than 'Keep open!', or the new marking tool will still resolve:
https://wiki.mozilla.org/Tree_Rules/Inbound#Please_do_the_following_after_pushing_to_inbound
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Whiteboard: [orange] Keep open! → [orange][leave open]
This last push has the patch, and no exception is logged, so my exception theory was wrong.
Attached patch more loggingSplinter Review
I know this is starting to get ugly, but I chatted with Ehsan and we agreed this was the righ/onlyt way to debug this...

It's possible that the problem is we don't correctly hook into nonKhronosFrameworkNotifyDone(), will investigate in parallel (and this logging will help decide that).
Attachment #633232 - Flags: review?(jgilbert)
Attachment #633232 - Flags: review?(jgilbert) → review+
Trimmed log from this push:

(WebGL mochitest) Starting test page
Test passed - read-pixels-test start: 1339996845526
Test passed - color pixel at 0, 0 should be about 127.5
Test passed - color pixel at 0, 0 should be about 178.5
Test passed - color pixel at 0, 0 should be about 255
Test passed - color pixel at 0, 0 should be about 255
... snip ...
Test passed - read-pixels-test continueTestPart2: 1339996845930
Test passed - actual is expected
Test passed - actual is expected
Test passed - actual is expected
Test passed - getError was expected value: NO_ERROR : there should be no GL errors
Test passed - read-pixels-test calling finishTest: 1339997640033
Test passed - beginning of finishTest: 1339997640035
Test passed - end of finishTest: 1339997640045
Test passed - read-pixels-test after finishTest: 1339997640048
Timeout in this test page

So here we see that that problem is that while the first part of the test only took 0.4 seconds, continueTestPart2 took (1339997640033 - 1339996845930)/1000 = 800 seconds = 13 minutes.

After that, finishTest by itself runs fast.

Conclusion: we need to instrument continueTestPart2 to figure what's taking long.
Attachment #634092 - Flags: review?(jgilbert)
Attachment #634092 - Flags: review?(jgilbert) → review+
Whiteboard: [orange][leave open] → [orange]
Target Milestone: mozilla16 → ---
Comment on attachment 631785 [details] [diff] [review]
log exceptions to understand timeout in read-pixels-test

Adding checked-in+ since bug still open and helps me differentiate between [oranges] that have patches that have not yet been checked in vs not (so as to avoid patches being lost in the tbplbot spam).
Attachment #631785 - Flags: checkin+
Attachment #633232 - Flags: checkin+
Attachment #634092 - Flags: checkin+
So, looking at the logs:

45726 INFO TEST-PASS | /tests/content/canvas/test/webgl/test_webgl_conformance_test_suite.html | [conformance/reading/read-pixels-test.html] Test passed - read-pixels-test after finishTest: 1344409944746
45727 ERROR TEST-UNEXPECTED-FAIL | /tests/content/canvas/test/webgl/test_webgl_conformance_test_suite.html | [conformance/reading/read-pixels-test.html] Timeout in this test page

I can't make sense of that :-( The test proceeds to after the very last line of the script fast, and _then_ it times out.
OK, here's what's happening.

The test calls finishTest() at the end. The logs show that this function is run. It is defined in js-test-pre.js:

function finishTest() {
  testPassed("beginning of finishTest: " + (new Date().getTime()));

  successfullyParsed = true;
  var epilogue = document.createElement("script");
  epilogue.onload = function() {
    testPassed("finishTest epilogue onload: " + (new Date().getTime()));
    if (window.nonKhronosFrameworkNotifyDone) {
      window.nonKhronosFrameworkNotifyDone();
    }
  };

  var basePath = "";
  var expectedBase = "js-test-pre.js";
  var scripts = document.getElementsByTagName('script');
  for (var script, i = 0; script = scripts[i]; i++) {
    var src = script.src;
    var l = src.length;
    if (src.substr(l - expectedBase.length) == expectedBase) {
      basePath = src.substr(0, l - expectedBase.length);
      break;
    }
  }
  epilogue.src = basePath + "js-test-post.js";
  document.body.appendChild(epilogue);
  testPassed("end of finishTest: " + (new Date().getTime()));
}

The testPassed here are debug stuff added in the present bug. The logs show these messages BUT they do not have the "finishTest epilogue onload" message.

Here, the point of this code is to run a script, "js-test-post.js". The fact that "finishTest epilogue onload" isn't in the logs means that this trick to run this script doesn't work.

Is this code valid? I.e. is it a Firefox bug?
I have good hopes that this will remove the timeouts even though I still don't understand everything here.

The timeouts are apparently caused by js-test-post.js failing to load. Test pages load it in either of two possible ways:
 - some pages have a <script src="../../resources/js-test-post.js">
 - some pages call finishTest(), which programatically creates such a script element.

Timeouts happen in both approaches.

This page changes all pages to call finishTest(); and changes finishTest() to directly do the job by pasting the (very short) contents of js-test-post.js instead of attempting to load it.

There is one difference: I dropped the assert(successfullyParsed); I never really understood that part, it was giving me trouble, and out of experience, hasn't been what caught webgl regressions.
Attachment #650971 - Flags: review?(jgilbert)
Comment on attachment 650971 [details] [diff] [review]
stop loading js-test-post.js altogether; instead, paste its contents into finishTest()

This was the most fun review ever.

Can we not remove js-test-post, as well? I would feel better if we deleted it because it would make any tests that might have been missed here fail.
Attachment #650971 - Flags: review?(jgilbert) → review+
Oops. I should have mentioned that only js-test-pre.js had custom changes; the rest was search and replace.

Good idea about removing js-test-post.js.
On try, since inbound is still closed:
https://tbpl.mozilla.org/?tree=Try&rev=b2f58297b04b

you were right, i had indeed forgotten quite a few occurences, caught by removing this file.
http://hg.mozilla.org/integration/mozilla-inbound/rev/1a96900f09c8

Seems pretty stable now, after a few tries. 

Funnily, this makes glsl-function-nodes.html pass. Looking into why.
Target Milestone: --- → mozilla17
Hm, this test was failing at the "Threw exception ReferenceError: successfullyParsed is not defined" check, according to logs. Don't know why. Normally that would mean that an exception occured in this tests, but uncaught exceptions don't get logged on android.
https://hg.mozilla.org/mozilla-central/rev/1a96900f09c8
Status: REOPENED → RESOLVED
Closed: 12 years ago12 years ago
Resolution: --- → FIXED
Comment 37 is aurora: does not invalidate the m-c fix (comment 36).
Whiteboard: [orange]
You need to log in before you can comment on or make changes to this bug.