Intermittent on Android: test_webgl_conformance_test_suite.html | [conformance/reading/read-pixels-test.html] Timeout in this test page

RESOLVED FIXED in mozilla17

Status

()

Core
Canvas: WebGL
RESOLVED FIXED
5 years ago
3 years ago

People

(Reporter: bjacob, Assigned: bjacob)

Tracking

({intermittent-failure})

unspecified
mozilla17
ARM
Android
intermittent-failure
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(4 attachments)

Comment hidden (empty)
(Assignee)

Updated

5 years ago
Summary: Intermittent on Android → Intermittent on Android: test_webgl_conformance_test_suite.html | [conformance/reading/read-pixels-test.html] Timeout in this test page
(Assignee)

Comment 1

5 years ago
Created attachment 631785 [details] [diff] [review]
log exceptions to understand timeout in read-pixels-test

The logs seem to show that the timeout is always at the end of this test, and the only thing we're doing there is calling this finishTest() function, so it's worth checking if an exception is occurring there.

By the way, it would be wonderful if all uncaught exceptions could be logged in the test logs.
Attachment #631785 - Flags: review?(jgilbert)
(Assignee)

Updated

5 years ago
Duplicate of this bug: 763453

Comment 3

5 years ago
(Adding whiteboard/blocking:randomorange so this shows up on TBPL)
Blocks: 438871
Whiteboard: [orange]
Comment on attachment 631785 [details] [diff] [review]
log exceptions to understand timeout in read-pixels-test

Review of attachment 631785 [details] [diff] [review]:
-----------------------------------------------------------------

This could be overwritten when we upgrade the conformance suite, but this should be a temporary thing anyway.
Attachment #631785 - Flags: review?(jgilbert) → review+
(Assignee)

Comment 5

5 years ago
http://hg.mozilla.org/integration/mozilla-inbound/rev/70dccb11784f
Assignee: nobody → bjacob
Whiteboard: [orange] → [orange] Keep open!
Target Milestone: --- → mozilla16

Comment 6

5 years ago
https://hg.mozilla.org/mozilla-central/rev/70dccb11784f
Status: NEW → RESOLVED
Last Resolved: 5 years ago
Resolution: --- → FIXED

Comment 7

5 years ago
Please use '[leave open]', rather than 'Keep open!', or the new marking tool will still resolve:
https://wiki.mozilla.org/Tree_Rules/Inbound#Please_do_the_following_after_pushing_to_inbound
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Whiteboard: [orange] Keep open! → [orange][leave open]
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
This last push has the patch, and no exception is logged, so my exception theory was wrong.
Created attachment 633232 [details] [diff] [review]
more logging

I know this is starting to get ugly, but I chatted with Ehsan and we agreed this was the righ/onlyt way to debug this...

It's possible that the problem is we don't correctly hook into nonKhronosFrameworkNotifyDone(), will investigate in parallel (and this logging will help decide that).
Attachment #633232 - Flags: review?(jgilbert)
Attachment #633232 - Flags: review?(jgilbert) → review+
http://hg.mozilla.org/integration/mozilla-inbound/rev/2a03063474b4
Comment hidden (Treeherder Robot)
https://hg.mozilla.org/mozilla-central/rev/2a03063474b4
Comment hidden (Treeherder Robot)
Trimmed log from this push:

(WebGL mochitest) Starting test page
Test passed - read-pixels-test start: 1339996845526
Test passed - color pixel at 0, 0 should be about 127.5
Test passed - color pixel at 0, 0 should be about 178.5
Test passed - color pixel at 0, 0 should be about 255
Test passed - color pixel at 0, 0 should be about 255
... snip ...
Test passed - read-pixels-test continueTestPart2: 1339996845930
Test passed - actual is expected
Test passed - actual is expected
Test passed - actual is expected
Test passed - getError was expected value: NO_ERROR : there should be no GL errors
Test passed - read-pixels-test calling finishTest: 1339997640033
Test passed - beginning of finishTest: 1339997640035
Test passed - end of finishTest: 1339997640045
Test passed - read-pixels-test after finishTest: 1339997640048
Timeout in this test page

So here we see that that problem is that while the first part of the test only took 0.4 seconds, continueTestPart2 took (1339997640033 - 1339996845930)/1000 = 800 seconds = 13 minutes.

After that, finishTest by itself runs fast.

Conclusion: we need to instrument continueTestPart2 to figure what's taking long.
Created attachment 634092 [details] [diff] [review]
even moar logging
Attachment #634092 - Flags: review?(jgilbert)
Attachment #634092 - Flags: review?(jgilbert) → review+
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
http://hg.mozilla.org/integration/mozilla-inbound/rev/a3322674e2c3
https://hg.mozilla.org/mozilla-central/rev/a3322674e2c3
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)

Updated

5 years ago
Whiteboard: [orange][leave open] → [orange]
Target Milestone: mozilla16 → ---
Comment on attachment 631785 [details] [diff] [review]
log exceptions to understand timeout in read-pixels-test

Adding checked-in+ since bug still open and helps me differentiate between [oranges] that have patches that have not yet been checked in vs not (so as to avoid patches being lost in the tbplbot spam).
Attachment #631785 - Flags: checkin+

Updated

5 years ago
Attachment #633232 - Flags: checkin+

Updated

5 years ago
Attachment #634092 - Flags: checkin+
Comment hidden (Treeherder Robot)
So, looking at the logs:

45726 INFO TEST-PASS | /tests/content/canvas/test/webgl/test_webgl_conformance_test_suite.html | [conformance/reading/read-pixels-test.html] Test passed - read-pixels-test after finishTest: 1344409944746
45727 ERROR TEST-UNEXPECTED-FAIL | /tests/content/canvas/test/webgl/test_webgl_conformance_test_suite.html | [conformance/reading/read-pixels-test.html] Timeout in this test page

I can't make sense of that :-( The test proceeds to after the very last line of the script fast, and _then_ it times out.
OK, here's what's happening.

The test calls finishTest() at the end. The logs show that this function is run. It is defined in js-test-pre.js:

function finishTest() {
  testPassed("beginning of finishTest: " + (new Date().getTime()));

  successfullyParsed = true;
  var epilogue = document.createElement("script");
  epilogue.onload = function() {
    testPassed("finishTest epilogue onload: " + (new Date().getTime()));
    if (window.nonKhronosFrameworkNotifyDone) {
      window.nonKhronosFrameworkNotifyDone();
    }
  };

  var basePath = "";
  var expectedBase = "js-test-pre.js";
  var scripts = document.getElementsByTagName('script');
  for (var script, i = 0; script = scripts[i]; i++) {
    var src = script.src;
    var l = src.length;
    if (src.substr(l - expectedBase.length) == expectedBase) {
      basePath = src.substr(0, l - expectedBase.length);
      break;
    }
  }
  epilogue.src = basePath + "js-test-post.js";
  document.body.appendChild(epilogue);
  testPassed("end of finishTest: " + (new Date().getTime()));
}

The testPassed here are debug stuff added in the present bug. The logs show these messages BUT they do not have the "finishTest epilogue onload" message.

Here, the point of this code is to run a script, "js-test-post.js". The fact that "finishTest epilogue onload" isn't in the logs means that this trick to run this script doesn't work.

Is this code valid? I.e. is it a Firefox bug?
Comment hidden (Treeherder Robot)
Created attachment 650971 [details] [diff] [review]
stop loading js-test-post.js altogether; instead, paste its contents into finishTest()

I have good hopes that this will remove the timeouts even though I still don't understand everything here.

The timeouts are apparently caused by js-test-post.js failing to load. Test pages load it in either of two possible ways:
 - some pages have a <script src="../../resources/js-test-post.js">
 - some pages call finishTest(), which programatically creates such a script element.

Timeouts happen in both approaches.

This page changes all pages to call finishTest(); and changes finishTest() to directly do the job by pasting the (very short) contents of js-test-post.js instead of attempting to load it.

There is one difference: I dropped the assert(successfullyParsed); I never really understood that part, it was giving me trouble, and out of experience, hasn't been what caught webgl regressions.
Attachment #650971 - Flags: review?(jgilbert)
Comment on attachment 650971 [details] [diff] [review]
stop loading js-test-post.js altogether; instead, paste its contents into finishTest()

This was the most fun review ever.

Can we not remove js-test-post, as well? I would feel better if we deleted it because it would make any tests that might have been missed here fail.
Attachment #650971 - Flags: review?(jgilbert) → review+
Oops. I should have mentioned that only js-test-pre.js had custom changes; the rest was search and replace.

Good idea about removing js-test-post.js.
On try, since inbound is still closed:
https://tbpl.mozilla.org/?tree=Try&rev=b2f58297b04b

you were right, i had indeed forgotten quite a few occurences, caught by removing this file.
http://hg.mozilla.org/integration/mozilla-inbound/rev/1a96900f09c8

Seems pretty stable now, after a few tries. 

Funnily, this makes glsl-function-nodes.html pass. Looking into why.
Target Milestone: --- → mozilla17
Hm, this test was failing at the "Threw exception ReferenceError: successfullyParsed is not defined" check, according to logs. Don't know why. Normally that would mean that an exception occured in this tests, but uncaught exceptions don't get logged on android.
https://hg.mozilla.org/mozilla-central/rev/1a96900f09c8
Status: REOPENED → RESOLVED
Last Resolved: 5 years ago5 years ago
Resolution: --- → FIXED
Comment hidden (Treeherder Robot)
Comment 37 is aurora: does not invalidate the m-c fix (comment 36).
Keywords: intermittent-failure
Whiteboard: [orange]
Blocks: 1080922
You need to log in before you can comment on or make changes to this bug.