Closed Bug 1437295 Opened 2 years ago Closed 2 years ago

Intermittent gfx/layers/apz/test/mochitest/test_bug1151663.html | found the RCD node

Categories

(Core :: Panning and Zooming, enhancement, P5)

enhancement

Tracking

()

RESOLVED FIXED
mozilla60
Tracking Status
firefox60 --- fixed

People

(Reporter: kats, Assigned: kats)

References

Details

(Keywords: intermittent-failure, Whiteboard: [gfx-noted][stockwell fixed:product])

Attachments

(6 files)

+++ This bug was initially created as a clone of Bug #1374166 +++

I landed a patch for bug 1374166, but the failure persisted (see bug 1374166 comment 32). Filing this bug to track it.
Recent failure log (tier 2): https://treeherder.mozilla.org/logviewer.html#?job_id=161520884&repo=mozilla-inbound&lineNumber=6696

[task 2018-02-10T15:05:28.537Z] 15:05:28     INFO - TEST-START | gfx/layers/apz/test/mochitest/test_bug1151663.html
[task 2018-02-10T15:05:29.877Z] 15:05:29     INFO - GECKO(5382) | Flushed APZ repaints, waiting for callback...
[task 2018-02-10T15:05:30.238Z] 15:05:30     INFO - GECKO(5382) | Flushed APZ repaints, waiting for callback...
[task 2018-02-10T15:05:30.262Z] 15:05:30     INFO - TEST-INFO | started process screentopng
[task 2018-02-10T15:05:30.872Z] 15:05:30     INFO - TEST-INFO | screentopng: exit 0
[task 2018-02-10T15:05:30.872Z] 15:05:30     INFO - Buffered messages logged at 15:05:30
[task 2018-02-10T15:05:30.873Z] 15:05:30     INFO - must wait for load
[task 2018-02-10T15:05:30.873Z] 15:05:30     INFO - TEST-PASS | gfx/layers/apz/test/mochitest/test_bug1151663.html | expected at least one paint in compositor test data 
[task 2018-02-10T15:05:30.873Z] 15:05:30     INFO - Buffered messages finished
[task 2018-02-10T15:05:30.873Z] 15:05:30     INFO - TEST-UNEXPECTED-FAIL | gfx/layers/apz/test/mochitest/test_bug1151663.html | found the RCD node 
[task 2018-02-10T15:05:30.873Z] 15:05:30     INFO -     testBug1151663@gfx/layers/apz/test/mochitest/helper_bug1151663.html:53:7
[task 2018-02-10T15:05:30.873Z] 15:05:30     INFO -     promise callback*@gfx/layers/apz/test/mochitest/helper_bug1151663.html:59:5
[task 2018-02-10T15:05:30.874Z] 15:05:30     INFO - GECKO(5382) | JavaScript error: http://mochi.test:8888/tests/gfx/layers/apz/test/mochitest/helper_bug1151663.html, line 54: TypeError: rcd is null
[task 2018-02-10T15:10:28.546Z] 15:10:28     INFO - Not taking screenshot here: see the one that was previously logged
[task 2018-02-10T15:10:28.547Z] 15:10:28     INFO - TEST-UNEXPECTED-FAIL | gfx/layers/apz/test/mochitest/test_bug1151663.html | Test timed out. 
[task 2018-02-10T15:10:28.548Z] 15:10:28     INFO -     reportError@SimpleTest/TestRunner.js:121:7
[task 2018-02-10T15:10:28.548Z] 15:10:28     INFO -     TestRunner._checkForHangs@SimpleTest/TestRunner.js:142:7
[task 2018-02-10T15:10:28.548Z] 15:10:28     INFO -     setTimeout handler*TestRunner._checkForHangs@SimpleTest/TestRunner.js:163:5
[task 2018-02-10T15:10:28.548Z] 15:10:28     INFO -     setTimeout handler*TestRunner._checkForHangs@SimpleTest/TestRunner.js:163:5
[task 2018-02-10T15:10:28.549Z] 15:10:28     INFO -     setTimeout handler*TestRunner._checkForHangs@SimpleTest/TestRunner.js:163:5
[task 2018-02-10T15:10:28.549Z] 15:10:28     INFO -     setTimeout handler*TestRunner._checkForHangs@SimpleTest/TestRunner.js:163:5
[task 2018-02-10T15:10:28.550Z] 15:10:28     INFO -     setTimeout handler*TestRunner._checkForHangs@SimpleTest/TestRunner.js:163:5
[task 2018-02-10T15:10:28.550Z] 15:10:28     INFO -     setTimeout handler*TestRunner._checkForHangs@SimpleTest/TestRunner.js:163:5
[task 2018-02-10T15:10:28.551Z] 15:10:28     INFO -     setTimeout handler*TestRunner._checkForHangs@SimpleTest/TestRunner.js:163:5
[task 2018-02-10T15:10:28.551Z] 15:10:28     INFO -     setTimeout handler*TestRunner._checkForHangs@SimpleTest/TestRunner.js:163:5
[task 2018-02-10T15:10:28.551Z] 15:10:28     INFO -     setTimeout handler*TestRunner._checkForHangs@SimpleTest/TestRunner.js:163:5
[task 2018-02-10T15:10:28.552Z] 15:10:28     INFO -     setTimeout handler*TestRunner._checkForHangs@SimpleTest/TestRunner.js:163:5
[task 2018-02-10T15:10:28.552Z] 15:10:28     INFO -     TestRunner.runTests@SimpleTest/TestRunner.js:380:5
[task 2018-02-10T15:10:28.553Z] 15:10:28     INFO -     RunSet.runtests@SimpleTest/setup.js:194:3
[task 2018-02-10T15:10:28.553Z] 15:10:28     INFO -     RunSet.runall@SimpleTest/setup.js:173:5
[task 2018-02-10T15:10:28.554Z] 15:10:28     INFO -     hookupTests@SimpleTest/setup.js:266:5
[task 2018-02-10T15:10:28.554Z] 15:10:28     INFO - parseTestManifest@http://mochi.test:8888/manifestLibrary.js:36:5
[task 2018-02-10T15:10:28.555Z] 15:10:28     INFO - getTestManifest/req.onload@http://mochi.test:8888/manifestLibrary.js:49:11
[task 2018-02-10T15:10:28.555Z] 15:10:28     INFO - EventHandlerNonNull*getTestManifest@http://mochi.test:8888/manifestLibrary.js:45:3
[task 2018-02-10T15:10:28.556Z] 15:10:28     INFO -     hookup@SimpleTest/setup.js:246:5
[task 2018-02-10T15:10:28.556Z] 15:10:28     INFO - EventHandlerNonNull*@http://mochi.test:8888/tests?autorun=1&closeWhenDone=1&consoleLevel=INFO&manifestFile=tests.json&dumpOutputDirectory=%2Ftmp&cleanupCrashes=true:11:1
[task 2018-02-10T15:10:29.545Z] 15:10:29     INFO - GECKO(5382) | MEMORY STAT vsizeMaxContiguous not supported in this build configuration.
[task 2018-02-10T15:10:29.545Z] 15:10:29     INFO - GECKO(5382) | MEMORY STAT | vsize 1590MB | residentFast 131MB | heapAllocated 24MB
[task 2018-02-10T15:10:32.554Z] 15:10:32     INFO - TEST-OK | gfx/layers/apz/test/mochitest/test_bug1151663.html | took 304017ms
[task 2018-02-10T15:10:35.560Z] 15:10:35     INFO - Error: Unable to restore focus, expect failures and timeouts.
[task 2018-02-10T15:10:35.598Z] 15:10:35     INFO - TEST-START | gfx/layers/apz/test/mochitest/test_bug1151667.html
[task 2018-02-10T15:15:58.547Z] 15:15:58     INFO - Not taking screenshot here: see the one that was previously logged
[task 2018-02-10T15:15:58.548Z] 15:15:58     INFO - Buffered messages logged at 15:10:39
[task 2018-02-10T15:15:58.548Z] 15:15:58     INFO - Error: Unable to restore focus, expect failures and timeouts.
[task 2018-02-10T15:15:58.549Z] 15:15:58     INFO - must wait for load
[task 2018-02-10T15:15:58.550Z] 15:15:58     INFO - must wait for focus
[task 2018-02-10T15:15:58.550Z] 15:15:58     INFO - Buffered messages finished
[task 2018-02-10T15:15:58.551Z] 15:15:58     INFO - TEST-UNEXPECTED-FAIL | gfx/layers/apz/test/mochitest/test_bug1151667.html | Test timed out. 
[task 2018-02-10T15:15:58.551Z] 15:15:58     INFO -     reportError@SimpleTest/TestRunner.js:121:7
[task 2018-02-10T15:15:58.551Z] 15:15:58     INFO -     TestRunner._checkForHangs@SimpleTest/TestRunner.js:142:7
[task 2018-02-10T15:15:58.552Z] 15:15:58     INFO -     setTimeout handler*TestRunner._checkForHangs@SimpleTest/TestRunner.js:163:5
[task 2018-02-10T15:15:58.552Z] 15:15:58     INFO -     setTimeout handler*TestRunner._checkForHangs@SimpleTest/TestRunner.js:163:5
[task 2018-02-10T15:15:58.553Z] 15:15:58     INFO -     setTimeout handler*TestRunner._checkForHangs@SimpleTest/TestRunner.js:163:5
[task 2018-02-10T15:15:58.553Z] 15:15:58     INFO -     setTimeout handler*TestRunner._checkForHangs@SimpleTest/TestRunner.js:163:5
[task 2018-02-10T15:15:58.554Z] 15:15:58     INFO -     setTimeout handler*TestRunner._checkForHangs@SimpleTest/TestRunner.js:163:5
[task 2018-02-10T15:15:58.554Z] 15:15:58     INFO -     setTimeout handler*TestRunner._checkForHangs@SimpleTest/TestRunner.js:163:5
[task 2018-02-10T15:15:58.555Z] 15:15:58     INFO -     setTimeout handler*TestRunner._checkForHangs@SimpleTest/TestRunner.js:163:5
[task 2018-02-10T15:15:58.555Z] 15:15:58     INFO -     setTimeout handler*TestRunner._checkForHangs@SimpleTest/TestRunner.js:163:5
[task 2018-02-10T15:15:58.556Z] 15:15:58     INFO -     setTimeout handler*TestRunner._checkForHangs@SimpleTest/TestRunner.js:163:5
[task 2018-02-10T15:15:58.556Z] 15:15:58     INFO -     setTimeout handler*TestRunner._checkForHangs@SimpleTest/TestRunner.js:163:5
[task 2018-02-10T15:15:58.557Z] 15:15:58     INFO -     setTimeout handler*TestRunner._checkForHangs@SimpleTest/TestRunner.js:163:5
[task 2018-02-10T15:15:58.557Z] 15:15:58     INFO -     setTimeout handler*TestRunner._checkForHangs@SimpleTest/TestRunner.js:163:5
[task 2018-02-10T15:15:58.558Z] 15:15:58     INFO -     setTimeout handler*TestRunner._checkForHangs@SimpleTest/TestRunner.js:163:5
[task 2018-02-10T15:15:58.558Z] 15:15:58     INFO -     setTimeout handler*TestRunner._checkForHangs@SimpleTest/TestRunner.js:163:5
[task 2018-02-10T15:15:58.559Z] 15:15:58     INFO -     setTimeout handler*TestRunner._checkForHangs@SimpleTest/TestRunner.js:163:5
[task 2018-02-10T15:15:58.559Z] 15:15:58     INFO -     setTimeout handler*TestRunner._checkForHangs@SimpleTest/TestRunner.js:163:5
[task 2018-02-10T15:15:58.560Z] 15:15:58     INFO -     setTimeout handler*TestRunner._checkForHangs@SimpleTest/TestRunner.js:163:5
[task 2018-02-10T15:15:58.560Z] 15:15:58     INFO -     setTimeout handler*TestRunner._checkForHangs@SimpleTest/TestRunner.js:163:5
[task 2018-02-10T15:15:58.561Z] 15:15:58     INFO -     setTimeout handler*TestRunner._checkForHangs@SimpleTest/TestRunner.js:163:5
[task 2018-02-10T15:15:58.561Z] 15:15:58     INFO -     setTimeout handler*TestRunner._checkForHangs@SimpleTest/TestRunner.js:163:5
[task 2018-02-10T15:15:58.562Z] 15:15:58     INFO -     setTimeout handler*TestRunner._checkForHangs@SimpleTest/TestRunner.js:163:5
[task 2018-02-10T15:15:58.563Z] 15:15:58     INFO -     TestRunner.runTests@SimpleTest/TestRunner.js:380:5
[task 2018-02-10T15:15:58.563Z] 15:15:58     INFO -     RunSet.runtests@SimpleTest/setup.js:194:3
[task 2018-02-10T15:15:58.563Z] 15:15:58     INFO -     RunSet.runall@SimpleTest/setup.js:173:5
[task 2018-02-10T15:15:58.563Z] 15:15:58     INFO -     hookupTests@SimpleTest/setup.js:266:5
[task 2018-02-10T15:15:58.564Z] 15:15:58     INFO - parseTestManifest@http://mochi.test:8888/manifestLibrary.js:36:5
[task 2018-02-10T15:15:58.564Z] 15:15:58     INFO - getTestManifest/req.onload@http://mochi.test:8888/manifestLibrary.js:49:11
[task 2018-02-10T15:15:58.565Z] 15:15:58     INFO - EventHandlerNonNull*getTestManifest@http://mochi.test:8888/manifestLibrary.js:45:3
[task 2018-02-10T15:15:58.565Z] 15:15:58     INFO -     hookup@SimpleTest/setup.js:246:5
[task 2018-02-10T15:15:58.566Z] 15:15:58     INFO - EventHandlerNonNull*@http://mochi.test:8888/tests?autorun=1&closeWhenDone=1&consoleLevel=INFO&manifestFile=tests.json&dumpOutputDirectory=%2Ftmp&cleanupCrashes=true:11:1
[task 2018-02-10T15:15:59.554Z] 15:15:59     INFO - Not taking screenshot here: see the one that was previously logged
[task 2018-02-10T15:15:59.554Z] 15:15:59     INFO - TEST-UNEXPECTED-FAIL | gfx/layers/apz/test/mochitest/test_bug1151667.html | [SimpleTest.finish()] waitForFocus() was called a different number of times from the number of callbacks run.  Maybe the test terminated prematurely -- be sure to use SimpleTest.waitForExplicitFinish(). - got 1, expected +0
[task 2018-02-10T15:15:59.555Z] 15:15:59     INFO -     SimpleTest.is@SimpleTest/SimpleTest.js:312:5
[task 2018-02-10T15:15:59.556Z] 15:15:59     INFO -     afterCleanup@SimpleTest/SimpleTest.js:1169:13
[task 2018-02-10T15:15:59.556Z] 15:15:59     INFO -     executeCleanupFunction@SimpleTest/SimpleTest.js:1211:13
[task 2018-02-10T15:15:59.556Z] 15:15:59     INFO -     SimpleTest.finish@SimpleTest/SimpleTest.js:1230:5
[task 2018-02-10T15:15:59.557Z] 15:15:59     INFO -     killTest@SimpleTest/TestRunner.js:130:7
[task 2018-02-10T15:15:59.557Z] 15:15:59     INFO -     delayedKillTest@SimpleTest/TestRunner.js:157:47
[task 2018-02-10T15:15:59.558Z] 15:15:59     INFO -     setTimeout handler*TestRunner._checkForHangs@SimpleTest/TestRunner.js:157:7
[task 2018-02-10T15:15:59.559Z] 15:15:59     INFO -     setTimeout handler*TestRunner._checkForHangs@SimpleTest/TestRunner.js:163:5
[task 2018-02-10T15:15:59.559Z] 15:15:59     INFO -     setTimeout handler*TestRunner._checkForHangs@SimpleTest/TestRunner.js:163:5
[task 2018-02-10T15:15:59.560Z] 15:15:59     INFO -     setTimeout handler*TestRunner._checkForHangs@SimpleTest/TestRunner.js:163:5
[task 2018-02-10T15:15:59.560Z] 15:15:59     INFO -     setTimeout handler*TestRunner._checkForHangs@SimpleTest/TestRunner.js:163:5
[task 2018-02-10T15:15:59.561Z] 15:15:59     INFO -     setTimeout handler*TestRunner._checkForHangs@SimpleTest/TestRunner.js:163:5
[task 2018-02-10T15:15:59.561Z] 15:15:59     INFO -     setTimeout handler*TestRunner._checkForHangs@SimpleTest/TestRunner.js:163:5
[task 2018-02-10T15:15:59.562Z] 15:15:59     INFO -     setTimeout handler*TestRunner._checkForHangs@SimpleTest/TestRunner.js:163:5
[task 2018-02-10T15:15:59.562Z] 15:15:59     INFO -     setTimeout handler*TestRunner._checkForHangs@SimpleTest/TestRunner.js:163:5
[task 2018-02-10T15:15:59.563Z] 15:15:59     INFO -     setTimeout handler*TestRunner._checkForHangs@SimpleTest/TestRunner.js:163:5
[task 2018-02-10T15:15:59.563Z] 15:15:59     INFO -     setTimeout handler*TestRunner._checkForHangs@SimpleTest/TestRunner.js:163:5
[task 2018-02-10T15:15:59.563Z] 15:15:59     INFO -     setTimeout handler*TestRunner._checkForHangs@SimpleTest/TestRunner.js:163:5
[task 2018-02-10T15:15:59.564Z] 15:15:59     INFO -     setTimeout handler*TestRunner._checkForHangs@SimpleTest/TestRunner.js:163:5
[task 2018-02-10T15:15:59.565Z] 15:15:59     INFO -     setTimeout handler*TestRunner._checkForHangs@SimpleTest/TestRunner.js:163:5
[task 2018-02-10T15:15:59.565Z] 15:15:59     INFO -     setTimeout handler*TestRunner._checkForHangs@SimpleTest/TestRunner.js:163:5
[task 2018-02-10T15:15:59.565Z] 15:15:59     INFO -     setTimeout handler*TestRunner._checkForHangs@SimpleTest/TestRunner.js:163:5
[task 2018-02-10T15:15:59.566Z] 15:15:59     INFO -     setTimeout handler*TestRunner._checkForHangs@SimpleTest/TestRunner.js:163:5
[task 2018-02-10T15:15:59.566Z] 15:15:59     INFO -     setTimeout handler*TestRunner._checkForHangs@SimpleTest/TestRunner.js:163:5
[task 2018-02-10T15:15:59.566Z] 15:15:59     INFO -     setTimeout handler*TestRunner._checkForHangs@SimpleTest/TestRunner.js:163:5
[task 2018-02-10T15:15:59.567Z] 15:15:59     INFO -     setTimeout handler*TestRunner._checkForHangs@SimpleTest/TestRunner.js:163:5
[task 2018-02-10T15:15:59.567Z] 15:15:59     INFO -     setTimeout handler*TestRunner._checkForHangs@SimpleTest/TestRunner.js:163:5
[task 2018-02-10T15:15:59.567Z] 15:15:59     INFO -     setTimeout handler*TestRunner._checkForHangs@SimpleTest/TestRunner.js:163:5
[task 2018-02-10T15:15:59.568Z] 15:15:59     INFO -     TestRunner.runTests@SimpleTest/TestRunner.js:380:5
[task 2018-02-10T15:15:59.568Z] 15:15:59     INFO -     RunSet.runtests@SimpleTest/setup.js:194:3
[task 2018-02-10T15:15:59.568Z] 15:15:59     INFO -     RunSet.runall@SimpleTest/setup.js:173:5
[task 2018-02-10T15:15:59.568Z] 15:15:59     INFO -     hookupTests@SimpleTest/setup.js:266:5
[task 2018-02-10T15:15:59.568Z] 15:15:59     INFO - parseTestManifest@http://mochi.test:8888/manifestLibrary.js:36:5
[task 2018-02-10T15:15:59.569Z] 15:15:59     INFO - getTestManifest/req.onload@http://mochi.test:8888/manifestLibrary.js:49:11
[task 2018-02-10T15:15:59.570Z] 15:15:59     INFO - EventHandlerNonNull*getTestManifest@http://mochi.test:8888/manifestLibrary.js:45:3
[task 2018-02-10T15:15:59.570Z] 15:15:59     INFO -     hookup@SimpleTest/setup.js:246:5
[task 2018-02-10T15:15:59.570Z] 15:15:59     INFO - EventHandlerNonNull*@http://mochi.test:8888/tests?autorun=1&closeWhenDone=1&consoleLevel=INFO&manifestFile=tests.json&dumpOutputDirectory=%2Ftmp&cleanupCrashes=true:11:1
[task 2018-02-10T15:15:59.571Z] 15:15:59     INFO - GECKO(5382) | MEMORY STAT | vsize 1590MB | residentFast 145MB | heapAllocated 25MB
[task 2018-02-10T15:16:01.554Z] 15:16:01     INFO - TEST-OK | gfx/layers/apz/test/mochitest/test_bug1151667.html | took 325956ms
[task 2018-02-10T15:16:04.558Z] 15:16:04     INFO - Error: Unable to restore focus, expect failures and timeouts.
There have been 33 failures in the last week.
The failures occur on linux64-qr/ opt
This is a clone of Bug #1374166

Here is a recent log file and a snippet with the failure:
https://treeherder.mozilla.org/logviewer.html#?repo=autoland&job_id=165125656&lineNumber=6596

[task 2018-03-01T07:44:48.746Z] 07:44:48     INFO - Buffered messages logged at 07:44:47
[task 2018-03-01T07:44:48.746Z] 07:44:48     INFO - must wait for load
[task 2018-03-01T07:44:48.746Z] 07:44:48     INFO - Buffered messages logged at 07:44:48
[task 2018-03-01T07:44:48.746Z] 07:44:48     INFO - TEST-PASS | gfx/layers/apz/test/mochitest/test_bug1151663.html | expected at least one paint in compositor test data 
[task 2018-03-01T07:44:48.746Z] 07:44:48     INFO - Buffered messages finished
[task 2018-03-01T07:44:48.746Z] 07:44:48     INFO - TEST-UNEXPECTED-FAIL | gfx/layers/apz/test/mochitest/test_bug1151663.html | found the RCD node 
[task 2018-03-01T07:44:48.746Z] 07:44:48     INFO -     testBug1151663@gfx/layers/apz/test/mochitest/helper_bug1151663.html:53:7
[task 2018-03-01T07:44:48.746Z] 07:44:48     INFO -     promise callback*@gfx/layers/apz/test/mochitest/helper_bug1151663.html:59:5
[task 2018-03-01T07:44:48.746Z] 07:44:48     INFO - GECKO(5239) | JavaScript error: http://mochi.test:8888/tests/gfx/layers/apz/test/mochitest/helper_bug1151663.html, line 54: TypeError: rcd is null
[task 2018-03-01T07:49:46.467Z] 07:49:46     INFO - Not taking screenshot here: see the one that was previously logged
Flags: needinfo?(bugmail)
Whiteboard: [gfx-noted] → [stockwell needswork]
Flags: needinfo?(bugmail)
Whiteboard: [stockwell needswork] → [stockwell needswork][gfx-noted]
Assignee: nobody → bugmail
There are 124 failures in the last 7 days.
All of them are on "linux64-qr". Mostly on "opt", only 2 of them on "debug".

:Kats, do you wanna take a look or should we go on and disable it?
Flags: needinfo?(bugmail)
I'll take a look today, and disable by end of day if I can't figure out what's going on.
Flags: needinfo?(bugmail)
to skip on linux64-qr, please use |skip-if = webrender|, here is an example:
https://searchfox.org/mozilla-central/source/dom/events/test/mochitest.ini#180
Attached the disable patch in case :kats won`t fix it
Attachment #8957211 - Flags: review?(jmaher)
Comment on attachment 8957211 [details] [diff] [review]
Disable test on Linux64-qr

Review of attachment 8957211 [details] [diff] [review]:
-----------------------------------------------------------------

looks good
Attachment #8957211 - Flags: review?(jmaher) → review+
So the problem is that at the time the compositor APZ test data is pulled, the compositor has not yet received a parent-process layer tree that includes a reflayer for the helper's child window. In other words, we do a waitUntilApzStable() in the content process on the test page before spawning the helper into a new window, and we do a waitUntilApzStable() in the content process which ensures the new content-process layer tree has been sent to the compositor, but we never wait to make sure the parent process has actually hooked up the new layer tree from the content process and sent that to the compositor.

This seems to be a pre-existing bug which just gets hit a lot more with WR because of timing differences. Trying to figure out what a good way to fix this is. One way to do it might be to send a waitUntilApzStable() call to the parent process to run there as well, or something equivalent to that.
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #15)
> One way to do it might be to send a waitUntilApzStable() call to
> the parent process to run there as well, or something equivalent to that.

I implemented this, but it's not sufficient. It looks like the parent process even doesn't have the nsSubDocumentFrame/RenderFrameParent for the new content set up yet, so forcing a paint just forces the paint of the previous page. I'll keep digging, but I probably won't be able to get to the bottom of this today. I'll go ahead and push the test disabling patch.
Err actually you can push it. Looks like the attachment doesn't have author info properly set.
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #16)
> It looks like the parent
> process even doesn't have the nsSubDocumentFrame/RenderFrameParent for the
> new content set up yet, so forcing a paint just forces the paint of the
> previous page.

I think this was wrong, I got confused in looking at my logging output.

The problem now is that the parent process paint (with the new RefLayer) can happen after the content process paint, and in that case [1] comes into play and skips recording the content-process layer tree info that we are interested in. We need a content process paint to happen *after* the parent process paint, or we need to change the recording code.

[1] https://searchfox.org/mozilla-central/rev/588d8120aa11738657da93e09a03378bcd1ba8ec/gfx/layers/apz/src/APZCTreeManager.cpp#961-964
Comment on attachment 8957659 [details]
Bug 1437295 - Promise-ify some of the paint/flushing methods.

https://reviewboard.mozilla.org/r/226574/#review232478

This is a promising direction :)

::: testing/mochitest/tests/SimpleTest/paint_listener.js:88
(Diff revision 1)
>  
>    window.waitForAllPaints = function(callback) {
>      waitForPaints(callback, null, FlushModes.NOFLUSH);
>    };
> +
> +  window.promiseAllPaintsDone = function(subdoc = null, flush = false) {

Why are we changing the type of the last parameter from a descriptive enum to a nondescript bool?

::: testing/mochitest/tests/SimpleTest/paint_listener.js:91
(Diff revision 1)
>    };
> +
> +  window.promiseAllPaintsDone = function(subdoc = null, flush = false) {
> +    var flushmode = flush ? FlushModes.FLUSH : FlushModes.NOFLUSH;
> +    return new Promise(function (resolve, reject) {
> +      waitForPaints(resultRect => resolve(resultRect), subdoc, flushmode);

Does just |resolve| not work in place of |resultRect => resolve(resultRect)|?
Attachment #8957659 - Flags: review?(botond) → review+
Comment on attachment 8957660 [details]
Bug 1437295 - Ensure cleanup functions registered in a subtest are invoked when unloading the subtest.

https://reviewboard.mozilla.org/r/226576/#review232490

The cleanup function that the next patch is registering calls destroy() on a chrome script created with loadChromeScript().

We have existing APZ mochitests that create and destroy chrome scripts (in getSnapshot(), and the chromeProcessCounter in helper_touch_action_regions.html), and those are also used from subtests.

Why is destroying the chrome script at the end of the test group a problem in this case, and not in those cases?
Comment on attachment 8957661 [details]
Bug 1437295 - Ensure that waitUntilApzStable also flushes the parent process.

https://reviewboard.mozilla.org/r/226578/#review232496

::: gfx/layers/apz/test/mochitest/apz_test_utils.js:298
(Diff revision 1)
> +    // that not only is the content-process layer tree ready in the compositor,
> +    // the parent-process layer tree in the compositor has the appropriate
> +    // RefLayer pointing to the content-process layer tree.
> +
> +    // Sadly this helper function cannot reuse any code from other places because
> +    // it must be totally self-contained to be shipped over to the parent process.

I wonder if we can do |ChromeUtils.import('somepath/apz_test_utils.js')|, and then call the helpers we'd like to call. (I wouldn't spend more than a few minutes trying to make that work, though.)
Attachment #8957661 - Flags: review?(botond) → review+
Comment on attachment 8957662 [details]
Bug 1437295 - Remove unnecessary change from bug 1374166.

https://reviewboard.mozilla.org/r/226580/#review232506

test_bug1151663.html / helper_bug1151663.html is basically an ad-hoc implementation of the "run the test in its own window for greater control" pattern that we've since polished with runSubtestsSeriallyInFreshWindows().

Could we, perhaps in a follow-up, just change this test to use runSubtestsSeriallyInFreshWindows() (even if it's the only test in the group for now)?
Attachment #8957662 - Flags: review?(botond) → review+
Comment on attachment 8957663 [details]
Bug 1437295 - Ensure that we have a compositor APZ testdata before running the test.

https://reviewboard.mozilla.org/r/226582/#review232508

We have a few other tests that query compositor-side test data. Should we be calling forceLayerTreeToCompositor() in those tests as well?
Attachment #8957663 - Flags: review?(botond) → review+
(In reply to Botond Ballo [:botond] from comment #27)
> The cleanup function that the next patch is registering calls destroy() on a
> chrome script created with loadChromeScript().
> 
> We have existing APZ mochitests that create and destroy chrome scripts (in
> getSnapshot(), and the chromeProcessCounter in
> helper_touch_action_regions.html), and those are also used from subtests.
> 
> Why is destroying the chrome script at the end of the test group a problem
> in this case, and not in those cases?

To follow up on this: even if there is a good reason to need to run cleanup at the end of a subtest, I'm uneasy about stepping on SimpleTest's toes like this and assuming that _every_ cleanup function should be run at the end of the subtest.

I'd rather introduce a different cleanup function API where the cleanup functions are explicitly designated as running before unloading the current page. One way to do this could be to maintain our own list of cleanup functions in apz_test_utils.js. By default, we'd register a function (with SimpleTest.registerCleanupFunction()) that calls all of our cleanup functions, but runSubtestsSeriallyInFreshWindows() would override that behaviour to call our functions at the end of each subtest.
Comment on attachment 8957660 [details]
Bug 1437295 - Ensure cleanup functions registered in a subtest are invoked when unloading the subtest.

https://reviewboard.mozilla.org/r/226576/#review232526
Attachment #8957660 - Flags: review?(botond)
(In reply to Botond Ballo [:botond] from comment #26)
> > +  window.promiseAllPaintsDone = function(subdoc = null, flush = false) {
> 
> Why are we changing the type of the last parameter from a descriptive enum
> to a nondescript bool?

The descriptive enum is hidden inside the function scope of the top-level wrapper function in the file, and is not exported onto the window so callers can't see it. What we're actually doing is replacing the overloaded function where the descriptiveness is baked into the function name (waitForAllPaintsFlushed vs waitForAllPaints) with a nondescript bool. I can add an extra promise function if you want but it seemed redundant to me.

> ::: testing/mochitest/tests/SimpleTest/paint_listener.js:91
> > +      waitForPaints(resultRect => resolve(resultRect), subdoc, flushmode);
> 
> Does just |resolve| not work in place of |resultRect => resolve(resultRect)|?

While checking this I realized that neither of those work, because actually the callback is passed 4 arguments (the components of the rect), and you can only pass one thing to a resolve function (the rest seem to get stripped away). So I had to update this to take the 4 rect components and pass them as an array to resolve().

(In reply to Botond Ballo [:botond] from comment #27)
> The cleanup function that the next patch is registering calls destroy() on a
> chrome script created with loadChromeScript().
> 
> We have existing APZ mochitests that create and destroy chrome scripts (in
> getSnapshot(), and the chromeProcessCounter in
> helper_touch_action_regions.html), and those are also used from subtests.
> 
> Why is destroying the chrome script at the end of the test group a problem
> in this case, and not in those cases?

It actually is a problem in those cases too, it just so happens that helper_touch_action_regions is run as the last subtest in the group, so it accidentally works. If you move it to run earlier it fails in the same way. No other subtests do this.

(In reply to Botond Ballo [:botond] from comment #28)
> > +    // Sadly this helper function cannot reuse any code from other places because
> > +    // it must be totally self-contained to be shipped over to the parent process.
> 
> I wonder if we can do |ChromeUtils.import('somepath/apz_test_utils.js')|,
> and then call the helpers we'd like to call. (I wouldn't spend more than a
> few minutes trying to make that work, though.)

ChromeUtils.import seems to take URIs rather than paths, and after spending a few minutes I still couldn't figure which path, if any, would allow importing apz_test_utils.js or other JS files. :/

(In reply to Botond Ballo [:botond] from comment #29)
> Could we, perhaps in a follow-up, just change this test to use
> runSubtestsSeriallyInFreshWindows() (even if it's the only test in the group
> for now)?

Agreed. I think there are other tests we can probably add that group. But I've filed bug 1444611 for it.

(In reply to Botond Ballo [:botond] from comment #30)
> We have a few other tests that query compositor-side test data. Should we be
> calling forceLayerTreeToCompositor() in those tests as well?

I looked at those tests and none of them read the compositor-side test data right at the start of the test. They all do some stuff and (at first glance) wait for the repaints before reading the compositor-side test data, so they should be fine. Some of them might need this but if so they will have their own intermittent failure bugs and we can make the necessary changes after a proper investigation in those cases.

(In reply to Botond Ballo [:botond] from comment #31)
> To follow up on this: even if there is a good reason to need to run cleanup
> at the end of a subtest, I'm uneasy about stepping on SimpleTest's toes like
> this and assuming that _every_ cleanup function should be run at the end of
> the subtest.

Yeah I was a little uneasy about this too, but I couldn't come up with anything better that worked.

> I'd rather introduce a different cleanup function API where the cleanup
> functions are explicitly designated as running before unloading the current
> page. One way to do this could be to maintain our own list of cleanup
> functions in apz_test_utils.js. By default, we'd register a function (with
> SimpleTest.registerCleanupFunction()) that calls all of our cleanup
> functions, but runSubtestsSeriallyInFreshWindows() would override that
> behaviour to call our functions at the end of each subtest.

I had tried something like this but I didn't do a good enough job because I was able to get it working this time.

Updated patches coming soon.
Comment on attachment 8957660 [details]
Bug 1437295 - Ensure cleanup functions registered in a subtest are invoked when unloading the subtest.

https://reviewboard.mozilla.org/r/226576/#review232596

Thanks!

::: gfx/layers/apz/test/mochitest/apz_test_utils.js:196
(Diff revision 2)
>  // to access objects.
>  function runSubtestsSeriallyInFreshWindows(aSubtests) {
>    return new Promise(function(resolve, reject) {
>      var testIndex = -1;
>      var w = null;
> +    var topLevelCleanups = SimpleTest._cleanupFunctions.length;

Not necessary any more
Attachment #8957660 - Flags: review?(botond) → review+
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #34)
> (In reply to Botond Ballo [:botond] from comment #27)
> > The cleanup function that the next patch is registering calls destroy() on a
> > chrome script created with loadChromeScript().
> > 
> > We have existing APZ mochitests that create and destroy chrome scripts (in
> > getSnapshot(), and the chromeProcessCounter in
> > helper_touch_action_regions.html), and those are also used from subtests.
> > 
> > Why is destroying the chrome script at the end of the test group a problem
> > in this case, and not in those cases?
> 
> It actually is a problem in those cases too, it just so happens that
> helper_touch_action_regions is run as the last subtest in the group, so it
> accidentally works. 

Heh :)

> If you move it to run earlier it fails in the same way.

I'm still curious to know what that way is.
(In reply to Botond Ballo [:botond] from comment #41)
> > +    var topLevelCleanups = SimpleTest._cleanupFunctions.length;
> 
> Not necessary any more

Thanks, fixed.

(In reply to Botond Ballo [:botond] from comment #42)
> > If you move it to run earlier it fails in the same way.
> 
> I'm still curious to know what that way is.

I don't have the exact error message handy but basically the nsIMessageManager.removeMessageListener function call was returning an NS_ERROR_INVALID_VALUE which manifested as a JS exception. I had attached gdb to investigate a bit and the error was produced somewhere in the JS/C++ binding goop, probably around where it checks the function arguments to make sure the compartments are correct.
Pushed by kgupta@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/e2d541a790ea
Promise-ify some of the paint/flushing methods. r=botond
https://hg.mozilla.org/integration/autoland/rev/81cbeed33860
Ensure cleanup functions registered in a subtest are invoked when unloading the subtest. r=botond
https://hg.mozilla.org/integration/autoland/rev/49393cd5db23
Ensure that waitUntilApzStable also flushes the parent process. r=botond
https://hg.mozilla.org/integration/autoland/rev/5a973861da6a
Remove unnecessary change from bug 1374166. r=botond
https://hg.mozilla.org/integration/autoland/rev/a89953a37889
Ensure that we have a compositor APZ testdata before running the test. r=botond
Whiteboard: [gfx-noted][stockwell disable-recommended] → [gfx-noted][stockwell fixed:product]
You need to log in before you can comment on or make changes to this bug.