Open Bug 1559570 Opened 5 years ago Updated 2 years ago

Intermittent gfx/layers/apz/test/mochitest/test_bug1151663.html | found the RCD node

Categories

(Core :: Panning and Zooming, defect, P5)

defect

Tracking

()

REOPENED
Tracking Status
firefox-esr60 --- unaffected
firefox67 --- unaffected
firefox68 --- unaffected

People

(Reporter: intermittent-bug-filer, Assigned: emilio)

References

(Regression)

Details

(Keywords: intermittent-failure, regression)

Attachments

(1 file)

Filed by: aciure [at] mozilla.com
Parsed log: https://treeherder.mozilla.org/logviewer.html#?job_id=252011964&repo=autoland
Full log: https://queue.taskcluster.net/v1/task/EfLF6IdGRgy_jefYKH10YQ/runs/0/artifacts/public/logs/live_backing.log


[task 2019-06-15T06:52:36.839Z] 06:52:36 INFO - Buffered messages finished
[task 2019-06-15T06:52:36.839Z] 06:52:36 INFO - TEST-UNEXPECTED-FAIL | gfx/layers/apz/test/mochitest/test_bug1151663.html | found the RCD node
[task 2019-06-15T06:52:36.839Z] 06:52:36 INFO - SimpleTest.ok@SimpleTest/SimpleTest.js:275:18
[task 2019-06-15T06:52:36.839Z] 06:52:36 INFO - testBug1151663@gfx/layers/apz/test/mochitest/helper_bug1151663.html:51:18
[task 2019-06-15T06:52:36.839Z] 06:52:36 INFO - setTimeout handlerrepaintDone@gfx/layers/apz/test/mochitest/apz_test_utils.js:174:7
[task 2019-06-15T06:52:36.839Z] 06:52:36 INFO - GECKO(2911) | JavaScript error: http://mochi.test:8888/tests/gfx/layers/apz/test/mochitest/helper_bug1151663.html, line 52: TypeError: rcd is null
[task 2019-06-15T06:52:36.839Z] 06:52:36 INFO - GECKO(2911) | --DOMWINDOW == 14 (0x7f842a010800) [pid = 2911] [serial = 6] [outer = (nil)] [url = about:blank]
[task 2019-06-15T06:52:37.533Z] 06:52:37 INFO - GECKO(2911) | --DOMWINDOW == 9 (0x7f7daa1b6400) [pid = 3004] [serial = 6] [outer = (nil)] [url = about:blank]
[task 2019-06-15T06:52:37.533Z] 06:52:37 INFO - GECKO(2911) | --DOMWINDOW == 8 (0x7f7daaf9ec00) [pid = 3004] [serial = 3] [outer = (nil)] [url = about:blank]
[task 2019-06-15T06:52:41.153Z] 06:52:41 INFO - GECKO(2911) | --DOMWINDOW == 13 (0x7f8426d23800) [pid = 2911] [serial = 14] [outer = (nil)] [url = about:blank]
[task 2019-06-15T06:52:41.633Z] 06:52:41 INFO - GECKO(2911) | --DOMWINDOW == 7 (0x7f7daa1b6000) [pid = 3004] [serial = 10] [outer = (nil)] [url = about:blank]
[task 2019-06-15T06:52:41.634Z] 06:52:41 INFO - GECKO(2911) | --DOMWINDOW == 6 (0x7f7daa1b4c00) [pid = 3004] [serial = 7] [outer = (nil)] [url = about:blank]
[task 2019-06-15T06:52:42.160Z] 06:52:42 INFO - GECKO(2911) | --DOMWINDOW == 2 (0x7fda63803400) [pid = 3076] [serial = 2] [outer = (nil)] [url = about:blank]
[task 2019-06-15T06:52:42.767Z] 06:52:42 INFO - GECKO(2911) | [Parent 2911, StreamTrans #5] WARNING: 'NS_FAILED(rv)', file /builds/worker/workspace/build/src/modules/libjar/nsJARChannel.cpp, line 371
[task 2019-06-15T06:52:42.768Z] 06:52:42 INFO - GECKO(2911) | [Parent 2911, Main Thread] WARNING: NS_ENSURE_SUCCESS(rv, rv) failed with result 0x80520012: file /builds/worker/workspace/build/src/modules/libjar/nsJARChannel.cpp, line 994
[task 2019-06-15T06:58:01.796Z] 06:58:01 INFO - Not taking screenshot here: see the one that was previously logged
[task 2019-06-15T06:58:01.798Z] 06:58:01 INFO - TEST-UNEXPECTED-FAIL | gfx/layers/apz/test/mochitest/test_bug1151663.html | Test timed out.
[task 2019-06-15T06:58:01.798Z] 06:58:01 INFO - SimpleTest.ok@SimpleTest/SimpleTest.js:275:18
[task 2019-06-15T06:58:01.798Z] 06:58:01 INFO - reportError@SimpleTest/TestRunner.js:121:22
[task 2019-06-15T06:58:01.798Z] 06:58:01 INFO - TestRunner._checkForHangs@SimpleTest/TestRunner.js:142:7
[task 2019-06-15T06:58:01.799Z] 06:58:01 INFO - setTimeout handler
TestRunner._checkForHangs@SimpleTest/TestRunner.js:163:5
[task 2019-06-15T06:58:01.799Z] 06:58:01 INFO - setTimeout handlerTestRunner._checkForHangs@SimpleTest/TestRunner.js:163:5
[task 2019-06-15T06:58:01.799Z] 06:58:01 INFO - setTimeout handler
TestRunner._checkForHangs@SimpleTest/TestRunner.js:163:5
[task 2019-06-15T06:58:01.799Z] 06:58:01 INFO - setTimeout handlerTestRunner._checkForHangs@SimpleTest/TestRunner.js:163:5
[task 2019-06-15T06:58:01.799Z] 06:58:01 INFO - setTimeout handler
TestRunner._checkForHangs@SimpleTest/TestRunner.js:163:5
[task 2019-06-15T06:58:01.800Z] 06:58:01 INFO - setTimeout handlerTestRunner._checkForHangs@SimpleTest/TestRunner.js:163:5
[task 2019-06-15T06:58:01.800Z] 06:58:01 INFO - setTimeout handler
TestRunner._checkForHangs@SimpleTest/TestRunner.js:163:5
[task 2019-06-15T06:58:01.800Z] 06:58:01 INFO - setTimeout handlerTestRunner._checkForHangs@SimpleTest/TestRunner.js:163:5
[task 2019-06-15T06:58:01.800Z] 06:58:01 INFO - setTimeout handler
TestRunner._checkForHangs@SimpleTest/TestRunner.js:163:5
[task 2019-06-15T06:58:01.800Z] 06:58:01 INFO - setTimeout handlerTestRunner._checkForHangs@SimpleTest/TestRunner.js:163:5
[task 2019-06-15T06:58:01.800Z] 06:58:01 INFO - setTimeout handler
TestRunner._checkForHangs@SimpleTest/TestRunner.js:163:5
[task 2019-06-15T06:58:01.801Z] 06:58:01 INFO - TestRunner.runTests/<@SimpleTest/TestRunner.js:381:20
[task 2019-06-15T06:58:01.801Z] 06:58:01 INFO - promise callbackTestRunner.runTests@SimpleTest/TestRunner.js:368:50
[task 2019-06-15T06:58:01.801Z] 06:58:01 INFO - RunSet.runtests@SimpleTest/setup.js:201:14
[task 2019-06-15T06:58:01.802Z] 06:58:01 INFO - RunSet.runall@SimpleTest/setup.js:180:12
[task 2019-06-15T06:58:01.802Z] 06:58:01 INFO - hookupTests@SimpleTest/setup.js:273:12
[task 2019-06-15T06:58:01.802Z] 06:58:01 INFO - parseTestManifest@http://mochi.test:8888/manifestLibrary.js:36:5
[task 2019-06-15T06:58:01.802Z] 06:58:01 INFO - getTestManifest/req.onload@http://mochi.test:8888/manifestLibrary.js:49:11
[task 2019-06-15T06:58:01.803Z] 06:58:01 INFO - EventHandlerNonNull
getTestManifest@http://mochi.test:8888/manifestLibrary.js:45:3

Hmm, so that test doesn't seem to have any style change whatsoever, so it's unclear to me how that patch should cause it... I guess the RCD node stands for root content document?

Keywords: regression

I'm not working on this because I don't really know how my patch could've caused it. My patch at best makes some reflows faster, but I don't possibly know how could it affect this.

Kats, am I looking at the right thing? I don't see any dynamic style change in that subtest (nor anything non-statically positioned really, either in the test or in the window it opens).

Is it possible that waitForApzStable is not waiting for the right thing? That waits only for one window, afaict. Also, I think given we don't allow duplicate message listeners, if two different tests are waiting for APZ one of them may get the event too soon and deregister too early... Though I guess that may not happen since before doing that we're waiting for the window to get focus...

Flags: needinfo?(emilio) → needinfo?(kats)

Also, Andreea, do you know if this happens only on WebRender? Or does the failure happen on other platforms?

Flags: needinfo?(apavel)

I'm guessing yes, that's the problem: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=252131533&repo=try&lineNumber=390313

Is there some change we can make to that helper to force a full flush/repaint?

Yes, it is indeed... We can probably modify box-sizing or something to cause the same kind of flush that this was causing...

Do you know exactly which part of the layout flush is this code is relying on? My code should still cause a layout flush, just a smaller reflow scoped to the body rather than a full reflow. It'd be great to make it a bit less footgunny :)

Assignee: nobody → emilio
Flags: needinfo?(emilio) → needinfo?(kats)

box-sizing changes seem a bit less likely to be optimized away in the near
future...

(In reply to Emilio Cobos Álvarez (:emilio) from comment #10)

Yes, it is indeed... We can probably modify box-sizing or something to cause the same kind of flush that this was causing...

SGTM, will r+ your patch in a sec.

Do you know exactly which part of the layout flush is this code is relying on? My code should still cause a layout flush, just a smaller reflow scoped to the body rather than a full reflow. It'd be great to make it a bit less footgunny :)

We really just need to force a repaint here, so that an updated layer tree gets sent to the compositor. I think I copy/pasted this code from elsewhere and don't have a great understanding of the layout flush mechanics involved, just that it seemed to do what we wanted.

Flags: needinfo?(kats)
Pushed by ealvarez@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/5c58b5952441
Use a different style change to trigger AllReflowHints for now. r=kats
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla69
Status: RESOLVED → REOPENED
Flags: needinfo?(emilio)
Resolution: FIXED → ---
Target Milestone: mozilla69 → ---

It looks like the frequency is much lower now though, right?

I don't really have better ideas to change it to be even more reliable without invalidating the world... Kats, is doing a proper repaint (i.e., changing a color property or background) something to avoid here? Or should we do that?

Flags: needinfo?(emilio) → needinfo?(kats)

Doing that is fine, if it will force a proper repaint and send the full layer tree over.

Flags: needinfo?(kats)

All the recent failures are on macOS without WebRender enabled. Not sure if it's the same problem as what Emilio's patch fixed. In theory it might be a similar thing, or it might be a different regression.

Removing "regression" keyword from APZ intermittents for reasons laid out in bug 1571054 comment 0.

Keywords: regression
Has Regression Range: --- → yes
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: