Closed Bug 1436200 Opened 2 years ago Closed 2 years ago

Intermittent /test_closewindow-with-pointerlock.html, test_window_open_close.html,test_group_pointerevents.html | application crashed [@ mozilla::layers::APZCTreeManager::GetAPZCAtPointWR] after Assertion failure: scrollId == FrameMetrics::NULL_SCROLL_ID

Categories

(Core :: Graphics: WebRender, defect, P3, critical)

defect

Tracking

()

RESOLVED FIXED
mozilla60
Tracking Status
firefox-esr52 --- unaffected
firefox58 --- unaffected
firefox59 --- disabled
firefox60 --- fixed

People

(Reporter: intermittent-bug-filer, Assigned: kats)

References

(Blocks 1 open bug)

Details

(Keywords: crash, intermittent-failure, Whiteboard: [stockwell unknown])

Crash Data

Attachments

(1 file)

Filed by: ncsoregi [at] mozilla.com

https://treeherder.mozilla.org/logviewer.html#?job_id=160683058&repo=autoland

https://queue.taskcluster.net/v1/task/A4rnj_zySKOUHcsWoLUvUw/runs/0/artifacts/public/logs/live_backing.log

[task 2018-02-06T21:57:28.395Z] 21:57:28     INFO - TEST-PASS | dom/tests/mochitest/pointerlock/test_closewindow-with-pointerlock.html | Check we have locked the pointer 
2518
[task 2018-02-06T21:57:28.395Z] 21:57:28     INFO - Buffered messages finished
2519
[task 2018-02-06T21:57:28.395Z] 21:57:28    ERROR - TEST-UNEXPECTED-FAIL | dom/tests/mochitest/pointerlock/test_closewindow-with-pointerlock.html | application terminated with exit code 11
2520
[task 2018-02-06T21:57:28.396Z] 21:57:28     INFO - runtests.py | Application ran for: 0:00:54.335981
2521
[task 2018-02-06T21:57:28.396Z] 21:57:28     INFO - zombiecheck | Reading PID log: /tmp/tmp1o5Yiqpidlog
2522
[task 2018-02-06T21:57:28.397Z] 21:57:28     INFO - ==> process 1206 launched child process 1226
2523
[task 2018-02-06T21:57:28.397Z] 21:57:28     INFO - ==> process 1206 launched child process 1263
2524
[task 2018-02-06T21:57:28.398Z] 21:57:28     INFO - ==> process 1206 launched child process 1294
2525
[task 2018-02-06T21:57:28.399Z] 21:57:28     INFO - zombiecheck | Checking for orphan process with PID: 1226
2526
[task 2018-02-06T21:57:28.400Z] 21:57:28     INFO - zombiecheck | Checking for orphan process with PID: 1294
2527
[task 2018-02-06T21:57:28.400Z] 21:57:28     INFO - zombiecheck | Checking for orphan process with PID: 1263
2528
[task 2018-02-06T21:57:28.401Z] 21:57:28     INFO - mozcrash Copy/paste: /usr/local/bin/linux64-minidump_stackwalk /tmp/tmpZlwHtd.mozrunner/minidumps/44107c85-4a27-67c8-944c-0e0ce506a116.dmp /builds/worker/workspace/build/symbols
2529
[task 2018-02-06T21:57:36.358Z] 21:57:36     INFO - mozcrash Saved minidump as /builds/worker/workspace/build/blobber_upload_dir/44107c85-4a27-67c8-944c-0e0ce506a116.dmp
2530
[task 2018-02-06T21:57:36.359Z] 21:57:36     INFO - mozcrash Saved app info as /builds/worker/workspace/build/blobber_upload_dir/44107c85-4a27-67c8-944c-0e0ce506a116.extra
2531
[task 2018-02-06T21:57:36.437Z] 21:57:36     INFO - PROCESS-CRASH | dom/tests/mochitest/pointerlock/test_closewindow-with-pointerlock.html | application crashed [@ mozilla::layers::APZCTreeManager::GetAPZCAtPointWR]
Summary: Intermittent /test_closewindow-with-pointerlock.html | application crashed [@ mozilla::layers::APZCTreeManager::GetAPZCAtPointWR] after Assertion failure: scrollId == FrameMetrics::NULL_SCROLL_ID → Intermittent /test_closewindow-with-pointerlock.html, test_window_open_close.html | application crashed [@ mozilla::layers::APZCTreeManager::GetAPZCAtPointWR] after Assertion failure: scrollId == FrameMetrics::NULL_SCROLL_ID
Component: DOM → Graphics: WebRender
Assignee: nobody → bugmail
So at least part of the problem here is that we're pushing the scroll data update to APZ [1] at the same time that we're pushing a display list update to WR [2]. However, WR takes that display list update and uses it to update a scene that is not the scene used for hit-testing. So the data that APZ is using is out of sync with the data that WR is using. WR will promote that scene to be the active one at the time of frame generation [3], so that's probably when we should be pushing the scroll data update to APZ.

[1] https://searchfox.org/mozilla-central/rev/b9f1a4ecba48b2d8c686669e32d109c40e927b48/gfx/layers/wr/WebRenderBridgeParent.cpp#629
[2] https://searchfox.org/mozilla-central/rev/b9f1a4ecba48b2d8c686669e32d109c40e927b48/gfx/layers/wr/WebRenderBridgeParent.cpp#613-617
[3] https://searchfox.org/mozilla-central/rev/b9f1a4ecba48b2d8c686669e32d109c40e927b48/gfx/layers/wr/WebRenderBridgeParent.cpp#1256-1258
In theory the fix is simple but it will likely result in some sort of breakage - subtle breakage if not obvious breakage. Let's see how this plays out: https://treeherder.mozilla.org/#/jobs?repo=try&revision=988d3bdfd6d071b4d3baf31de70b8acbded390a8
Summary: Intermittent /test_closewindow-with-pointerlock.html, test_window_open_close.html | application crashed [@ mozilla::layers::APZCTreeManager::GetAPZCAtPointWR] after Assertion failure: scrollId == FrameMetrics::NULL_SCROLL_ID → Intermittent /test_closewindow-with-pointerlock.html, test_window_open_close.html,test_group_pointerevents.html | application crashed [@ mozilla::layers::APZCTreeManager::GetAPZCAtPointWR] after Assertion failure: scrollId == FrameMetrics::NULL_SCROLL_ID
There have been 36 failures in the last week.
The failures occur on 	linux64-qr/debug.
Here is a recent log file and a snippet with the failure:
https://treeherder.mozilla.org/logviewer.html#?repo=mozilla-central&job_id=165789449&lineNumber=2333

[task 2018-03-03T22:30:55.702Z] 22:30:55     INFO - GECKO(1236) | MEMORY STAT vsizeMaxContiguous not supported in this build configuration.
[task 2018-03-03T22:30:55.711Z] 22:30:55     INFO - GECKO(1236) | MEMORY STAT | vsize 1581MB | residentFast 149MB | heapAllocated 25MB
[task 2018-03-03T22:30:55.938Z] 22:30:55     INFO - GECKO(1236) | Assertion failure: scrollId == FrameMetrics::NULL_SCROLL_ID, at /builds/worker/workspace/build/src/gfx/layers/apz/src/APZCTreeManager.cpp:2393
[task 2018-03-03T22:31:27.277Z] 22:31:27     INFO - GECKO(1236) | #01: mozilla::layers::APZCTreeManager::GetTargetAPZC [gfx/layers/apz/src/APZCTreeManager.cpp:2341]
[task 2018-03-03T22:31:27.279Z] 22:31:27     INFO - 
[task 2018-03-03T22:31:27.281Z] 22:31:27     INFO - GECKO(1236) | #02: mozilla::layers::APZCTreeManager::ReceiveInputEvent [mfbt/AlreadyAddRefed.h:121]
Flags: needinfo?(milan)
Whiteboard: [stockwell needswork]
Whiteboard: [stockwell needswork] → [stockwell needswork:owner]
A lot of flux in webrender, we'll keep this test running while sorting it out.
Flags: needinfo?(milan)
I can remove the assertion for now since the tests it's tripping in aren't actually testing the WR hit-testing code. The proper fix involves bug 1391318 and its dependency chain which isn't even fully fleshed out yet so it will be a week at least before it's all done.
Depends on: 1391318
Comment on attachment 8956069 [details]
Bug 1436200 - Disable failing assertions until we have the proper fix in place.

https://reviewboard.mozilla.org/r/225008/#review231048
Attachment #8956069 - Flags: review?(botond) → review+
Pushed by kgupta@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/4e230122cb23
Disable failing assertions until we have the proper fix in place. r=botond
https://hg.mozilla.org/mozilla-central/rev/4e230122cb23
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla60
Duplicate of this bug: 1211206
Depends on: 1452845
No longer depends on: 1391318
You need to log in before you can comment on or make changes to this bug.