Closed Bug 1457590 Opened 7 years ago Closed 7 years ago

Intermittent gfx/layers/apz/test/mochitest/test_group_mouseevents.html | application crashed [@ mozilla::layers::AsyncPanZoomController::ConvertScrollbarPoint(mozilla::gfx::PointTyped<mozilla::ParentLayer after Assertion failure: mIsSome

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla62

Tracking Flags:

Tracking

Status

firefox-esr52

---

unaffected

firefox-esr60

---

wontfix

firefox60

---

wontfix

firefox61

---

wontfix

firefox62

---

fixed

People

(Reporter: intermittent-bug-filer, Assigned: kats)

References

Details

(Keywords: crash, intermittent-failure)

Crash Data

Attachments

(3 files)

Bug 1457590 - Strengthen the contract around recycling HitTestingTreeNodes. 7 years ago Kartikaya Gupta (email:kats@mozilla.staktrace.com) 59 bytes, text/x-review-board-request	botond : review+	Details
Bug 1457590 - Add the HitTestingTreeNodeAutoLock class. 7 years ago Kartikaya Gupta (email:kats@mozilla.staktrace.com) 59 bytes, text/x-review-board-request	botond : review+	Details
Bug 1457590 - Use the HitTestingTreeNodeAutoLock. 7 years ago Kartikaya Gupta (email:kats@mozilla.staktrace.com) 59 bytes, text/x-review-board-request	botond : review+	Details

Treeherder Bug Filer

Reporter

Description

•

7 years ago

treeherder

Filed by: ncsoregi [at] mozilla.com https://treeherder.mozilla.org/logviewer.html#?job_id=176001225&repo=autoland https://queue.taskcluster.net/v1/task/C4TlOa9sS3qVQWi8jiXL4g/runs/0/artifacts/public/logs/live_backing.log 12:05:26 INFO - TEST-PASS | gfx/layers/apz/test/mochitest/test_group_mouseevents.html | helper_drag_scroll.html | Scroll position strictly increased 12:05:26 INFO - Buffered messages logged at 12:04:37 12:05:26 INFO - must wait for load 12:05:26 INFO - Buffered messages finished 12:05:26 ERROR - TEST-UNEXPECTED-FAIL | gfx/layers/apz/test/mochitest/test_group_mouseevents.html | application terminated with exit code 1 12:05:26 INFO - runtests.py | Application ran for: 0:01:26.715013 12:05:26 INFO - zombiecheck | Reading PID log: /var/folders/_2/5rcdryb97wl3b_q4535l0xqc00000w/T/tmp17ZKFJpidlog 12:05:26 INFO - ==> process 2146 launched child process 2147 12:05:26 INFO - ==> process 2146 launched child process 2148 12:05:26 INFO - ==> process 2146 launched child process 2149 12:05:26 INFO - zombiecheck | Checking for orphan process with PID: 2147 12:05:26 INFO - zombiecheck | Checking for orphan process with PID: 2148 12:05:26 INFO - zombiecheck | Checking for orphan process with PID: 2149 12:05:26 INFO - mozcrash Copy/paste: /Users/cltbld/tasks/task_1524854160/build/macosx64-minidump_stackwalk /var/folders/_2/5rcdryb97wl3b_q4535l0xqc00000w/T/tmpovTx_6.mozrunner/minidumps/1D86CA9E-BCFC-474D-AFAF-CFEC25C082E0.dmp /Users/cltbld/tasks/task_1524854160/build/symbols 12:05:46 INFO - mozcrash Saved minidump as /Users/cltbld/tasks/task_1524854160/build/blobber_upload_dir/1D86CA9E-BCFC-474D-AFAF-CFEC25C082E0.dmp 12:05:46 INFO - mozcrash Saved app info as /Users/cltbld/tasks/task_1524854160/build/blobber_upload_dir/1D86CA9E-BCFC-474D-AFAF-CFEC25C082E0.extra 12:05:46 INFO - PROCESS-CRASH | gfx/layers/apz/test/mochitest/test_group_mouseevents.html | application crashed [@ mozilla::layers::AsyncPanZoomController::ConvertScrollbarPoint(mozilla::gfx::PointTyped<mozilla::ParentLayerPixel, float> const&, mozilla::layers::ScrollbarData const&) const] 12:05:46 INFO - Crash dump filename: /var/folders/_2/5rcdryb97wl3b_q4535l0xqc00000w/T/tmpovTx_6.mozrunner/minidumps/1D86CA9E-BCFC-474D-AFAF-CFEC25C082E0.dmp 12:05:46 INFO - Operating system: Mac OS X 12:05:46 INFO - 10.10.5 14F27 12:05:46 INFO - CPU: amd64 12:05:46 INFO - family 6 model 69 stepping 1 12:05:46 INFO - 4 CPUs 12:05:46 INFO - 12:05:46 INFO - GPU: UNKNOWN 12:05:46 INFO - 12:05:46 INFO - Crash reason: EXC_BAD_ACCESS / KERN_INVALID_ADDRESS 12:05:46 INFO - Crash address: 0x0 12:05:46 INFO - Process uptime: 37 seconds 12:05:46 INFO -

Comment hidden (Intermittent Failures Robot)

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 3

•

7 years ago

According to Ben's comment in bug 1458400 comment 0, it's the *thumbData.mDirection at [1] that's asserting. Which is odd, because in the caller function we assert [2]. And these are both MOZ_ASSERTs so it's not a debug vs opt build issue. Somehow mDirection is getting set to Nothing in between [1] and [2]? Maybe a race condition with the updater thread updating the hit-testing tree node while this is running on the controller thread without holding the tree lock? [1] https://searchfox.org/mozilla-central/rev/795575287259a370d00518098472eaa5b362bfa7/gfx/layers/apz/src/AsyncPanZoomController.cpp#1754 [2] https://searchfox.org/mozilla-central/rev/795575287259a370d00518098472eaa5b362bfa7/gfx/layers/apz/src/APZCTreeManager.cpp#1782

Ben Kelly [:bkelly, not reviewing]

Comment 4

•

7 years ago

Note, I only think its the mDirection because thats the only Maybe<> I could find involved in the code. It seems like a lot of inlining was happening, though, which made it hard to be completely sure.

Comment hidden (Intermittent Failures Robot)

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Updated

•

7 years ago

Comment 8

•

7 years ago

I think comment 3 is accurate. We can fix either by excluding scrollbar nodes from being recycled, or ensure that we hold the tree lock while handling input events that fiddle with the scrollbar nodes. The latter is probably more robust, but we'll need to make sure we don't introduce any lock ordering violations.

Comment hidden (Intermittent Failures Robot)

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Updated

•

7 years ago

Comment 11

•

7 years ago

I discussed this with Botond today, and we decided to do the following to solve the ordering violation: - Add a flag on the HTTN that tracks if the node can be recycled. This flag would only be read/mutated while holding the APZ tree lock - The functions that return a HTTN on the controller thread (e.g. GetTargetNode and GetTargetAPZC) would set the flag before returning the HTTN - The caller would clear the flag when it's done with the HTTN As a mechanism to make sure the flag is set/cleared correctly we could introduce a HitTestingTreeNodeRef structure that stores a RefPtr to the HitTestingTreeNode and manages the flag in a RAII fashion. This type would be the thing returned from GetTargetNode/GetTargetAPZC instead of the RefPtr<HitTestingTreeNode> that is currently used. Furthermore we can make things like HTTN::GetScrollbarData private and only accessible via the HTTNRef, to ensure that we only access it while the flag is set and the node is not in danger of getting recycled.

Comment hidden (Intermittent Failures Robot)

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Updated

•

7 years ago

Crash Signature: [@ mozilla::layers::AsyncPanZoomController::ConvertScrollbarPoint(mozilla::gfx::PointTyped<mozilla::ParentLayerPixel, float> const&, mozilla::layers::ScrollbarData const&) const] → [@ mozilla::layers::AsyncPanZoomController::ConvertScrollbarPoint] [@ mozilla::layers::AsyncPanZoomController::ConvertScrollbarPoint(mozilla::gfx::PointTyped<mozilla::ParentLayerPixel, float> const&, mozilla::layers::ScrollbarData const&) const]

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Updated

•

7 years ago

Assignee: nobody → bugmail

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 14

•

7 years ago

https://treeherder.mozilla.org/#/jobs?repo=try&revision=317dfb9c05c2d01337b738bd747d19488782a71d

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Updated

•

7 years ago

Blocks: 1462370, 1463184

Comment 18

•

7 years ago

I couldn't find a way to reliably reproduce this, but we can monitor the effectiveness of the patches via crash-stats, since we are seeing these crashes in the wild. Also, sorry if I went a little overboard with the aProofOfTreeLock business, but I figured it was better than documenting it in comments since this way it's compiler-enforceable.

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 19

•

7 years ago

Update to fix the static analysis compile error: https://treeherder.mozilla.org/#/jobs?repo=try&revision=7c74c0bddbc5ab3a94472d4dab7b571e6da40837

Comment hidden (mozreview-request)

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 23

•

7 years ago

... and with the silly-null-deref fix: https://treeherder.mozilla.org/#/jobs?repo=try&revision=1aa96a507724617e6531b72ba7ab0c7fbc858e46

Comment hidden (mozreview-request)

Botond Ballo [:botond]

Comment 27

•

7 years ago

mozreview-review

Comment on attachment 8981996 [details] Bug 1457590 - Strengthen the contract around recycling HitTestingTreeNodes. https://reviewboard.mozilla.org/r/248048/#review254198

Attachment #8981996 - Flags: review?(botond) → review+

Botond Ballo [:botond]

Comment 28

•

7 years ago

mozreview-review

Comment on attachment 8981997 [details] Bug 1457590 - Add the HitTestingTreeNodeAutoLock class. https://reviewboard.mozilla.org/r/248050/#review254204 ::: gfx/layers/apz/src/HitTestingTreeNode.cpp:456 (Diff revision 3) > + return; > + } > + MOZ_ASSERT(mTreeMutex); > + > + { // scope lock > + RecursiveMutexAutoLock lock(*mTreeMutex); Are we sure a node will never outlive the APZCTM (and thus its mutex)?

Attachment #8981997 - Flags: review?(botond) → review+

Botond Ballo [:botond]

Comment 29

•

7 years ago

mozreview-review

Comment on attachment 8981998 [details] Bug 1457590 - Use the HitTestingTreeNodeAutoLock. https://reviewboard.mozilla.org/r/248052/#review254222 Nice and clean, I like it!

Attachment #8981998 - Flags: review?(botond) → review+

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 30

•

7 years ago

(In reply to Botond Ballo [:botond] from comment #28) > Are we sure a node will never outlive the APZCTM (and thus its mutex)? Fairly sure, yeah. Conceptually the APZCTM owns the nodes, and we don't have any code outside APZ that touches the HitTestingTreeNode. Any operations that hold refptrs to nodes should be going through the APZCTreeManager and have that on the stack, so the mutex should still be valid at that point. If we run into situations where that's not the case I think we'd have bigger problems.

Pulsebot

Comment 31

•

7 years ago

Pushed by kgupta@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/8a1bdb8d8bf5 Strengthen the contract around recycling HitTestingTreeNodes. r=botond https://hg.mozilla.org/integration/autoland/rev/437b901404db Add the HitTestingTreeNodeAutoLock class. r=botond https://hg.mozilla.org/integration/autoland/rev/85d13ab240fa Use the HitTestingTreeNodeAutoLock. r=botond

Andreea Pavel [:apavel]

Comment 32

•

7 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/8a1bdb8d8bf5 https://hg.mozilla.org/mozilla-central/rev/437b901404db https://hg.mozilla.org/mozilla-central/rev/85d13ab240fa

Status: NEW → RESOLVED

Closed: 7 years ago

status-firefox62: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla62

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Updated

•

7 years ago

Blocks: 1465896

Ryan VanderMeulen [:RyanVM]

Updated

•

7 years ago

status-firefox60: --- → wontfix

status-firefox61: --- → wontfix

status-firefox-esr52: --- → unaffected

status-firefox-esr60: --- → wontfix

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 33

•

7 years ago

I discussed with with RyanVM on IRC - there is a real correctness issue affecting older versions, which then got exposed as an assertion failure due to recent changes. And it would be nice to fix that correctness issue, but the patch isn't totally trivial either. I'd prefer to let it bake for a bit, and considering we're close to the end of the current cycle we might as well let it ride the trains.

Comment hidden (Intermittent Failures Robot)

You need to log in before you can comment on or make changes to this bug.