Closed Bug 1282283 Opened 8 years ago Closed 8 years ago

TSan: data race on APZCTreeManager::mRootNode

Categories

(Core :: Panning and Zooming, defect, P3)

45 Branch
defect

Tracking

()

RESOLVED FIXED
mozilla50
Tracking Status
firefox47 --- wontfix
firefox48 --- fixed
firefox49 --- fixed
firefox50 --- fixed

People

(Reporter: jseward, Assigned: kats)

References

Details

(Whiteboard: [gfx-noted])

Attachments

(2 files)

This is not easy to reproduce -- was observed when surfing with a profile with around 100 tabs using a TSan-enabled build. Sorry I don't have better STR right now. It appears that APZCTreeManager::UpdateHitTestingTree sets mRootNode to null at APZCTreeManager.cpp:174, on the compositor thread 174 mRootNode = nullptr; whilst APZCTreeManager::ReceiveInputEvent dereferences it, on the controller thread 698 if (!apzc && mRootNode) { 699 apzc = mRootNode->GetApzc(); 700 } APZCTreeManager::UpdateHitTestingTree does indeed hold APZCTreeManager::mTreeLock but APZCTreeManager::ReceiveInputEvent does not. This strikes me as distinctly ungood ;-) According to APZCTreeManager.h, comment starting at line 542, mTreeLock must be held whenever accessing the tree rooted at mRootNode. So it seems as if ::ReceiveInputEvent is in error, and could conceivably be fixed just by putting "MutexAutoLock lock(mTreeLock);" at the top of APZCTreeManager::ReceiveInputEvent. Does that sound plausible?
Attached file TSan complaint
Yeah, that looks like a bug in ReceiveInputEvent. We should add a treelock, but I would prefer to scope it just around that hunk of code that reads mRootNode. ReceiveInputEvent does a lot of other stuff and I don't want to hold the lock for the whole function.
Component: Graphics: Layers → Panning and Zooming
Blocks: 1211612
Version: Trunk → 45 Branch
I can write the patch, it's pretty straightforward.
Assignee: nobody → bugmail.mozilla
Attachment #8765284 - Flags: review?(jseward)
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #2) > Yeah, that looks like a bug in ReceiveInputEvent. We should add a treelock, > but I would prefer to scope it just around that hunk of code that reads > mRootNode. ReceiveInputEvent does a lot of other stuff and I don't want to > hold the lock for the whole function. Ok, so I see that in the patch you restrict the scope of the lock just to that hunk. I just want to check the following: with the patch, we now hold the lock when testing |mRootNode| and then calling |mRootNode->GetApzc()|, and then drop the lock. Is it possible that the other thread could then get the lock, set |mRootNode| to null, and thereby cause the |azpc| value we just pulled out to become invalid? In other words -- is the locked region you've added, big enough to guarantee that the |azpc| value obtained from |mRootNode| remains valid for the rest of its uses in APZCTreeManager::ReceiveInputEvent case MOUSE_INPUT ?
The apzc is held by a RefPtr so it won't become garbage. It has some internal pointers that may get nulled out if mRootNode is destroyed but even if that happens the code in ReceiveInputEvent should work fine. Access to member variables inside apzc has other locks to avoid races.
Comment on attachment 8765284 [details] Bug 1282283 - Fix data race found by TSan. With that patch, I can no longer reproduce the race. Thanks!
Attachment #8765284 - Flags: review?(jseward) → review+
Priority: -- → P3
Whiteboard: [gfx-noted]
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla50
Comment on attachment 8765284 [details] Bug 1282283 - Fix data race found by TSan. Approval Request Comment [Feature/regressing bug #]: bug 1211612 [User impact if declined]: Possible race condition that results in a crash while dragging. We haven't seen any reports of this in the wild, but it's a theoretical possibility [Describe test coverage new/current, TreeHerder]: not easily testable [Risks and why]: very low risk patch, just adds a bit of locking to avoid the race condition [String/UUID change made/needed]: none
Attachment #8765284 - Flags: approval-mozilla-beta?
Attachment #8765284 - Flags: approval-mozilla-aurora?
Comment on attachment 8765284 [details] Bug 1282283 - Fix data race found by TSan. Fix a race, taking it. Should be in 48 beta 5
Attachment #8765284 - Flags: approval-mozilla-beta?
Attachment #8765284 - Flags: approval-mozilla-beta+
Attachment #8765284 - Flags: approval-mozilla-aurora?
Attachment #8765284 - Flags: approval-mozilla-aurora+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: