Closed Bug 1282283 Opened 3 years ago Closed 3 years ago

TSan: data race on APZCTreeManager::mRootNode

Categories

(Core :: Panning and Zooming, defect, P3)

45 Branch
defect

Tracking

()

RESOLVED FIXED
mozilla50
Tracking Status
firefox47 --- wontfix
firefox48 --- fixed
firefox49 --- fixed
firefox50 --- fixed

People

(Reporter: jseward, Assigned: kats)

References

Details

(Whiteboard: [gfx-noted])

Attachments

(2 files)

This is not easy to reproduce -- was observed when surfing with a
profile with around 100 tabs using a TSan-enabled build.  Sorry I
don't have better STR right now.

It appears that APZCTreeManager::UpdateHitTestingTree sets mRootNode to null
at APZCTreeManager.cpp:174, on the compositor thread

174  mRootNode = nullptr;

whilst APZCTreeManager::ReceiveInputEvent dereferences it, on the
controller thread

698      if (!apzc && mRootNode) {
699        apzc = mRootNode->GetApzc();
700      }

APZCTreeManager::UpdateHitTestingTree does indeed hold
APZCTreeManager::mTreeLock but APZCTreeManager::ReceiveInputEvent does
not.

This strikes me as distinctly ungood ;-)

According to APZCTreeManager.h, comment starting at line 542,
mTreeLock must be held whenever accessing the tree rooted at
mRootNode.  So it seems as if ::ReceiveInputEvent is in error, and
could conceivably be fixed just by putting "MutexAutoLock
lock(mTreeLock);" at the top of APZCTreeManager::ReceiveInputEvent.
Does that sound plausible?
Attached file TSan complaint
Yeah, that looks like a bug in ReceiveInputEvent. We should add a treelock, but I would prefer to scope it just around that hunk of code that reads mRootNode. ReceiveInputEvent does a lot of other stuff and I don't want to hold the lock for the whole function.
Component: Graphics: Layers → Panning and Zooming
I can write the patch, it's pretty straightforward.
Assignee: nobody → bugmail.mozilla
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #2)
> Yeah, that looks like a bug in ReceiveInputEvent. We should add a treelock,
> but I would prefer to scope it just around that hunk of code that reads
> mRootNode. ReceiveInputEvent does a lot of other stuff and I don't want to
> hold the lock for the whole function.

Ok, so I see that in the patch you restrict the scope of the lock just to
that hunk.  I just want to check the following: with the patch, we now hold
the lock when testing |mRootNode| and then calling |mRootNode->GetApzc()|,
and then drop the lock.  Is it possible that the other thread could then
get the lock, set |mRootNode| to null, and thereby cause the |azpc|
value we just pulled out to become invalid?

In other words -- is the locked region you've added, big enough to guarantee
that the |azpc| value obtained from |mRootNode| remains valid for the rest
of its uses in APZCTreeManager::ReceiveInputEvent case MOUSE_INPUT ?
The apzc is held by a RefPtr so it won't become garbage. It has some internal pointers that may get nulled out if mRootNode is destroyed but even if that happens the code in ReceiveInputEvent should work fine. Access to member variables inside apzc has other locks to avoid races.
Comment on attachment 8765284 [details]
Bug 1282283 - Fix data race found by TSan.

With that patch, I can no longer reproduce the race.  Thanks!
Attachment #8765284 - Flags: review?(jseward) → review+
Pushed by kgupta@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/fb7b5a8e6124
Fix data race found by TSan. r=sewardj
Whiteboard: [gfx-noted]
https://hg.mozilla.org/mozilla-central/rev/fb7b5a8e6124
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla50
Comment on attachment 8765284 [details]
Bug 1282283 - Fix data race found by TSan.

Approval Request Comment
[Feature/regressing bug #]: bug 1211612
[User impact if declined]: Possible race condition that results in a crash while dragging. We haven't seen any reports of this in the wild, but it's a theoretical possibility
[Describe test coverage new/current, TreeHerder]: not easily testable
[Risks and why]: very low risk patch, just adds a bit of locking to avoid the race condition
[String/UUID change made/needed]: none
Attachment #8765284 - Flags: approval-mozilla-beta?
Attachment #8765284 - Flags: approval-mozilla-aurora?
Comment on attachment 8765284 [details]
Bug 1282283 - Fix data race found by TSan.

Fix a race, taking it.
Should be in 48 beta 5
Attachment #8765284 - Flags: approval-mozilla-beta?
Attachment #8765284 - Flags: approval-mozilla-beta+
Attachment #8765284 - Flags: approval-mozilla-aurora?
Attachment #8765284 - Flags: approval-mozilla-aurora+
You need to log in before you can comment on or make changes to this bug.