TSan: data race widget/gtk/nsWindow.cpp:6100 nsWindow::EndRemoteDrawingInRegion

RESOLVED FIXED in Firefox 42

Status

()

Core
Widget: Gtk
RESOLVED FIXED
3 years ago
3 years ago

People

(Reporter: froydnj, Assigned: lsalzman)

Tracking

(Blocks: 1 bug, {regression})

unspecified
mozilla42
x86_64
Linux
regression
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox42 fixed)

Details

(Whiteboard: [tsan])

Attachments

(2 attachments, 1 obsolete attachment)

(Reporter)

Description

3 years ago
Created attachment 8643139 [details]
TSan stack trace

The attached logfile shows a thread/data race detected by TSan (ThreadSanitizer).

* Specific information about this bug

I've seen this race a couple of times, and each time TSan has been unable to get any information on the writing side of the race, which is unfortunate.  Fortunately, the line information is pretty decent, and the variable that's being touched in mIsFullyObscured.

It looks like the only place in nsWindow that we write to mIsFullyObscured is in OnVisibilityNotifyEvent, which I assume is happening on the main thread.  I guess it's possible that the offending write is in the constructor, but that seems unlikely.

So the problem is that the compositor is calling back into nsWindow on the compositor thread, and we're touching things that are also accessed on the main thread...but without any locking.

Karl, does the above sound like a reasonable diagnosis of the problem?  What can we do to fix this?

* General information about TSan, data races, etc.

Typically, races reported by TSan are not false positives, but it is possible that the race is benign. Even in this case though, we should try to come up with a fix unless this would cause unacceptable performance issues. Also note that seemingly benign races can possibly be harmful (also depending on the compiler and the architecture) [1][2].

If the bug cannot be fixed, then this bug should be used to either make a compile-time annotation for blacklisting or add an entry to the runtime blacklist.

[1] http://software.intel.com/en-us/blogs/2013/01/06/benign-data-races-what-could-possibly-go-wrong
[2] _How to miscompile programs with "benign" data races_: https://www.usenix.org/legacy/events/hotpar11/tech/final_files/Boehm.pdf
Flags: needinfo?(karlt)
(In reply to Nathan Froyd [:froydnj][:nfroyd] from comment #0)
> Fortunately, the line information is pretty decent, and the variable that's
> being touched in mIsFullyObscured.

How do you know it is that variable?

I assume it is the line of code with all of these?

(!mGdkWindow || mIsFullyObscured || !mHasMappedToplevel || mIsDestroyed ||
      !mShmImage)

Did you separate them onto different statements to test?

> It looks like the only place in nsWindow that we write to mIsFullyObscured
> is in OnVisibilityNotifyEvent, which I assume is happening on the main
> thread.  I guess it's possible that the offending write is in the
> constructor, but that seems unlikely.

I guess when the window is shown, uncovered, or unminimized, etc.

> 
> So the problem is that the compositor is calling back into nsWindow on the
> compositor thread, and we're touching things that are also accessed on the
> main thread...but without any locking.
> 
> Karl, does the above sound like a reasonable diagnosis of the problem?

Yes.  Nice analysis, thanks.

> What can we do to fix this?

I don't know what is done to ensure the safety of access to any of those
variables or mThebesSurface.

mIsFullyObscured and I guess !mHasMappedToplevel also is an optimization and
can be dropped.

mIsDestroyed is unnecessary because !mGdkWindow covers that.

mGdkWindow is may be modified on the main thread during destruction of the
window or its parents.  The GdkWindow object is a main-thread-only object, and
so can only be used off-main-thread if the main thread is blocked.

Putting mShmImage first would probably work around the problem because that
should not be used in normal configurations.

However it looks like Lee introduced this usage in
https://hg.mozilla.org/mozilla-central/rev/8538bc4d2cbd
so better check with him what this is for and how it should work.
Blocks: 1127752
Flags: needinfo?(karlt) → needinfo?(lsalzman)
Keywords: regression
(Reporter)

Comment 2

3 years ago
(In reply to Karl Tomlinson (ni?:karlt) from comment #1)
> (In reply to Nathan Froyd [:froydnj][:nfroyd] from comment #0)
> > Fortunately, the line information is pretty decent, and the variable that's
> > being touched in mIsFullyObscured.
> 
> How do you know it is that variable?
> 
> I assume it is the line of code with all of these?
> 
> (!mGdkWindow || mIsFullyObscured || !mHasMappedToplevel || mIsDestroyed ||
>       !mShmImage)
> 
> Did you separate them onto different statements to test?

The (edited for clarity) entry for the stack trace says:

nsWindow::EndRemoteDrawingInRegion widget/gtk/nsWindow.cpp:6100:22

and the "22" at the end there refers to column 22, which is mIsFullyObscured.
(Assignee)

Updated

3 years ago
Assignee: nobody → lsalzman
Status: NEW → ASSIGNED
Flags: needinfo?(lsalzman)
(Assignee)

Comment 3

3 years ago
Created attachment 8643902 [details] [diff] [review]
fix race condition in gtk window EndRemoteDrawingInRegion

This patch just gets rid of the accesses to the variables that might be messed with by other events and assumes that since we already started drawing that we should just finish up drawing.

Try run: https://treeherder.mozilla.org/#/jobs?repo=try&revision=3f1b87731239
Attachment #8643902 - Flags: review?(nical.bugzilla)
Comment on attachment 8643902 [details] [diff] [review]
fix race condition in gtk window EndRemoteDrawingInRegion

Review of attachment 8643902 [details] [diff] [review]:
-----------------------------------------------------------------

::: widget/gtk/nsWindow.cpp
@@ +6207,5 @@
>  nsWindow::EndRemoteDrawingInRegion(DrawTarget* aDrawTarget, nsIntRegion& aInvalidRegion)
>  {
>  #ifdef MOZ_X11
>  #  ifdef MOZ_HAVE_SHMIMAGE
> +  if (!mGdkWindow || !mShmImage)

Please use this patch as an occasion to brace the if.
Attachment #8643902 - Flags: review?(nical.bugzilla) → review+
(Assignee)

Comment 5

3 years ago
Created attachment 8644386 [details] [diff] [review]
fix race condition in gtk window EndRemoteDrawingInRegion

Fixed bracketing of if.
Attachment #8643902 - Attachment is obsolete: true
Attachment #8644386 - Flags: review+
(Assignee)

Updated

3 years ago
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/03c0f6885ea1
Status: ASSIGNED → RESOLVED
Last Resolved: 3 years ago
status-firefox42: --- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla42
You need to log in before you can comment on or make changes to this bug.