Closed Bug 1273854 Opened 4 years ago Closed 4 years ago

TextureChild's mutex makes TSan unhappy

Categories

(Core :: Graphics: Layers, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla49
Tracking Status
firefox49 --- fixed

People

(Reporter: nical, Assigned: nical)

Details

Attachments

(3 files)

There's a mutex in TextureChild that is held while TextureClient is locked. different textures can be locked in different orders and on the producer thread only, these locks can overlap. On any other thread there is no overlap so this locking pattern doesn't cause deadlocks, but TSan can't really guess that.

Since the vast majority of the tsan alerts seem to be happening when painting content layers, with which the access to textures is exclusive to the main thread, we can silence them by just not holding the texture's lock in these cases, specifically.

In the case of video frames and the few other content producers that are off the main thread, we do have to hold the lock, so it won't silence potential TSan alerts there. We'll see if the volume is problematic and if we need another workaround for that later.
Assignee: nobody → nical.bugzilla
This would silence most of the warnings. It relies on the fact that all non-imagebridge textures currenlty never leave the main thread so they don't need a mutex.
(In reply to Nicolas Silva [:nical] from comment #2)
> Created attachment 8755789 [details] [diff] [review]
> Only use the locks for ImageBridge textures

After some testing, this seems good to me.  I would like it to land if
possible, if only because the flood of warnings I get from TSan without
the patch applied make other race-removal work on Gecko harder to do.
Attachment #8755789 - Flags: feedback+
Attachment #8757336 - Flags: review?(jseward)
Attachment #8757336 - Flags: review?(bas)
(In reply to Julian Seward [:jseward] from comment #3)
> (In reply to Nicolas Silva [:nical] from comment #2)
> > Created attachment 8755789 [details] [diff] [review]
> > Only use the locks for ImageBridge textures
> 
> After some testing, this seems good to me.  I would like it to land if
> possible, if only because the flood of warnings I get from TSan without
> the patch applied make other race-removal work on Gecko harder to do.

Can we also somehow tell it that this type of pattern is expected for these particular mutices. It really isn't that odd of a thing to do.
Comment on attachment 8757336 [details] [diff] [review]
Document the dependent locking pattern.

Review of attachment 8757336 [details] [diff] [review]:
-----------------------------------------------------------------

Seems a little verbose to me. But I also don't mind.
Attachment #8757336 - Flags: review?(bas) → review+
Comment on attachment 8755789 [details] [diff] [review]
Only use the locks for ImageBridge textures

Review of attachment 8755789 [details] [diff] [review]:
-----------------------------------------------------------------

I don't like this, but while we look for a way to silence TSan about this I guess it's okay.
Attachment #8755789 - Flags: review+
Comment on attachment 8757336 [details] [diff] [review]
Document the dependent locking pattern.

Review of attachment 8757336 [details] [diff] [review]:
-----------------------------------------------------------------

+  // Lock tile B
+  // Lock tile A
+  // Lock tile D
+  // Render tiles A, B and C
+  // Unlock tile B
+  // Unlock tile A
+  // Unlock tile D

Do you really mean "Render tiles A, B and C" there?  Seems like a copy-n-paste
from the previous block.
Attachment #8757336 - Flags: review?(jseward) → review+
(In reply to Bas Schouten (:bas.schouten) from comment #7)
> I don't like this, but while we look for a way to silence TSan about this I
> guess it's okay.

At least, if it starts racing in the future, due to some change, and I'm still
doing TSan runs of Gecko, we will notice.

----

If I understand correctly from Nicolas' comments, what we have is:

* a fixed set of tiles, eg, A B C D E F, each with its own lock

* various random threads that fill or empty a tile, eg
    A.lock();
    A.fill();
    A.unlock();

* a rendering thread that locks some subset of the tiles, renders into
  them, and unlocks them:
    B.lock(); A.lock(); D.lock();
    B.render(); A.render(); D.render();
    B.unlock(); A.unlock(); D.unlock();

The question that comes to mind is: why does the rendering thread need
to lock all of the tiles in the subset before dealing with any of them?
If instead it could do this

   B.lock(); B.render(); B.unlock();
   // the same for A
   // the same for D

then TSan would stop complaining.  And the number of lock/unlock
operations is the same, so no extra performance overhead.
(In reply to Julian Seward [:jseward] from comment #8)
> Do you really mean "Render tiles A, B and C" there?  Seems like a
> copy-n-paste
> from the previous block.

Ah, yes indeed, my bad.

> The question that comes to mind is: why does the rendering thread need
> to lock all of the tiles in the subset before dealing with any of them?
> If instead it could do this
> 
>    B.lock(); B.render(); B.unlock();
>    // the same for A
>    // the same for D
> 
> then TSan would stop complaining.  And the number of lock/unlock
> operations is the same, so no extra performance overhead.

The reason is that the memory management (locking and unlocking texture memory) and the drawing of the set of tiles are operated by very distinct abstraction layers. The drawing layer (Moz2D) has no notion of the memory being potentially mapped/unmapped as textures are locked/unlocked. The rendering itself is also not quite as simple as drawing tiles one after the other (that is a gross simplification on my end as the comment is already too verbose). We get better performance by going through each tile for each drawing command than going through each drawing command for each tile.

it actually looks more like:
tileA.fillRect(...)
tileB.fillRect(...)
tileC.fillRect(...)
tileA.strokePath(...)
tileB.strokePath(...)
tileC.strokePath(...)
etc.

Locking and unlocking also has some overhead so doing it for each drawing command and for each time would be too costly.
(In reply to Nicolas Silva [:nical] from comment #10)
> Locking and unlocking also has some overhead so doing it for each drawing
> command and for each time would be too costly.

s/time/tile
(In reply to Nicolas Silva [:nical] from comment #10)
> The reason is that the memory management [...]

Thanks for the further explanation.

One other idea I had is to take the locks in strictly increasing (or decreasing)
order of memory address.  Then there can be no lock acquisition cycles, so TSan
would stop complaining, and we would not lose any thread-safety.  Is this feasible?

I had a look at the first stack traces in comment #1 to see if I could figure out
where the tiles are iterated over, but I only I only see a loop at
mozilla::layers::ClientContainerLayer::RenderLayer() (ClientContainerLayer.h:65).
(In reply to Julian Seward [:jseward] from comment #12)
> (In reply to Nicolas Silva [:nical] from comment #10)
> > The reason is that the memory management [...]
> 
> Thanks for the further explanation.
> 
> One other idea I had is to take the locks in strictly increasing (or
> decreasing)
> order of memory address.  Then there can be no lock acquisition cycles, so
> TSan
> would stop complaining, and we would not lose any thread-safety.  Is this
> feasible?

It is certainly feasible. It's more a matter of how much complexity we are willing to add to our code in order for it to be understood by a given static analysis pass without adding exceptions.
Doing this for tiling would not be sufficient, we'd also need something like this in the various places where we copy a texture into another. It's not immediately clear to me that it is worth the added complexity, but we should try and decide. In principle I am not opposed to something like this if it doesn't make things even harder to understand.

> 
> I had a look at the first stack traces in comment #1 to see if I could
> figure out
> where the tiles are iterated over, but I only I only see a loop at
> mozilla::layers::ClientContainerLayer::RenderLayer()
> (ClientContainerLayer.h:65).

Among the places where iterate over the tiles to draw into them, there are most of the methods in gfx/2d/DrawTargetTiled.cpp.
There is also an experimental parallel tile painting backend where tiles must all be locked on the main thread before drawing jobs are scheduled in parallel, joined, and everything is unlocked again on the main thread (This backend hasn't landed yet).
Pushed by nsilva@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/5da8ebecdfb7
Document the locking patterns in TextureChild that don't play well with tsan. r=bas, jseward
https://hg.mozilla.org/integration/mozilla-inbound/rev/7dacef36b2dc
Temporarily disable the TextureChild's mutex if not using ImageBridge. r=bas
https://hg.mozilla.org/mozilla-central/rev/5da8ebecdfb7
https://hg.mozilla.org/mozilla-central/rev/7dacef36b2dc
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla49
You need to log in before you can comment on or make changes to this bug.