Closed
Bug 1156966
Opened 10 years ago
Closed 10 years ago
TSan: data race dom/media/MediaTaskQueue.cpp:252 Runner::Run
Categories
(Core :: Audio/Video, defect)
Tracking
()
RESOLVED
FIXED
mozilla40
Tracking | Status | |
---|---|---|
firefox40 | --- | fixed |
People
(Reporter: froydnj, Assigned: mattwoodrow)
References
(Blocks 2 open bugs)
Details
(Whiteboard: [tsan])
Attachments
(3 files)
The attached logfile shows a thread/data race detected by TSan (ThreadSanitizer).
* Specific information about this bug
The stacks here are a bit of a mess. It would be helpful if we knew *which* ::Run method was being called inside MediaTaskQueue::Runner::Run, but apparently that's not possible...
I don't exactly know how there's a race between gfx code and media code, but several people in #content suggested that video bits can do interesting things with layers. Maybe somebody a little more knowledgeable about the media bits can make sense of what's going on.
bholley, do you know who the right person to look at this is?
* General information about TSan, data races, etc.
Typically, races reported by TSan are not false positives, but it is possible that the race is benign. Even in this case though, we should try to come up with a fix unless this would cause unacceptable performance issues. Also note that seemingly benign races can possibly be harmful (also depending on the compiler and the architecture) [1][2].
If the bug cannot be fixed, then this bug should be used to either make a compile-time annotation for blacklisting or add an entry to the runtime blacklist.
[1] http://software.intel.com/en-us/blogs/2013/01/06/benign-data-races-what-could-possibly-go-wrong
[2] _How to miscompile programs with "benign" data races_: https://www.usenix.org/legacy/events/hotpar11/tech/final_files/Boehm.pdf
Flags: needinfo?(bobbyholley)
Comment 1•10 years ago
|
||
Flagging matt to take a quick look at the gfx stack and see if anything jumps out. Putting this in the media backlog to make sure it gets looked at in the coming weeks.
Blocks: MSRI
Flags: needinfo?(bobbyholley) → needinfo?(matt.woodrow)
Assignee | ||
Comment 2•10 years ago
|
||
Is there any way we can get better stacks?
I can't see any obvious reason that Layers ipdl would be using the same memory as something in the MediaTaskQueue.
Flags: needinfo?(matt.woodrow)
Reporter | ||
Comment 3•10 years ago
|
||
(In reply to Matt Woodrow (:mattwoodrow) from comment #2)
> Is there any way we can get better stacks?
What does "better" mean here?
The stacks pretty much are what they are. I guess I could try running with lower levels of optimization to see if inlining was reduced or similar.
> I can't see any obvious reason that Layers ipdl would be using the same
> memory as something in the MediaTaskQueue.
OK, thanks.
Assignee | ||
Comment 4•10 years ago
|
||
Yeah, less inlining would help I think.
I can't see where MediaTaskQueue::Runner::Run() would call nsTArray::Length (mQueue->mTasks is std::queue), and my line numbers for the autogenerated PLayerTransactionChild don't line up.
Reporter | ||
Comment 5•10 years ago
|
||
This was with changeset 50b95032152c, if that helps any.
Reporter | ||
Comment 6•10 years ago
|
||
Disassembling and examining the code says that the first frame on the stack is really TaskGroupRunnable::Run:
http://mxr.mozilla.org/mozilla-central/source/dom/media/TaskDispatcher.h#121
and we're examining mStateChangeTasks from PerThreadTaskGroup.
But PerThreadTaskGroup, AFAICS, should be destroyed by the runnable, so I can't see how any other thread would get a chance to access it later...
Reporter | ||
Comment 7•10 years ago
|
||
OK, figured out today that I wasn't running with llvm-symbolizer, which does a much better job of parsing DWARF and showing frames for inlined functions than the default of addr2line.
Matt, do these stacks make any more sense? I looked at the layers code, and I don't see exactly how it's touching things that the vsync code has. But the race says that we're modifying the empty array for nsTArray, if I understand correctly, and that seems bad in any event...
Flags: needinfo?(matt.woodrow)
Assignee | ||
Comment 8•10 years ago
|
||
This looks like a false-positive from nsTArray code.
The assertion at [1] makes sure we don't write non-zero values into sEmtpyHdr, but we'll still happily write 0.
I assume TSan isn't catching the fact that we're writing 0 on top of 0 and ignoring it.
It wouldn't be hard to just skip the write if the destination is sEmptyHdr to make TSan happy.
[1] http://mxr.mozilla.org/mozilla-central/source/xpcom/glue/nsTArray-inl.h#413
Flags: needinfo?(matt.woodrow)
Assignee | ||
Comment 9•10 years ago
|
||
Assignee: nobody → matt.woodrow
Attachment #8596950 -
Flags: review?(nfroyd)
Comment 10•10 years ago
|
||
(In reply to Matt Woodrow (:mattwoodrow) from comment #8)
I think the TSAN warning makes sense.
When you write to a memory location, other threads could see intermediate state when memory barrier is not applied even if the write takes only one instruction.
Reporter | ||
Comment 11•10 years ago
|
||
Comment on attachment 8596950 [details] [diff] [review]
Don't write to EmptyHdr
Review of attachment 8596950 [details] [diff] [review]:
-----------------------------------------------------------------
Thanks for investigating and fixing this up.
Attachment #8596950 -
Flags: review?(nfroyd) → review+
Comment 12•10 years ago
|
||
Comment 13•10 years ago
|
||
Status: NEW → RESOLVED
Closed: 10 years ago
status-firefox40:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla40
You need to log in
before you can comment on or make changes to this bug.
Description
•