Closed Bug 1156966 Opened 10 years ago Closed 10 years ago

TSan: data race dom/media/MediaTaskQueue.cpp:252 Runner::Run

Categories

(Core :: Audio/Video, defect)

x86_64
Linux
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla40
Tracking Status
firefox40 --- fixed

People

(Reporter: froydnj, Assigned: mattwoodrow)

References

(Blocks 2 open bugs)

Details

(Whiteboard: [tsan])

Attachments

(3 files)

The attached logfile shows a thread/data race detected by TSan (ThreadSanitizer). * Specific information about this bug The stacks here are a bit of a mess. It would be helpful if we knew *which* ::Run method was being called inside MediaTaskQueue::Runner::Run, but apparently that's not possible... I don't exactly know how there's a race between gfx code and media code, but several people in #content suggested that video bits can do interesting things with layers. Maybe somebody a little more knowledgeable about the media bits can make sense of what's going on. bholley, do you know who the right person to look at this is? * General information about TSan, data races, etc. Typically, races reported by TSan are not false positives, but it is possible that the race is benign. Even in this case though, we should try to come up with a fix unless this would cause unacceptable performance issues. Also note that seemingly benign races can possibly be harmful (also depending on the compiler and the architecture) [1][2]. If the bug cannot be fixed, then this bug should be used to either make a compile-time annotation for blacklisting or add an entry to the runtime blacklist. [1] http://software.intel.com/en-us/blogs/2013/01/06/benign-data-races-what-could-possibly-go-wrong [2] _How to miscompile programs with "benign" data races_: https://www.usenix.org/legacy/events/hotpar11/tech/final_files/Boehm.pdf
Flags: needinfo?(bobbyholley)
Flagging matt to take a quick look at the gfx stack and see if anything jumps out. Putting this in the media backlog to make sure it gets looked at in the coming weeks.
Blocks: MSRI
Flags: needinfo?(bobbyholley) → needinfo?(matt.woodrow)
Is there any way we can get better stacks? I can't see any obvious reason that Layers ipdl would be using the same memory as something in the MediaTaskQueue.
Flags: needinfo?(matt.woodrow)
(In reply to Matt Woodrow (:mattwoodrow) from comment #2) > Is there any way we can get better stacks? What does "better" mean here? The stacks pretty much are what they are. I guess I could try running with lower levels of optimization to see if inlining was reduced or similar. > I can't see any obvious reason that Layers ipdl would be using the same > memory as something in the MediaTaskQueue. OK, thanks.
Yeah, less inlining would help I think. I can't see where MediaTaskQueue::Runner::Run() would call nsTArray::Length (mQueue->mTasks is std::queue), and my line numbers for the autogenerated PLayerTransactionChild don't line up.
This was with changeset 50b95032152c, if that helps any.
Disassembling and examining the code says that the first frame on the stack is really TaskGroupRunnable::Run: http://mxr.mozilla.org/mozilla-central/source/dom/media/TaskDispatcher.h#121 and we're examining mStateChangeTasks from PerThreadTaskGroup. But PerThreadTaskGroup, AFAICS, should be destroyed by the runnable, so I can't see how any other thread would get a chance to access it later...
OK, figured out today that I wasn't running with llvm-symbolizer, which does a much better job of parsing DWARF and showing frames for inlined functions than the default of addr2line. Matt, do these stacks make any more sense? I looked at the layers code, and I don't see exactly how it's touching things that the vsync code has. But the race says that we're modifying the empty array for nsTArray, if I understand correctly, and that seems bad in any event...
Flags: needinfo?(matt.woodrow)
This looks like a false-positive from nsTArray code. The assertion at [1] makes sure we don't write non-zero values into sEmtpyHdr, but we'll still happily write 0. I assume TSan isn't catching the fact that we're writing 0 on top of 0 and ignoring it. It wouldn't be hard to just skip the write if the destination is sEmptyHdr to make TSan happy. [1] http://mxr.mozilla.org/mozilla-central/source/xpcom/glue/nsTArray-inl.h#413
Flags: needinfo?(matt.woodrow)
Assignee: nobody → matt.woodrow
Attachment #8596950 - Flags: review?(nfroyd)
(In reply to Matt Woodrow (:mattwoodrow) from comment #8) I think the TSAN warning makes sense. When you write to a memory location, other threads could see intermediate state when memory barrier is not applied even if the write takes only one instruction.
Comment on attachment 8596950 [details] [diff] [review] Don't write to EmptyHdr Review of attachment 8596950 [details] [diff] [review]: ----------------------------------------------------------------- Thanks for investigating and fixing this up.
Attachment #8596950 - Flags: review?(nfroyd) → review+
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla40
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: