Closed Bug 845523 Opened 8 years ago Closed 8 years ago

TSan: Thread data race in sipcc::PeerConnectionCtx::onDeviceEvent() vs. sipcc::PeerConnectionCtx::ChangeSipccState()

Categories

(Core :: WebRTC, defect, P5)

x86_64
Linux
defect

Tracking

()

RESOLVED FIXED
mozilla22

People

(Reporter: posidron, Assigned: abr)

References

()

Details

(Keywords: sec-want, Whiteboard: [tsan][tsan-test-blocker][webrtc][blocking-webrtc-][qa-])

Attachments

(2 files)

Attached file trace
During initial tests with ThreadSanitizer (LLVM version), we get a data race reported as described in the attached log. Trace was created on mozilla-central with changeset 122820:c233837cce08.

According to the TSan devs, most of the reported traces should be real data races, even though they can be "benign". We need to determine if the race can/should be fixed, or put on the ignore list. Even for benign races, TSan devs suggest to fix them (second priority), as they can also cause problems [1].

[1] http://software.intel.com/en-us/blogs/2013/01/06/benign-data-races-what-could-possibly-go-wrong
See Also: → 806825
Perhaps another way the sipcc startup code can go wrong, unless this is a "benign" race.

-> adam who's looked at this a lot
Assignee: nobody → adam
Whiteboard: [tsan][tsan-test-blocker] → [tsan][tsan-test-blocker][webrtc][blocking-webrtc+]
Priority: -- → P2
This is another race related to Bug 839647. It looks like we need to move the signaling of the ccAppReadyToStart condition up a level, from CallControlManagerImpl::startSDPMode() to PeerConnectionCtx::Initialize() (immediately before it returns).

This should be fixed, but I would consider it ultralow priority at the moment. The CCApp thread will be doing a lot of stuff before it gets around to reading this variable, giving the main thread a very wide berth for its unsynchronized write (which happens immediately after the CCApp thread is given permission to start processing messages). It's a race that can't be lost.

This should be coupled with a removal of the non-atomic hack to PeerConnectionCtx::onDeviceEvent that moves state into a stack variable "to avoid a data race": once the condition signaling is moved, the race is completely eliminated.
Priority: P2 → P5
comment 2 sends the impression that this is blocking- now
Whiteboard: [tsan][tsan-test-blocker][webrtc][blocking-webrtc+] → [tsan][tsan-test-blocker][webrtc][blocking-webrtc-]
I'm going to go ahead and fix this now, since it'll clear up some uncertainty I have regarding the behavior I see in the logs for Bug 839677.
Comment on attachment 720153 [details] [diff] [review]
Move condition signaling into PeerConnectionCtx init

Ethan: This is pretty much the same fix you reviewed for Bug 839647, but with the signaling of the condition moved up one frame.
Attachment #720153 - Flags: review?(ethanhugg)
Here's a try (along with a patch for bug 846942):

https://tbpl.mozilla.org/?tree=Try&rev=a44a035255ff
Re-try after compile fixes for Linux & Windows:
https://tbpl.mozilla.org/?tree=Try&rev=5157500f9c63
Comment on attachment 720153 [details] [diff] [review]
Move condition signaling into PeerConnectionCtx init

Review of attachment 720153 [details] [diff] [review]:
-----------------------------------------------------------------

lgtm
Attachment #720153 - Flags: review?(ethanhugg) → review+
https://hg.mozilla.org/mozilla-central/rev/6e4fd88df150
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla22
Whiteboard: [tsan][tsan-test-blocker][webrtc][blocking-webrtc-] → [tsan][tsan-test-blocker][webrtc][blocking-webrtc-][qa-]
You need to log in before you can comment on or make changes to this bug.