Closed Bug 821292 Opened 7 years ago Closed 7 years ago

Deadlock and crash on shutdown for basic video peer connection test [@ mozilla::MediaPipeline::SendPacket(mozilla::TransportFlow*, void const*, int)]

Categories

(Core :: WebRTC: Signaling, defect, P1, critical)

x86_64
macOS
defect

Tracking

()

RESOLVED FIXED
mozilla20

People

(Reporter: whimboo, Assigned: ekr)

References

Details

(Keywords: crash, hang, regression, Whiteboard: [WebRTC][blocking-webrtc+][qa-])

Crash Data

Attachments

(1 file)

Firefox is showing a deadlock on shutdown running my pc video test on bug 796888. I haven't see this before the patch on bug 792175 landed.

Steps:
1. Apply my patch from bug 796888
2. Run make to update /dom/media code and tests
3. Run ./mach mochitest-plain dom/media/tests/mochitest

On shutdown you will notice a hang for more than a minute followed by a crash.

Apple report for a debug Nightly build:

Thread 19 Crashed:
0   XUL                           	0x00000001028a8497 mozilla::DeadlockDetector<mozilla::BlockingResourceBase::DeadlockDetectorEntry>::CheckAcquisition(mozilla::BlockingResourceBase::DeadlockDetectorEntry const*, mozilla::BlockingResourceBase::DeadlockDetectorEntry const*, mozilla::CallStack const&) + 71 (DeadlockDetector.h:402)
1   XUL                           	0x00000001028a7a12 mozilla::BlockingResourceBase::CheckAcquire(mozilla::CallStack const&) + 114 (BlockingResourceBase.cpp:105)
2   XUL                           	0x00000001028a7f7e mozilla::Mutex::Lock() + 30 (BlockingResourceBase.cpp:228)
3   XUL                           	0x000000010107d01b nsSocketTransportService::Dispatch(nsIRunnable*, unsigned int) + 107 (nsCOMPtr.h:764)
4   XUL                           	0x0000000102e88e24 mozilla::MediaPipeline::SendPacket(mozilla::TransportFlow*, void const*, int) + 372 (MediaPipeline.cpp:263)
5   XUL                           	0x0000000102e8afb3 mozilla::MediaPipeline::PipelineTransport::SendRtcpPacket(void const*, int) + 915 (MediaPipeline.cpp:518)
6   XUL                           	0x0000000102e4b86d non-virtual thunk to mozilla::WebrtcVideoConduit::SendRTCPPacket(int, void const*, int) + 93 (VideoConduit.cpp:597)
7   XUL                           	0x0000000102d840fb webrtc::ViESender::SendRTCPPacket(int, void const*, int) + 155 (vie_sender.cc:187)
8   XUL                           	0x0000000102d95323 webrtc::RTCPSender::SendRTCP(unsigned int, int, unsigned short const*, bool, unsigned long long) + 1251 (rtcp_sender.cc:1947)
9   XUL                           	0x0000000102d8d81d webrtc::ModuleRtpRtcpImpl::Process() + 605 (rtp_rtcp_impl.cc:1971)
10  XUL                           	0x0000000102d1c026 webrtc::ProcessThreadImpl::Process() + 294 (process_thread_impl.cc:183)
11  XUL                           	0x0000000102d1bcb9 webrtc::ProcessThreadImpl::Run(void*) + 9 (process_thread_impl.cc:138)
12  XUL                           	0x0000000102d465f1 webrtc::ThreadPosix::Run() + 161 (thread_posix.cc:361)
13  XUL                           	0x0000000102d46409 StartThread + 9 (thread_posix.cc:72)
14  libsystem_c.dylib             	0x00007fff894a38bf _pthread_start + 335
15  libsystem_c.dylib             	0x00007fff894a6b75 thread_start + 13
I have only tested on OS X given that I do not have a dev environment for other platforms.

This blocks us from landing our first peer connection test. Can we get this bug prioritized appropriately? Thanks.
OS: All → Mac OS X
Hardware: All → x86_64
Whiteboard: [WebRTC][automation-blocked]
This is a known issue.

The best thing to do here is to land the peerconnection tests anyway. Yes, they will crash, but they're still useful for command line testing, and since they are not landing on M-C, that doesn't really matter.
The following patch when combined with the current patch for bug 8201012, runs this test cleanly on my machine.
Comment on attachment 693064 [details] [diff] [review]
Diagnose shutdown crash

Review of attachment 693064 [details] [diff] [review]:
-----------------------------------------------------------------

Can you see if this makes the audio stream receiving side shut down for you?
Attachment #693064 - Flags: feedback?(rjesup)
Attachment #693064 - Flags: feedback?(hskupin)
Comment on attachment 693064 [details] [diff] [review]
Diagnose shutdown crash

This patch seems to depend on other changes which are not in mozilla-central yet. So I can't test it.

patching file media/mtransport/runnable_utils.h
Hunk #2 FAILED at 33
1 out of 2 hunks FAILED -- saving rejects to file media/mtransport/runnable_utils.h.rej
patching file media/webrtc/signaling/src/mediapipeline/MediaPipeline.h
Hunk #2 FAILED at 347
1 out of 2 hunks FAILED -- saving rejects to file media/webrtc/signaling/src/mediapipeline/MediaPipeline.h.rej
Attachment #693064 - Flags: feedback?(hskupin) → feedback-
Per comment 4, it applies on top of the patch for bug 820102 (though ekr typoed the number).
Assignee: nobody → ekr
Priority: -- → P1
Whiteboard: [WebRTC][automation-blocked] → [WebRTC][automation-blocked][blocking-webrtc+]
Sorry that I missed that. I'm building now and will test in a bit.
This is strange. So I have applied both patches now and rebuilt the application. When I run the mochitests without the debugger Firefox always hangs on shutdown and crashes. But whenever I use gdb I don't see this hang/crash and the application exit immediately.
So this actually happens when you run at least two of the peer connection tests. Each one runs fine independently but causes this deadlock when run in sequence.
Comment on attachment 693064 [details] [diff] [review]
Diagnose shutdown crash

Review of attachment 693064 [details] [diff] [review]:
-----------------------------------------------------------------

Seems good, but note the test results from whimboo

::: media/webrtc/signaling/src/mediapipeline/MediaPipeline.h
@@ +353,5 @@
>                                            TrackRate rate,
>                                            TrackTicks offset,
>                                            uint32_t events,
> +                                          const MediaSegment& queued_media) {
> +    }

all the changes to this file seem moot....
Attachment #693064 - Flags: feedback?(rjesup) → feedback+
Can you please supply a new crash report? The one above does not reflect 820102.
whimboo: I'm not following your description of how to reproduce this in c10. Can you please provide a recipe (preferably a command line) and a crash report?
Flags: needinfo?(hskupin)
Applying both this patch and the patch in 822158 appears to remove the crash.

whimboo, can you confirm?
Flags: needinfo?(hskupin)
Patch landed at:

https://hg.mozilla.org/integration/mozilla-inbound/rev/21409a401d75

This should fix the deadlock, but you may still see crashes until bug 822158 is fixed.
https://hg.mozilla.org/mozilla-central/rev/3bb824035131
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla20
Okay I'm confused. Was this backed out or did this land?
Keywords: verifyme
Backed out and then re-landed.
(In reply to Eric Rescorla (:ekr) from comment #13)
> whimboo: I'm not following your description of how to reproduce this in c10.
> Can you please provide a recipe (preferably a command line) and a crash
> report?

Run the in-tree mochitests via './mach mochitest-plain dom/media/tests/mochitest/' and you will notice.

:abr was also able to reproduce via bug 818714. So he might also be able to verify this is fixed while I'm away.
I don't see this on my machine and the above changes should fix it. Feel free to reopen if it appears again.
in-testsuite+ since this already covered by an existing mochitest
Flags: in-testsuite+
Keywords: verifyme
Whiteboard: [WebRTC][automation-blocked][blocking-webrtc+] → [WebRTC][automation-blocked][blocking-webrtc+][qa-]
Landed a trivial followup to remove semicolon after "NS_INLINE_DECL_THREADSAFE_REFCOUNTING(Destructor);"
 https://hg.mozilla.org/integration/mozilla-inbound/rev/836eb526e5fe

(As shown by this MXR search, that macro doesn't take a semicolon at the end:
  https://mxr.mozilla.org/mozilla-central/search?string=NS_INLINE_DECL_THREADSAFE_REFCOUNTING
)
Whiteboard: [WebRTC][automation-blocked][blocking-webrtc+][qa-] → [WebRTC][blocking-webrtc+][qa-]
You need to log in before you can comment on or make changes to this bug.