Closed
Bug 1406135
Opened 7 years ago
Closed 7 years ago
Can't subscribe to large amount of webRTC streams (regression)
Categories
(Core :: WebRTC, defect, P2)
Tracking
()
RESOLVED
FIXED
mozilla59
Tracking | Status | |
---|---|---|
thunderbird_esr52 | --- | unaffected |
firefox-esr52 | --- | unaffected |
firefox56 | --- | wontfix |
firefox57 | --- | wontfix |
firefox58 | + | wontfix |
firefox59 | --- | fixed |
People
(Reporter: mun.dave, Assigned: padenot)
References
Details
(Keywords: regression)
Attachments
(2 files)
User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36
Steps to reproduce:
I tried to use the OpenTok JS SDK to receive 12 webRTC video+audio streams on Win 10 FF 56.
Actual results:
Typically, if I try to subscribe to 12 streams, usually ~8 render video onto the DOM, and the other do not.
I have attached a screenshot repro-ing on a computer with a wired connection that I do not expect to have a CPU or network bottleneck.
Here is a YT link (3rd party repro-ing on a different machine) recording the behavior discrepancy between Win FF and Win Chrome: https://youtu.be/-voCghEJfdQ?t=92
Expected results:
I would expect all 12 to receive video.
I know you would like me to reproduce with a SDK agnostic + minimal footprint, but I am having trouble finding another free service that allows more than 8 concurrent streams.
Comment 2•7 years ago
|
||
Here's a plain JS repro of what is probably the same issue:
https://output.jsbin.com/dezobe
Like the one with OpenTok, this only reproduces on Firefox Windows (tested Chrome & FF on macOS & Windows).
Comment 3•7 years ago
|
||
Improved version of jsbin :-)
https://output.jsbin.com/sizekiy
Comment 4•7 years ago
|
||
Ok final update for now:
https://output.jsbin.com/nubuhap
This adds the option to add a delay between each stream / peer connection. Based on our customer bug report I thought that FF might work if the streams were fed slowly but I still got the issue with a 3000ms delay between each one. It seems .play() never completes for the 6th stream. This is both FF nightly and stable, on Windows.
Updated•7 years ago
|
Component: Untriaged → WebRTC
Product: Firefox → Core
Comment hidden (obsolete) |
Comment hidden (obsolete) |
Comment 8•7 years ago
|
||
(In reply to Michael Froman [:mjf] from comment #5)
> Nils, this looks like Bug 1402510?
Indeed this appears to be the same issue.
Interestingly the test case works fine for me with 56 and 58 on Mac.
But it reproduces immediately on Win 10 with Fx 56 and 58.
I'll try to investigate...
Flags: needinfo?(drno)
Updated•7 years ago
|
Assignee: nobody → drno
Status: UNCONFIRMED → NEW
Rank: 25
Ever confirmed: true
Priority: -- → P2
Comment 9•7 years ago
|
||
Doesn't repro on Linux with Fx 56 either.
Comment 10•7 years ago
|
||
So here is an interesting observation: when I take out the 'audio: true' from the gUM request in the test page and use mode 'parallel' it works for me on Windows 10.
For some unknown reason if I take out 'audio: true' the serial mode stops working.
Andrew can you please confirm that making the gUM request video if it works for you then as well?
Flags: needinfo?(voltrevo)
Comment 11•7 years ago
|
||
When looking at about:webrtc it's pretty clear that the connection gets established, but the sender does not actually start sending any RTP packets. Since there is only a single gUM request in the test code, I would guess this is some problem with the code replicating an existing track to other streams (MSG)?!
Paul: does the description of being able to request a maximum of 6 audio streams ring a bell? Any thoughts on this?
Flags: needinfo?(padenot)
Assignee | ||
Comment 12•7 years ago
|
||
(In reply to Nils Ohlmeier [:drno] from comment #11)
> When looking at about:webrtc it's pretty clear that the connection gets
> established, but the sender does not actually start sending any RTP packets.
> Since there is only a single gUM request in the test code, I would guess
> this is some problem with the code replicating an existing track to other
> streams (MSG)?!
>
> Paul: does the description of being able to request a maximum of 6 audio
> streams ring a bell? Any thoughts on this?
I would believe it can happen with some hardware/driver combination, but I don't have equipment to reproduce this. That said, if we fail opening a system-level audio stream, we should fall-back to a normal thread and re-try periodically to open a new audio stream, so we should be able to send packets (since we fake a clock and keep things rolling).
Flags: needinfo?(padenot)
Comment 13•7 years ago
|
||
(In reply to Nils Ohlmeier [:drno] from comment #10)
> So here is an interesting observation: when I take out the 'audio: true'
> from the gUM request in the test page and use mode 'parallel' it works for
> me on Windows 10.
>
> For some unknown reason if I take out 'audio: true' the serial mode stops
> working.
>
> Andrew can you please confirm that making the gUM request video if it works
> for you then as well?
Hmm for me it works without audio for both parallel and serial.
Flags: needinfo?(voltrevo)
Comment 14•7 years ago
|
||
@jib: what kind of logs do I need to gather to allow someone to analyze this?
Flags: needinfo?(jib)
Comment 15•7 years ago
|
||
Dunno, depends on whether we suspect the problem is network or local. If it's audio, maybe
MOZ_LOG=timestamp:1,MediaManager:5,MediaStreamGraph:4,TrackUnionStream:5,webrtc_trace:65535 ?
Paul, any others?
I'd try to reduce the JS test case to see if it repros without the peer connection piece.
Do we suspect a regression? If so, a regression range might help.
Flags: needinfo?(jib) → needinfo?(padenot)
Comment 16•7 years ago
|
||
I'm pretty sure it's not a network problem. ICE established the PeerConnection just fine. But we appear to never start sending any media. I'll try yo get the logs tomorrow.
Assignee | ||
Comment 17•7 years ago
|
||
(In reply to Jan-Ivar Bruaroey [:jib] (needinfo? me) from comment #15)
> MOZ_LOG=timestamp:1,MediaManager:5,MediaStreamGraph:4,TrackUnionStream:5,
> webrtc_trace:65535 ?
>
> Paul, any others?
MOZ_LOG=timestamp:1,MediaManager:5,MediaStreamGraph:4,TrackUnionStream:5,webrtc_trace:65535,cubeb:5
Flags: needinfo?(padenot)
Comment 18•7 years ago
|
||
While testing with this jsbin (50 streams in parallel) I got a shutdownhang in latest Nightly, [1] (actually I got two, but they seem to be dupes, see [2]).
In the jsbin I consistently see video frames for 26 streams, while the connection is established for all streams (so 50).
Interestingly there are 26 threads in the crash report matching "webrtc::AudioDeviceLinuxPulse::RecThreadProcess". The weird part is that full duplex is on so audio capture should be handled by cubeb.
Paul, do you know more here on what is expected?
[1] https://crash-stats.mozilla.com/report/index/e57d6d23-8b59-4e6d-9c6d-6eebd0171024#allthreads
[2] https://crash-stats.mozilla.com/report/index/e728a85a-c57a-4fb7-8117-e15970171024#allthreads
Flags: needinfo?(padenot)
Assignee | ||
Comment 19•7 years ago
|
||
This is either 1394163 or 1397793, depending on how you look at it (the problem or the solution). Since WebRTC.org 57 landed, we always initialize and create threads and devices using webrtc.org, but we never use them.
This is already covered by my patches that refactor all this. I also have a plan to only land this on its own, but I've been told it's now unnecessary to land it early, because we thought this was only useful to unblock sandboxing, and they can wait for 59.
In light of the above, maybe we can revisit this plan.
Flags: needinfo?(padenot)
Assignee | ||
Comment 20•7 years ago
|
||
A bit of a longer explanation. Since the last webrtc uplift, we initialize "platform objects for audio" [0] each time we initialize VoEBaseImpl [1].
We initialize VoEBaseImpl in the following situations:
- Anytime we EnumerateAudioDevice [2]
- Anytime we Initialize an AudioDevice [3]
- Anytime we create an AudioConduit: [4]
Sometimes we clean those up, sometimes we don't, so sometimes threads and other resources are sticking around, sometimes not. The patch set in bug 1397793 fixes all of this.
[0]: http://searchfox.org/mozilla-central/source/media/webrtc/trunk/webrtc/modules/audio_device/audio_device_impl.cc#189
[1]: http://searchfox.org/mozilla-central/source/media/webrtc/trunk/webrtc/voice_engine/voe_base_impl.cc#197
[2]: http://searchfox.org/mozilla-central/source/dom/media/webrtc/MediaEngineWebRTC.cpp#314
[3]: http://searchfox.org/mozilla-central/source/dom/media/webrtc/MediaEngineWebRTCAudio.cpp#787
[4]: http://searchfox.org/mozilla-central/source/media/webrtc/signaling/src/media-conduit/AudioConduit.cpp#320
Comment 21•7 years ago
|
||
Thanks Paul.
I just confirmed that your latest WIP set of patches in bug 1397793, [1] worked for me with the same steps as in comment 18.
This was an unoptimized local build even, so extra good.
[1] https://treeherder.mozilla.org/#/jobs?repo=try&revision=7fafc2ddbf2d7cf3a03c730df04f190a02d1923a
Depends on: 1397793
Comment 22•7 years ago
|
||
Let's start review on bug 1397793 and see how far we get and how comfortable we are with landing in 58.
It's not like this is a common use-case, but spreading patches out a bit might be good, and unblocking sandboxing earlier would be bonus.
Comment 23•7 years ago
|
||
[Tracking Requested - why for this release]: Based on comment 20 this regressed in bug 1341285.
Blocks: 1341285
status-firefox56:
--- → wontfix
status-firefox57:
--- → wontfix
status-firefox58:
--- → fix-optional
status-firefox-esr52:
--- → unaffected
tracking-firefox58:
--- → ?
Keywords: regression
Summary: Can't subscribe to large amount of webRTC streams → Can't subscribe to large amount of webRTC streams (regression)
Comment 24•7 years ago
|
||
We usually don't track bugs that are fix optional. If we want to consider tracking this, we should change 58 status to affected.
Comment 25•7 years ago
|
||
Hold on guys the try build from comment #22 from this URL https://treeherder.mozilla.org/#/jobs?repo=try&revision=7fafc2ddbf2d7cf3a03c730df04f190a02d1923a does not solve the problem, instead it crashes Firefox w/o a crash report.
Comment 26•7 years ago
|
||
drno pointed out this basically blocks anyone who goes into a bigger mesh call.
Comment 27•7 years ago
|
||
To clarify my comment #25: on my Win 10 laptop when I try https://output.jsbin.com/nubuhap RTP never gets send for the 6 or 7th PeerConnection.
But when I tried Andreas try build (see above) instead of not sending any media Firefox crashes. And it crashes so hard that it doesn't even record any crash in about:crashes.
I ran out of time to gather logs or run it with a debugger.
Assignee | ||
Comment 28•7 years ago
|
||
Nils, I mentioned it on irc but not on the bug, those patches are WIP, and only do something meaningful on Linux for now.
Comment 29•7 years ago
|
||
So this is the log from stock Nightly (so no patches applied) with the log parameters from comment #17.
Updated•7 years ago
|
Assignee: drno → nobody
Comment 31•7 years ago
|
||
Andreas, Paul: could either of you take a look at the log I uploaded in comment #29 to asses what is going on there and if bug 1397793 is really going to fix that (note: Andreas try build with the patches from that bug crashed on my Win10 laptop so hard it wouldn't even submit a crash report).
Flags: needinfo?(padenot)
Flags: needinfo?(apehrson)
Comment 32•7 years ago
|
||
I see 12 "webrtc::AudioDeviceModuleImpl::CreatePlatformSpecificObjects" where each will create threads and set up audio apis and more.
I don't know in more detail what goes wrong after that, but I confirmed that on linux Paul's patches help. To figure it out completely I would start to look at where in the chain between MediaStreamGraph (since AIUI all peer connections use the same video track) and the sending side of an RTCPeerConnection frames get blocked.
If all connections are using the same track I'd expect MediaPipeline and VideoConduit to work, so that suggests something happens under the hood in webrtc.org. We know that audio triggers it, but is it a general issue of creating too many threads (the platform specifics mentioned above) or something more explicit in the webrtc stack? It could even be the former and Paul's patches just pushing the boundary a bit.
Flags: needinfo?(apehrson)
Comment 33•7 years ago
|
||
What I find interesting in the log is after the 12th "webrtc::AudioDeviceModuleImpl::AudioDeviceModuleImpl". Could 12 instances be 6 sender PCs and 6 receiver PCs, where the ones created after this didn't work?
> 2017-10-25 22:38:09.296000 UTC - [Main Thread]: D/webrtc_trace (audio_device_wave_win.cc:234): failed to start the timer event
That's http://searchfox.org/mozilla-central/rev/40e8eb46609dcb8780764774ec550afff1eed3a5/media/webrtc/trunk/webrtc/modules/audio_device/win/audio_device_wave_win.cc#234
(But for some reason it's forwarded to our log as debug level instead of error -- had it been error it would have been easier to find)
> 2017-10-25 22:38:10.297000 UTC - [Main Thread]: D/webrtc_trace (audio_device_impl.cc:534): Audio device initialization failed.
> 2017-10-25 22:38:10.298000 UTC - [Main Thread]: D/webrtc_trace (audio_device_impl.cc:1441): webrtc::AudioDeviceModuleImpl::StopPlayout
> 2017-10-25 22:38:10.298000 UTC - [Main Thread]: D/webrtc_trace (audio_device_impl.cc:1483): webrtc::AudioDeviceModuleImpl::StopRecording
> 2017-10-25 22:38:10.298000 UTC - [Main Thread]: D/webrtc_trace (audio_device_impl.cc:1509): webrtc::AudioDeviceModuleImpl::RegisterEventObserver
> 2017-10-25 22:38:10.298000 UTC - [Main Thread]: D/webrtc_trace (audio_device_impl.cc:1522): webrtc::AudioDeviceModuleImpl::RegisterAudioCallback
> 2017-10-25 22:38:10.298000 UTC - [Main Thread]: D/webrtc_trace (audio_device_buffer.cc:88): webrtc::AudioDeviceBuffer::RegisterAudioCallback
> 2017-10-25 22:38:10.298000 UTC - [Main Thread]: D/webrtc_trace (audio_device_impl.cc:547): webrtc::AudioDeviceModuleImpl::Terminate
> 2017-10-25 22:38:10.298000 UTC - [Main Thread]: D/webrtc_trace (audio_device_impl.cc:408): webrtc::AudioDeviceModuleImpl::~AudioDeviceModuleImpl
> 2017-10-25 22:38:10.298000 UTC - [Main Thread]: D/webrtc_trace (audio_device_buffer.cc:82): AudioDeviceBuffer::~dtor
This init failure seems like a 1 second timeout after the previous timer error. Logged at http://searchfox.org/mozilla-central/rev/40e8eb46609dcb8780764774ec550afff1eed3a5/media/webrtc/trunk/webrtc/modules/audio_device/audio_device_impl.cc#534
Comment 34•7 years ago
|
||
The timer start failure is from [1]:
> timerID_ = timeSetEvent(time, 0, (LPTIMECALLBACK)HANDLE(event_), 0,
> TIME_PERIODIC | TIME_CALLBACK_EVENT_PULSE);
> ...
> return timerID_ != NULL;
So it's an error from timeSetEvent, [2]:
> Return value
> Returns an identifier for the timer event if successful or an error otherwise. This function returns NULL if it fails and the timer event was not created. (This identifier is also passed to the callback function.)
But no further info on why it would fail. At least this suggests that getting rid of all these extra audio stacks would help, assuming that this timer failure was the reason the frames stop showing up.
[1] http://searchfox.org/mozilla-central/rev/40e8eb46609dcb8780764774ec550afff1eed3a5/media/webrtc/trunk/webrtc/system_wrappers/source/event_timer_win.cc#66
[2] https://msdn.microsoft.com/en-us/library/dd757634(v=vs.85).aspx
Assignee | ||
Comment 35•7 years ago
|
||
(In reply to Nils Ohlmeier [:drno] from comment #31)
> Andreas, Paul: could either of you take a look at the log I uploaded in
> comment #29 to asses what is going on there and if bug 1397793 is really
> going to fix that (note: Andreas try build with the patches from that bug
> crashed on my Win10 laptop so hard it wouldn't even submit a crash report).
Looking at your log (tell me if I got any of the following wrong):
- This is Windows
- The input device has a rate of 16kHz, the output device 48kHz
- The period is 10ms (one callback each 10ms, providing 10ms of microphone audio, and requesting 10ms of output audio).
- We have a single cubeb stream (duplex) as expected, it's doing fine (we see the callback messages at a correct rate, around one each 10ms). It's doing fine because it's been created first (we create an MSG, it creates and output audio stream, then switches to a duplex stream. We see perfectly normal callbacks throughout, with sensible driver switches. The only error here is that we can't register the endpoint notification client, which is caused by the sandbox on windows, and there is a workaround in place and shipped to release to workaround this.
- Around line 1075, we start seeing some webrtc things. Immediately, we see activity in `audio_device_impl.cc`. This is what I'm explaining in comment 20. Look at the amount of `AudioDeviceModuleImpl::Init` that are happening. Those are very expensive objects.
- Since the computer resources are not infinite, we see things starting to fail (line 6798, benign failure, but it shows the beginning of the resource exhaustion)
- Line 10074 (note that a few seconds have passed since the beginning of the call, around 9 according to the log), we see the culprit: "Audio device initialization failed." from webrtc.org. This is [0], called by [1], called by [2] (it's in the conduit, because we've already initialized successfully at all the other call sites, that happen sequentially before). So the AudioConduit creation failed. This returns error after error, basically unwinding everything until we reach the DOM bindings. This is what you're looking for.
- Around line 26431, we see more logs from `audio_device_impl.cc`, because we're probably shutting things down or the GC kicks in or the like. Remember we already had a bunch of those running, so we need to shut them down.
- Right after this, we don't have an input stream anymore, cubeb switches to an output only stream, and continues running the graph, until the end.
So, long story short, this will be fixed by this patch set, since one of the things that it does is stopping this `audio_device_impl.cc` madness. Again, this is a work in progress, and it's not ready.
[0]: http://searchfox.org/mozilla-central/source/media/webrtc/trunk/webrtc/modules/audio_device/audio_device_impl.cc#534
[1]: http://searchfox.org/mozilla-central/source/media/webrtc/trunk/webrtc/voice_engine/voe_base_impl.cc#239
[2]: http://searchfox.org/mozilla-central/source/media/webrtc/signaling/src/media-conduit/AudioConduit.cpp#320
Flags: needinfo?(padenot) → needinfo?(drno)
Comment 36•7 years ago
|
||
Marking wontfix after discussion with QA.
Comment 37•7 years ago
|
||
Meaning we'll continue reviews and get this ready to land as soon as 59 opens.
Assignee | ||
Comment 39•7 years ago
|
||
This is now fixed.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Updated•7 years ago
|
You need to log in
before you can comment on or make changes to this bug.
Description
•