Closed Bug 1108248 Opened 5 years ago Closed 5 years ago

Firefox on windows can't connect to more than 8 to 10 incoming WebRTC streams.

Categories

(Core :: WebRTC: Networking, defect)

x86_64
Windows 7
defect
Not set

Tracking

()

RESOLVED FIXED
mozilla38
Tracking Status
firefox38 --- fixed

People

(Reporter: bhouser.knuddels, Assigned: jesup)

References

Details

Attachments

(4 files, 2 obsolete files)

My company is building a video conferencing app, using Licode (http://lynckia.com/licode/) as a Multipoint Control Unit.  So there are no P2P connections, and no TURN server.  Video/Audio are encoded as VP8/Opus, respectively.

Firefox (33/34/35[developer edition]/latest-nightly-debug) appears to have an issue connecting to more than (approximately) 8 incoming streams.  I’ve successfully connected up to 10 streams, but not more.  It appears to be irrelevant whether or not the incoming streams are displayed in one browser tab, or dispersed though multiple tabs.

Basically, when setting up the connection for the streams that fail, the offer from the remote end is received, I call peerConnection.setRemoteDesription.  The “Success” callback is executed.  But peerConnection.onaddstream is never called.

The quantity of streams which successfully connect appears to be unaffected by how fast or slow the host machine is.  Slow clients are able to connect about 8 streams which are then played back in stop-motion.

The problem occurs on Windows machines (tested on windows 7 and window 8.1).  A Mac seems to happily connect up to 50 streams.  Did not test Linux.
Sorry for the delay, we had to fix a different bug in our code (invalid sdp response) before we could produce the results.

We ran the test again with this recommended binary:
https://ftp-ssl.mozilla.org/pub/mozilla.org/firefox/try-builds/bcampen@mozilla.com-16007b74d0f2/try-win32-debug/

Unfortunately the behaviour is the same, no improvement.

The logs are for a test where we attempted to subscribe to 12 streams (3 browser tabs, with 4 streams each).  The first 8 streams connected successfully, the last 4 fail.  The failure scenario is as described in the original bug description.
I see the following in the logfile 4 times:

2014-12-16 13:41:51.343000 UTC - 0[3111140]: [main|WebrtcAudioSessionConduit] AudioConduit.cpp:288: mozilla::WebrtcAudioConduit::Init VoiceEngine Base Not Initialized
2014-12-16 13:41:51.343000 UTC - 0[3111140]: [main|WebrtcAudioSessionConduit] AudioConduit.cpp:52: mozilla::AudioSessionConduit::Create AudioConduit Init Failed 
2014-12-16 13:41:51.344000 UTC - 0[3111140]: [main|PeerConnectionMedia] PeerConnectionMedia.cpp:334: Failed to create receiving pipeline, rv=2147500037 

Maybe we're hitting some limit in the webrtc.org code?
^
Flags: needinfo?(rjesup)
Could you re-run with webrtc.org trace logging (as described here: https://wiki.mozilla.org/Media/WebRTC/Logging )?
Flags: needinfo?(bhouser.knuddels)
If this is caused by some limitation on the number of webrtc.org audio/video engines, we probably want to fix this before (or ASAP after) multistream lands. Also, since the renegotiation work is going to force us to stop linking tx/rx conduits (because either side needs to be able to go away while the other continues to run), that's going to double the number of engines.
Here are the logs with webrtc.org trace logging as requested.

Additionally, we've created a test page which should allow you to reproduce the bug.  I'll send it to you (Byron) by email because I don't want to post it publicly on Bugzilla.  We have a large chat community that shouldn't find the page.
Flags: needinfo?(bhouser.knuddels)
Hmm, it seems that the webrtc logfile must have rolled over? The logfile starts at 10:54:52:532, but the signaling log shows the failures I'm interested in at 10:51:04.796.
Flags: needinfo?(bhouser.knuddels)
You're right about the rollover, the log appears to contain no more than about 5 seconds of useful information.  Sorry about that.

Unfortunately I can't find an option to disable rollover.  The 32 bit build that you linked also doesn't want to start from the command line, so I can't just pipe the output.  Do you have a 64 bit build I can log?
webrtc internal trace logs roll over at ~10MB.  You *can* set WEBRTC_TRACE_FILE=nspr and they get merged with regular logs, but expect performance problems if you do so.
Flags: needinfo?(rjesup)
(In reply to Brian Houser from comment #9)
> You're right about the rollover, the log appears to contain no more than
> about 5 seconds of useful information.  Sorry about that.
> 
> Unfortunately I can't find an option to disable rollover.  The 32 bit build
> that you linked also doesn't want to start from the command line, so I can't
> just pipe the output.  Do you have a 64 bit build I can log?

   If the binary I linked you to did not work any better, there's no reason to test with it further. I'm trying to reproduce now.

(In reply to Randell Jesup [:jesup] from comment #10)
> webrtc internal trace logs roll over at ~10MB.  You *can* set
> WEBRTC_TRACE_FILE=nspr and they get merged with regular logs, but expect
> performance problems if you do so.

   I tried this, and the webrtc.org logs completely stomp all of the nspr logging; only the webrtc.org stuff comes through.
We are failing here:

http://dxr.mozilla.org/mozilla-central/source/media/webrtc/trunk/webrtc/voice_engine/voe_base_impl.cc#384

    // ADM initialization
    if (_shared->audio_device()->Init() != 0)
    {
        _shared->SetLastError(VE_AUDIO_DEVICE_MODULE_ERROR, kTraceError,
            "Init() failed to initialize the ADM");
        return -1;
    }

The function is pretty straightforward, but the stuff is calls out to is platform-dependent, and dxr doesn't index anything but linux:

http://dxr.mozilla.org/mozilla-central/search?q=%2Bderived%3Awebrtc%3A%3AAudioDeviceUtility
http://dxr.mozilla.org/mozilla-central/search?q=%2Bderived%3Awebrtc%3A%3AAudioDeviceGeneric
This points pretty clearly to this being a webrtc.org limitation on windows. One option of fixing this is to shift over to using a single shared engine, and make use of channels. However, this will not be trivial, since we need some new API from webrtc.org. The second option is to try to fix the limitation on windows. I don't know enough about the webrtc.org code to know which will be easier, but the shared engine approach seems to be the best path going forward, since that seems to be how the webrtc.org code is intended to be used.
See Also: → 784517
Blocks: 1017888
See Also: → 1099318
Did some more testing, and the logging indicates that we're failing to start a timer here:

https://dxr.mozilla.org/mozilla-central/source/media/webrtc/trunk/webrtc/modules/audio_device/win/audio_device_wave_win.cc#259

    if (!_timeEvent.StartTimer(periodic, TIMER_PERIOD_MS))
    {
        WEBRTC_TRACE(kTraceCritical, kTraceAudioDevice, _id,
                     "failed to start the timer event");
        if (_ptrThread->Stop())
        {
            delete _ptrThread;
            _ptrThread = NULL;
        }
        else
        {
            WEBRTC_TRACE(kTraceWarning, kTraceAudioDevice, _id,
                         "unable to stop the activated thread");
        }
        return -1;
    }

This implies that we're failing here:

https://dxr.mozilla.org/mozilla-central/source/media/webrtc/trunk/webrtc/system_wrappers/source/event_win.cc#58

bool EventWindows::StartTimer(bool periodic, unsigned long time) {
  if (timerID_ != NULL) {
    timeKillEvent(timerID_);
    timerID_ = NULL;
  }

  if (periodic) {
>>> timerID_ = timeSetEvent(time, 0, (LPTIMECALLBACK)HANDLE(event_), 0, <<<<<<<<<<
                            TIME_PERIODIC | TIME_CALLBACK_EVENT_PULSE);
  } else {
    timerID_ = timeSetEvent(time, 0, (LPTIMECALLBACK)HANDLE(event_), 0,
                            TIME_ONESHOT | TIME_CALLBACK_EVENT_SET);
  }

  return timerID_ != NULL;
}

I'm not sure why this windows API would fail. It is possible that the c'tor for EventWindows failed to init |event_| (by calling CreateEvent), or there may be some other error. In either case, we seem to be running into some limit in the Windows API, and it seems unlikely that we'll find an easy workaround. I'll add some logging and push to try, and see if I can figure out the root cause, but we probably need to start work on moving over to using channels.
Flags: needinfo?(rjesup)
Assignee: nobody → rjesup
There's a limit of 16 timers for timerSetEvent() (per process apparently; not well documented).  On top of that, the call is officially deprecated for the last decade or so - but it turns out works better than any of the newer/"better" alternatives.

Since we don't use the 2ms timers in the wave code, and instead only appear to use it for video-frame-time-ish periods, the higher variation and poor short-duration precision may not matter.  There may be an issue with longer-term drift (at least on XP); we'll see.  I'll upload a patch that may help (still being debugged, but at least builds, runs, and can do video calls).
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Flags: needinfo?(rjesup)
Attachment #8560726 - Attachment is obsolete: true
Byron - give this a try.  It seems to work locally (18 encode/decode peerconnections at once, plus gUM, albeit at lowered resolution to avoid running out of CPU).

If so, I'll clean it up (easy)
Flags: needinfo?(docfaraday)
I'll give it a look when I'm back home with my windows machine.
Doesn't seem to have the limit anymore.
Flags: needinfo?(docfaraday)
Attachment #8561564 - Attachment is obsolete: true
Comment on attachment 8563074 [details] [diff] [review]
Swap CreateTimerQueueTimer() for timerSetEvent() in webrtc win32 code to avoid limits on the number of realtime timers

We don't use any <10ms timers so far as I know, in our usage (there are 2ms timers in the win32 audio code in webrtc, but we don't use it and won't).

Officially, timerSetEvent() is deprecated (for around a decade); people use it because it's more accurate (especially for <10ms timers, doubly for <=2ms) and winxp may have some drift issues with the timer queues, per stuff I've found on the web.  Once we switch to mostly channel-based use of webrtc.org media engines, we may be able to unset this and go back to timerSetEvent().
Attachment #8563074 - Flags: review?(dmajor)
Hmm. The docs don't say whether the APIs use NULL or INVALID_HANDLE_VALUE for bad timer handles. From the _In_ annotation on DeleteTimerQueueTimer, and the fact that they use NULL for timer *queue* handles, I'm going to assume it's NULL.
Comment on attachment 8563074 [details] [diff] [review]
Swap CreateTimerQueueTimer() for timerSetEvent() in webrtc win32 code to avoid limits on the number of realtime timers

Review of attachment 8563074 [details] [diff] [review]:
-----------------------------------------------------------------

::: media/webrtc/trunk/webrtc/system_wrappers/source/event_win.cc
@@ +57,5 @@
> +// static
> +void CALLBACK EventWindows::TimerRoutine(PVOID lpParam, BOOLEAN TimerOrWaitFired)
> +{
> +  EventWindows *eventwin = (EventWindows*) lpParam;
> +  SetEvent(eventwin->event_);

Does the set/pulse distinction (lines 86/89) not matter here?

@@ +95,5 @@
>  
>  bool EventWindows::StopTimer() {
> +#ifdef WIN32_USE_TIMER_QUEUES
> +  if (timerHandle_) {
> +    DeleteTimerQueueTimer(NULL, timerHandle_, NULL);

Looking at the documentation for the third parameter, I suspect that INVALID_HANDLE_VALUE might make more sense, to let TimerRoutine finish before we reach CloseHandle. What do you think?

(Possibly also for line 68)

::: media/webrtc/trunk/webrtc/system_wrappers/source/event_win.h
@@ +31,5 @@
>    virtual bool Set();
>    virtual bool Reset();
>  
> +#ifdef WIN32_USE_TIMER_QUEUES
> +  static void CALLBACK EventWindows::TimerRoutine(PVOID lpParam, BOOLEAN TimerOrWaitFired);

Can remove "EventWindows::"
Attachment #8563074 - Flags: review?(dmajor) → review+
https://hg.mozilla.org/mozilla-central/rev/73b8180cfd7e
Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla38
Flags: needinfo?(bhouser.knuddels)
You need to log in before you can comment on or make changes to this bug.