Closed Bug 1209033 Opened 9 years ago Closed 9 years ago

WebRTC getUserMedia with constraints leads to InternalError or hang in 43

Categories

(Core :: WebRTC: Audio/Video, defect, P1)

42 Branch
defect

Tracking

()

RESOLVED FIXED
Tracking Status
firefox41 --- wontfix
firefox42 --- wontfix
firefox43 + fixed
firefox44 --- fixed

People

(Reporter: akvakh, Assigned: jib)

References

Details

(Keywords: hang)

Attachments

(5 files)

User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.101 Safari/537.36

Steps to reproduce:

1. Open web page using https (not to show devices access dialog in case of device change)
2. Use getUserMedia with constraints to select audio and video recording device (using deviceId)
3. Change audio device id only by passing new constraints to getUserMedia (leave video constraints the same)


Actual results:

errorCallback fired with name: InternalError , message: "Starting video failed” 


Expected results:

successCallback fired
Component: Untriaged → WebRTC: Audio/Video
Product: Firefox → Core
Jib: please verify and update.
Reporter: Can you provide any more details or logs?
Flags: needinfo?(jib)
Flags: needinfo?(gingerbread_man)
Sorry, but I'm unable to test this (which is presumably what was requested). Clearing needinfo flag.
Flags: needinfo?(gingerbread_man)
Looks like the needinfo was meant for the reporter.

I'm unable to reproduce. Here's the fiddle I used: https://jsfiddle.net/77bcfthz/

Andrey, some questions:
1. does it work in http?
2. What OS is this on?
3. What audio devices do you have attached?
4. Is one of the audio devices the "built-in" version of the other, or are you specifically using two unrelated audio devices in your test?
5. Are you able to reproduce with the above fiddle?

Note to self: We should at least change InternalError to either NotReadableError or AbortError to be spec compliant.

Spec [1] says:

"If the user grants permission but a hardware error such as an OS/program/webpage lock prevents access, reject p with a new DOMException object whose name attribute has the value NotReadableError and abort these steps.

If the user grants permission but device access fails for any reason other than those listed above, reject p with a new DOMException object whose name attribute has the value AbortError and abort these steps."

[1] http://w3c.github.io/mediacapture-main/getusermedia.html#dom-mediadevices-getusermedia
Flags: needinfo?(akvakh)
Flags: needinfo?(jib)
Hello,

1. does it work in http?
Yes, but in case of http it shows dialog to choose the recording device.

2. What OS is this on?
Mac OS 10.11

3. What audio devices do you have attached?
Enum shows 3 devices: Default, Built-in Microphone and Display Audio (I have Cinema Display connected to my macbook)

4. Is one of the audio devices the "built-in" version of the other, or are you specifically using two unrelated audio devices in your test?
It doesn't matter, it's related to switch between any 2 devices

5. Are you able to reproduce with the above fiddle?
Fiddle works well. But this app doesn't https://demos.zingaya.com/rtctests/index.html?username=testuser&pwd=testpass&appname=rtctest&accname=aylarov  , just allow access and then click "Enable Self View" and then change audio recording device in combobox
Flags: needinfo?(akvakh)
The problem doesn't reproduce for me in your app (I'm on OSX 10.9.5. and tried FF41 and 42). Can you modify the fiddle until it breaks for you?

Also, if you could test this in Firefox Nightly, that would be great (we fixed a race there that may be related - bug 1103188 comment 92).
Flags: needinfo?(akvakh)
Updated to the latest version of Nightly, the problem is still there, I recorded video:

https://www.youtube.com/watch?v=tWWBaJF3D6U&feature=youtu.be
Flags: needinfo?(akvakh)
I still can't reproduce. I tried to follow the code, but got lost. If you could reproduce it by modifying the minimal fiddle in comment 3 that would help a lot.

(Btw I recommend the actively developed adapter.js, the official WebRTC polyfill, to abstract out browser differences, as it lets you program against the latest standard, with simpler code on your end. https://github.com/webrtc/adapter/blob/master/adapter.js )

This looks to me like either:

A) A race in our code or in the hardware device between calling stream.stop() on the previous gum stream and obtaining the new one. Have you tried putting in a 200 ms delay? It would be interesting to see if it helps. You could also try to switch from stream.stop() to stream.getTracks().forEach(track => track.stop()), which is the new standard, and another difference from the fiddle.

I wonder, does it reproduce in http if you select the device in the permission prompt to match the device you select in-content? (sorry, we have bug 1213046 which is messing up the default choice here). Btw, you can avoid offering the redundant user-choice in the first place by using the 'exact' keyword in constraints [1] (available if you use adapter.js).

If you could try that it would rule out a race, and it might instead be...

B) a hardware error of some kind, and the spec allows us to return NotReadableError when that happens.
   Maybe it's not liking combinations of video and audio from different devices? But in your video it
   looks like you're selecting the same one. hmm.

   The error comes from here:
   http://mxr.mozilla.org/mozilla-central/source/dom/media/MediaManager.cpp?rev=8f665fef765b&mark=279-279#254

[1] https://developer.mozilla.org/en-US/docs/Web/API/MediaDevices/getUserMedia#Parameters
It's not B) , since if I change the device one more time after the error it works well.

Looks like I found the reason - I changed MediaStream.stop to track.stop and it doesn't happen anymore.
(In reply to Andrey Kovalenko from comment #8)
> Looks like I found the reason - I changed MediaStream.stop to track.stop and
> it doesn't happen anymore.

FYI - could this be the order-of-calling-stop issue we recently discovered with pehrson's patches?
Flags: needinfo?(jib)
FWIW the order didn't matter. That turned out to be a pilot error on my part.
Flags: needinfo?(jib)
(In reply to Jan-Ivar Bruaroey [:jib] from comment #10)
> FWIW the order didn't matter. That turned out to be a pilot error on my part.

Where does this stand?
See comment 11 (just looking for a status on this)
Flags: needinfo?(jib)
I was able to modify the fiddle to reproduce it: https://jsfiddle.net/jib1/nh3pp8g2/ It doesn't happen on every run, but 30-60% of the time, so reliably enough to find a regression-range.

It happens in 42 and 43 (and probably earlier back to 39), but was fixed in 44 here: https://hg.mozilla.org/mozilla-central/rev/2f4ae4d923bc which I verified by manual regression testing (mozregression isn't working for me atm).

I got it working in 43 with the following patches (I ran into Bug 1210852 with this test, a deadlock, which should be uplifted IMHO):

> 2015-10-14 11:27:15 +0800 | pehrsons: Bug 1070216 - Properly manage lifetime of allocated CaptureDevices. r?jib
> 2015-10-03 20:42:26 -0400 | jib: Bug 1210852 - do SelectSettings of device capabilities on media thread.
> 2015-09-30 00:39:38 +0800 | pehrsons: Bug 1103188 - Keep track of stopped tracks in gUM stream listener. r?jib
> 2015-09-30 00:39:08 +0800 | pehrsons: Bug 1103188 - Keep track of capture stop only in gUM stream listener. r?jib
Assignee: nobody → jib
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Flags: needinfo?(jib)
[Tracking Requested - why for this release]:

This request is for pehrsons's patches (3 of the 4 patches mentioned in comment 13), because we cherry-picked them from huge patch sets in those bugs mentioned (and I've flagged Bug 1210852 - the hang - separately since it should be considered on its own).

The regression here (the non-hang-part) is that the camera does not come on in all situations when it should. The STR is to have multiple back-to-back gUM requests with persistent permissions granted (a timing issue).
To clarify: this bug has no patches, but requests uplift of cherry-picked patches from other bugs.

The patches don't apply cleanly, so I tried to push my patch-set to mozreview from my local beta, but stopped when it started preparing 1000s of commits. Am I doing something wrong?
(In reply to Jan-Ivar Bruaroey [:jib] from comment #15)
> To clarify: this bug has no patches, but requests uplift of cherry-picked
> patches from other bugs.
> 
> The patches don't apply cleanly, so I tried to push my patch-set to
> mozreview from my local beta, but stopped when it started preparing 1000s of
> commits. Am I doing something wrong?

Use splinter ;-)  File a bug on Mozreview
Updated status/tracking.  Not changing the 43 entries until I look at what patches we're considering, how changed they are, and see some try results with them.  (Since these are being cherry-picked out of much larger sets.)
backlog: --- → webrtc/webaudio+
Rank: 15
Flags: needinfo?(jib)
Priority: -- → P1
Bug 1103188 - Keep track of capture stop only in gUM stream listener. r?jib
Bug 1103188 - Keep track of stopped tracks in gUM stream listener. r?jib
This is needed to avoid something like:
* [old stream] stop track 1 -> deallocate MediaDevice for track 1
* [new stream] gUM() -> allocate MediaDevice for track 1
* [old stream] stop stream -> deallocate MediaDevice for track 1
* [new stream] gUM() -> start MediaDevice for track 1 (oops, MediaDevice was no more!)
Bug 1210852 - do SelectSettings of device capabilities on media thread.
Bug 1070216 - Properly manage lifetime of allocated CaptureDevices. r?jib
We currently avoid Deallocating a CaptureDevice used for multiple gUMStreams
when one of them calls Deallocate() by keeping track of how many called
Start().

The normal lifetime sequence however, is:
Allocate()
Start()
Stop()
Deallocate()

This patches fixes the lifetime management by keeping track of how many
users of the CaptureDevice called Allocate().
I asked on #mozreview, and was told not to be alarmed, so I went ahead, and the patches for beta are above. I launched a try run in mozreview.

Andreas, could you look over these patches for any missing dependencies or potential problems for beta?
Flags: needinfo?(rjesup)
Flags: needinfo?(pehrsons)
Flags: needinfo?(jib)
I don't really want to vouch for moving SelectSettings to the main thread, haven't been involved much there. For the rest I think we're good. That code has been pretty stable since beta.
Flags: needinfo?(pehrsons)
Of course. I'm vouching for that one.
Summary: WebRTC getUserMedia with constraints leads to InternalError → WebRTC getUserMedia with constraints leads to InternalError or hang in 43
Depends on: 1210852
Comment on attachment 8681981 [details]
MozReview Request: Bug 1210852 - do SelectSettings of device capabilities on media thread.

Approval Request Comment
[Feature/regressing bug #]: Bug 1046245
[User impact if declined]:

Calling getUserMedia twice in a row with persistent permission (i.e. without a permission prompt), where the first call succeeded, leads to a race which depending on timing can cause the second stream to either fail with InternalError or deadlock between the main thread and the mediaManager thread. The deadlock seems to happen in non-e10s mode.

The deadlock is fixed by this patch which is from Bug 1210852 and rebased for beta.

[Describe test coverage new/current, TreeHerder]:

Bug 1210852 landed in Nightly almost a month ago and seems to be working well, passing all existing tests.

[Risks and why]: 

Low. The code change follows an established pattern in this file for thread dispatching over to the media thread, where the code was always meant to be run (and used to run before 39).

[String/UUID change made/needed]: none
Attachment #8681981 - Flags: approval-mozilla-beta?
Comment on attachment 8681982 [details]
MozReview Request: Bug 1070216 - Properly manage lifetime of allocated CaptureDevices. r?jib

(This request is for the remaining 3 patches fixing the InternalError.)

Approval Request Comment
[Feature/regressing bug #]: Bug 1046245
[User impact if declined]:

Calling getUserMedia twice in a row with persistent permission (i.e. without a permission prompt), where the first call succeeded, leads to a race which depending on timing can cause the second stream to fail with InternalError.

The three patches are from Bug 1103188 and Bug 1070216.

[Describe test coverage new/current, TreeHerder]:

Bug 1103188 landed in Nightly over a month ago, and Bug 1070216 half-a-month ago, and seem to be working well, passing all existing and new tests. These cherry-picked patches are vouched for by pehrsons in comment 23 as being simple and standalone from the other patches in those bugs (they were problems discovered along the way of an implementation feature).

[Risks and why]: 

Low. The first two patches fixed some faulty assumptions about when streams are marked as stopped in comment 19. The third patch fixes the tracking code for when resources for track members can be relinquished (opencount in Start/Stop when it should have been Allocate/Deallcoate) in comment 21.

All of these a bugs in their own right that contributed to this resource overlap problem.

[String/UUID change made/needed]: none
Attachment #8681982 - Flags: approval-mozilla-beta?
Tracking for 43 and 44 for now. I'll look at uplifting this next week or the end of this week once we have beta 1 in place.
Keywords: hang
The patches we've requested uplift on (to Beta - Fx43) are already in Fx44.  So I'm clearing tracking-firefox44.  Also, I think all the needinfo's have been answered; so I'm clearing the remaining one.
Flags: needinfo?(rjesup)
Comment on attachment 8681981 [details]
MozReview Request: Bug 1210852 - do SelectSettings of device capabilities on media thread.

Fix for hang and other video issues. OK to uplift to beta.
Attachment #8681981 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Comment on attachment 8681982 [details]
MozReview Request: Bug 1070216 - Properly manage lifetime of allocated CaptureDevices. r?jib

OK to uplift to beta, this is the 2nd of 4 patches to fix a video/audio hang.
Attachment #8681982 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Wes, can you uplift these and the other two patches attached here that jib rebased for beta. Thanks.
Flags: needinfo?(wkocher)
Comment on attachment 8681979 [details]
MozReview Request: Bug 1103188 - Keep track of capture stop only in gUM stream listener. r?jib

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/23937/diff/1-2/
Attachment #8681979 - Flags: review?(jib)
Attachment #8681980 - Flags: review?(jib)
Comment on attachment 8681980 [details]
MozReview Request: Bug 1103188 - Keep track of stopped tracks in gUM stream listener. r?jib

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/23939/diff/1-2/
Comment on attachment 8681981 [details]
MozReview Request: Bug 1210852 - do SelectSettings of device capabilities on media thread.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/23941/diff/1-2/
Attachment #8681982 - Flags: review?(jib)
Comment on attachment 8681982 [details]
MozReview Request: Bug 1070216 - Properly manage lifetime of allocated CaptureDevices. r?jib

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/23943/diff/1-2/
Bug 1103188 - Always call MediaManager::NotifyFinished/NotifyRemoved on main thread. r?jib
Attachment #8685107 - Flags: review?(jib)
Flags: needinfo?(jib)
Attachment #8681979 - Flags: review?(jib)
Attachment #8681980 - Flags: review?(jib)
Attachment #8681982 - Flags: review?(jib)
Attachment #8685107 - Flags: review?(jib)
Comment on attachment 8685107 [details]
MozReview Request: Bug 1103188 - Always call MediaManager::NotifyFinished/NotifyRemoved on main thread. r?jib

I forgot to stay on top of try's here, sorry. Got wrong thread asserts:
https://treeherder.mozilla.org/#/jobs?repo=mozilla-beta&revision=3705005d2190

We need one more patch (this one) from Bug 1103188 to fix this. Doing new try:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=8f98856a73ac

Andreas, please let me know if you see any problems with using this patch on beta.

Worst case, I also did a try with just the lone Bug 1210852 patch (the hang):
https://treeherder.mozilla.org/#/jobs?repo=try&revision=804a226b61b0
Flags: needinfo?(pehrsons)
Comment on attachment 8681979 [details]
MozReview Request: Bug 1103188 - Keep track of capture stop only in gUM stream listener. r?jib

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/23937/diff/2-3/
Comment on attachment 8681980 [details]
MozReview Request: Bug 1103188 - Keep track of stopped tracks in gUM stream listener. r?jib

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/23939/diff/2-3/
Comment on attachment 8681982 [details]
MozReview Request: Bug 1070216 - Properly manage lifetime of allocated CaptureDevices. r?jib

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/23943/diff/2-3/
Comment on attachment 8685107 [details]
MozReview Request: Bug 1103188 - Always call MediaManager::NotifyFinished/NotifyRemoved on main thread. r?jib

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/24729/diff/1-2/
Comment on attachment 8685107 [details]
MozReview Request: Bug 1103188 - Always call MediaManager::NotifyFinished/NotifyRemoved on main thread. r?jib

Fixed kdiff error.
Looks fine to me.
Flags: needinfo?(pehrsons)
Comment on attachment 8685107 [details]
MozReview Request: Bug 1103188 - Always call MediaManager::NotifyFinished/NotifyRemoved on main thread. r?jib

See comment 26.

Last beta landing bounced w/oranges about NotifyFinished being on the wrong thread, because we left out this patch in our cherry-picking from bug 1103188 which fixes that.

Try seems fine now in comment 46 (though it's hard to know what passes for green these days with a lot of oranges still, but they seem unrelated).
Attachment #8685107 - Flags: approval-mozilla-beta?
Comment on attachment 8685107 [details]
MozReview Request: Bug 1103188 - Always call MediaManager::NotifyFinished/NotifyRemoved on main thread. r?jib

Let's try uplifting to beta again for the beta 4 build today.
Attachment #8685107 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
jib: can we close this?
Flags: needinfo?(jib)
Yes, thanks for catching that it wasn't.
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Flags: needinfo?(jib)
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: