Closed Bug 1284102 Opened 8 years ago Closed 8 years ago

Intermittent browser/base/content/test/webrtc/browser_devices_get_user_media.js | recording-device-events notification unexpected - Got 1, expected 0

Categories

(Core :: WebRTC, defect, P1)

defect

Tracking

()

RESOLVED FIXED
mozilla52
Tracking Status
firefox49 --- wontfix
firefox50 --- wontfix
firefox51 --- wontfix
firefox52 --- fixed

People

(Reporter: intermittent-bug-filer, Assigned: pehrsons)

References

Details

(Keywords: intermittent-failure)

Attachments

(8 files, 5 obsolete files)

2.28 KB, patch
Details | Diff | Splinter Review
58 bytes, text/x-review-board-request
Details
58 bytes, text/x-review-board-request
jib
: review+
Details
58 bytes, text/x-review-board-request
florian
: review+
Details
58 bytes, text/x-review-board-request
florian
: review+
Details
58 bytes, text/x-review-board-request
florian
: review+
Details
58 bytes, text/x-review-board-request
florian
: review+
Details
58 bytes, text/x-review-board-request
florian
: review+
Details
Looks to me like the same (?) problem as in bug 1284038.
backlog: --- → webrtc/webaudio+
Rank: 35
Priority: -- → P3
See Also: → 1284038
It looks to me like like the platform code always fires a single "starting" recording-device-events per stream, but fires one "shutdown" recording-device-events *per track* in the stream.

In other words, for a yield promiseRequestDevice(true, true) + closeStream(), I would expect:

  1 recording-device-events, "starting" 
  1 recording-device-events, "shutdown", true, false,
  1 recording-device-events, "shutdown", false, true,

Where the boolean props represent isAudio and isVideo respectively.

Yet the browser tests don't seem written to wait for both. closeStream [1] appears to always only wait for one, but using a Promise.all that also waits for another unrelated event. Directly after that, it checks that it has received no additional recording-device-events, which is the line of the intermittent in comment 0.

Since it never waits for a second recording-device-events, and the counting of recording-device-events appears to happen in parallel with this promise chain in this test, it seems plausible that this is racing, especially since the platform code offers no timing guarantee that these events fire on the same cycle. 

I haven't been able to confirm this, as I ran out of time to test it (promiseObserverCalled() needs fixing to be able to wait for two events reliably), so if someone could confirm this that'd be great.

[1] https://dxr.mozilla.org/mozilla-central/rev/b3ec8a3373e8faca26c39e9ff577a2d4f8b6927a/browser/base/content/test/webrtc/head.js#385
Flags: needinfo?(florian)
This is now blocking the landing of bug 1294605, which is a P1, because of some timing issues triggering this a bit extra (?)
Blocks: 1294605
Rank: 35 → 18
Priority: P3 → P1
(In reply to Jan-Ivar Bruaroey [:jib] from comment #6)
> It looks to me like like the platform code always fires a single "starting"
> recording-device-events per stream, but fires one "shutdown"
> recording-device-events *per track* in the stream.

How difficult would it be to fix this? Or is there a good reason why it needs to be this way?

This notification triggers an update of all the sharing indicators in the UI, so triggering it twice in a row causes unneeded execution of UI code.
Flags: needinfo?(florian) → needinfo?(jib)
If we have a gUM request succeed with audio+video and stop only video, is it meant to update the UI state to become audio only? If not there's no need for multiple events really.
(In reply to Florian Quèze [:florian] [:flo] from comment #9)
> (In reply to Jan-Ivar Bruaroey [:jib] from comment #6)
> > It looks to me like like the platform code always fires a single "starting"
> > recording-device-events per stream, but fires one "shutdown"
> > recording-device-events *per track* in the stream.
> 
> How difficult would it be to fix this? Or is there a good reason why it
> needs to be this way?

This behavior is intentional, and was added to solve bug 1050802, where users with both screensharing and camera shared could use our in-call doorhangers (see attachment 8470062 [details]) to stop screensharing but leave camera sharing running. Is that still desirable, or has the UX changed?
Flags: needinfo?(jib) → needinfo?(florian)
To me it seems doable to redo the browser tests to expect the right number of events here, but I'll leave it up to florian since this was done to support UX in the first place.
(In reply to Jan-Ivar Bruaroey [:jib] from comment #12)
> To me it seems doable to redo the browser tests to expect the right number
> of events here, but I'll leave it up to florian since this was done to
> support UX in the first place.

Adapting the test to whatever (deterministic) behavior we have currently is doable. My question was more about whether we could find a way to prevent updating the UI twice. Is there any place in the code where a boolean could be used to avoid firing the notification until we are done making changes to the page's streams?

(In reply to Jan-Ivar Bruaroey [:jib] from comment #11)

> users with both screensharing and camera shared could use our in-call
> doorhangers (see attachment 8470062 [details]) to stop screensharing but
> leave camera sharing running. Is that still desirable, or has the UX changed?

This doorhanger no longer exists; it has been replaced in bug 1206233 by sharing indicators in the control center panel. It would actually be desirable to have a way to stop sharing the microphone without the camera and vice versa... but that seems off-topic here.
As I mention in comment 6 there are isAudio and isVideo booleans accompanying the "shutdown" events (I'm unsure if they also accompany the "starting" event, but we could add that if it's missing). So assuming the code knows whether both audio and video were requested or not, it should have the information to avoid updating the UI twice.

E.g. for a gUM(cam+mic) call, wait for

  1 recording-device-events, "shutdown", true, false, and
  1 recording-device-events, "shutdown", false, true,

or

  a single recording-device-events, "shutdown", true, true,

Is that what you mean?
See Also: → 1300093
In case it's confusing, the sub-part of bug 1050802 that's relevant (wrt STOPPED_TRACK and the double events) have to do with separating screensharing from *audio*, not camera, See bug 1050802 comment 26.
(In reply to Jan-Ivar Bruaroey [:jib] from comment #6)
> It looks to me like like the platform code always fires a single "starting"
> recording-device-events per stream, but fires one "shutdown"
> recording-device-events *per track* in the stream.

This is not what I'm observing, apply the attached patch to see in the terminal the notifications fired by MediaManager.
Flags: needinfo?(florian)
Flags: needinfo?(jib)
When the application stops the tracks I get largely what jib described. I.e., on [1] I get:

> start()
> updateIndicators, topic = recording-device-events, aData = starting, aSubject = ({requestURL:"https://fiddle.jshell.net/_display/", isAudio:true, isApp:false, isVideo:true})
> 
> stop()
> updateIndicators, topic = recording-device-events, aData = shutdown, aSubject = ({requestURL:"https://fiddle.jshell.net/_display/", isAudio:true, isApp:false, isVideo:false})
> updateIndicators, topic = recording-device-events, aData = shutdown, aSubject = ({requestURL:"https://fiddle.jshell.net/_display/", isAudio:false, isApp:false, isVideo:true})
> removeBrowserSpecificIndicator, topic = recording-window-ended, aData = 19


And on [2] where I use MediaStream.stop instead of MediaStreamTrack.stop, like in the intermittently failing test, I get:
> start()
> updateIndicators, topic = recording-device-events, aData = starting, aSubject = ({requestURL:"https://fiddle.jshell.net/_display/", isAudio:true, isApp:false, isVideo:true})
> 
> stop()
> removeBrowserSpecificIndicator, topic = recording-window-ended, aData = 20
> updateIndicators, topic = recording-device-events, aData = shutdown, aSubject = ({requestURL:"https://fiddle.jshell.net/_display/", isAudio:true, isApp:false, isVideo:true})


Can someone clarify what we're seeing when the test fails? We're getting an unexpected recording-device-events after stopping the stream it seems, but what's the full sequence? I had a hard time following the test code.


[1] https://jsfiddle.net/w6y9qvpj/1/
[2] https://jsfiddle.net/w6y9qvpj/2/
Florian, do you know what we're seeing in the failure case? And are the results I note in comment 17 unexpected?
Flags: needinfo?(florian)
Clearing the needinfo for now as we discussed over IRC yesterday and there doesn't seem to be anything actionable for me here right now.
Flags: needinfo?(florian)
Assignee: nobody → pehrson
Status: NEW → ASSIGNED
Flags: needinfo?(jib)
(In reply to Andreas Pehrson [:pehrsons] (Telenor) from comment #21)
> Let's see how this works:
> https://treeherder.mozilla.org/#/jobs?repo=try&revision=920dae571114

If your patch does that I think it does, you may be able to remove all the promiseTodoObserverNotCalled calls.
I had to launch one more try with your logs since this was still failing.

I caught it failing.

The job in question: https://treeherder.mozilla.org/#/jobs?repo=try&revision=3a464d23f03a&selectedJob=27891483

The log in question: https://archive.mozilla.org/pub/firefox/try-builds/pehrsons@gmail.com-3a464d23f03a8dc1acbac455ff71a90f097e64d4/try-linux64/try_ubuntu64_vm_test-mochitest-browser-chrome-1-bm113-tests1-linux64-build793.txt.gz

This is interesting because I see many recording-device-events with aData=shutdown separately for audio and video. But it only fails on one. Florian, shouldn't we catch all these? That would make debugging easier.

Locally I ran this test without errors but observed:
> 403 INFO Console message: [JavaScript Warning: "MediaStream.stop() is deprecated and will soon be removed. Use MediaStreamTrack.stop() instead." {file: "https://example.com/browser/browser/base/content/test/webrtc/get_user_media.html" line: 48}]
> 404 INFO TEST-FAIL | browser/base/content/test/webrtc/browser_devices_get_user_media.js | Stream was GC'd before MediaStream.stop() was dispatched (bug 1284038)

The js warning proves that stop() was called so it was clearly not GC'd. Let's not mask errors away please.
Flags: needinfo?(florian)
In the case where it fails locally it's because the tracks end in different MSG iterations.

MediaStream.stop() ends the tracks, but they won't notify about ended until their internal buffer runs out (video doesn't really have one, but the audio buffer stretches a bit into the future).

I think we can make the behavior consistent by stopping all tracks instead of just the stream, in combination with the patches I already wrote. Not a big deal that MediaStream.stop() causes extra UI updates since it's deprecated anyway.
(In reply to Andreas Pehrson [:pehrsons] (Telenor) from comment #26)

> MediaStream.stop() ends the tracks, but they won't notify about ended until
> their internal buffer runs out (video doesn't really have one, but the audio
> buffer stretches a bit into the future).
> 
> I think we can make the behavior consistent by stopping all tracks instead
> of just the stream, in combination with the patches I already wrote. Not a
> big deal that MediaStream.stop() causes extra UI updates since it's
> deprecated anyway.

I'm a bit confused about why calling MediaStream.stop which ends the track would be less deterministic than having the JS code stop every tracks, but if you can get the test to pass consistently by replacing MediaStream.stop() by something that doesn't cause deprecation warnings in the log, that seems great to me :-).
Flags: needinfo?(florian)
(In reply to Florian Quèze [:florian] [:flo] from comment #27)
> (In reply to Andreas Pehrson [:pehrsons] (Telenor) from comment #26)
> 
> > MediaStream.stop() ends the tracks, but they won't notify about ended until
> > their internal buffer runs out (video doesn't really have one, but the audio
> > buffer stretches a bit into the future).
> > 
> > I think we can make the behavior consistent by stopping all tracks instead
> > of just the stream, in combination with the patches I already wrote. Not a
> > big deal that MediaStream.stop() causes extra UI updates since it's
> > deprecated anyway.
> 
> I'm a bit confused about why calling MediaStream.stop which ends the track
> would be less deterministic than having the JS code stop every tracks, but
> if you can get the test to pass consistently by replacing MediaStream.stop()
> by something that doesn't cause deprecation warnings in the log, that seems
> great to me :-).

MediaStreamTrack.stop() ends the track, MediaStream.stop() ends the source that produces data for the tracks which then end when the data they already have runs out.

Anyway, that was not my point. My question was why there are so many occurences in the log of duplicate recording-device-events without any errors? Is there a bug in the test?
This looks good locally - no double shutdown notifications.

Florian, please see the commit message in comment 32. I'll implement that missing path if necessary.

Try push: https://treeherder.mozilla.org/#/jobs?repo=try&revision=a5ed95330332
Comment on attachment 8793815 [details]
Bug 1284102 - Stop getUserMedia stream by stopping all tracks.

https://reviewboard.mozilla.org/r/80440/#review79180
Attachment #8793815 - Flags: review?(florian) → review+
Comment on attachment 8793816 [details]
Bug 1284102 - Remove all todos in browser chrome getUserMedia tests.

https://reviewboard.mozilla.org/r/80442/#review79188

r- because I can't set f+ from reviewboard, but I'm excited to see these todo() finally go away!

::: browser/base/content/test/webrtc/head.js
(Diff revision 1)
>      });
>      mm.sendAsyncMessage("Test:ExpectNoObserverCalled");
>    });
>  }
>  
> -function promiseTodoObserverNotCalled(aTopic) {

Please also remove http://searchfox.org/mozilla-central/source/browser/base/content/test/webrtc/get_user_media_content_script.js#37
Attachment #8793816 - Flags: review?(florian) → review-
Comment on attachment 8793817 [details]
Bug 1284102 - Expect recording-device-events when closing window.

https://reviewboard.mozilla.org/r/80444/#review79192

While updating the UI in this case doesn't seem useful, it's such an edge case that I don't think it matters here. I'm OK with just changing the test.
Attachment #8793817 - Flags: review?(florian) → review+
See Also: → 1304997
Comment on attachment 8793816 [details]
Bug 1284102 - Remove all todos in browser chrome getUserMedia tests.

https://reviewboard.mozilla.org/r/80442/#review79376
Attachment #8793816 - Flags: review?(florian) → review+
Comment on attachment 8794154 [details]
Bug 1284102 - Fix a race in test code.

https://reviewboard.mozilla.org/r/80702/#review79378

r=me assumping try is green.
Attachment #8794154 - Flags: review?(florian) → review+
Comment on attachment 8793308 [details]
Bug 1284102 - Notify chrome of stopped tracks in next stable state.

https://reviewboard.mozilla.org/r/80082/#review80144

lgtm.

::: dom/media/MediaManager.cpp:3667
(Diff revision 2)
> -                           stopAudio ? mAudioDevice.get() : nullptr,
> -                           stopVideo ? mVideoDevice.get() : nullptr,
> +                           stopAudio && mAudioDevice ? mAudioDevice.get()
> +                                                         : nullptr,
> +                           stopVideo && mVideoDevice ? mVideoDevice.get()
> +                                                         : nullptr,

Nit: This is a no-op change.
Attachment #8793308 - Flags: review?(jib) → review+
Comment on attachment 8793814 [details]
Bug 1284102 - Guard against double-deallocating devices in release.

https://reviewboard.mozilla.org/r/80438/#review80158

lgtm.

::: dom/media/webrtc/MediaEngine.h:253
(Diff revision 1)
> -    mRegisteredHandles.RemoveElementAt(mRegisteredHandles.IndexOf(handle, 0,
> -                                                                  Comparator()));
> +    nsTArray<RefPtr<AllocationHandle>>::index_type ix =
> +      mRegisteredHandles.IndexOf(handle, 0, Comparator());

I think this is why they invented auto.

::: dom/media/webrtc/MediaEngine.h:255
(Diff revision 1)
>        }
>      };
> -    MOZ_ASSERT(mRegisteredHandles.Contains(handle, Comparator()));
> -    mRegisteredHandles.RemoveElementAt(mRegisteredHandles.IndexOf(handle, 0,
> -                                                                  Comparator()));
> +
> +    nsTArray<RefPtr<AllocationHandle>>::index_type ix =
> +      mRegisteredHandles.IndexOf(handle, 0, Comparator());
> +    if (ix == nsTArray<RefPtr<AllocationHandle>>::NoIndex) {

if (ix == mRegisteredHandles.NoIndex)
?

::: dom/media/webrtc/MediaEngine.h:256
(Diff revision 1)
> +      MOZ_ASSERT(false);
> +      return NS_ERROR_FAILURE;

How can this happen?
Attachment #8793814 - Flags: review?(jib) → review+
Comment on attachment 8793308 [details]
Bug 1284102 - Notify chrome of stopped tracks in next stable state.

https://reviewboard.mozilla.org/r/80082/#review80182

::: dom/media/MediaManager.cpp:3652
(Diff revision 2)
> +  MOZ_ASSERT(!stopAudio || mAudioDevice,
> +             "If there's a pending stop for audio, there must be an audio device");
> +  MOZ_ASSERT(!stopVideo || !mVideoStopped,
> +             "If there's a pending stop for video, video must not have been stopped");
> +  MOZ_ASSERT(!stopVideo || mVideoDevice,
> +             "If there's a pending stop for video, there must be an video device");

a video device
Comment on attachment 8793814 [details]
Bug 1284102 - Guard against double-deallocating devices in release.

https://reviewboard.mozilla.org/r/80438/#review80158

> How can this happen?

I introduced a bug in an intermediate stage and on opt builds we couldn't assert and we didn't return. That caused weird looking failures because of UAF. We keep an assert for the what-if case and return early for opt builds.
Attachment #8793814 - Attachment is obsolete: true
Attachment #8793815 - Attachment is obsolete: true
Attachment #8793816 - Attachment is obsolete: true
Attachment #8793817 - Attachment is obsolete: true
Attachment #8794154 - Attachment is obsolete: true
Well, now I pushed the patch to the wrong bug anyway. Sorry for that.
Comment on attachment 8796522 [details]
Bug 1284102 - Stop getUserMedia stream by stopping all tracks.

https://reviewboard.mozilla.org/r/82360/#review80944

Will be nice to no longer see this deprecation warning in the test logs. Would be even nicer if we got rid of "JavaScript Warning: "navigator.mozGetUserMedia has been replaced by navigator.mediaDevices.getUserMedia"" too, to have warning free testlogs ;).
Attachment #8796522 - Flags: review?(florian) → review+
Comment on attachment 8796523 [details]
Bug 1284102 - Remove all todos in browser chrome getUserMedia tests.

https://reviewboard.mozilla.org/r/82362/#review80946
Attachment #8796523 - Flags: review?(florian) → review+
Comment on attachment 8796524 [details]
Bug 1284102 - Expect recording-device-events when closing window.

https://reviewboard.mozilla.org/r/82364/#review80948
Attachment #8796524 - Flags: review?(florian) → review+
Comment on attachment 8796525 [details]
Bug 1284102 - Fix a race in test code.

https://reviewboard.mozilla.org/r/82366/#review80950
Attachment #8796525 - Flags: review?(florian) → review+
Comment on attachment 8796549 [details]
Bug 1284102 - Switch from mozGUM to mediaDevice.gUM for chrome tests.

https://reviewboard.mozilla.org/r/82372/#review80958

Thanks!
Attachment #8796549 - Flags: review?(florian) → review+
Comment on attachment 8796521 [details]
Bug 1284102 - Guard against double-deallocating devices in release.

https://reviewboard.mozilla.org/r/82358/#review81290
Attachment #8796521 - Flags: review?(jib) → review+
Depends on: 1304950
Pushed by pehrsons@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/4cdc20d44c1a
Notify chrome of stopped tracks in next stable state. r=jib
https://hg.mozilla.org/integration/autoland/rev/528f093ec47b
Guard against double-deallocating devices in release. r=jib
https://hg.mozilla.org/integration/autoland/rev/d287cffb6bf6
Stop getUserMedia stream by stopping all tracks. r=florian
https://hg.mozilla.org/integration/autoland/rev/8d29221c9a8a
Remove all todos in browser chrome getUserMedia tests. r=florian
https://hg.mozilla.org/integration/autoland/rev/583a86a0e8d2
Expect recording-device-events when closing window. r=florian
https://hg.mozilla.org/integration/autoland/rev/308e82fff677
Fix a race in test code. r=florian
https://hg.mozilla.org/integration/autoland/rev/ce4fb8910a52
Switch from mozGUM to mediaDevice.gUM for chrome tests. r=florian
I'm guessing this is too much to uplift to Beta at this point in the cycle. What are your thoughts about Aurora, though?
Flags: needinfo?(pehrson)
I don't think we need to uplift this at all. Per comment 9, we update chrome's sharing indicators two times instead of one in some cases. That's not user observable so a fix is more for optimization.

Also the patch that needs uplift is fairly large and might be what triggered quite some test fallout on m-c.

I'll mark them wontfix.
Depends on: 1312332
Depends on: 1312234
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: