Unable to view local video when muting outgoing video on a webrtc call

RESOLVED INCOMPLETE

Status

()

Core
WebRTC: Audio/Video
--
enhancement
RESOLVED INCOMPLETE
5 years ago
3 years ago

People

(Reporter: standard8, Unassigned)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

(Reporter)

Description

5 years ago
STR:

1) Establish a WebRTC call between two people.
2) Set up a video mute button to get the MediaStreamTrack items for the video, and set .enabled to false.
3) Click the mute button

Expected Results:

- Local video remains visible
- Outgoing video is muted

Actual Results:

- Both local and outgoing video streams are muted
(Reporter)

Comment 1

5 years ago
From discussion on irc, I believe the intended way to do this is:

1) Get the local MediaStream from gUM
2) Get the MediaStreamTrack items
3) Clone each track item
4) adding the cloned track items to a new MediaStream object
5) Use one of the MediaStream objects for the local video display, and the other for connecting to the peer connection.

However, currently step 3 and 4 are not possible, due to not yet being implemented in the platform.
comment 0's expected behavior doesn't sound right to me in the local context. If you mute a media stream track on a remote stream that came from a local media stream, then I'd expect the local and remote media stream's video track to be muted, given that the remote stream given on onaddstream is deriving from the local stream acquired originally. They are essentially affecting the same underlying media stream track. Cloning them creates confusion over how those tracks are derived, which will confuse web developers.
Whiteboard: [INVALID?]

Comment 3

5 years ago
jsmith: If I understand what you are proposing, namely that if I have a loopback call and I mute the stream that came out of the PC, it should mute the stream that I called addStream() with, then that's not what the spec says, indeed there is no technical mechanism in the spec that would make this happen.

Removing [INVALID?]
Whiteboard: [INVALID?]
(In reply to Eric Rescorla (:ekr) from comment #3)
> jsmith: If I understand what you are proposing, namely that if I have a
> loopback call and I mute the stream that came out of the PC, it should mute
> the stream that I called addStream() with, then that's not what the spec
> says, indeed there is no technical mechanism in the spec that would make
> this happen.
> 
> Removing [INVALID?]

This is talking about a local call, not a remote call. A local call carries the remote stream given by onaddstream in the same JS context as the local stream given on a callback. That makes sense that manipulating tracks for one stream affects the other - they are sharing the same object references.

If this happens in a remote context, then that's a bug I wouldn't expect to happen.
Whiteboard: [INVALID?]

Comment 5

5 years ago
It seems like by a "local call" you mean a loopback call. If so, then it should behave as I said in c3. There is simply no meaningful connection between streams in one PC and streams in another, even if they happen to be in the same underlying JS context. If you think otherwise, please point to the specification text which supports your position or which would even allow the PC to plausibly detect this condition.
Whiteboard: [INVALID?]
(In reply to Eric Rescorla (:ekr) from comment #5)
> It seems like by a "local call" you mean a loopback call. If so, then it
> should behave as I said in c3. There is simply no meaningful connection
> between streams in one PC and streams in another, even if they happen to be
> in the same underlying JS context. If you think otherwise, please point to
> the specification text which supports your position or which would even
> allow the PC to plausibly detect this condition.

That provides no additional information at all to make that point against my arguments. Please read https://bugzilla.mozilla.org/page.cgi?id=etiquette.html about providing valuable comments in bugzilla.

I want to hear from someone else at this point about local calls with the same JS context.
Whiteboard: [INVALID?]
(In reply to Jason Smith [:jsmith] from comment #6)
> That provides no additional information at all to make that point against my
> arguments. Please read
> https://bugzilla.mozilla.org/page.cgi?id=etiquette.html about providing
> valuable comments in bugzilla.

As far as I can see, you just got a technical opinion on the correct behavior from a WG member and active spec contributor. That seems like relevant information.

> I want to hear from someone else at this point about local calls with the same JS context.

Like EKR, I can't distinguish what you mean by a "local call" that is distinct from a loopback call. A remote stream returned by onaddstream does not contain any MediaStreamTrack objects in common with the PC that sent it the stream, regardless of whether the originating PC was local or remote, so I can't agree that your argument "makes sense".
Whiteboard: [INVALID?]
(Reporter)

Comment 8

5 years ago
(In reply to Jason Smith [:jsmith] from comment #2)
> They are essentially
> affecting the same underlying media stream track. Cloning them creates
> confusion over how those tracks are derived, which will confuse web
> developers.

We would argue, that by having this limitation of always muting both ends, you are also restricting web developers to a specific UX.
(In reply to Mark Banner (:standard8) (slow responses, team meetup) from comment #8)
> (In reply to Jason Smith [:jsmith] from comment #2)
> > They are essentially
> > affecting the same underlying media stream track. Cloning them creates
> > confusion over how those tracks are derived, which will confuse web
> > developers.
> 
> We would argue, that by having this limitation of always muting both ends,
> you are also restricting web developers to a specific UX.

In some sense I could agree with that. Although the problem I could see here is that then introduces overhead in order to meet the "mute your video" use case. When that happens, the streamlined workflow is that muting a video on one end should affect the other immediately on execution. Otherwise, you'll introduce the problem that you'll need to separately notify the remote peer over a different mechanism to the mute video. And even trust the fact that the information you are getting to the remote peer is trusted that you should mute the video. That's too much overhead.
(In reply to Mark Banner (:standard8) (slow responses, team meetup) from comment #8)
> (In reply to Jason Smith [:jsmith] from comment #2)
> We would argue, that by having this limitation of always muting both ends,
> you are also restricting web developers to a specific UX.

Very much so.  And, the restriction is likely going to be contentious, since many use cases could benefit from locally viewing muted video.  Removing this ability means users couldn't check what's visible in their camera before transmitting it.
(In reply to Timothy B. Terriberry (:derf) from comment #7)
> Like EKR, I can't distinguish what you mean by a "local call" that is
> distinct from a loopback call. A remote stream returned by onaddstream does
> not contain any MediaStreamTrack objects in common with the PC that sent it
> the stream, regardless of whether the originating PC was local or remote, so
> I can't agree that your argument "makes sense".

So exactly how are you going to handle the remote video manipulation use cases?
(In reply to Jennifer Morrow [:Boriss] (Firefox UX) from comment #10)
> (In reply to Mark Banner (:standard8) (slow responses, team meetup) from
> comment #8)
> > (In reply to Jason Smith [:jsmith] from comment #2)
> > We would argue, that by having this limitation of always muting both ends,
> > you are also restricting web developers to a specific UX.
> 
> Very much so.  And, the restriction is likely going to be contentious, since
> many use cases could benefit from locally viewing muted video.  Removing
> this ability means users couldn't check what's visible in their camera
> before transmitting it.

The problem with locally muting video such that video is still transmitting remotely is that the user loses knowledge of knowing what the remote side is actually seeing. That's really bad for privacy.
(In reply to Jason Smith [:jsmith] from comment #11)
> (In reply to Timothy B. Terriberry (:derf) from comment #7)
> > Like EKR, I can't distinguish what you mean by a "local call" that is
> > distinct from a loopback call. A remote stream returned by onaddstream does
> > not contain any MediaStreamTrack objects in common with the PC that sent it
> > the stream, regardless of whether the originating PC was local or remote, so
> > I can't agree that your argument "makes sense".
> 
> So exactly how are you going to handle the remote video manipulation use
> cases?

What are those "remote video" manipulation use cases? Generally, the
receiving/playing side of a call gets to mess with their own view
but not to significantly impact the other side's operations, just
like with an ordinary phone or video call. For example, if you look
at a system like hangouts, I can make random people big or small
but that doesn't affect their self-view.

With that said, the common case is that the same JS is running on
both sides, so if you want there to be feedback from the receiver
to the sender, then it's easy to implement by signaling via the JS.
(In reply to Jason Smith [:jsmith] from comment #12)
> (In reply to Jennifer Morrow [:Boriss] (Firefox UX) from comment #10)
> > (In reply to Mark Banner (:standard8) (slow responses, team meetup) from
> > comment #8)
> > > (In reply to Jason Smith [:jsmith] from comment #2)
> > > We would argue, that by having this limitation of always muting both ends,
> > > you are also restricting web developers to a specific UX.
> > 
> > Very much so.  And, the restriction is likely going to be contentious, since
> > many use cases could benefit from locally viewing muted video.  Removing
> > this ability means users couldn't check what's visible in their camera
> > before transmitting it.
> 
> The problem with locally muting video such that video is still transmitting
> remotely is that the user loses knowledge of knowing what the remote side is
> actually seeing. That's really bad for privacy.

Huh? There's no requirement that a video stream that's attached to a PC be
rendered in any kind of local window at all. It's completely permissible
to have a MediaStream that you acquire through gUM attached to a PC and nothing
else. There's simply nothing in the specification that guarantees that
the user have any feedback at all about what's being transmitted, other than
the requirement that there be indicators that that camera and microphone are
*live*.
(In reply to Jason Smith [:jsmith] from comment #9)
> (In reply to Mark Banner (:standard8) (slow responses, team meetup) from
> comment #8)
> > (In reply to Jason Smith [:jsmith] from comment #2)
> > > They are essentially
> > > affecting the same underlying media stream track. Cloning them creates
> > > confusion over how those tracks are derived, which will confuse web
> > > developers.
> > 
> > We would argue, that by having this limitation of always muting both ends,
> > you are also restricting web developers to a specific UX.
> 
> In some sense I could agree with that. Although the problem I could see here
> is that then introduces overhead in order to meet the "mute your video" use
> case. When that happens, the streamlined workflow is that muting a video on
> one end should affect the other immediately on execution. Otherwise, you'll
> introduce the problem that you'll need to separately notify the remote peer
> over a different mechanism to the mute video. And even trust the fact that
> the information you are getting to the remote peer is trusted that you
> should mute the video. That's too much overhead.

You seem to be conflating the video you are *sending* with the video you are
receiving.

Yes, if I am *sending* a stream and I mute it, it should also stop transmitting
to the other side. However, if I decide not to play your video, you have no
security and privacy interest in knowing that, any more than you have a right
to know that I've closed my eyes or stuck my fingers in my ears---to the contrary, I have a security interest in stopping your media from displaying without informing you of
that fact.

Now, we may eventually implement a hold-type feature where through the signaling
channel we tell the other side to stop transmitting, but that's not primarily
about user notification but rather about throttling data transmission. The
user notification should be done through the app.
I think you may have addressed my concern on comment 15 if I understand what you are saying correctly:

If I mute the local video track as part of a media stream that's being sent to a remote peer, then the track should stop being sent to the remote peer. That means that if I add a "mute" button to say "mute my camera stream," then I can then mute the camera stream to disable the video track, which would not give the remote media stream track to a remote peer. Is my understanding of what you are saying correct?

If my understanding above is correct, then that's fine.
(In reply to Eric Rescorla (:ekr) from comment #14)
> (In reply to Jason Smith [:jsmith] from comment #12)
> > (In reply to Jennifer Morrow [:Boriss] (Firefox UX) from comment #10)
> > > (In reply to Mark Banner (:standard8) (slow responses, team meetup) from
> > > comment #8)
> > > > (In reply to Jason Smith [:jsmith] from comment #2)
> > > > We would argue, that by having this limitation of always muting both ends,
> > > > you are also restricting web developers to a specific UX.
> > > 
> > > Very much so.  And, the restriction is likely going to be contentious, since
> > > many use cases could benefit from locally viewing muted video.  Removing
> > > this ability means users couldn't check what's visible in their camera
> > > before transmitting it.
> > 
> > The problem with locally muting video such that video is still transmitting
> > remotely is that the user loses knowledge of knowing what the remote side is
> > actually seeing. That's really bad for privacy.
> 
> Huh? There's no requirement that a video stream that's attached to a PC be
> rendered in any kind of local window at all. It's completely permissible
> to have a MediaStream that you acquire through gUM attached to a PC and
> nothing
> else. There's simply nothing in the specification that guarantees that
> the user have any feedback at all about what's being transmitted, other than
> the requirement that there be indicators that that camera and microphone are
> *live*.

I know what the spec says here, but that's not what our users are saying. We're getting some complaints on SUMO on how turn the feature off, which makes me question if we're really have a sufficient UX privacy-wise.

https://input.mozilla.org/en-US/?q=webrtc&date_end=2013-07-18&date_start=2013-04-19&happy=0
(In reply to Jason Smith [:jsmith] from comment #16)
> I think you may have addressed my concern on comment 15 if I understand what
> you are saying correctly:
> 
> If I mute the local video track as part of a media stream that's being sent
> to a remote peer, then the track should stop being sent to the remote peer.
> That means that if I add a "mute" button to say "mute my camera stream,"
> then I can then mute the camera stream to disable the video track, which
> would not give the remote media stream track to a remote peer. Is my
> understanding of what you are saying correct?

Yes, but it's important to understand that this isn't a security feature.
There's nothing that stops the site from giving you a mute button, suppressing
local video and yet continuing to send it. That's why there is an in-chrome
indicator camera and microphone access which stays on as long as the site has
access to the media.
(In reply to Jason Smith [:jsmith] from comment #17)
> (In reply to Eric Rescorla (:ekr) from comment #14)
> > Huh? There's no requirement that a video stream that's attached to a PC be
> > rendered in any kind of local window at all. It's completely permissible
> > to have a MediaStream that you acquire through gUM attached to a PC and
> > nothing
> > else. There's simply nothing in the specification that guarantees that
> > the user have any feedback at all about what's being transmitted, other than
> > the requirement that there be indicators that that camera and microphone are
> > *live*.
> 
> I know what the spec says here, but that's not what our users are saying.
> We're getting some complaints on SUMO on how turn the feature off, which
> makes me question if we're really have a sufficient UX privacy-wise.
> 
> https://input.mozilla.org/en-US/?q=webrtc&date_end=2013-07-
> 18&date_start=2013-04-19&happy=0

Reading the feedback at this link suggests that some number of users
would like to be able to turn off WebRTC entirely. That's totally
reasonable, but it has no bearing on exactly how changes to streams
on side A affect streams on side B.

Regardless, Bugzilla is not the place to debate specification features.
If you believe the specification should be changed, contact the WG at:

public-webrtc@w3.org
To expand on ekr's point, if the actual desire is to be able to turn off the feature entirely, there's a non-user-friendly way to do so now:

media.navigator.enabled = false
(turn off gUM)
and
media.peerconnection.disabled = true;
(turn off peerconnections).

If people are actually requesting a user-friendly way to turn it off, that can be evaluated as a separate bug, but that's not this bug.
A minor terminology request:

I appreciate that the audio dimension is symmetrical and can be left out here, but
can we still please say "video muting a video" and "video muting a video stream" to avoid ambiguity, when we're not explicitly talking about tracks?

I find "muting a video" to be ambiguous at best in a discussion that also mixes tracks and streams.
Now handled in more detailed, newer bugs, and it became unclear what this bug actually was about.
Status: NEW → RESOLVED
Last Resolved: 3 years ago
Resolution: --- → INCOMPLETE
You need to log in before you can comment on or make changes to this bug.