1769985 - 'media.setsinkid.enabled' does not display all audio output devices

jacobeelias

Reporter

Description

•

2 years ago

User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:100.0) Gecko/20100101 Firefox/100.0

Steps to reproduce:

Enabled 'media.setsinkid.enabled' from the about:config page
Went to https://webrtc.github.io/samples/src/content/devices/input-output/ to test inputs and outputs
Clicked Audio Output dropdown

Actual results:

The Audio Output dropdown only displays 3 devices. My computer has 5. There is no scrolling or anything. There are just two devices missing.

Devices Displayed:

RemotePCSound
ZoomAudioDevice
Internal Speakers

Devices Missing:

Dell U2711D (External Monitor with audio out)
krisp speaker (Virtual Interface)

Funny enough, these two that are missing are the ones I need most. Unable to use firefox for meetings or really anything without them.

Expected results:

After enabling media.setsinkid.enabled I should have seen a list of all of my audio outputs. None missing. Ideally, I could also set a default behavior for future visits.

Especially since when media.setsinkid.enabled is disbled, it plays properly out of my Dell U2711D output.

Gregory Pappas [:gregp]

Updated

•

2 years ago

Component: Untriaged → WebRTC: Audio/Video

OS: Unspecified → macOS

Product: Firefox → Core

Hardware: Unspecified → x86_64

Karl Tomlinson (:karlt)

Assignee

Comment 1

•

2 years ago

The specified exposure "If deviceInfo.groupId is the same as the groupId of any microphone in microphoneList, return true" is what's involved here. I wonder whether we should reconsider that because on some platforms each and every output device will have a (virtual) microphone for the monitor of that device. I don't know that there is a good reason to have different behavior on Mac.

Severity: -- → S3

Type: defect → enhancement

Comment 2

•

2 years ago

•

Edited

Yes, this is triggering poor behavior across multiple conferencing sites. It affects Windows and Mac, where Firefox does not expose speaker output monitors as microphones. Output monitors are exposed as microphones on Linux.

When the system default speaker has no microphone in the same group (including a built-in speaker on a system with no built-in microphone) and permission is granted for microphone exposure:

https://webrtc.github.io/samples/src/content/devices/input-output/, describes a headset speaker as the output device when output is in fact the system default speaker.
Neither the system default speakers nor the first exposed "Audio output destination" headset speaker can be selected in the UI because the first is assumed the system default.
When a second headset is connected, its speaker can be selected, and the first headset speaker can subsequently be selected, but there is no way to switch back to the system default.

Similarly whereby.com indicates that a USB headset speaker is the output when the output is in fact the system default speaker device. Switching behavior is also similar to "input-output" above.

meet.jit.si doesn't say what is the output device while output is (initially) directed to the system default. Output can be switched to any other headset speakers, but can't be switched back to the default device.

Speaker selection is ineffective in Firefox on zoom.us and meet.google.com even when all speakers are exposed, so the issues there might not be immediately related to this bug. Output is always directed to the system default speaker.

zoom.us has a separate "Same as System" speaker, listed last. Speaker selection is effective in Chrome, where "Same as System" is first.
meet.google.com indicates that a headset speaker is the output (when the output is in fact the system default speaker device).

Blocks: 1498512

Updated

•

2 years ago

Status: UNCONFIRMED → NEW

Ever confirmed: true

Karl Tomlinson (:karlt)

Assignee

Updated

•

2 years ago

Comment 3

•

2 years ago

The "exposure decision algorithm" for audio output devices was added without much discussion re the reasoning. The expectation was to build on it, if necessary, based on use cases. I guess the choice to expose user-selected devices was acceptable because those devices were explicitly authorized, and the exposure of outputs tied to microphones did not add much fingerprinting surface beyond what was already in the microphone device info.

There was already existing text indicating that "the user agent must acquire the user’s consent to access non-default devices" "to prevent" "nuisance scenarios" like unwanted sound through non-default devices. When that was added (2014), there was no indication of how a sink ID might be obtained (setSinkId() had no parameters), and "access" seems to be more about audio output than device information. There was device-specific "implicit consent to access non-default audio output devices via the getUserMedia() permission prompt: Opening an audio input device via getUserMedia() authorizes access to any output devices with the same groupId." There is no link in the pull request to discussion that led to the initial mediacapture-output spec.

Perhaps one might argue that granting access to microphone audio already indicates a high level of trust in the site and so exposing device info is less of a concern. I'm not convinced by that argument because microphone audio is typically granted to one site at a time and so does not easily support fingerprinting. Also, the user may be granting access to a microphone with very controlled audio and may not have much trust in the site.

If we believe that it is reasonable to expose device info for all microphones because access has been granted to audio from one microphone, then the same reasoning would apply to exposing all output device info upon granting access to audio from one microphone. However, I inclined to think that exposing device info from all microphones was a mistake from which it is difficult to back out, and so we should not necessarily repeat the same mistake for output devices. There are some users without many microphones, for whom the additional exposure of output devices would significantly increase fingerprinting.

A microphone-permission solution would also not address the confusion about the default device when outputs are exposed only via selectAudioOutput().

Karl Tomlinson (:karlt)

Assignee

Comment 4

•

2 years ago

•

Edited

If a default audio output device is included in enumerateDevices() output, then that would avoid introducing this regression when enabling "media.setsinkid.enabled", assuming the site still presents the default output device in its picker. The user may need to use the OS to select the physical device that will be the default output, but at least the default device would be selectable.

The intention is that the default audio output device is identifiable by being the first audio output device listed in enumerateDevices().
Exposure of the default audio output device without any permission was proposed, but this was apparently accidentally dropped from consideration.

If we expose a default audio output device without permission, then it should not identify the actual current default device. i.e. it would be a generic "Default" audio output device corresponding to setSinkId("").

If the current default audio output device is exposed, then adding a generic "Default" audio output device would mean that enumerateDevices() would list two "system default audio output" devices. They cannot both be first, but we need not include the generic "Default" in this case. Doing so could add unnecessary confusion at least in the situation when there is only one audio output device.

Comment 5

•

2 years ago

First some context for the OP:

Went to https://webrtc.github.io/samples/src/content/devices/input-output/ to test inputs and outputs

That link relies on microphone permission and in-content device selection instead of using the newer selectAudioOutput API instead when available, which works without microphone permission and should list all your devices. Jacobeelias (if you're still here), can you verify that it works?

...
Expected results:

After enabling media.setsinkid.enabled I should have seen a list of all of my audio outputs. None missing.

The spec was tightened so microphone permission only exposes output devices that are also microphones (comment 1). We can't expose all audio output devices here and follow the spec. We should submit a PR on the webrtc samples.

The remaining comments I think discuss whether we want to attempt to change the spec. Karl, do you agree?

Flags: needinfo?(jacobeelias)

Jan-Ivar Bruaroey [:jib] (needinfo? me)

Comment 6

•

2 years ago

(In reply to Karl Tomlinson (:karlt) from comment #1)

I wonder whether we should reconsider that because on some platforms each and every output device will have a (virtual) microphone for the monitor of that device. I don't know that there is a good reason to have different behavior on Mac.

Reconsider which way? To not expose output devices that lack ties to a "physical" microphone? Or expose all output devices on mic permission?

The former would match spec intent better IMHO: groupId was conceived as a hint to apps to "set these together" to bring a (headset-wearing) user into the audio loop.

In contrast, setting a monitored output and its output signal as input together loops on the output. If this doesn't cause your device to catch fire, it should, as it is always an app bug AFAIK and a footgun.

Are other browsers implementing groupId this way? If so we should probably file a bug on the spec to clarify this was not the intent.

jacobeelias

Reporter

Comment 7

•

2 years ago

(In reply to On PTO. Back Aug 25th from comment #5)

First some context for the OP:

Went to https://webrtc.github.io/samples/src/content/devices/input-output/ to test inputs and outputs

That link relies on microphone permission and in-content device selection instead of using the newer selectAudioOutput API instead when available, which works without microphone permission and should list all your devices. Jacobeelias (if you're still here), can you verify that it works?

Confirmed. Just followed the link you gave (https://jan-ivar.github.io/dummy/gum_picker_output.html) — yes I was able to set my inputs and outputs separately. From the Speakers button I saw all of my audio outputs.

(But like you've mentioned, few video call apps work this way—instead behaving much more like the example link I added—Google Meet for example. And yes, in Chrome all of my outputs are shown.)

Flags: needinfo?(jacobeelias)

Jan-Ivar Bruaroey [:jib] (needinfo? me)

Comment 8

•

2 years ago

Thanks Jacobeelias for the quick response! — You are correct that sites aren't doing this today. A bit of a chicken and egg problem since we're still behind pref. We have some outreach to do to try to remedy that. There's an inherent tension here between privacy and web compat. It's probably not going to be perfect out the gate.

FWIW here's a second UX example https://jan-ivar.github.io/dummy/gum_picker_output2.html showing partial enumeration.

jacobeelias

Reporter

Comment 9

•

2 years ago

Oh yeah, partial enumeration is interesting. I really like that I'm able to look through the rest of the options if I don't seen my preferred pick initially. What's interesting that once I pick it from the "Other" section, and go through the second output picker from the browser, the new option shows up in the Speakers dropdown (until refresh).

Could I set a default audio input and output for websites or at a browser level?

Jan-Ivar Bruaroey [:jib] (needinfo? me)

Comment 10

•

2 years ago

Thanks for the input. Glad you like it! Memory would be implemented by sites just like for camera and microphone using localStorage of deviceId. See bug 1712892 comment 2.

Comment 11

•

2 years ago

(In reply to Jan-Ivar Bruaroey [:jib] (needinfo? me) from comment #6)

(In reply to Karl Tomlinson (:karlt) from comment #1)

I wonder whether we should reconsider that because on some platforms each and every output device will have a (virtual) microphone for the monitor of that device. I don't know that there is a good reason to have different behavior on Mac.

Reconsider which way? To not expose output devices that lack ties to a "physical" microphone? Or expose all output devices on mic permission?

I was suggesting reconsidering to expose all output devices, but I've moved on from that, and don't feel it is necessary.

The former would match spec intent better IMHO: groupId was conceived as a hint to apps to "set these together" to bring a (headset-wearing) user into the audio loop.

Thank you for filing https://github.com/w3c/mediacapture-main/issues/899 on that.

Are other browsers implementing groupId this way? If so we should probably file a bug on the spec to clarify this was not the intent.

Chromium does not expose output device monitors as input devices at all.

Karl Tomlinson (:karlt)

Assignee

Comment 12

•

2 years ago

(In reply to Jan-Ivar Bruaroey [:jib] (needinfo? me) from comment #5)

The remaining comments I think discuss whether we want to attempt to change the spec. Karl, do you agree?

Pretty much, yes.

At least one thing to fix in the spec is the ability to detect whether the first device listed is the system default device. If a solution to that also addresses the regression here (a chooser that results in the user being able to switch away from the default device but not back to the default device), then there is benefit to that. How much benefit that is depends on whether sites will adjust to Firefox or set their behavior based on Chrome.

Perhaps there may exist a solution within spec, such as including a virtual default microphone, which has the same groupId as a virtual default speaker. A virtual default microphone is analogous to the microphone with empty groupId listed before microphone permission is granted, and so a similar speaker with empty groupId would be exposed even without microphone permission, resolving this issue even in the selectAudioOutput()-only situation. I guess an empty groupId is not quite spec compliant because "The group identifier MUST be uniquely generated for each document" and two virtual devices don't necessarily "belong to the same physical device". It would be nice if we could find a solution as close to the intent of the spec as possible.

Karl Tomlinson (:karlt)

Assignee

Comment 13

•

2 years ago

https://hg.mozilla.org/try/rev/f05e1e956656e34189021548678d4499fd4ee7f7 adds a default virtual audio output device with empty deviceId and groupId to enumerateDevices() results when the first exposed output device is not the system default. It does not add the virtual output device when there is no exposed output device, nor when the first exposed output device is the system default.

This resolves the problems identified in comment 2 on whereby.com and jit.si.

zoom.us and meet.google.com do not call setSinkId() when changing output devices, so i don't know that anything we do can make their selectors behave as expected. FWIW zoom.us still adds a "Same as System" option for both microphone and speaker. meet.google.com still shows only headset speaker devices.

Karl Tomlinson (:karlt)

Assignee

Updated

•

2 years ago

See Also: → https://github.com/w3c/mediacapture-output/issues/133

Karl Tomlinson (:karlt)

Assignee

Comment 14

•

2 years ago

Attached file Bug 1769985 add a dummy default audio output device to enumerate devices results when the first exposed output device is not the default r?jib — Details

This is the simplest of the variations of the proposal at
https://github.com/w3c/mediacapture-output/issues/133#issuecomment-1271122304

Creating the dummy output in FilterExposedDevices(), rather than later in
the enumerateDevices() process, provides a "devicechange" event when an
exposed device becomes or ceases to be the default device.

Phabricator Automation

Updated

•

2 years ago

Assignee: nobody → karlt

Status: NEW → ASSIGNED

Karl Tomlinson (:karlt)

Assignee

Comment 15

•

2 years ago

Attached file Bug 1769985 add a label to the dummy default audio output device for when the default device is not exposed r?jib — Details

Depends on D159013

Karl Tomlinson (:karlt)

Assignee

Updated

•

2 years ago

Blocks: 1794961

Pulsebot

Comment 16

•

2 years ago

Pushed by ktomlinson@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/93b96cb111eb add a dummy default audio output device to enumerate devices results when the first exposed output device is not the default r=jib

Serban Stanca [:SerbanS]

Comment 17

•

2 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/93b96cb111eb

Status: ASSIGNED → RESOLVED

Closed: 2 years ago

status-firefox108: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → 108 Branch

Pulsebot

Comment 18

•

2 years ago

Pushed by ktomlinson@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/cf5c533b4ba4 add a label to the dummy default audio output device for when the default device is not exposed r=jib,eemeli

Iulian Moraru

Comment 19

•

2 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/cf5c533b4ba4

Karl Tomlinson (:karlt)

Assignee

Comment 20

•

2 years ago

(In reply to jacobeelias from comment #9)

Could I set a default audio input and output for websites or at a browser level?

Without setSinkId(), Firefox uses the defaults provided by the operating system.
I don't know that Mac OS has built-in UI for anything more specific than system-wide defaults. A third party utility might be able to add this.
WINNT has application level settings. Different Linux environments provide different UI.

Karl Tomlinson (:karlt)

Assignee

Updated

•

5 months ago

Bug 1769985 add a dummy default audio output device to enumerate devices results when the first exposed output device is not the default r?jib 2 years ago Karl Tomlinson (:karlt) 48 bytes, text/x-phabricator-request		Details \| Review
Bug 1769985 add a label to the dummy default audio output device for when the default device is not exposed r?jib 2 years ago Karl Tomlinson (:karlt) 48 bytes, text/x-phabricator-request		Details \| Review