Open Bug 1541425 (audio-sharing) Opened 5 years ago Updated 3 months ago

Implement audio capture for getDisplayMedia

Categories

(Core :: WebRTC: Audio/Video, enhancement, P3)

66 Branch
enhancement

Tracking

()

People

(Reporter: wilhelm.wanecek, Unassigned)

References

(Depends on 6 open bugs, Blocks 1 open bug)

Details

User Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:66.0) Gecko/20100101 Firefox/66.0

Steps to reproduce:

Attempted requesting screen capture with audio constraints, e.g. { video: true, audio: true }.

Actual results:

No audio tracks are obtained (i.e. MediaStream#getAudioTracks() returns an empty list), since the feature is not implemented yet.

Expected results:

Firefox should implement the part of the Screen Capture specification[1] describing audio capture. AFAIU, Chromium will implement this as of Chrome 68 [2]. There's some discussion about this feature-set in bug #1321221 [3].

  1. https://w3c.github.io/mediacapture-screen-share/
  2. https://bugs.chromium.org/p/chromium/issues/detail?id=896333
  3. https://bugzilla.mozilla.org/show_bug.cgi?id=1321221
Type: defect → task
Component: Untriaged → WebRTC: Audio/Video
Product: Firefox → Core

Can you help me triage this one?

Flags: needinfo?(jib)

I think this is low priority for us at the moment. This is a MAY in the spec, i.e. it's not mandatory to implement, so we are spec compliant without this feature.

While we do have some old disabled audio-capture code in the tree that we might be able to revive from the Firefox Hello days, I believe that code specifically exposed tab audio, and we don't currently expose tab sharing, which is what it was designed to go along with.

Flags: needinfo?(jib)
Priority: -- → P3
Status: UNCONFIRMED → NEW
Ever confirmed: true
Type: task → enhancement

Ok, thanks a lot for explaining the situation :) Chromium is in different position since they already had audio capture in their previous, non-standard implementation (although only for tabs on Mac & Linux).

The spec is really hand-wavy:

In the case of audio, the user agent MAY present the end-user with audio sources to share. Which choices are available to choose from is up to the user agent, and the audio source(s) are not necessarily the same as the video source(s). An audio source may be a particular application, window, browser, the entire system audio or any combination thereof. Unlike mediadevices.getUserMedia() with regards to audio+video, the user agent is allowed not to return audio even if the audio constraint is present. If the user agent knows no audio will be shared for the lifetime of the stream it MUST NOT include an audio track in the resulting stream. The user agent MAY accept a request for audio and video by only returning a video track in the resulting stream, or it MAY accept the request by returning both an audio track and a video track in the resulting stream. The user agent MUST reject audio-only requests.

We don't know what do do, and authors and users don't have real guarantees, so I guess we can do the following to maximize usefulness:

  • On OSes where we support monitor device/loopback stream, we can offer device monitoring (we have the capability in cubeb for Pulse and WASAPI)
  • We can offer an input stream for the page's output (this is implemented already)
  • We might be able to offer the browser's output, but it's more complicated (we'd need to implement a browser-wide multi-process mixer)

Just chiming in here, we're creating a product that wants to rely on this feature (the audio sharing in particular), so it's definitely a bummer to see firefox behind edge and chrome here. Would love to see at least tab-audio sharing support.

(I'm assuming caniuse is correct when it shows firefox not supporting this at all: https://caniuse.com/#feat=mdn-api_mediadevices_getdisplaymedia_audio-capture-support)

Hi there. Is there a way to offer a bounty for this feature? The lack of this feature in Firefox is the only reason I ever open up Chrome anymore. Thanks.

Any update?

I was a bit curious when I streamed via the web-version of Discord and others told me they can't hear any sound - I guess I found the answer with this ticket why it does not work.

(In reply to Paul Adenot (:padenot) from comment #5)

The spec is really hand-wavy:

In the case of audio, the user agent MAY present the end-user with audio sources to share. Which choices are available to choose from is up to the user agent, and the audio source(s) are not necessarily the same as the video source(s). An audio source may be a particular application, window, browser, the entire system audio or any combination thereof. Unlike mediadevices.getUserMedia() with regards to audio+video, the user agent is allowed not to return audio even if the audio constraint is present. If the user agent knows no audio will be shared for the lifetime of the stream it MUST NOT include an audio track in the resulting stream. The user agent MAY accept a request for audio and video by only returning a video track in the resulting stream, or it MAY accept the request by returning both an audio track and a video track in the resulting stream. The user agent MUST reject audio-only requests.

We don't know what do do, and authors and users don't have real guarantees, so I guess we can do the following to maximize usefulness:

  • On OSes where we support monitor device/loopback stream, we can offer device monitoring (we have the capability in cubeb for Pulse and WASAPI)
  • We can offer an input stream for the page's output (this is implemented already)
  • We might be able to offer the browser's output, but it's more complicated (we'd need to implement a browser-wide multi-process mixer)

This one seems actually pretty straightforward. When getDisplayMedia() causes the permission dialog to popup and the user selects between a specific window or the entire system for screen capture this behavior should be transparently applied to the sound as well if the user opts-in. E.g. the only change in this permissions dialog would be a checkbox that would enable additional sound capture to the video capture:

  • With capturing sound from the specific window if the user chooses a window.
  • Or capturing the entire system sound if the user chooses a system-wide capture.

That is pretty simple and what the user probably would expect the most - and it would be simple to use and quite efficient. But the difficulties are in the detail: Tracking the correct sound source related to the choosen window might be tricky if the recorded window uses a different process for the audio or if complex systems (possibly PulseAudio and other sound systems) are around.

No longer blocks: tab-sharing
Depends on: tab-sharing

We can offer an input stream for the page's output (this is implemented already)

We should focus on this use case, as it is the most urgent. People want to share e.g. a youtube tab with their audience and already expect audio.

The simplest and fastest would be to block on bug 1651145 and add ☑ share tab audio to our tab-sharing UX, for parity with Chrome (demo).

Severity: normal → S3
Depends on: 1605894
Alias: audio-sharing
Duplicate of this bug: 1837017
Depends on: 1864067
You need to log in before you can comment on or make changes to this bug.