Closed Bug 1849108 Opened 1 years ago Closed 1 year ago

Feed output audio from non-default sinks (setSinkid) to AEC

Categories

(Core :: Audio/Video: MediaStreamGraph, task, P1)

task

Tracking

()

RESOLVED FIXED
123 Branch
Tracking Status
relnote-firefox --- 123+
firefox123 --- fixed

People

(Reporter: pehrsons, Assigned: karlt)

References

(Depends on 1 open bug, Blocks 1 open bug)

Details

(Keywords: dev-doc-needed)

Attachments

(1 file)

webrtc::AudioProcessing that we use for echo cancellation needs to be fed one input stream and one output stream to do its processing.

There is one webrtc::AudioProcessing per input stream, be it for a default or non-default device.

However, the output stream feeding webrtc::AudioProcessing is always of the audio played out in the default MediaTrackGraph, i.e. for the default output device, where the AudioProcessingTrack lives.

setSinkId with a MediaStreamTrack will create a new MediaTrackGraph running on the callbacks of the device with the given sink id, and connect the track corresponding to the played MediaStreamTrack in the default graph to the new graph. There is no path to feed the output of this new graph to the AudioProcessingTrack in the default graph.

We'll need to fix this, possibly through bug 1830247.

Karl, would you have cycles to look into this?

Flags: needinfo?(karlt)

Thank you for identifying this issue and linking to the parts involved. I'll look into it.

I assume multiple audio output devices should be treated similar to having multiple channels in a single output device, except that clocks and rates can differ.
Perhaps adding more channels to a single AudioProcessing would be more efficient than having a series of AudioProcessing.

Assignee: nobody → karlt
Flags: needinfo?(karlt)

Output->echo latencies can also differ. I'm not sure how well the AudioProcessing AEC would handle different scenarios there.

Ah, yes, thanks. I wonder whether the AEC handles widely separated speakers.

Blocks: 1848283

Local output from a MediaTrackGraph can come from either a media element or an AudioContext. Currently, non-default outputs can only be set from the media element, but this will be required on AudioContext for recent spec changes.

Our current echo cancellation mechanism is intended only for use when the input/output pair is on the same graph and (so) same thread.

The underlying AudioProcessing class does have the necessary locking to be used with input and output on different threads. So we could plausibly have an n * m matrix of AudioProcessing, but we don't need this in the common case.

In the common case that the MediaTrackGraph does not have another active audio output device, instead of always creating a CrossGraphPort for non-default output devices, we should be able to run the MediaTrackGraph from the non-default device. This would be better than what we have now because Cubeb may then need to resample, but unlike CrossGraphPort it will not need to drift correct.

With the non-default-output-run graph of comment 6, if a default-rate AudioContext is then connected to a MediaStreamTrack of the same graph, a CrossGraphPort will be required if output is to be delivered (from the AudioContext) to the default device. (Connections are not currently supported between different rate graphs.)

  • The CrossGraphPort could either be between a separate graph for the AudioContext and the first graph or at the output of the AudioContext running on the first graph.

  • Having the CrossGraphPort on the AudioContext output would be somewhat simpler as an initial step as it would be consistent with the media element output design, though we would still want the CrossGraphPort between graphs for non-default-rate AudioContexts at some point.

  • The benefit of having an AudioContext on the same graph as the microphone track is that its output can be integrated into our existing echo cancellation mechanism, if there is only one output device.

Adding some detection of non-silent output on the AudioContext will be useful to free up the graph to choose the output device that is actually in use. This would also have performance / energy consumption benefits.

Microphones and PeerConnection remote tracks both currently get a graph with default sample rate.

Perhaps their graphs could be determined lazily so that rendering their signal through a non-default audio output device, via media element, would not require resampling from the default graph rate. However, that would be considerable re-engineering, so I do not intend to do this here.

Severity: N/A → S2
Priority: P3 → P1

I am not adding anything to this just documenting that Whereby is doing feature detection for setSinkId. dbaker and I were having echo issues in Whereby, and got a regression range. Disabling the setSinkId and reloading the site cleared up the issue.

See Also: → 1865996

Until this bug is fixed in Firefox, we're running with a monkey-patch for setSinkId:

const originalSetSinkId = HTMLAudioElement.prototype.setSinkId;
HTMLAudioElement.prototype.setSinkId = function (sinkId) {
  // Don't call it in Firefox
  if (navigator.userAgent.includes("Firefox"))
    return Promise.resolve(undefined);
  return originalSetSinkId.call(this, sinkId);
};
Depends on: 1869043
Depends on: 1872519

(In reply to Karl Tomlinson (:karlt) from comment #6)

In the common case that the MediaTrackGraph does not have another active audio output device, instead of always creating a CrossGraphPort for non-default output devices, we should be able to run the MediaTrackGraph from the non-default device. This would be better than what we have now because Cubeb may then need to resample, but unlike CrossGraphPort it will not need to drift correct.

I haven't done that in the patches here. The patches introduce data structures that would support this, but some additional work would be required to Apply() PendingResumeOperations when a graph's secondary output stream is running, so as not to miss the start of Web Audio rendering. There are some other cases to reconsider too, such as (currently) using an AudioCallbackDriver when there are any audio tracks, even when there are no device outputs or inputs. Perhaps that may similarly relate to having a device immediately available for a new audio output.

Pushed by ktomlinson@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/fde039d640cb use a secondary output device for echo cancellation if the primary device has no output r=pehrsons
No longer blocks: 1498512
Status: NEW → RESOLVED
Closed: 1 year ago
Resolution: --- → FIXED
Target Milestone: --- → 123 Branch

Please don't remove the link to bug 1498512. It's linked as a blocker because ideally setSinkId should have been hooked up to the AEC when it shipped.

Blocks: 1498512

Release Note Request (optional, but appreciated)
[Why is this notable]:
New web platform feature affecting user experience in some cases.
[Affects Firefox for Android]: No
[Suggested wording]:
Audio echo cancellation can now be applied to microphone inputs when the audio output is redirected to another device with setSinkId().
[Links (documentation, blog post, etc)]:
https://developer.mozilla.org/en-US/docs/Web/API/MediaTrackConstraints/echoCancellation
https://developer.mozilla.org/en-US/docs/Web/API/HTMLMediaElement/setSinkId

dev-doc-needed for https://developer.mozilla.org/en-US/docs/Mozilla/Firefox/Releases/123, but users may also notice this.

relnote-firefox: --- → ?
Keywords: dev-doc-needed

Note added to 123 nightly release notes in the Web Platform section, thanks.

Duplicate of this bug: 1875745
See Also: → 1879180
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: