Closed Bug 1340718 Opened 7 years ago Closed 7 years ago

If a audio device (Citrix) disappears and fails on re-open attempts, output can be hung until restart

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla54

Tracking Flags:

Tracking

Status

firefox-esr45

53+

fixed

firefox51

---

wontfix

firefox52

---

wontfix

firefox-esr52

53+

fixed

firefox53

---

fixed

firefox54

---

fixed

People

(Reporter: jesup, Assigned: padenot)

References

(Depends on 1 open bug)

Details

Attachments

(3 files, 1 obsolete file)

Bug 1340718 - Retry getting the default device when audio devices are present and the first call failed. 7 years ago Paul Adenot (:padenot) 59 bytes, text/x-review-board-request	kinetik : review-	Details
patch 7 years ago Paul Adenot (:padenot) 1.83 KB, patch	jesup : review+	Details \| Diff \| Splinter Review
patch for esr45 7 years ago Paul Adenot (:padenot) 2.72 KB, patch	jesup : review+ Sylvestre : approval-mozilla-esr45+ Sylvestre : approval-mozilla-esr52+	Details \| Diff \| Splinter Review
Beta patch 7 years ago Paul Adenot (:padenot) 1.80 KB, patch	jcristau : approval-mozilla-beta+	Details \| Diff \| Splinter Review

Randell Jesup [:jesup] (needinfo me)

Reporter

Description

•

7 years ago

If using an audio device such as a Citrix pseudo-mic & speakers, if the Citrix connection disconnects, the audio device fails and returns error codes, and also we see devices being removed.  When cubeb tries to reopen devices, it can end up in an ERROR state (3), and after that audio cannot be restarted since datacallbacks will not happen again.

Randell Jesup [:jesup] (needinfo me)

Reporter

Updated

•

7 years ago

Rank: 12

Comment hidden (mozreview-request)

Review commit: https://reviewboard.mozilla.org/r/114136/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/114136/

Matthew Gregan [:kinetik]

Comment 2

•

7 years ago

mozreview-review

Comment on attachment 8839520 [details]
Bug 1340718 - Retry getting the default device when audio devices are present and the first call failed.

https://reviewboard.mozilla.org/r/114136/#review115750

::: media/libcubeb/src/cubeb_wasapi.cpp:709
(Diff revision 1)
> -  if (FAILED(hr)) {
> +    if (FAILED(hr)) {
> -    LOG("Could not get default audio endpoint: %x\n", hr);
> +      LOG("Could not get default audio endpoint: %x\n", hr);
> -    SafeRelease(enumerator);
> +      Sleep(100);
> -    return hr;
> -  }
> +    }
> +  } while (FAILED(hr) && hr != E_NOTFOUND);

It's hard to review this without information about how it was debugged with a description of what', specifically, is happening.  What error is being returned from GetDefaultAudioEndpoint in this situation?

This fix seems a bit hacky and my concern is there are other situations where we get an error here and instead of falling back to no sound we end up stuck in this loop.

Which caller of get_default_endpoint does this matter for?  If this kind of loop/sleep is a necessary workaround, it might be better handled by the caller.

::: media/libcubeb/src/cubeb_wasapi.cpp:713
(Diff revision 1)
> -  }
> +    }
> +  } while (FAILED(hr) && hr != E_NOTFOUND);
>  
>    SafeRelease(enumerator);
>  
>    return ERROR_SUCCESS;

There's now a situation where GetDefaultAudioEndpoint fails (hr == E_NOTFOUND) where we used to return E_NOTFOUND and now return ERROR_SUCCESS.

Attachment #8839520 - Flags: review?(kinetik) → review-

Matthew Gregan [:kinetik]

Comment 3

•

7 years ago

If this is happening after a device loss event where we try to reconfigure but the new target/no available device is ready to configure, I think a better approach would be to have a "settle" delay between the device loss event (which we don't control) and when we attempt to reconfigure.  IIRC, Chromium has something like this (a single 200ms delay, last time I looked) to try to deal with this sort of thing - the basic problem being that we trigger reconfigure on the first device loss event but there can be a stream of device change events over time and there's no logical beginning or end of them at which to start the reconfigure process.

Paul Adenot (:padenot)

Assignee

Comment 4

•

7 years ago

I forgot to send my message explaining the patch, sorry. Also, thinking again about it, I think this is not ideal (for the reason you found), I'll come up with something better.

Also, yes, I've implemented some sort of debouncing for the reconfigure events: we receive a bunch of notification, and only reconfigure once, after things have settled. This is sitting on a branch on my windows machine for now.

Paul Adenot (:padenot)

Assignee

Comment 5

•

7 years ago

Attached patch patch — Details — Splinter Review

Much better patch, only touching `GraphDriver.cpp`. As far as cubeb is concerned, the job is done: the error state is being signaled properly. The graph driver now decides to fallback to a `SystemClockDriver`. The graph then decides to re-try periodically to switch to a new `AudioCallbackDriver` (because at least one track needs to output audio). I removed the error case about returning when we can't get the preferred latency, this makes us go into the case where we fallback to a `SystemClockDriver`.

STR:
- On windows, disable all the audio devices, apart from a device that can be unplugged (I used a USB headset)
- Run any demo that outputs sound and uses an MSG (web audio/webrtc, whatever)
- Notice that there is sound
- Unplug the USB device
- At this point, the machine has no audio devices, falls back to a system thread, notice that with this patch, the graph still runs (can't output audio though)
- Re-plug the USB device, notice that audio starts to flow again, with correct timing (no buffering or anything)

I think this is similar with what the folks we're talking with experience with Citrix.

This is easily back-portable, and is a mozilla-central patch for now.

Attachment #8840522 - Flags: review?(rjesup)

Randell Jesup [:jesup] (needinfo me)

Reporter

Comment 6

•

7 years ago

Comment on attachment 8840522 [details] [diff] [review]
patch

Review of attachment 8840522 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/media/GraphDriver.cpp
@@ +1048,5 @@
>  AudioCallbackDriver::StateCallback(cubeb_state aState)
>  {
>    LOG(LogLevel::Debug, ("AudioCallbackDriver State: %d", aState));
> +  if (aState == CUBEB_STATE_ERROR) {
> +    // Fall back to a driver using a normal thread.

Add a comment that device changes (re-plug) will cause it to reopen and switch back to an AudioCallbackDriver

@@ +1051,5 @@
> +  if (aState == CUBEB_STATE_ERROR) {
> +    // Fall back to a driver using a normal thread.
> +    MonitorAutoLock lock(GraphImpl()->GetMonitor());
> +    SystemClockDriver* nextDriver = new SystemClockDriver(GraphImpl());
> +    SetNextDriver(nextDriver);

Comment that we can't use SwitchOnNextIteration since there won't be one.

Attachment #8840522 - Flags: review?(rjesup) → review+

Randell Jesup [:jesup] (needinfo me)

Reporter

Comment 7

•

7 years ago

 https://treeherder.mozilla.org/#/jobs?repo=try&revision=0e4f67858bb71c39351342f5e3fe9ab96ee0c789

Randell Jesup [:jesup] (needinfo me)

Reporter

Updated

•

7 years ago

Component: Audio/Video: cubeb → Audio/Video: MediaStreamGraph

Paul Adenot (:padenot)

Assignee

Comment 8

•

7 years ago

Attached patch patch for esr45 — Details — Splinter Review

Same fix, changed a few calls so that it compiles on ESR45, tested, it still works as intended.

Attachment #8840916 - Flags: review?(rjesup)

Pulsebot

Comment 9

•

7 years ago

Pushed by paul@paul.cx:
https://hg.mozilla.org/integration/mozilla-inbound/rev/71bbaf9a0d0f
When the cubeb stream errors out while driving a MediaStreamGraph, fall back to a SystemClockDriver. r=jesup

Randell Jesup [:jesup] (needinfo me)

Reporter

Comment 10

•

7 years ago

Comment on attachment 8840916 [details] [diff] [review]
patch for esr45

Review of attachment 8840916 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/media/GraphDriver.cpp
@@ +892,5 @@
> +    mNextDriver = nextDriver;
> +    DebugOnly<bool> found = mGraphImpl->RemoveMixerCallback(this);
> +    nextDriver->SetGraphTime(this, mIterationStart, mIterationEnd);
> +    mGraphImpl->SetCurrentDriver(nextDriver);
> +    nextDriver->Start();

Clone the comment about SwitchAtNextIteration.

Attachment #8840916 - Flags: review?(rjesup) → review+

Wes Kocher (:KWierso) (Not reading bugmail; email directly if needed)

Comment 11

•

7 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/71bbaf9a0d0f

Status: NEW → RESOLVED

Closed: 7 years ago

status-firefox54: affected → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla54

Randell Jesup [:jesup] (needinfo me)

Reporter

Comment 12

•

7 years ago

Comment on attachment 8840916 [details] [diff] [review]
patch for esr45

[Approval Request Comment]
If this is not a sec:{high,crit} bug, please state case for ESR consideration:
This is critical to Intuit who is using Firefox for doing video CSR support for tax preparation (and also other Intuit products).  They use ESR, and are a prime example of an ESR user.  All updates to Firefox are done centrally on the Citrix servers, and they try to avoid CSRs being bugged by Firefox update requests they can't resolve.

User impact if declined: When a CSR has a network glitch (or citrix glitch) and loses connectivity for a few seconds, they'll have to restart the browser on the server, and likely will lose contact with the customer and have to have them call in again.

Fix Landed on Version: 54.  Now that it's verified it will be uplifted to all versions.  We're asking for ESR52 as well.

Risk to taking this patch (and alternatives if risky): Very low risk patch; uses the normal sequence to restart the MediaStreamGraph on cubeb/driver failure callback.

String or UUID changes made by this patch: none

See https://wiki.mozilla.org/Release_Management/ESR_Landing_Process for more info.

Attachment #8840916 - Flags: approval-mozilla-esr52?

Attachment #8840916 - Flags: approval-mozilla-esr45?

Sylvestre Ledru [:Sylvestre]

Comment 13

•

7 years ago

Not sure I understand why we need to uplift it to 45? This isn't a new regression. 
Can Intuit workaround the issue?

Gerry Chang [:gchang]

Comment 14

•

7 years ago

We don’t usually uplift patch that isn’t a security fix this late in ESR. We can keep this open, and if we end up with a dot release it could ride along.

Sylvestre Ledru [:Sylvestre]

Comment 15

•

7 years ago

Comment on attachment 8840916 [details] [diff] [review]
patch for esr45

This isn't something that we usually do but here, Randell explained me the rational (partner related) and it makes sense to have this exception.

Attachment #8840916 - Flags: approval-mozilla-esr52?

Attachment #8840916 - Flags: approval-mozilla-esr52+

Attachment #8840916 - Flags: approval-mozilla-esr45?

Attachment #8840916 - Flags: approval-mozilla-esr45+

Sylvestre Ledru [:Sylvestre]

Comment 16

•

7 years ago

Polo, I guess we want to uplift that to all other branches, right?
Merci et bon week end!

Flags: qe-verify+

Flags: needinfo?(padenot)

Ryan VanderMeulen [:RyanVM]

Comment 17

•

7 years ago

bugherder uplift

https://hg.mozilla.org/releases/mozilla-esr52/rev/3aae8d906162

status-firefox-esr52: affected → fixed

Ryan VanderMeulen [:RyanVM]

Comment 18

•

7 years ago

bugherder uplift

https://hg.mozilla.org/releases/mozilla-esr45/rev/bfa75fc20c2b

status-firefox-esr45: affected → fixed

Paul Adenot (:padenot)

Assignee

Comment 19

•

7 years ago

(In reply to Sylvestre Ledru [:sylvestre] from comment #16)
> Polo, I guess we want to uplift that to all other branches, right?
> Merci et bon week end!

Yes, thanks.

Flags: needinfo?(padenot)

Paul Adenot (:padenot)

Assignee

Comment 20

•

7 years ago

Attached patch Beta patch — Details — Splinter Review

Attachment #8839520 - Attachment is obsolete: true

Attachment #8847656 - Flags: approval-mozilla-beta?

Julien Cristau [:jcristau]

Comment 21

•

7 years ago

Comment on attachment 8847656 [details] [diff] [review]
Beta patch

Thanks Paul, let's get this in 53.0b3

Attachment #8847656 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

Ryan VanderMeulen [:RyanVM]

Comment 22

•

7 years ago

bugherder uplift

https://hg.mozilla.org/releases/mozilla-beta/rev/f64241177060

status-firefox53: affected → fixed

Ryan VanderMeulen [:RyanVM]

Updated

•

7 years ago

status-firefox52: affected → wontfix

Ryan VanderMeulen [:RyanVM]

Comment 23

•

7 years ago

[Tracking Requested - why for this release]:

tracking-firefox-esr45: --- → ?

tracking-firefox-esr52: --- → ?

Julien Cristau [:jcristau]

Updated

•

7 years ago

tracking-firefox-esr45: ? → 53+

tracking-firefox-esr52: ? → 53+

Paul Silaghi, QA [:pauly]

Comment 24

•

7 years ago

Still an issue on Fx 53b10, Win 10 x64:
1. Open https://mozilla.github.io/webrtc-landing/gum_test.html
2. Click 'audio'
3. Unplug the USB headset

Actual results:
"Firefox has stopped working" error message displayed. Closing the error pop-up also closes Firefox.
Browser console log after the browser restarts:
"IndexedDB UnknownErr: ActorsParent.cpp:594 (unknown)
UnknownError  IndexedDBHelper.jsm:71:12
PushService:stateChangeProcessEnqueue: Error transitioning state UnknownError  PushService.jsm:160"
Also, there are no crashes in about:crashes.
Thoughts?

Flags: needinfo?(padenot)

Paul Adenot (:padenot)

Assignee

Comment 25

•

7 years ago

Paul, this is in contradiction with https://bugzilla.mozilla.org/show_bug.cgi?id=1354528#c10, are you sure you're testing with beta 10 ?

Flags: needinfo?(padenot) → needinfo?(paul.silaghi)

Paul Silaghi, QA [:pauly]

Comment 26

•

7 years ago

Yes, but I tested on https://archive.mozilla.org/pub/firefox/candidates/53.0b10-candidates/build2/win32/en-US/ which was build 07-Apr-2017, while Emil tried on a tinderbox build from 09-Apr-2017.
So, I guess bug 1354528 is a follow-up for this, but only for the beta branch?

Flags: needinfo?(paul.silaghi) → needinfo?(padenot)

Paul Silaghi, QA [:pauly]

Comment 27

•

7 years ago

Anyway, 54.0a2 (2017-04-10) and 55.0a1 (2017-04-09) don't crash like beta, but after re-plug the headset, I can no longer hear myself on https://mozilla.github.io/webrtc-landing/gum_test.html / audio.

Paul Adenot (:padenot)

Assignee

Comment 28

•

7 years ago

Paul, please open a new bug about this.

Flags: needinfo?(padenot) → needinfo?(paul.silaghi)

Paul Silaghi, QA [:pauly]

Updated

•

7 years ago

Depends on: 1355773

Paul Silaghi, QA [:pauly]

Comment 29

•

7 years ago

Filed bug 1355773.
Meanwhile dropping qe-verify+, will verify later when bug 1355773 gets fixed.

Flags: qe-verify+

Flags: needinfo?(paul.silaghi)

You need to log in before you can comment on or make changes to this bug.