Closed Bug 1735905 Opened 4 months ago Closed 4 months ago

Shutdown deadlock involving cubeb

Categories

(Core :: Audio/Video: cubeb, defect)

defect

Tracking

()

RESOLVED FIXED
95 Branch
Tracking Status
firefox-esr91 --- fixed
firefox94 - wontfix
firefox95 --- fixed

People

(Reporter: glandium, Assigned: glandium)

References

Details

Attachments

(2 files)

Attached file Log

For some reason, when building instrumented builds for PGO with upcoming rustc 1.56, this shutdown deadlock happens 100% of the time on automation... unless using an interactive task (which makes for fun debugging). I wasn't able to reproduce it locally.

Attached is the crash part of the log, once the minidump processing knows about system libraries (because otherwise the stack traces are almost useless). The parts that I think are relevant:

Thread 0 tid 506
 0  libpthread.so.0!__GI___pthread_timedjoin_ex [pthread_join_common.c : 89 + 0x25]
    rax = 0xfffffffffffffe00   rdx = 0x0000000000000282
    rcx = 0x00007feae89f1d2d   rbx = 0x00007feab54fe700
    rsi = 0x0000000000000000   rdi = 0x00007feab54fe9d0
    rbp = 0x0000000000000000   rsp = 0x00007ffd9cfa0ff0
     r8 = 0x00000000000000ca    r9 = 0x00007feab54fe9d0
    r10 = 0x0000000000000000   r11 = 0x0000000000000246
    r12 = 0x0000000000000000   r13 = 0x00007ffd9cfa1000
    r14 = 0x0000000000000000   r15 = 0x00007feae7840310
    rip = 0x00007feae89f1d2d
    Found by: given as instruction pointer in context
 1  libxul.so!<audioipc::core::CoreThread as core::ops::drop::Drop>::drop [core.rs:58db01c6d5f8d7629d8601dd4e08ca2ded93868e : 34 + 0x33]
    rbx = 0x0000000000000001   rbp = 0x00007ffd9cfa1130
    rsp = 0x00007ffd9cfa1060   r12 = 0x00007fead1a66590
    r13 = 0x00007fead1a66530   r14 = 0x00007ffd9cfa1078
    r15 = 0x00007feae7840310   rip = 0x00007fead8bd7d73
    Found by: call frame info
 2  libxul.so!audioipc_server_stop [lib.rs:58db01c6d5f8d7629d8601dd4e08ca2ded93868e : 169 + 0xd]
    rbx = 0x00007feab5769980   rbp = 0x00007ffd9cfa1160
    rsp = 0x00007ffd9cfa1140   r12 = 0x00007fead1a66590
    r13 = 0x00007fead1a66530   r14 = 0x00007fead5643d10
    r15 = 0x00007feae7840310   rip = 0x00007fead8c0e662
    Found by: call frame info
 3  libxul.so!mozilla::CubebUtils::ShutdownLibrary() [CubebUtils.cpp:58db01c6d5f8d7629d8601dd4e08ca2ded93868e : 669 + 0x19]
    rbx = 0x0000000000000000   rbp = 0x00007ffd9cfa1180
    rsp = 0x00007ffd9cfa1170   r12 = 0x00007fead1a66590
    r13 = 0x00007fead1a66530   r14 = 0x00007fead5643d10
    r15 = 0x00007feae7840310   rip = 0x00007fead49e6766
    Found by: call frame info
 4  libxul.so!nsLayoutStatics::Shutdown() [nsLayoutStatics.cpp:58db01c6d5f8d7629d8601dd4e08ca2ded93868e : 362 + 0x5]
    rbx = 0x00007feac6027870   rbp = 0x00007ffd9cfa11a0
    rsp = 0x00007ffd9cfa1190   r12 = 0x00007fead1a66590
    r13 = 0x00007fead1a66530   r14 = 0x00007fead5643d10
    r15 = 0x00007feae7840310   rip = 0x00007fead60b017b
    Found by: call frame info

and

Thread 39 tid 642
 0  libpthread.so.0!__pthread_cond_wait [pthread_cond_wait.c : 655 + 0xfb]
    rax = 0xfffffffffffffe00   rdx = 0x0000000000000000
    rcx = 0x00007feae89f69f3   rbx = 0x00007feaa66e39a0
    rsi = 0x0000000000000080   rdi = 0x00007feaa66e39c8
    rbp = 0x00007feaa66e39c4   rsp = 0x00007feab54fcc90
     r8 = 0x0000000000000000    r9 = 0x0000000000000000
    r10 = 0x0000000000000000   r11 = 0x0000000000000246
    r12 = 0x00007feaa66e39c8   r13 = 0x0000000000000000
    r14 = 0x00007feaa663f9d0   r15 = 0x0000000000000014
    rip = 0x00007feae89f69f3
    Found by: given as instruction pointer in context
 1  libpulse.so.0!pa_threaded_mainloop_wait [thread-mainloop.c : 215 + 0x5]
    rbx = 0x00007feaa695bf40   rbp = 0x00007feab54fd260
    rsp = 0x00007feab54fcd60   r12 = 0x00007feaa5ce24c0
    r13 = 0x00007feaacefda00   r14 = 0x00007feadd394618
    r15 = 0x00007feaa695bf40   rip = 0x00007feabe9e2a68
    Found by: call frame info
 2  libxul.so!<cubeb_pulse::backend::stream::PulseStream as cubeb_backend::traits::StreamOps>::stop [stream.rs:58db01c6d5f8d7629d8601dd4e08ca2ded93868e : 641 + 0x30]
    rbx = 0x00007feabe9e2a40   rbp = 0x00007feab54fd260
    rsp = 0x00007feab54fcd70   r12 = 0x00007feaa5ce24c0
    r13 = 0x00007feaacefda00   r14 = 0x00007feadd394618
    r15 = 0x00007feaa695bf40   rip = 0x00007fead8d44801
    Found by: call frame info
 3  libxul.so!cubeb_backend::capi::capi_stream_stop [capi.rs:58db01c6d5f8d7629d8601dd4e08ca2ded93868e : 186 + 0x5]
    rbx = 0x00007feab570a220   rbp = 0x00007feab54fd270
    rsp = 0x00007feab54fd270   r12 = 0x00007feab54fd6c0
    r13 = 0x00007feab54fd740   r14 = 0x00007feab54fd528
    r15 = 0x00007feab5759200   rip = 0x00007fead8d42851
    Found by: call frame info
 4  libxul.so!audioipc_server::server::CubebServer::process_msg [server.rs:58db01c6d5f8d7629d8601dd4e08ca2ded93868e : 506 + 0x60]
    rbx = 0x00007feab570a220   rbp = 0x00007feab54fd510
    rsp = 0x00007feab54fd280   r12 = 0x00007feab54fd6c0
    r13 = 0x00007feab54fd740   r14 = 0x00007feab54fd528
    r15 = 0x00007feab5759200   rip = 0x00007fead8c2b575
    Found by: call frame info
 5  libxul.so!<audioipc_server::server::CubebServer as audioipc::rpc::server::Server>::process [server.rs:58db01c6d5f8d7629d8601dd4e08ca2ded93868e : 373 + 0x159]
    rbx = 0x00007feab570a208   rbp = 0x00007feab54fd5e0
    rsp = 0x00007feab54fd520   r12 = 0x00007feab54fd6c0
    r13 = 0x00007feab570a210   r14 = 0x00007feab54fd740
    r15 = 0x00007feab5759200   rip = 0x00007fead8c282c9
    Found by: call frame info
 6  libxul.so!<audioipc::rpc::driver::Driver<T> as futures::future::Future>::poll [driver.rs:58db01c6d5f8d7629d8601dd4e08ca2ded93868e : 132 + 0x5bc]
    rbx = 0x00007feab5759200   rbp = 0x00007feab54fd8c0
    rsp = 0x00007feab54fd5f0   r12 = 0x00007feab54fd740
    r13 = 0x00007feab57592f8   r14 = 0x00007feab54fd758
    r15 = 0x000000000000000d   rip = 0x00007fead8c1fb2a
    Found by: call frame info

I found the race condition.

Assignee: nobody → mh+mozilla
Pushed by mh@glandium.org:
https://hg.mozilla.org/integration/autoland/rev/85ab72b52e71
Upgrade cubeb-pulse to fix a race condition that can lead to shutdown deadlock. r=kinetik
Status: NEW → RESOLVED
Closed: 4 months ago
Resolution: --- → FIXED
Target Milestone: --- → 95 Branch
Blocks: 1736459

Mike,

I think you need to backport this one in Debian to fix https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=998108 .

Seems this already is in beta. Would it be worth uplifting to release? Other distros (openSUSE, Arch) shipping Rust 1.56/LLVM 13 are affected as well.

Flags: needinfo?(mh+mozilla)

I'll wait for confirmation that it fixes it.

(In reply to Mike Hommey [:glandium] from comment #7)

I'll wait for confirmation that it fixes it.

It seems such confirmation have been given in:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=998108#212

Thanks :-)

Flags: needinfo?(kinetik)

Comment on attachment 9246239 [details]
Bug 1735905 - Upgrade cubeb-pulse to fix a race condition that can lead to shutdown deadlock.

ESR Uplift Approval Request

  • If this is not a sec:{high,crit} bug, please state case for ESR consideration: Rust unsafe violation leading to some versions of the rust compiler to generate unexpected code that can deadlock (in more situations than originally stated ; note: this does not affect the mozilla.org builds)
  • User impact if declined: See above.
  • Fix Landed on Version: 95
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): Fix is rather straightforward and well understood, replacing a pointer with an atomic pointer.
  • String or UUID changes made by this patch: N/A

Note: the revision in phabricator was updated with a backport for esr91 based on the branch from comment 10.

Flags: needinfo?(mh+mozilla)
Attachment #9246239 - Flags: approval-mozilla-esr91?
Duplicate of this bug: 1737212

Comment on attachment 9246239 [details]
Bug 1735905 - Upgrade cubeb-pulse to fix a race condition that can lead to shutdown deadlock.

Approved for 91.4esr.

Attachment #9246239 - Flags: approval-mozilla-esr91? → approval-mozilla-esr91+
Duplicate of this bug: 1740260
Duplicate of this bug: 1736770
See Also: → 1740973

Is this fix going to backported to Firefox 94.x? The release notes for 94.0.2 have not been published yet:

We’re still preparing the notes for this release, and will post them here when they are ready. Please check back later.

The 94.0.2 source archive is already available, but unfortunately does not contain the fix.

 firefox-94.0.2$ md5sum third_party/rust/cubeb-pulse/src/backend/stream.rs 
 4335bd496eeb9a1e576cae5ba635197e  third_party/rust/cubeb-pulse/src/backend/stream.rs

This is the same version as in 94.0.

What could have been done, to get the fix into 94.0.2?

[Tracking Requested - why for this release]:

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=998108#241 and https://bugzilla.opensuse.org/show_bug.cgi?id=1192067 backported the fix.

Wouldn't it be better to backport the fix upstream (here) to serve all downstream builds equally?

(Paul Menzel from comment #18)

What could have been done, to get the fix into 94.0.2?

Too late to get this into 94.0.2. Given that Fx95 goes to RC next week, it's pretty unlikely we'll ship another Fx94 in the mean time.

(In reply to Ryan VanderMeulen [:RyanVM] from comment #20)

Too late to get this into 94.0.2. Given that Fx95 goes to RC next week, it's pretty unlikely we'll ship another Fx94 in the mean time.

So that is works better the next time, what should have been done to get this into 94.0.2?

Asking for it around the time of comment 11 when the ESR branching and request was made but not the same for release.

Duplicate of this bug: 1741444
Duplicate of this bug: 1743610
Duplicate of this bug: 1744492
You need to log in before you can comment on or make changes to this bug.