Closed Bug 1685463 Opened 10 months ago Closed 8 months ago

rdd Crash in [@ __pthread_setaffinity_new]

Categories

(Core :: Security: Process Sandboxing, defect, P2)

Unspecified
Linux
defect

Tracking

()

RESOLVED FIXED
88 Branch
Tracking Status
firefox-esr78 --- unaffected
firefox84 --- unaffected
firefox85 --- unaffected
firefox86 --- wontfix
firefox87 --- fixed
firefox88 --- fixed

People

(Reporter: aryx, Assigned: alwu)

References

(Depends on 1 open bug, Blocks 2 open bugs, Regression)

Details

(Keywords: crash, regression)

Crash Data

Attachments

(1 file)

rdd linux process crash new in Firefox 86.0a1 20210106155127, likely from bug 1681043.

Maybe Fission related. (DOMFissionEnabled=1)

Crash report: https://crash-stats.mozilla.org/report/index/1f7b4b07-fb85-495a-90d8-8bf810210107

Reason: SIGSYS

Top 5 frames of crashing thread:

0 libpthread.so.0 __pthread_setaffinity_new sysdeps/unix/sysv/linux/pthread_setaffinity.c:44
1 libmesa_dri_drivers.so util_queue_thread_func src/util/u_queue.c:261
2 libmesa_dri_drivers.so impl_thrd_routine include/c11/threads_posix.h:87
3 libpthread.so.0 start_thread nptl/nptl/pthread_create.c:477
4 libc.so.6 __GI___clone 
Flags: needinfo?(jya-moz)
Crash Signature: [@ __pthread_setaffinity_new] → [@ __pthread_setaffinity_new] [@ mozilla::ipc::IToplevelProtocol::OtherPid]
Summary: Crash in [@ __pthread_setaffinity_new] → rdd Crash in [@ __pthread_setaffinity_new] and [@ mozilla::ipc::IToplevelProtocol::OtherPid]

This is a crash in the profiler;
https://searchfox.org/mozilla-central/rev/ef900cd2258d4c5d968093f612f807d96e6e7c98/dom/media/ipc/RDDParent.cpp#156

Here we have a null deref somewhere (crash address is 0xcb) in thread #6 while the main thread is waiting on it.

Could be a timing issue as the RDD process may be started earlier following bug 1681043 and exposing an existing issue when initialisation of the profiler occurs.

Component: Audio/Video: Playback → Gecko Profiler
Flags: needinfo?(jya-moz) → needinfo?(gsquelart)
Crash Signature: [@ __pthread_setaffinity_new] [@ mozilla::ipc::IToplevelProtocol::OtherPid] → [@ __pthread_setaffinity_new] [@ pthread_setaffinity_np@@GLIBC_2.3.40] [@ mozilla::ipc::IToplevelProtocol::OtherPid]

I'm not convinced it's the profiler:

In the report linked in comment 0 ( https://crash-stats.mozilla.org/report/index/1f7b4b07-fb85-495a-90d8-8bf810210107#allthreads ), we have:

  • Thread 0 is waiting for the end of the initialization of the new "ProfilerChild" thread.
  • Thread 8 is that new thread, and it's already waiting for tasks to be processed. (Maybe the end-of-initialization monitor is in flight between 8 and 0?)
  • Thread 6 is the crashing thread, I'm not sure where it comes from, but assuming its number is relative to creation order, it may have started before the ProfilerChild thread?

That thread 6 seems to be part of a small group of threads (#4, #5, #6, #7; sometimes more) that look different from others, and they're in all other reports. What are these? They all have their thread main function in libgallium_dri.so or libmesa_dri_drivers.so, both graphic driver libraries.

Looking at some other crash reports:

  1. https://crash-stats.mozilla.org/report/index/e8d251ec-8a29-4ad1-9b65-2ad5e0210107#allthreads : No ProfilerChild thread yet, thread 0 is in RDDParent::RecvInit -> SetRemoteDataDecoderSandbox -> SetCurrentProcessSandbox -> __libc_read, crashing thread is 4.
  2. https://crash-stats.mozilla.org/report/index/912a0132-79da-4ecb-9141-983a60210107#allthreads : Very close to previous one, but crashing thread is 5.
  3. https://crash-stats.mozilla.org/report/index/9f229f43-60f4-4ca1-9126-f16ef0210107#allthreads : ProfilerChild thread started, also a further thread MediaSupervisor has also started, I'm not sure what thread 0 is doing, crashing thread is 4.
  4. https://crash-stats.mozilla.org/report/index/c3bc6530-b9fa-4d1b-9bd4-78c800210107#allthreads : Similar to the report in comment 0, but the ProfilerChild thread has barely started, crashing thread is 4, but interestingly threads 5, 6, 7 are all trying to log a sandbox crash.

Guess at this point (uneducated so don't trust me!) : The RDD process needs graphics capabilities ASAP and pre-loads a graphics library (before the call to SetRemoteDataDecoderSandbox), the graphics library starts initializing 4+ threads in the background and hits a sandbox wall at a random time -- while the main thread is still initializing other Firefox-related things.

Forwarding NI to :gcp to weigh in on the sandbox angle. Please forward (or back to me) as appropriate.

Flags: needinfo?(gsquelart) → needinfo?(gpascutto)
Crash Signature: [@ __pthread_setaffinity_new] [@ pthread_setaffinity_np@@GLIBC_2.3.40] [@ mozilla::ipc::IToplevelProtocol::OtherPid] → [@ __pthread_setaffinity_new] [@ pthread_setaffinity_np@@GLIBC_2.3.4] [@ mozilla::ipc::IToplevelProtocol::OtherPid]

Oh, and please note that the crash reason is "SIGSYS", so I doubt that the crash address points at a read or write through a nullptr.

SIGSYS is indeed very likely to come from the sandbox. I guess the RDD policy doesn't permit some setaffinity related syscall.

Flags: needinfo?(gpascutto)

I'm not clear on the severity - are we hitting this in the default config?

Component: Gecko Profiler → Security: Process Sandboxing

[@ mozilla::ipc::IToplevelProtocol::OtherPid] looks like a different crash to me. Sebastian would you mind filing that separately?

Crash Signature: [@ __pthread_setaffinity_new] [@ pthread_setaffinity_np@@GLIBC_2.3.4] [@ mozilla::ipc::IToplevelProtocol::OtherPid] → [@ __pthread_setaffinity_new] [@ pthread_setaffinity_np@@GLIBC_2.3.4]
Flags: needinfo?(aryx.bugmail)
Summary: rdd Crash in [@ __pthread_setaffinity_new] and [@ mozilla::ipc::IToplevelProtocol::OtherPid] → rdd Crash in [@ __pthread_setaffinity_new]
Flags: needinfo?(aryx.bugmail)

Gian-Carlo, could we get an assignee to this bug? The volume is significant on Nightly. Thanks

Flags: needinfo?(gpascutto)

Nils, who owns the RDD work?

Flags: needinfo?(gpascutto) → needinfo?(drno)

(In reply to Pascal Chevrel:pascalc from comment #7)

Gian-Carlo, could we get an assignee to this bug? The volume is significant on Nightly. Thanks

If this is caused by ffmpeg inside RDD causing problems then we have a high amount of people who flip the pref, because it is off by default https://searchfox.org/mozilla-central/source/modules/libpref/init/StaticPrefList.yaml#7245

I'll try to find someone to look into this, but from my point of view this doesn't have the highest priority right now (assuming folks are messing around with prefs which we have not officially approved as usable yet).

Flags: needinfo?(drno)
See Also: → 1686681
Blocks: RDD
Severity: -- → S4
Priority: -- → P5

Changing the priority to p2 as the bug is tracked by a release manager for the current nightly.
See What Do You Triage for more information

Priority: P5 → P2

Context on the sched syscalls: they can target any thread of any process owned by the same user, so we'd prefer not to allow them in the general case. We can limit it to the calling thread only, but there's no way to allow other threads in the same process without allowing everything.

(In reply to Sebastian Hengst [:aryx] (needinfo on intermittent or backout) from comment #0)

1 libmesa_dri_drivers.so util_queue_thread_func src/util/u_queue.c:261
2 libmesa_dri_drivers.so impl_thrd_routine include/c11/threads_posix.h:87

I'm confused about why GPU drivers are being loaded in the RDD process. I thought VA-API support was still off by default.

(In reply to Nils Ohlmeier [:drno] from comment #9)

If this is caused by ffmpeg inside RDD causing problems then we have a high amount of people who flip the pref, because it is off by default https://searchfox.org/mozilla-central/source/modules/libpref/init/StaticPrefList.yaml#7245

I notice that media.rdd-ffvpx.enabled is on by default; could that be causing some of our “ffmpeg” problems (like the one where it tries to inspect the terminal to decide whether to print color codes on stderr)?

(In reply to Gerald Squelart [:gerald] (he/him) from comment #2)

Guess at this point (uneducated so don't trust me!) : The RDD process needs graphics capabilities ASAP and pre-loads a graphics library (before the call to SetRemoteDataDecoderSandbox), the graphics library starts initializing 4+ threads in the background and hits a sandbox wall at a random time -- while the main thread is still initializing other Firefox-related things.

For what it's worth, we have had interesting race conditions with sandbox startup before (e.g., NSPR trying to read and then write some thread scheduling state, and doesn't write if the read failed, but sandbox start can occur between the two calls).

(In reply to Gerald Squelart [:gerald] (he/him) from comment #3)

Oh, and please note that the crash reason is "SIGSYS", so I doubt that the crash address points at a read or write through a nullptr.

The address is indeed fake: it's the syscall number, copied into the address field to make it searchable on crash-stats; see bug 1017393.

Just some remarks from a user's perspective:

  • I haven't changed any of the RDD settings in about:config.
  • I had at least two crashes that happened so early during startup that the crash reporter didn't seem to have any data (maybe not even started yet?) so that sending the report fails. The last time this happened I got the following terminal output:
Sandbox: seccomp sandbox violation: pid 9311, tid 9320, syscall 203, args 9320 128 140347659128096 8 140347659130432 140347659130432.  Killing process.
Sandbox: seccomp sandbox violation: pid 9311, tid 9321, syscall 203, args 9321 128 140347650735392 8 140347650737728 140347650737728.  Killing process.
Sandbox: seccomp sandbox violation: pid 9311, tid 9322, syscall 203, args 9322 128 140347642342688 8 140347642345024 140347642345024.  Killing process.
Sandbox: crash reporter is disabled (or failed); trying stack trace:
Sandbox: crash reporter is disabled (or failed); trying stack trace:
Sandbox: frame #01: pthread_setaffinity_np[/usr/lib/libpthread.so.0 +0x143a1]
Sandbox: frame #02: ???[/usr/lib/dri/r600_dri.so +0x112ea8]
Sandbox: frame #01: pthread_setaffinity_np[/usr/lib/libpthread.so.0 +0x143a1]
Sandbox: frame #02: ???[/usr/lib/dri/r600_dri.so +0x112ea8]
Sandbox: frame #03: ???[/usr/lib/dri/r600_dri.so +0x112638]
Sandbox: frame #04: ???[/usr/lib/libpthread.so.0 +0x93e9]
Sandbox: frame #05: clone[/usr/lib/libc.so.6 +0x100293]
Sandbox: frame #06: ??? (???:???)
Sandbox: end of stack.
  • I think there are users who disable media.rdd-ffvpx.enabled because they want to use hardware video acceleration for VP9 with VA-API which is currently broken (bug 1673184) unless you disable this setting.

Nils, does the previous comment change anything about https://bugzilla.mozilla.org/show_bug.cgi?id=1685463#c9 ?

If users can hit this in the default config (!) and we're not necessarily getting crash reports (!!!!) if it does this is very serious.

Flags: needinfo?(drno)

I had two crashes today during the same browsing session, but I could send only the report for the first one because the crash reporter failed (?) for the second one and I got a similar terminal output as above, but somehow mixed up:

Sandbox: seccomp sandbox violation: pid 19216, tid 19225, syscall 203, args 19225 128 140473493822752 8 140473493825088 140473493825088.  Killing process.
Sandbox: seccomp sandbox violation: pid 19216, tid 19226, syscall 203, args 19226 128 140473485430048 8 140473485432384 140473485432384.  Killing process.
Sandbox: crash reporter is disabled (or failed); trying stack trace:
Sandbox: seccomp sandbox violation: pid 19216, tid 19227, syscall 203, args 19227 128 140473477037344 8 140473477039680 140473477039680.  Killing process.
Sandbox: crash reporter is disabled (or failed); trying stack trace:
Sandbox: frame #01: pthread_setaffinity_np[/usr/lib/libpthread.so.0 +0x143a1]
Sandbox: frame #02: ???[/usr/lib/dri/r600_dri.so +0x112ea8]
Sandbox: frame #03: ???[/usr/lib/dri/r600_dri.so +0x112638]
Sandbox: frame #04: ???[/usr/lib/libpthread.so.0 +0x93e9]
Sandbox: frame #05: clone[/usr/lib/libc.so.6 +0x100293]
Sandbox: frame #01: pthread_setaffinity_np[/usr/lib/libpthread.so.0 +0x143a1]
Sandbox: frame #02: ???[/usr/lib/dri/r600_dri.so +0x112ea8]
Sandbox: frame #03: ???[/usr/lib/dri/r600_dri.so +0x112638]
Sandbox: frame #06: ??? (???:???)
Sandbox: end of stack.
Sandbox: frame #04: ???[/usr/lib/libpthread.so.0 +0x93e9]
Sandbox: frame #05: clone[/usr/lib/libc.so.6 +0x100293]

So I am not sure any more if the crashes I mentioned in comment 12 really happened at startup or just early during the session.

OS: Unspecified → Linux
Duplicate of this bug: 1691292
Crash Signature: [@ __pthread_setaffinity_new] [@ pthread_setaffinity_np@@GLIBC_2.3.4] → [@ __pthread_setaffinity_new] [@ pthread_setaffinity_np@@GLIBC_2.3.4] [@ libpthread.so.0@0x14491]

Will check this later, add NI.

Crash Signature: [@ __pthread_setaffinity_new] [@ pthread_setaffinity_np@@GLIBC_2.3.4] [@ libpthread.so.0@0x14491] → [@ __pthread_setaffinity_new] [@ pthread_setaffinity_np@@GLIBC_2.3.4] [@ libpthread.so.0@0x14491]
Flags: needinfo?(drno) → needinfo?(alwu)

Added a signature

Crash Signature: [@ __pthread_setaffinity_new] [@ pthread_setaffinity_np@@GLIBC_2.3.4] [@ libpthread.so.0@0x14491] → [@ __pthread_setaffinity_new] [@ pthread_setaffinity_np@@GLIBC_2.3.4] [@ libpthread.so.0@0x12c5d] [@ libpthread.so.0@0x14491]

I'm also seeing this crash on an old i965 mashine in nightly:

Sandbox: seccomp sandbox violation: pid 3521, tid 3531, syscall 203, args 3531 128 139860967857440 8 139860967859776 1.  Killing process.
Sandbox: seccomp sandbox violation: pid 3521, tid 3530, syscall 203, args 3530 128 139861197053216 8 139861197055552 1.  Killing process.
Sandbox: crash reporter is disabled (or failed); trying stack trace:
Sandbox: frame #01: pthread_setaffinity_np[/lib64/libpthread.so.0 +0x14491]
Sandbox: frame #02: ???[/usr/lib64/dri/i965_dri.so +0x591d64]
Sandbox: frame #03: ???[/usr/lib64/dri/i965_dri.so +0x59152b]
Sandbox: frame #04: ???[/lib64/libpthread.so.0 +0x93f9]
Sandbox: frame #05: clone[/lib64/libc.so.6 +0x101b53]
Sandbox: frame #06: ??? (???:???)
Sandbox: end of stack.

Those crashes all happened from the driver's so (eg. libmesa_dri_drivers.so, libgallium_dri.so). From bug 1667429, when creating dma buffer, the code would then run into libgallium_dri.so.

I wonder if that is possible the process of creating dma buffer involved in calling pthread_setaffinity_np, which is not allowed in a seccomp sandbox. NI martin to see if he has any thought from the VAAPI perspective.

Flags: needinfo?(alwu) → needinfo?(stransky)

Alastor, can you clarify: https://bugzilla.mozilla.org/show_bug.cgi?id=1685463#c13

If necessary we might have to poke a hole in the sandbox here, which is why it's important to understand if this would be seen in release configurations.

Flags: needinfo?(alwu)
Crash Signature: [@ __pthread_setaffinity_new] [@ pthread_setaffinity_np@@GLIBC_2.3.4] [@ libpthread.so.0@0x12c5d] [@ libpthread.so.0@0x14491] → [@ __pthread_setaffinity_new] [@ pthread_setaffinity_np@@GLIBC_2.3.4] [@ libpthread.so.0@0x12c5d] [@ libpthread.so.0@0x14491] [@libpthread.so.0@0x12acd]
Crash Signature: [@ __pthread_setaffinity_new] [@ pthread_setaffinity_np@@GLIBC_2.3.4] [@ libpthread.so.0@0x12c5d] [@ libpthread.so.0@0x14491] [@libpthread.so.0@0x12acd] → [@ __pthread_setaffinity_new] [@ pthread_setaffinity_np@@GLIBC_2.3.4] [@ libpthread.so.0@0x12c5d] [@ libpthread.so.0@0x14491] [@libpthread.so.0@0x12acd] [@ pthread_setaffinity_np@@GLIBC_2.3.4 ]

I think these are basically users that enable vaapi, which is broken pending bug 1683808, AIUI.

Crash Signature: [@ __pthread_setaffinity_new] [@ pthread_setaffinity_np@@GLIBC_2.3.4] [@ libpthread.so.0@0x12c5d] [@ libpthread.so.0@0x14491] [@libpthread.so.0@0x12acd] [@ pthread_setaffinity_np@@GLIBC_2.3.4 ] → [@ __pthread_setaffinity_new] [@ pthread_setaffinity_np@@GLIBC_2.3.4] [@ libpthread.so.0@0x12c5d] [@ libpthread.so.0@0x14491] [@libpthread.so.0@0x12acd] [@ pthread_setaffinity_np@@GLIBC_2.3.4 ] [@libpthread.so.0@0x13c11]
Depends on: 1683808

By checking current default setting, we DID NOT use ffmpeg (with full decoding abilities for different codecs) in RDD process (pref media.rdd-ffmpeg.enabled), but we DID use ffvpx (which includes the part of the codes from ffmpeg that are used to decode vp8 and vp9 only) in RDD process (pref media.rdd-ffvpx.enabled).

(In reply to Jed Davis [:jld] ⟨⏰|UTC-7⟩ ⟦he/him⟧ from comment #11)

I'm confused about why GPU drivers are being loaded in the RDD process. I thought VA-API support was still off by default.

When we load ffvpx in RDD process, I found this where we would try to link VAAPI related libs. However, that would load the libs only when this condition is true. But that condition won't be true in the RDD process, so that driver seems not related with us.

However, we did load part of ffmpeg in RDD process, they would probably use some methods which would lead to using methods from GPU driver, but I'm not sure.

(In reply to Gian-Carlo Pascutto [:gcp] from comment #20)

If necessary we might have to poke a hole in the sandbox here, which is why it's important to understand if this would be seen in release configurations.

If these crashes are related with the methods which ffvpx (ffmpeg only with decoding ability for vp8 and vp9) used, then we might need to poke a hole in the sandbox. If not, that means the crash is caused by turning on the pref media.rdd-ffmpeg.enabled, which is not a default config.

Flags: needinfo?(alwu)

Hi, I wonder what the status are for these two prefs (media.rdd-ffmpeg.enabled and media.rdd-ffvpx.enabled) in your setting?
Thank you.

Flags: needinfo?(viktor_jaegerskuepper)
Flags: needinfo?(robert.mader)

Both are on default values, i.e. media.rdd-ffmpeg.enabled:false and media.rdd-ffvpx.enabled:true. Only non-default setting is the MOZ_ENABLE_WAYLAND=1 env var.

Flags: needinfo?(robert.mader)

Both on default (media.rdd-ffmpeg.enabled:false and media.rdd-ffvpx.enabled:true). All prefs beginning with media.ffmpeg or media.ffvpx are also on their default values, e.g. media.ffmpeg.vaapi.enabled:false.

Flags: needinfo?(viktor_jaegerskuepper)

Thank you. That do indicate the crash might be related with the usage of ffvpx, which is our default config.

Flags: needinfo?(gpascutto)

Going to raise the severity, though it looks from the code it only affects Wayland compiles.

Jed, it looks from https://bugzilla.mozilla.org/show_bug.cgi?id=1685463#c11 that we're going to have to do the full GPU driver dance in RDD too?

Assignee: nobody → jld
Severity: S4 → S3
Flags: needinfo?(gpascutto) → needinfo?(jld)

That do indicate the crash might be related with the usage of ffvpx, which is our default config.

So should that be backed out/disabled? From Bugzilla, it looks like it wasn't expected to work yet: See https://bugzilla.mozilla.org/show_bug.cgi?id=1683808.

Flags: needinfo?(alwu)

Based on comment 22, we DID disable linking VAAPI on RDD. Because I'm not the expert for ffvpx and VAAPI, that needs to be answered by :stransky. But as far as I know, ffvpx is the way we decode vp8/vp9, which should work independently without VAAPI. So I guess we might did something wrong to unexpectedly let ffvpx to use VAAPI.

Flags: needinfo?(alwu)

(In reply to Alastor Wu [:alwu] from comment #29)

Based on comment 22, we DID disable linking VAAPI on RDD. Because I'm not the expert for ffvpx and VAAPI, that needs to be answered by :stransky. But as far as I know, ffvpx is the way we decode vp8/vp9, which should work independently without VAAPI. So I guess we might did something wrong to unexpectedly let ffvpx to use VAAPI.

Using VAAPI allows hardware accelerated decoding and was added in bug 1660336. It's something we'd most likely want to work as it heavily reduces CPU usage.

(In reply to Robert Mader [:rmader] from comment #30)

Using VAAPI allows hardware accelerated decoding and was added in bug 1660336. It's something we'd most likely want to work as it heavily reduces CPU usage.

I mean using VAAPI via ffvpx in RDD, because we have disabled that in this revision, so it's not clear to me why we would still get the VAAPI usage in RDD.

I looked for early clone()s with GDB and found this one:

#0  0x00007efeb5633010 in clone () at /usr/lib/libc.so.6
#1  0x00007efeb5a4e262 in create_thread () at /usr/lib/libpthread.so.0
#2  0x00007efeb5a4fad8 in pthread_create@@GLIBC_2.2.5 () at /usr/lib/libpthread.so.0
#3  0x00007efea755f95c in  () at /usr/lib/dri/iris_dri.so
#4  0x00007efea6b2d6b6 in  () at /usr/lib/dri/iris_dri.so
#5  0x00007efea6b2ef09 in  () at /usr/lib/dri/iris_dri.so
#6  0x00007efea6b2f357 in  () at /usr/lib/dri/iris_dri.so
#7  0x00007efea71ea459 in  () at /usr/lib/dri/iris_dri.so
#8  0x00007efea71f101d in  () at /usr/lib/dri/iris_dri.so
#9  0x00007efea6776957 in  () at /usr/lib/dri/iris_dri.so
#10 0x00007efea6773b3b in  () at /usr/lib/dri/iris_dri.so
#11 0x00007efea6c2c849 in  () at /usr/lib/dri/iris_dri.so
#12 0x00007efea7f7544e in  () at /usr/lib/libgbm.so.1
#13 0x00007efea7f75b9d in  () at /usr/lib/libgbm.so.1
#14 0x00007efea7f71dfa in gbm_create_device () at /usr/lib/libgbm.so.1
#15 0x00007efeaeadf1dd in mozilla::widget::nsGbmLib::CreateDevice(int) (fd=0) at /home/jan/vc/mozilla/widget/gtk/DMABufLibWrapper.h:59
#16 mozilla::widget::nsDMABufDevice::Configure() (this=this@entry=0x7efeb2bf5800 <mozilla::widget::GetDMABufDevice()::dmaBufDevice>) at /home/jan/vc/mozilla/widget/gtk/DMABufLibWrapper.cpp:224
#17 0x00007efeaeadf4c3 in mozilla::widget::nsDMABufDevice::IsDMABufEnabled() (this=0x7efeb2bf5800 <mozilla::widget::GetDMABufDevice()::dmaBufDevice>) at /home/jan/vc/mozilla/widget/gtk/DMABufLibWrapper.cpp:238
#18 mozilla::widget::nsDMABufDevice::IsDMABufVAAPIEnabled() (this=0x7efeb2bf5800 <mozilla::widget::GetDMABufDevice()::dmaBufDevice>) at /home/jan/vc/mozilla/widget/gtk/DMABufLibWrapper.cpp:251
#19 0x00007efeae2c7396 in mozilla::FFmpegLibWrapper::LinkVAAPILibs() (this=0x7efeb2bf2a30 <mozilla::sFFVPXLib>) at /home/jan/vc/mozilla/dom/media/platforms/ffmpeg/FFmpegLibWrapper.cpp:262
#20 0x00007efeae30cfb7 in mozilla::FFVPXRuntimeLinker::Init() () at /home/jan/vc/mozilla/dom/media/platforms/ffmpeg/ffvpx/FFVPXRuntimeLinker.cpp:68
#21 0x00007efeae2b7638 in mozilla::PDMInitializer::InitRddPDMs() () at /home/jan/vc/mozilla/dom/media/platforms/PDMFactory.cpp:95
#22 mozilla::PDMInitializer::InitPDMs() () at /home/jan/vc/mozilla/dom/media/platforms/PDMFactory.cpp:162
#23 0x00007efeae2b9bae in mozilla::PDMFactory::PDMFactory() (this=0x7efeb525feb0) at /home/jan/vc/mozilla/dom/media/platforms/PDMFactory.cpp:254
#24 mozilla::MakeRefPtr<mozilla::PDMFactory>() () at /home/jan/vc/mozilla/obj-x86_64-pc-linux-gnu/dist/include/mozilla/RefPtr.h:603
#25 mozilla::PDMFactory::Supported(bool)::$_12::operator()() const (this=<optimized out>) at /home/jan/vc/mozilla/dom/media/platforms/PDMFactory.cpp:658
#26 0x00007efeae2b9b38 in mozilla::PDMFactory::Supported(bool) (aForceRefresh=false) at /home/jan/vc/mozilla/dom/media/platforms/PDMFactory.cpp:704
#27 0x00007efeae20904f in mozilla::RDDParent::RecvInit(nsTArray<mozilla::gfx::GfxVarUpdate>&&, mozilla::Maybe<mozilla::ipc::FileDescriptor> const&, bool const&) (this=0x7efeb5224c20, vars=<optimized out>, aBrokerFd=..., aCanRecordReleaseTelemetry=<optimized out>) at /home/jan/vc/mozilla/dom/media/ipc/RDDParent.cpp:129

Maybe we should try reordering the checks at https://searchfox.org/mozilla-central/source/widget/gtk/DMABufLibWrapper.cpp#242-258 so that the StaticPrefs are checked first.

Bug 1683808 is relevant here, I'll look at it.

Flags: needinfo?(stransky)

The quick fix should be moved the RDD check to the first, so we can skip the rest of them if we're clearly in a non-supported process.

IsDMABufEnabled() will call Configure() from which we will possibly call into the driver call in nsGbmLib::CreateDevice().

In order to prevent calling the driver code in RDD process which has been sandboxed, we should move the RDD check to the first.

Attachment #9206684 - Attachment description: Bug 1685463 - move the RDD check to the first. → Bug 1685463 - rearrange the check order.
Pushed by alwu@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/d5fbb3237c70
rearrange the check order. r=stransky
Status: NEW → RESOLVED
Closed: 8 months ago
Resolution: --- → FIXED
Target Milestone: --- → 88 Branch

According to the stats there aren't any crashes with the fixed Nightly builds, and I haven't seen any crashes in the terminal either for the last couple of days, which is great!

Can the fix be uplifted to beta?

The needinfo for Jed can probably be cancelled.

Flags: needinfo?(alwu)

I think uplifting this fix requires uplifting of bug 1695930, at least in part.

I think we can safely uplift logging code.

Comment on attachment 9206684 [details]
Bug 1685463 - rearrange the check order.

Beta/Release Uplift Approval Request

  • User impact if declined: For some Linux users, it would cause a crash.
  • Is this code covered by automated tests?: No
  • Has the fix been verified in Nightly?: No
  • Needs manual test from QE?: No
  • If yes, steps to reproduce: No
  • List of other uplifts needed: Bug 1695930
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): This patch is based on bug1695930, which only contains debug logging code. In addition, the fix itself doesn't include any new feature and behavioral change, it just rearrange the check order.
  • String changes made/needed: No
Flags: needinfo?(alwu)
Attachment #9206684 - Flags: approval-mozilla-beta?
Assignee: jld → alwu

Comment on attachment 9206684 [details]
Bug 1685463 - rearrange the check order.

approved for 87.0b8

Attachment #9206684 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Flags: needinfo?(jld)
You need to log in before you can comment on or make changes to this bug.