Closed Bug 1680932 Opened 3 years ago Closed 3 years ago

RDD process crash in [@ syscall | numa_init] due to get_mempolicy

Categories

(Core :: Security: Process Sandboxing, defect, P5)

x86_64
Linux
defect

Tracking

()

RESOLVED WORKSFORME
Tracking Status
firefox-esr78 --- unaffected
firefox83 --- unaffected
firefox84 --- unaffected
firefox85 --- disabled
firefox86 --- disabled
firefox87 --- wontfix

People

(Reporter: gsvelto, Unassigned)

References

(Blocks 1 open bug, Regression)

Details

(Keywords: crash, regression)

Crash Data

Crash report: https://crash-stats.mozilla.org/report/index/5ff22d8e-3863-4f5e-88ca-b18150201205

Reason: SIGSYS

Top 10 frames of crashing thread:

0 libc.so.6 syscall 
1 libnuma.so.1 numa_init libnuma.c:98
2 ld-linux-x86-64.so.2 call_init.part.0 elf/elf/dl-init.c:60
3 ld-linux-x86-64.so.2 _dl_init elf/elf/dl-init.c:121
4 libc.so.6 __GI__dl_catch_exception elf/elf/dl-error-skeleton.c:182
5 ld-linux-x86-64.so.2 dl_open_worker elf/elf/dl-open.c:783
6 libc.so.6 __GI__dl_catch_exception elf/elf/dl-error-skeleton.c:208
7 ld-linux-x86-64.so.2 _dl_open elf/elf/dl-open.c:864
8 libdl.so.2 dlopen_doit dlfcn/dlfcn/dlopen.c:66
9 libc.so.6 __GI__dl_catch_exception elf/elf/dl-error-skeleton.c:208

It seems like on recent Ubuntu installations glibc is calling into libnuma which in turn invokes the get_mempolicy syscall (nr 239).

99.73% of the 1498 numa_init have Fission enabled, though there are only 27 unique installations.

The earliest crashing build ID is 20201127155321.

Ubuntu 20.04.1 LTS and Ubuntu 20.10 are affected.

Fission Milestone: --- → ?

get_mempolicy was added to the content sandbox policy in bug 1285769, briefly removed in bug 1384804 and then re-added due to regressions. This crash report, however, is for the RDD process.

We could allow it in the common policy, but it's worth considering whether it's a significant fingerprinting risk for GMP; if it is, the GMP policy could override the common policy with a denial.

Alternately, if this is for something (FFmpeg?) that we intend to move from content to RDD, we could add it to the RDD policy only.

See Also: → 1285769

(In reply to Chris Peterson [:cpeterson] from comment #1)

99.73% of the 1498 numa_init have Fission enabled, though there are only 27 unique installations.

Based on Jed's comments about get_mempolicy and our process sandbox policies, this bug is probably not Fission related, even though 99% of the crash reports have Fission enabled.

Fission Milestone: ? → ---
Summary: Crash in [@ syscall | numa_init] → RDD process crash in [@ syscall | numa_init]
Regressed by: 1595994
Has Regression Range: --- → yes

(In reply to Chris Peterson [:cpeterson] from comment #3)

Based on Jed's comments about get_mempolicy and our process sandbox policies, this bug is probably not Fission related, even though 99% of the crash reports have Fission enabled.

This is a crash only on Nightly; otherwise the system call fails with ENOSYS and the library hopefully falls back to ignoring NUMA topology. I don't know if that helps explain the unusually large amount of Fission.

Using FFmpeg in the RDD process was originally Nightly-only but is riding the trains as of bug 1681228.

Summary: RDD process crash in [@ syscall | numa_init] → RDD process crash in [@ syscall | numa_init] due to get_mempolicy

High crash volume. Even if it's Nightly only it's annoying enough we'll have to fix it.

Blocks: 1681228
Severity: -- → S2
Priority: -- → P1

(In reply to Gabriele Svelto [:gsvelto] from comment #0)

It seems like on recent Ubuntu installations glibc is calling into libnuma which in turn invokes the get_mempolicy syscall (nr 239).

To clarify: glibc's dynamic loader is calling the FFmpeg libraries' initializer functions, which in turn call libnuma. As far as I can tell this is specific to the case of loading FFmpeg, and not a case of glibc itself trying to be NUMA-aware.

The fix is simple: allow the syscall for the RDD process, and leave a comment on the pref definition (or someplace similarly obvious) to remind us to remove it from the content process policy if/when the pref is removed.

Using FFmpeg in the RDD process was originally Nightly-only but is riding the trains as of bug 1681228.

AFAICT, RDD is enabled for FFmpeg on Linux in 85 Beta [1], but we have thousands of numa_init crash reports from 85 Nightly but none from 85 Beta.

[1] https://hg.mozilla.org/releases/mozilla-beta/file/de29dec5f30bfb41c42156c8d89aac69485caa2d/modules/libpref/init/StaticPrefList.yaml#l7249

(In reply to Chris Peterson [:cpeterson] from comment #8)

AFAICT, RDD is enabled for FFmpeg on Linux in 85 Beta [1], but we have thousands of numa_init crash reports from 85 Nightly but none from 85 Beta.

See comment 4, we don't crash on sandbox violations on beta (not directly, anyway).

I am still seeing this with 87.0a1 (20210214213026). Any news on this?

Just double checking if the priority and severity are still accurate based on crash volume

Flags: needinfo?(gpascutto)

Alastor, was this fixed by Bug 1685463 perhaps?

Severity: S2 → S4
Flags: needinfo?(gpascutto) → needinfo?(alwu)
Priority: P1 → P5

No, this one is different. This is using ffmpeg (not ffvpx), which is default off on all branches. So when calling PR_LoadLibraryWithFlags, that starts calling dlopen which causes other system calls which the sandbox policy doesn't allowed (bug 1672516).

Blocks: RDD
Flags: needinfo?(alwu)

Closing because no crashes reported for 12 weeks.

Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.