Closed Bug 1824682 Opened 2 years ago Closed 2 years ago

Crash in [@ core::result::unwrap_failed | core::result::Result<T>::expect]

Categories

(Toolkit :: Telemetry, defect)

Unspecified
Linux
defect

Tracking

()

RESOLVED FIXED
113 Branch
Tracking Status
firefox-esr102 --- wontfix
firefox111 --- wontfix
firefox112 --- fixed
firefox113 --- fixed

People

(Reporter: gsvelto, Assigned: chutten)

References

Details

(Keywords: crash)

Crash Data

Attachments

(1 file)

Crash report: https://crash-stats.mozilla.org/report/index/e03ce9b2-ccda-46c6-8258-345c90230326

MOZ_CRASH Reason: failed to open /dev/urandom: Os { code: 13, kind: PermissionDenied, message: "Permission denied" }

Top 10 frames of crashing thread:

0  libxul.so  MOZ_Crash  mfbt/Assertions.h:261
0  libxul.so  RustMozCrash  mozglue/static/rust/wrappers.cpp:18
1  libxul.so  mozglue_static::panic_hook  mozglue/static/rust/lib.rs:91
2  libxul.so  core::ops::function::Fn::call  library/core/src/ops/function.rs:79
3  libxul.so  <alloc::boxed::Box<F, A> as core::ops::function::Fn<Args>>::call  library/alloc/src/boxed.rs:2002
3  libxul.so  std::panicking::rust_panic_with_hook  library/std/src/panicking.rs:692
4  libxul.so  std::panicking::begin_panic_handler::{{closure}}  library/std/src/panicking.rs:579
5  libxul.so  std::sys_common::backtrace::__rust_end_short_backtrace  library/std/src/sys_common/backtrace.rs:137
6  libxul.so  rust_begin_unwind  library/std/src/panicking.rs:575
7  libxul.so  core::panicking::panic_fmt  library/core/src/panicking.rs:64

This is a crash that is triggered via Glean though Glean itself is not at fault. This is a problem that's similar to what we've experienced on Windows when failing to get random values. There's two type of crashes here, either we fail to open /dev/urandom or we block reading from it (the latter seem to happen with very old distros/kernels).

Correction, blocking on /dev/urandom seems to happen on recent distros/kernels, not old ones. Also a fair amount of these crashes are in the rdd process and due to the non-specific stack signature there's a few that also seem unrelated.

A pity "Top 10 frames of crashing thread" doesn't go to frame number 10 as 9 and 10 are the fun ones:

9 	libxul.so 	std::sys::unix::rand::imp::fill_bytes 	library/std/src/sys/unix/rand.rs:135 	inlined
9 	libxul.so 	std::sys::unix::rand::hashmap_random_keys 	library/std/src/sys/unix/rand.rs:5 	cfi
10 	libxul.so 	std::collections::hash::map::RandomState::new::KEYS::__init 	library/std/src/collections/hash/map.rs:3125 	inlined

I'm not entirely certain what we can do about this short of reworking FOG's and Glean's internal storages to not use hashed collections. I'm also unsure what the best component'd be to get this looked at.

A quick scan of the source suggests that, like Firefox, it only uses urandom if getrandom's unavailable/unhappy. :gsevlto, is FOG/Glean merely stealing crashes from other urandom failures Firefox would have? (or would silently use 0 for instead)?

Flags: needinfo?(gsvelto)

(In reply to Chris H-C :chutten from comment #2)

I'm not entirely certain what we can do about this short of reworking FOG's and Glean's internal storages to not use hashed collections. I'm also unsure what the best component'd be to get this looked at.

Maybe the Widget like we did with bug 1816953?

is FOG/Glean merely stealing crashes from other urandom failures Firefox would have? (or would silently use 0 for instead)?

Yes, I think it just happens to be the first piece of code to stumble upon this.

Flags: needinfo?(gsvelto)

bug 1816953 (well, the hook from bug 1788004) has access to a fallback, though. /dev/urandom is the fallback. Do we have another fallback hidden someplace? Or should we try waiting for a few millis and try again? I'm not sure that panicking when there is no source of randomness is actually wrong.

But I might be misunderstanding what it means to fail to read from /dev/urandom

Flags: needinfo?(gsvelto)

(In reply to Chris H-C :chutten from comment #4)

bug 1816953 (well, the hook from bug 1788004) has access to a fallback, though. /dev/urandom is the fallback. Do we have another fallback hidden someplace? Or should we try waiting for a few millis and try again? I'm not sure that panicking when there is no source of randomness is actually wrong.

But I might be misunderstanding what it means to fail to read from /dev/urandom

It seems we have two cases. One is that the read() call is returning EAGAIN, possibly because there's not enough entropy available. In that case I think that waiting would be preferable to crashing. In the other case we hit EPERM which would be super-odd... but it only happens in the rdd process! So maybe we've got a sandboxing issue there on older machines (it's a Debian 8 installation)?

Flags: needinfo?(gsvelto)

Unless I'm missing something, the EAGAINs aren't about /dev/urandom, they're when trying to start a thread. I think those are exactly what they look like: clone failing with EAGAIN, probably because we're over the RLIMIT_NPROC resource limit (on Linux that limits the number of threads, not processes). In particular, disproportionately many of the crashes are from ALT Linux, which has come to our attention before for having an unusually low RLIMIT_NPROC setting and causing crashes.

As for /dev/urandom and RDD: it doesn't seem to be allowed in the sandbox policy. (11 is EACCES, not EPERM.) That's easy enough to fix. Those are all kernel 3.16, which makes sense — the getrandom syscall was added in 3.17.

If we allow it on RDD we would need on all process types, if it's due to FOG.

See Also: → 1825822
Duplicate of this bug: 1825822

(In reply to Alexandre LISSY :gerard-majax from comment #7)

If we allow it on RDD we would need on all process types, if it's due to FOG.

utility and socket and content are the other sandboxed process types on Linux, and they all already have /dev/urandom, so we're clear there.

On Linuxen without getrandom(), Rust (and Firefox broadly) uses /dev/urandom
as a fallback. Rust uses it for a few things, notably hashmaps... and will
panic if it can't use it.

Assignee: nobody → chutten
Status: NEW → ASSIGNED
Pushed by chutten@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/2c1002a2f6da Add /dev/urandom to rdd process sandbox on Linux r=gcp
Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → 113 Branch

The patch landed in nightly and beta is affected.
:chutten, is this bug important enough to require an uplift?

  • If yes, please nominate the patch for beta approval.
  • If no, please set status-firefox112 to wontfix.

For more information, please visit auto_nag documentation.

Flags: needinfo?(chutten)

Comment on attachment 9326362 [details]
Bug 1824682 - Add /dev/urandom to rdd process sandbox on Linux r?gcp!

Beta/Release Uplift Approval Request

  • User impact if declined: Infrequent Linux-specific crash due to sandboxing will still occur.
  • Is this code covered by automated tests?: No
  • Has the fix been verified in Nightly?: Yes
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): Low risk because it's Linux-only, it's adding an exception to one process type's sandbox that's present on all other snadboxed process types' exception lists.
  • String changes made/needed:
  • Is Android affected?: Unknown
Flags: needinfo?(chutten)
Attachment #9326362 - Flags: approval-mozilla-beta?

Comment on attachment 9326362 [details]
Bug 1824682 - Add /dev/urandom to rdd process sandbox on Linux r?gcp!

Approved for 112.0rc2

Attachment #9326362 - Flags: approval-mozilla-beta? → approval-mozilla-release+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: