Closed Bug 1655929 Opened 4 years ago Closed 4 years ago

Crash in [@ rand::rngs::thread::THREAD_RNG_KEY::__init::{{closure}}]

Categories

(Core :: XPCOM, defect)

Unspecified
Android
defect

Tracking

()

RESOLVED FIXED
81 Branch
Tracking Status
firefox-esr68 --- unaffected
firefox-esr78 --- unaffected
firefox79 + fixed
firefox80 + fixed
firefox81 + fixed

People

(Reporter: pascalc, Assigned: froydnj)

Details

(Keywords: crash)

Crash Data

Attachments

(4 files)

This bug is for crash report bp-2c11bc35-8ae6-4c97-853d-4ed260200729.

Top 10 frames of crashing thread:

0 libxul.so RustMozCrash mozglue/static/rust/wrappers.cpp:17
1 libxul.so mozglue_static::panic_hook mozglue/static/rust/lib.rs:89
2 libxul.so core::ops::function::Fn::call src/libcore/ops/function.rs:72
3 libxul.so std::panicking::rust_panic_with_hook src/libstd/panicking.rs:474
4 libxul.so rust_begin_unwind src/libstd/panicking.rs:378
5 libxul.so core::panicking::panic_fmt src/libcore/panicking.rs:85
6 libxul.so rand::rngs::thread::THREAD_RNG_KEY::__init::{{closure}} third_party/rust/rand/src/rngs/thread.rs:65
7 libxul.so rand::rngs::thread::thread_rng third_party/rust/rand/src/rngs/thread.rs:81
8 libxul.so uuid::v4::<impl uuid::Uuid>::new_v4 third_party/rust/uuid/src/v4.rs:28
9 libxul.so GkRustUtils_GenerateUUID xpcom/rust/gkrust_utils/src/lib.rs:14

Looks like we're crashing while generating a UUID? Hopefully this is not a rust bug.

Component: Stability → XPCOM
Product: Fenix → Core

The crash reason on the report in comment 0 is "could not initialize thread_rng: getrandom: unknown code 0x00000016 -"

Oh noes, getrandom + Linux, that's just what we need! /s

(That's also the crash reason for all 23 crashes I see for 79.)

All these crashes except for one happen on API 23. It doesn't look like we collect kernel version info, sadly.

The official Linux kernel 3.14.29 (as suggested by the original crash report) doesn't support the getrandom syscall, so I would think that we would hit the code here:

https://searchfox.org/mozilla-central/source/third_party/rust/getrandom/src/linux_android.rs#48

but apparently we're not doing that?

I note that our getrandom crate is kind of old and doesn't include the fix:

https://github.com/rust-random/getrandom/commit/6716ad06a5a1e502b80bd1983308eab655d90b83

Maybe we're running into problems with some argument type (and therefore calling convention) mismatches?

https://github.com/rust-random/getrandom/pull/73 suggests that bad things can happen with the current version of the getrandom crate.

And indeed https://github.com/torvalds/linux/blob/v5.4/drivers/char/random.c#L2186 would return EINVAL if we had bogus bits set in the upper 32 bits of the argument, and EINVAL is 22 or 0x16, which matches comment 2.

Updating getrandom is a bit of a mess, but since I've started this, I might as well try and finish it.

Assignee: nobody → nfroyd

[Tracking Requested - why for this release]: Crashes in the new Firefox for Android are not great. It's going to be a biggish patch to update this, but relatively safe.

We need this bump for a couple of reasons:

  1. It enables floating the libc crate version, which makes keeping up-to-date
    with upstream changes easier.
  2. It enables floating the getrandom crate version, which enables updating
    getrandom to a version with less buggy behavior around the Linux
    getrandom syscall on some architectures.

We also take this opportunity to sync up our vendored lucet with the lucet
that we use to build things in taskcluster.

As the commit title suggests, this is a manual update, but we need to also
update config/rules.mk for the changed location of the wasi bindings.

Depends on D85408

We can do this update now that lucet-runtime-internals no longer pins the
getrandom crate version; the conflicts around spinlocks have been removed
and the new version of getrandom brings in an important fix for calling
SYS_getrandom on some architectures.

Depends on D85409

Pushed by nfroyd@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/56611f947075
bump upstream versions of rlbox + lucet; r=firefox-build-system-reviewers,rstewart
https://hg.mozilla.org/integration/autoland/rev/3f50dac7d0ba
run `cargo update` and `mach vendor rust` for rlbox + lucet updates; r=firefox-build-system-reviewers,rstewart
https://hg.mozilla.org/integration/autoland/rev/6d86f8e2824d
run `cargo update -p getrandom` + `mach vendor rust` to bump `getrandom` version; r=firefox-build-system-reviewers,rstewart

I don't suppose for uplift we can just do the one-line change to getrandom?

(In reply to Julien Cristau [:jcristau] from comment #16)

I don't suppose for uplift we can just do the one-line change to getrandom?

We could make a fork of the getrandom crate that pulled in just the syscall fix, if you thought that was more palatable.

Anything that would improve the likelihood of being able to uplift for a potential 79 dot release for GV would be great, IMHO.

The patch landed in nightly and beta is affected.
:froydnj, is this bug important enough to require an uplift?
If not please set status_beta to wontfix.

For more information, please visit auto_nag documentation.

Flags: needinfo?(nfroyd)

(In reply to Release mgmt bot [:sylvestre / :calixte / :marco for bugbug] from comment #19)

The patch landed in nightly and beta is affected.
:froydnj, is this bug important enough to require an uplift?
If not please set status_beta to wontfix.

I think so? (Can't believe I'm asking questions of a bot.)

I'm trying to get a backport of the syscall fix done, and then get at least beta patched (ideally that patch will apply to release as well).

Flags: needinfo?(nfroyd)

[Tracking Requested - why for this release]: aklotz suggested in comment 18 that we might want to do a dot release for this, so asking for tracking to get an opinion on that.

This patch for beta takes a slightly different approach than the central
fix. Rather than updating all of lucet just to update the getrandom
create, we instead backport the necessary syscall calling fix to the 0.1.3
branch and vendor that backport instead.

Comment on attachment 9167698 [details]
Bug 1655929 - backport the getrandom syscall fix to 0.1.3; r=#build

Beta/Release Uplift Approval Request

  • User impact if declined: Weird startup crashes on arm64 devices for Fenix.

We are writing a beta-only (possibly for release as well) patch because this particular patch is much less invasive than the patch that landed on central.

  • Is this code covered by automated tests?: Yes
  • Has the fix been verified in Nightly?: No
  • Needs manual test from QE?: No
  • If yes, steps to reproduce: We don't have any STR, but it would be fantastic if QE could somehow find STR to reproduce on an arm64 Android device and then verify that this patch fixes those crashes.
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): We're just updating a single crate, and in that crate, we're just changing how a particular syscall calling sequence works; there shouldn't (crosses fingers) be any other functional changes happening here.
  • String changes made/needed: None
Attachment #9167698 - Flags: approval-mozilla-release?
Attachment #9167698 - Flags: approval-mozilla-beta?

Comment on attachment 9167698 [details]
Bug 1655929 - backport the getrandom syscall fix to 0.1.3; r=#build

crash fix for fenix; approved for 80.0b4, thanks

Attachment #9167698 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

Comment on attachment 9167698 [details]
Bug 1655929 - backport the getrandom syscall fix to 0.1.3; r=#build

Approved for Fenix 79.0.5.

Attachment #9167698 - Flags: approval-mozilla-release? → approval-mozilla-release+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: