Closed Bug 1700675 Opened 3 years ago Closed 3 years ago

Crash in [@ rand::rngs::thread::THREAD_RNG_KEY::__getit]

Categories

(Core :: DOM: Security, defect, P1)

ARM
Android
defect

Tracking

()

RESOLVED WORKSFORME
Tracking Status
firefox-esr91 --- wontfix
firefox93 --- unaffected
firefox94 --- unaffected
firefox95 --- unaffected
firefox96 --- unaffected

People

(Reporter: gsvelto, Assigned: n.goeggi)

References

(Blocks 1 open bug)

Details

(Keywords: crash, Whiteboard: [domsecurity-backlog1])

Crash Data

Attachments

(3 obsolete files)

Crash report: https://crash-stats.mozilla.org/report/index/5073611f-b2ff-4414-b672-b2a9d0210324

Reason: SIGSEGV /SEGV_MAPERR

Top 10 frames of crashing thread:

0 libxul.so RustMozCrash mozglue/static/rust/wrappers.cpp:17
1 libxul.so mozglue_static::panic_hook mozglue/static/rust/lib.rs:89
2 libxul.so core::ops::function::Fn::call /builds/worker/fetches/rustc/lib/rustlib/src/rust/library/core/src/ops/function.rs:70
3 libxul.so std::panicking::rust_panic_with_hook library/std/src/panicking.rs:597
4 libxul.so std::panicking::begin_panic_handler::{{closure}} library/std/src/panicking.rs:499
5 libxul.so std::sys_common::backtrace::__rust_end_short_backtrace library/std/src/sys_common/backtrace.rs:141
6 libxul.so rust_begin_unwind library/std/src/panicking.rs:495
7 libxul.so std::panicking::begin_panic_fmt library/std/src/panicking.rs:437
8 libxul.so rand::rngs::thread::THREAD_RNG_KEY::__getit /builds/worker/fetches/rustc/lib/rustlib/src/rust/library/std/src/thread/local.rs:183
9 libxul.so <rand::rngs::thread::ThreadRng as core::default::Default>::default third_party/rust/rand/src/rngs/thread.rs:88

This is an odd crash but with a non-trivial volume. It seems to happen only on 32-bit ARM devices and specifically it appears to be happening on Android TV appliances, not phones. Filing it under DOM: Security because the crash always seems to originate from caps/NullPrincipalURI.cpp.

nsNullPrincipal has been making a simple GkRustUtils::GenerateUUID(mPath); call since Feb 2019. Did the rust crate get updated when this crash started happening in mid-December 2020 (or sometime in development leading up to that release)?

Flags: needinfo?(ryanvm)

Ryan: please pass the needinfo on to anyone who might know the answer to comment 1

AFAICT, rand hasn't been updated since 2019. Maybe glandium or emilio would have a better idea, though.

Flags: needinfo?(ryanvm)
Flags: needinfo?(mh+mozilla)
Flags: needinfo?(emilio)

I don't think anything in this particular codepath has changed. It seems like we're failing to initialize the OS RNG: https://searchfox.org/mozilla-central/rev/b6f52976b562008c9d9ceeda22907e1eda506c8e/third_party/rust/rand/src/rngs/thread.rs#64-65

Looking at the latest version of the UUID crate, they switched from rand to getrandom, for Uuid::new_v4, but I don't think that'd fix it. Alternatively we could use a weaker RNG, but for principals that might not be great? Not sure who's the right person to make a call there.

Flags: needinfo?(emilio)

The problem is deeper than that (it seems related to thread local storage), but for some reason we don't have the annotation for the MOZ_Crash in the crash? That's certainly weird.

Flags: needinfo?(mh+mozilla)

Looking at the raw minidump, the panic string seems to be "could not initialize thread_rng: Invalid argument", which does come from the failure to initialize the thread local storage value, not a failure from the thread local storage itself.

Gabriele, it would be interesting to figure out why the panic string doesn't appear in the crash report.

Flags: needinfo?(gsvelto)

The code flow would be:
https://searchfox.org/mozilla-central/rev/b6f52976b562008c9d9ceeda22907e1eda506c8e/third_party/rust/rand/src/rngs/thread.rs#64-65
https://searchfox.org/mozilla-central/rev/9b430bb1a11d7152cab2af4574f451ffb906b052/third_party/rust/rand_core/src/lib.rs#355-359
https://searchfox.org/mozilla-central/rev/9b430bb1a11d7152cab2af4574f451ffb906b052/third_party/rust/rand_core/src/os.rs#67-70
https://searchfox.org/mozilla-central/rev/9b430bb1a11d7152cab2af4574f451ffb906b052/third_party/rust/getrandom/src/lib.rs#277-282
https://searchfox.org/mozilla-central/rev/9b430bb1a11d7152cab2af4574f451ffb906b052/third_party/rust/getrandom/src/linux_android.rs#14-23

From there, we either use
https://searchfox.org/mozilla-central/rev/9b430bb1a11d7152cab2af4574f451ffb906b052/third_party/rust/getrandom/src/linux_android.rs#38-44
or
https://searchfox.org/mozilla-central/rev/9b430bb1a11d7152cab2af4574f451ffb906b052/third_party/rust/getrandom/src/use_file.rs#30-43

In any case, whether UUID uses getrandom or rand doesn't change anything, since rand uses getrandom and the failure originates from getrandom anyways.

The error message "Invalid argument" would be EINVAL, so that would mean either getrandom() or read() returned EINVAL. getrandom's manpage says EINVAL happens with invalid flags value (last argument), and we pass 0, it would be hard for that to be invalid without getrandom actually being ENOSYS. Getrandom was added to kernel 3.17 and the crash was on kernel 3.14.x, but we can't really assume that getrandom was not backported. OTOH, from the code flow I guess we can assume that we're not in using getrandom. read's manpage says EINVAL happens in the following cases:

  • fd is attached to an object which is unsuitable for reading
  • fd was opened with the O_DIRECT flag, and either the address specified in buf, the value specified in count, or the file offset is not suitably aligned.
  • fd was created via a call to timerfd_create(2) and the wrong size buffer was given to read()

None of these directly make sense.

There also has been no substantial change to the use_file.rs file in getrandom since the version we currently use (0.1.14).

No need to investigate, we decided to go back and use our built-in UUID generator we use in other places. No need for calling into rust code after all.

I'll fix that.

Assignee: nobody → ckerschb
Severity: -- → S3
Status: NEW → ASSIGNED
Flags: needinfo?(gsvelto)
Priority: -- → P1
Whiteboard: [domsecurity-active]

(In reply to Mike Hommey [:glandium] from comment #7)

Gabriele, it would be interesting to figure out why the panic string doesn't appear in the crash report.

I think this is another instance of bug 1681846 which is specific to Android/ARM.

(In reply to Gabriele Svelto [:gsvelto] from comment #11)

(In reply to Mike Hommey [:glandium] from comment #7)

Gabriele, it would be interesting to figure out why the panic string doesn't appear in the crash report.

I think this is another instance of bug 1681846 which is specific to Android/ARM.

FWIW, it's probably unrelated, because as far as I can tell from the machine code for MozRustCrash in a recent nightly, the crash annotation is stored in gMozCrashReason. I'd need to double check in the actual build the user had for the crash report.

(In reply to Mike Hommey [:glandium] from comment #12)

FWIW, it's probably unrelated, because as far as I can tell from the machine code for MozRustCrash in a recent nightly, the crash annotation is stored in gMozCrashReason. I'd need to double check in the actual build the user had for the crash report.

Yeah, I did some quick testing yesterday and it seems to be a different issue. I can find the string in the minidump so it's likely being set but it's not being added to the crash annotations. Probably a problem with the exception handler then.

Most likely Niklas will pick that one up while working on Meta Bug 1716730.

Assignee: ckerschb → nobody
Status: ASSIGNED → NEW
Whiteboard: [domsecurity-active] → [domsecurity-backlog1]
Attachment #9217343 - Attachment is obsolete: true
Assignee: nobody → ngogge
Status: NEW → ASSIGNED

Long ago we had mysterious issues with Fennec getting an error when trying to open /dev/urandom on some devices. That was Bug 1140806. Perhaps getrandom’s use_file.read is failing to open /dev/random on these 32-bit Android TVs?

Niklas, do you still plan to land these patches to use MFBT's RandomUint64OrDie() in nsUUIDGenerator?

I have some similar nsUUIDGenerator changes planned (in bug 1723674) that could either build upon or duplicate your patches.

I see the most recent crashing versions for this crash signature are from Fenix 86.1.1 and 87.0.0-rc.1, so possibly these crashes were fixed in Fenix 88 or in the OS. However, we are still receiving crash reports from Fenix 84 and 86.1.1 even this week, six months after though those versions were released. I don't know why those users are stuck on such old Fenix builds. Perhaps they are manually side-loading Fenix APKs on their Android TVs and not receiving app updates through Google Play Store or whatever OEM app store is on their Android TV.

Flags: needinfo?(ngogge)

(Sorry for my late response i was on vacation)

I was planing on landing these before my vacation but i will abandon them now in favor of your new patches in bug 1723674.

Flags: needinfo?(ngogge)
Attachment #9228768 - Attachment is obsolete: true
Depends on: 1724152
Attachment #9230044 - Attachment is obsolete: true

I think we can close this bug now. Bug 1724152 changed the UUID generation code to no longer call the Rust RNG (in Nightly 96). Plus we haven't seen any crash reports with this signature since Fenix 91.4.0.

https://crash-stats.mozilla.org/signature/?signature=rand%3A%3Arngs%3A%3Athread%3A%3ATHREAD_RNG_KEY%3A%3A__getit&date=%3E%3D2021-05-05T17%3A29%3A00.000Z&date=%3C2021-11-05T17%3A29%3A00.000Z

Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: