Crash in [@ rand::rngs::thread::THREAD_RNG_KEY::__getit]
Categories
(Core :: DOM: Security, defect, P1)
Tracking
()
Tracking | Status | |
---|---|---|
firefox-esr91 | --- | wontfix |
firefox93 | --- | unaffected |
firefox94 | --- | unaffected |
firefox95 | --- | unaffected |
firefox96 | --- | unaffected |
People
(Reporter: gsvelto, Assigned: n.goeggi)
References
(Blocks 1 open bug)
Details
(Keywords: crash, Whiteboard: [domsecurity-backlog1])
Crash Data
Attachments
(3 obsolete files)
Crash report: https://crash-stats.mozilla.org/report/index/5073611f-b2ff-4414-b672-b2a9d0210324
Reason: SIGSEGV /SEGV_MAPERR
Top 10 frames of crashing thread:
0 libxul.so RustMozCrash mozglue/static/rust/wrappers.cpp:17
1 libxul.so mozglue_static::panic_hook mozglue/static/rust/lib.rs:89
2 libxul.so core::ops::function::Fn::call /builds/worker/fetches/rustc/lib/rustlib/src/rust/library/core/src/ops/function.rs:70
3 libxul.so std::panicking::rust_panic_with_hook library/std/src/panicking.rs:597
4 libxul.so std::panicking::begin_panic_handler::{{closure}} library/std/src/panicking.rs:499
5 libxul.so std::sys_common::backtrace::__rust_end_short_backtrace library/std/src/sys_common/backtrace.rs:141
6 libxul.so rust_begin_unwind library/std/src/panicking.rs:495
7 libxul.so std::panicking::begin_panic_fmt library/std/src/panicking.rs:437
8 libxul.so rand::rngs::thread::THREAD_RNG_KEY::__getit /builds/worker/fetches/rustc/lib/rustlib/src/rust/library/std/src/thread/local.rs:183
9 libxul.so <rand::rngs::thread::ThreadRng as core::default::Default>::default third_party/rust/rand/src/rngs/thread.rs:88
This is an odd crash but with a non-trivial volume. It seems to happen only on 32-bit ARM devices and specifically it appears to be happening on Android TV appliances, not phones. Filing it under DOM: Security because the crash always seems to originate from caps/NullPrincipalURI.cpp
.
Comment 1•4 years ago
|
||
nsNullPrincipal has been making a simple GkRustUtils::GenerateUUID(mPath);
call since Feb 2019. Did the rust crate get updated when this crash started happening in mid-December 2020 (or sometime in development leading up to that release)?
Updated•4 years ago
|
Comment 2•4 years ago
|
||
Ryan: please pass the needinfo on to anyone who might know the answer to comment 1
Comment 3•4 years ago
|
||
AFAICT, rand hasn't been updated since 2019. Maybe glandium or emilio would have a better idea, though.
Comment 4•4 years ago
|
||
I don't think anything in this particular codepath has changed. It seems like we're failing to initialize the OS RNG: https://searchfox.org/mozilla-central/rev/b6f52976b562008c9d9ceeda22907e1eda506c8e/third_party/rust/rand/src/rngs/thread.rs#64-65
Looking at the latest version of the UUID crate, they switched from rand
to getrandom
, for Uuid::new_v4
, but I don't think that'd fix it. Alternatively we could use a weaker RNG, but for principals that might not be great? Not sure who's the right person to make a call there.
Comment 5•4 years ago
|
||
The problem is deeper than that (it seems related to thread local storage), but for some reason we don't have the annotation for the MOZ_Crash in the crash? That's certainly weird.
Comment 6•4 years ago
|
||
Looking at the raw minidump, the panic string seems to be "could not initialize thread_rng: Invalid argument", which does come from the failure to initialize the thread local storage value, not a failure from the thread local storage itself.
Comment 7•4 years ago
|
||
Gabriele, it would be interesting to figure out why the panic string doesn't appear in the crash report.
Comment 8•4 years ago
|
||
The code flow would be:
https://searchfox.org/mozilla-central/rev/b6f52976b562008c9d9ceeda22907e1eda506c8e/third_party/rust/rand/src/rngs/thread.rs#64-65
https://searchfox.org/mozilla-central/rev/9b430bb1a11d7152cab2af4574f451ffb906b052/third_party/rust/rand_core/src/lib.rs#355-359
https://searchfox.org/mozilla-central/rev/9b430bb1a11d7152cab2af4574f451ffb906b052/third_party/rust/rand_core/src/os.rs#67-70
https://searchfox.org/mozilla-central/rev/9b430bb1a11d7152cab2af4574f451ffb906b052/third_party/rust/getrandom/src/lib.rs#277-282
https://searchfox.org/mozilla-central/rev/9b430bb1a11d7152cab2af4574f451ffb906b052/third_party/rust/getrandom/src/linux_android.rs#14-23
From there, we either use
https://searchfox.org/mozilla-central/rev/9b430bb1a11d7152cab2af4574f451ffb906b052/third_party/rust/getrandom/src/linux_android.rs#38-44
or
https://searchfox.org/mozilla-central/rev/9b430bb1a11d7152cab2af4574f451ffb906b052/third_party/rust/getrandom/src/use_file.rs#30-43
In any case, whether UUID uses getrandom or rand doesn't change anything, since rand uses getrandom and the failure originates from getrandom anyways.
The error message "Invalid argument" would be EINVAL, so that would mean either getrandom() or read() returned EINVAL. getrandom's manpage says EINVAL happens with invalid flags
value (last argument), and we pass 0, it would be hard for that to be invalid without getrandom actually being ENOSYS. Getrandom was added to kernel 3.17 and the crash was on kernel 3.14.x, but we can't really assume that getrandom was not backported. OTOH, from the code flow I guess we can assume that we're not in using getrandom. read
's manpage says EINVAL happens in the following cases:
- fd is attached to an object which is unsuitable for reading
- fd was opened with the O_DIRECT flag, and either the address specified in buf, the value specified in count, or the file offset is not suitably aligned.
- fd was created via a call to timerfd_create(2) and the wrong size buffer was given to read()
None of these directly make sense.
There also has been no substantial change to the use_file.rs file in getrandom since the version we currently use (0.1.14).
Comment 9•4 years ago
|
||
No need to investigate, we decided to go back and use our built-in UUID generator we use in other places. No need for calling into rust code after all.
I'll fix that.
Comment 10•4 years ago
|
||
Reporter | ||
Comment 11•4 years ago
|
||
(In reply to Mike Hommey [:glandium] from comment #7)
Gabriele, it would be interesting to figure out why the panic string doesn't appear in the crash report.
I think this is another instance of bug 1681846 which is specific to Android/ARM.
Comment 12•4 years ago
|
||
(In reply to Gabriele Svelto [:gsvelto] from comment #11)
(In reply to Mike Hommey [:glandium] from comment #7)
Gabriele, it would be interesting to figure out why the panic string doesn't appear in the crash report.
I think this is another instance of bug 1681846 which is specific to Android/ARM.
FWIW, it's probably unrelated, because as far as I can tell from the machine code for MozRustCrash
in a recent nightly, the crash annotation is stored in gMozCrashReason. I'd need to double check in the actual build the user had for the crash report.
Reporter | ||
Comment 13•4 years ago
|
||
(In reply to Mike Hommey [:glandium] from comment #12)
FWIW, it's probably unrelated, because as far as I can tell from the machine code for
MozRustCrash
in a recent nightly, the crash annotation is stored in gMozCrashReason. I'd need to double check in the actual build the user had for the crash report.
Yeah, I did some quick testing yesterday and it seems to be a different issue. I can find the string in the minidump so it's likely being set but it's not being added to the crash annotations. Probably a problem with the exception handler then.
Comment 14•3 years ago
|
||
Most likely Niklas will pick that one up while working on Meta Bug 1716730.
Updated•3 years ago
|
Assignee | ||
Comment 15•3 years ago
|
||
Assignee | ||
Comment 16•3 years ago
|
||
Depends on D118714
Comment 17•3 years ago
|
||
Long ago we had mysterious issues with Fennec getting an error when trying to open /dev/urandom on some devices. That was Bug 1140806. Perhaps getrandom’s use_file.read is failing to open /dev/random on these 32-bit Android TVs?
Comment 18•3 years ago
|
||
Niklas, do you still plan to land these patches to use MFBT's RandomUint64OrDie()
in nsUUIDGenerator?
I have some similar nsUUIDGenerator changes planned (in bug 1723674) that could either build upon or duplicate your patches.
I see the most recent crashing versions for this crash signature are from Fenix 86.1.1 and 87.0.0-rc.1, so possibly these crashes were fixed in Fenix 88 or in the OS. However, we are still receiving crash reports from Fenix 84 and 86.1.1 even this week, six months after though those versions were released. I don't know why those users are stuck on such old Fenix builds. Perhaps they are manually side-loading Fenix APKs on their Android TVs and not receiving app updates through Google Play Store or whatever OEM app store is on their Android TV.
Assignee | ||
Comment 19•3 years ago
|
||
(Sorry for my late response i was on vacation)
I was planing on landing these before my vacation but i will abandon them now in favor of your new patches in bug 1723674.
Updated•3 years ago
|
Updated•3 years ago
|
Comment 20•3 years ago
|
||
I think we can close this bug now. Bug 1724152 changed the UUID generation code to no longer call the Rust RNG (in Nightly 96). Plus we haven't seen any crash reports with this signature since Fenix 91.4.0.
Updated•3 years ago
|
Description
•