startup Crash in [@ uuid::Uuid::new_v4]
Categories
(Toolkit :: Telemetry, defect)
Tracking
()
Tracking | Status | |
---|---|---|
firefox-esr91 | --- | unaffected |
firefox-esr102 | --- | unaffected |
firefox104 | --- | unaffected |
firefox105 | --- | wontfix |
firefox106 | --- | wontfix |
firefox110 | --- | wontfix |
firefox111 | --- | verified |
firefox112 | --- | verified |
People
(Reporter: aryx, Assigned: yannis)
References
Details
(Keywords: crash, Whiteboard: [win:stability])
Crash Data
Attachments
(1 file)
48 bytes,
text/x-phabricator-request
|
dmeehan
:
approval-mozilla-beta+
dmeehan
:
approval-mozilla-release-
|
Details | Review |
40 crash reports from 8+ installations, all on Windows 7 or 8.1 and all with 32-bit builds of 105 betas. >90% in the first minute after launch.
Could this be from the changes in bug 1781030? There hadn't been a crash during the Nightly development cycle or 104 betas.
Crash report: https://crash-stats.mozilla.org/report/index/3680107d-572f-4da2-80ca-b52b90220830
MOZ_CRASH Reason: could not retreive random bytes for uuid: OS Error: 1073741947
Top 10 frames of crashing thread:
0 xul.dll RustMozCrash mozglue/static/rust/wrappers.cpp:17
1 xul.dll mozglue_static::panic_hook mozglue/static/rust/lib.rs:91
2 xul.dll core::ops::function::Fn::call<void ../4b91a6ea7258a947e59c6522cd5898e7c0a6a88f/library/core/src/ops/function.rs:248
3 xul.dll std::panicking::rust_panic_with_hook library/std/src/panicking.rs:702
4 xul.dll std::panicking::begin_panic_handler::closure$0 library/std/src/panicking.rs:588
5 xul.dll std::sys_common::backtrace::__rust_end_short_backtrace<std::panicking::begin_panic_handler::closure_env$0, never$> library/std/src/sys_common/backtrace.rs:138
6 xul.dll std::panicking::begin_panic_handler library/std/src/panicking.rs:584
7 xul.dll core::panicking::panic_fmt library/core/src/panicking.rs:142
8 xul.dll uuid::Uuid::new_v4 third_party/rust/uuid/src/v4.rs:38
9 xul.dll glean_core::metrics::ping::PingType::submit_sync third_party/rust/glean-core/src/metrics/ping.rs:141
Comment 1•2 years ago
|
||
bug 1781030 didn't change anything about how we use uuid::Uuid::new_v4
, nor have we touched that particular crashing code path.
The uuid crate was updated from 0.8.1 to 0.8.2 in v103.
That uuid release changed from the rand
crate to getrandom
. That introduced that panic on a getrandom error.
That still doesn't explain why we're only seeing this now.
We could downgrade to 0.8.1 if this causes that many crashes.
Comment 2•2 years ago
|
||
Sounds similar to https://github.com/rust-lang/rust/issues/94098
Comment 3•2 years ago
|
||
Does that mean we should try to get a similar fallback into getrandom
now or do we rather go the downgrade route for now?
Comment 4•2 years ago
•
|
||
So I tried to traceback how uuid
uses rand
and how rand
then works.
rand
also uses getrandom
under the hood, so we at some point end up calling BCryptGenRandom
, but probably not as often, because rand
uses that just as a seed for its chacha implementation (if I understood it right).
I'm still not sure whether that means it might succeed the time we need it and further calls to create uuids will simply not call it.
I guess we have some options here for now:
- We downgrade back to uuid 0.8.1, which goes the
rand
way. We haven't updatedrand
in a while, so this might "fix" it for now. - Or we implement the same fallback algorithm for getrandom is easy
- but getrandom decided to explicitly only use
BCryptGenRandom
a while back, so I'm not sure how much leverage we have convincing them to take this fallback. - Of course we can use the patched version, but then we have all the downsides of a fork (missing out on updates when they happen upstream, ...).
- but getrandom decided to explicitly only use
we pull in the minimal things needed fromnot really an option, as it turns out we need quite a bit more than just v4 UUIDs.uuid
andgetrandom
into Glean and then can patch it up as we need it (essentially also a fork with the same issues, but given the smaller API maaaybe ok)
Comment 5•2 years ago
|
||
Since the crash volume is low (less than 5 per week), the severity is downgraded to S3
. Feel free to change it back if you think the bug is still critical.
For more information, please visit auto_nag documentation.
Updated•1 year ago
|
Comment 7•1 year ago
|
||
but getrandom decided to explicitly only use BCryptGenRandom a while back, so I'm not sure how much leverage we have convincing them to take this fallback.
AFAIK cmartin successfully argued them to reverse this in the stdlib. Note that one justification in that thread is "that's what Firefox does" and as you discovered here (and we also did in previous bugs), "what Firefox does" is actually broken on older Windows.
Comment 8•1 year ago
|
||
It's worth nothing that Rust StdLib eventually removed RtlGenRandom again after my original fix and went a different route with it. The getrandom
crate has decided to do the same, and there is currently a WIP PR to fix this, but it's been in limbo for the last 5 weeks (presumably due to time constraints). Perhaps it would be a good idea to help the maintainers of that crate get this fixed soon?
Assignee | ||
Comment 9•1 year ago
•
|
||
Hello,
I did a bit of automated analysis on 71 crashes. In particular I tried to identify and restore recurring strings on the stack of the thread that crashes. Among those 71 crashes there were actually two different reasons for crashing:
- for 67 crashes:
could not retreive random bytes for uuid: OS Error: 1073741947
which maps toNTSTATUS
0xC000007B
STATUS_INVALID_IMAGE_FORMAT
; - for 4 crashes:
could not retreive random bytes for uuid: OS Error: 1073742053
which maps toNTSTATUS
0xC00000E5
STATUS_INTERNAL_ERROR
.
Crashes of type (1) appear to be failures to load the SysWOW64 variant of the bcryptprimitives.dll
library from code that lives in bcrypt.dll
. I recovered recurring partial strings \SysWOW64\bcryptpr
and W64\bcryptprimitives.dll
, which indicate that it is likely that an attempt to load this library occurred shortly before crashing. bcrypt!_LoadImage
is reachable from bcrypt!BCryptGenRandom
when used with BCRYPT_USE_SYSTEM_PREFERRED_RNG
, and it contains code that looks as follows, which makes it the most likely location where this error originated from:
if (!LoadLibraryW(...)) {
return GetLastError() == ERROR_MOD_NOT_FOUND ? STATUS_DLL_NOT_FOUND : STATUS_INVALID_IMAGE_FORMAT;
}
The problem with this code is that as users of BCryptGenRandom
we do not have access to the original last error value but only to this STATUS_INVALID_IMAGE_FORMAT
(and for some reason, there doesn't seem to be last errors attached to these crashes). Recovering the last error may give us a better explanation why we failed to load the library: we could try to load bcryptprimitives.dll
manually within Firefox and catch the last error value if we want to explore this further. Note that this doesn't appear to be a sandboxing issue, as the crashes seem to be main process crashes.
Crashes of type (2) appear to be linked to a product from the InfoTeCS company (who name their products itcs-*
). This product likely registers as a preferred RNG provider for the system but sometimes fails to generate numbers. I recovered the following partial strings from the stacks of these 4 crashes:
itcs-cng-provider 0xe2c itcsrng_msinterface.cpp:242 itccng::bcrypt::ItcsRng_MsInterface::GenRandom <- !!!FA
;
itcs-cng-provider 0xba0 itcsrng_msinterface.cpp:242 itccng::bcrypt::ItcsRng_MsInterface::GenRandom <- !!!FA
;
itcs-cng-provider 0x1368 itcsrng_msinterface.cpp:242 itccng::bcrypt::ItcsRng_MsInterface::GenRandom <- !!!F
;
itcs-cng-provider 0xf04 itcsrng_msinterface.cpp:242 itccng::bcrypt::ItcsRng_MsInterface::GenRandom <- !!!FA
.
Given that we have at least 2 different reasons why BCryptGenRandom
can fail, it seems to me that having a fallback solution was a good idea.
Comment 10•1 year ago
|
||
(In reply to Yannis Juglaret from comment #9)
The problem with this code is that as users of
BCryptGenRandom
we do not have access to the original last error value but only to thisSTATUS_INVALID_IMAGE_FORMAT
(and for some reason, there doesn't seem to be last errors attached to these crashes).
The last error value is held in the TEB and we only capture that in nightly. Given how useful this is I was thinking of extending this to beta and release. We don't want all crash reports to become bigger because of it, so maybe we could do it statistically, like stuff the minidump with extra data once every two beta crashes and once every twenty release ones. I'll file a bug for that.
Comment 11•1 year ago
|
||
The bug is linked to a topcrash signature, which matches the following criterion:
- Top 20 desktop browser crashes on beta
:chutten, could you consider increasing the severity of this top-crash bug?
For more information, please visit auto_nag documentation.
Comment 12•1 year ago
|
||
I'd love to be able to do more beyond incrementing the severity, but all I can do is add more background:
uuid
was just upped to 1.0 in bug 1804359, butgetrandom
is still unconditionally usingBCryptGenRandom
in their latest, so I don't expect any improvement.- I've added links to a previous rustlang issue for this as well as
getrandom
's own issue for this fallback. Jan-Erik, it looks as though they are (or were as of October) amenable to adding a fallback. Does that change your impression of how such a patch might be received?
Comment 13•1 year ago
|
||
(In reply to Chris H-C :chutten from comment #12)
- I've added links to a previous rustlang issue for this as well as
getrandom
's own issue for this fallback. Jan-Erik, it looks as though they are (or were as of October) amenable to adding a fallback. Does that change your impression of how such a patch might be received?
Updated•1 year ago
|
Comment 14•1 year ago
|
||
Based on the topcrash criteria, the crash signatures linked to this bug are not in the topcrash signatures anymore.
For more information, please visit auto_nag documentation.
Assignee | ||
Updated•1 year ago
|
Assignee | ||
Comment 15•1 year ago
•
|
||
(In reply to Jan-Erik Rediger [:janerik] from comment #13)
Thanks [:janerik] for working on this! I played with the patch, and while it does provide a fallback, the fallback itself also relies on successfully loading bcryptprimitives.dll
, which seems to be what's failing for the vast majority of our crash reports (see comment 9). So I don't think this would be enough to fix our stability issue, except maybe for the few crashes related to InfoTeCS.
I think the stability issue can only be fixed by a fallback that doesn't rely on successful loading of bcryptprimitives.dll
, so one that uses previous APIs such as ADVAPI32!RtlGenRandom
like mentioned in comment 8. If you believe that getrandom
won't agree to have this fallback, then maybe we should move on our own about this? For example, I can try to write a patch to hook BCryptGenRandom
on Windows 7, and implement the RtlGenRandom
-based fallback in that hook?
To reach this conclusion, I played with a test executable in a WinDbg, and simulated the scenario where trying to load bcryptprimitives.dll
fails. In that case, both the original path and the fallback path end up failing, and rust ultimately panics with the same error as the one I recovered for most crashes in comment 9. Below are the two call stacks that lead to loading bcryptprimitives.dll
, the first is in the original path, the second is in the fallback path.
...
// First call stack
0:000> k
# Child-SP RetAddr Call Site
00 000000e8`53cfeff8 00007ff9`ba8f6ff5 KERNELBASE!LoadLibraryExW
01 000000e8`53cff000 00007ff9`ba8f6ad9 bcrypt!_LoadImage+0xa1
02 000000e8`53cff060 00007ff9`ba8f48c1 bcrypt!LoadProviderEx+0x189
03 000000e8`53cff0f0 00007ff9`ba8f18d6 bcrypt!BCryptOpenAlgorithmProvider+0x701
04 000000e8`53cff500 00007ff9`ba8f177c bcrypt!OpenSystemPreferredAlgorithmProvider+0x9a
05 000000e8`53cff560 00007ff6`0e022a0e bcrypt!BCryptGenRandom+0x1ac
06 000000e8`53cff5e0 00007ff6`0e02264e bcrypt_exe!getrandom::imp::Rng::random+0x9e [C:\Users\yjugl\.cargo\git\checkouts\getrandom-4e87a219ad2fb861\6f67832\src\windows.rs @ 128]
07 000000e8`53cff680 00007ff6`0e02217a bcrypt_exe!getrandom::imp::getrandom_inner+0xee [C:\Users\yjugl\.cargo\git\checkouts\getrandom-4e87a219ad2fb861\6f67832\src\windows.rs @ 65]
08 000000e8`53cff770 00007ff6`0e02221f bcrypt_exe!getrandom::getrandom_uninit+0x7a [C:\Users\yjugl\.cargo\git\checkouts\getrandom-4e87a219ad2fb861\6f67832\src\lib.rs @ 335]
09 000000e8`53cff800 00007ff6`0e021e1d bcrypt_exe!getrandom::getrandom+0x3f [C:\Users\yjugl\.cargo\git\checkouts\getrandom-4e87a219ad2fb861\6f67832\src\lib.rs @ 307]
0a 000000e8`53cff8a0 00007ff6`0e021488 bcrypt_exe!uuid::rng::bytes+0x2d [C:\repos\rust\bcrypt\uuid\src\rng.rs @ 7]
0b 000000e8`53cff8e0 00007ff6`0e021011 bcrypt_exe!uuid::Uuid::new_v4+0x18 [C:\repos\rust\bcrypt\uuid\src\v4.rs @ 34]
...
// First parameter
0:000> du rcx
000001e8`7ce0a700 "C:\WINDOWS\system32\bcryptprimit"
000001e8`7ce0a740 "ives.dll"
// Simulating a failure to load the library
0:000> r rip=poi(rsp); r rsp=rsp+8; r rax=0; g
...
// Second call stack
0:000> k
# Child-SP RetAddr Call Site
00 000000e8`53cfefc8 00007ff9`ba8f6ff5 KERNELBASE!LoadLibraryExW
01 000000e8`53cfefd0 00007ff9`ba8f6ad9 bcrypt!_LoadImage+0xa1
02 000000e8`53cff030 00007ff9`ba8f48c1 bcrypt!LoadProviderEx+0x189
03 000000e8`53cff0c0 00007ff6`0e0227f3 bcrypt!BCryptOpenAlgorithmProvider+0x701
04 000000e8`53cff4d0 00007ff6`0e022a8b bcrypt_exe!getrandom::imp::Rng::open+0xe3 [C:\Users\yjugl\.cargo\git\checkouts\getrandom-4e87a219ad2fb861\6f67832\src\windows.rs @ 100]
05 000000e8`53cff5d0 00007ff6`0e022681 bcrypt_exe!getrandom::imp::fallback_rng+0x2b [C:\Users\yjugl\.cargo\git\checkouts\getrandom-4e87a219ad2fb861\6f67832\src\windows.rs @ 147]
06 000000e8`53cff680 00007ff6`0e02217a bcrypt_exe!getrandom::imp::getrandom_inner+0x121 [C:\Users\yjugl\.cargo\git\checkouts\getrandom-4e87a219ad2fb861\6f67832\src\windows.rs @ 66]
07 000000e8`53cff770 00007ff6`0e02221f bcrypt_exe!getrandom::getrandom_uninit+0x7a [C:\Users\yjugl\.cargo\git\checkouts\getrandom-4e87a219ad2fb861\6f67832\src\lib.rs @ 335]
08 000000e8`53cff800 00007ff6`0e021e1d bcrypt_exe!getrandom::getrandom+0x3f [C:\Users\yjugl\.cargo\git\checkouts\getrandom-4e87a219ad2fb861\6f67832\src\lib.rs @ 307]
09 000000e8`53cff8a0 00007ff6`0e021488 bcrypt_exe!uuid::rng::bytes+0x2d [C:\repos\rust\bcrypt\uuid\src\rng.rs @ 7]
0a 000000e8`53cff8e0 00007ff6`0e021011 bcrypt_exe!uuid::Uuid::new_v4+0x18 [C:\repos\rust\bcrypt\uuid\src\v4.rs @ 34]
// First parameter
0:000> du rcx
000001e8`7ce0a700 "C:\WINDOWS\system32\bcryptprimit"
000001e8`7ce0a740 "ives.dll"
// Simulating a failure to load the library
0:000> r rip=poi(rsp); r rsp=rsp+8; r rax=0; g
...
// Result (this is NTSTATUS 0xc000007b)
thread 'main' panicked at 'could not retrieve random bytes for uuid: OS Error: 1073741947', uuid\src\rng.rs:9:13
Assignee | ||
Comment 16•1 year ago
|
||
This recent crash signature indeed suggests that Rust's "fallback" implementation doesn't cover the case where bcryptprimitives.dll
fails to load, these crashes all have moz crash reason RNG broken: 0xc000007b, fallback RNG broken: 0xc000007b
(see comment 9 for what 0xc000007b means). We need RtlGenRandom
.
Comment 17•1 year ago
•
|
||
I filed an issue on Rust StdLib, and they have a PR up to fix this. When that merges, we should be good. (You know... In a few months when the new version of Rust is released)
Assignee | ||
Updated•1 year ago
|
Assignee | ||
Comment 18•1 year ago
|
||
I'm adding the std::sys::windows::rand::fallback_rng
signature here because the root cause is the same: calling BCryptGenRandom
fails, usually because of failure to load bcryptprimitives.dll
. So these signatures should be monitored together. In fact, the reason we do not see the uuid crash in 110 is not because it's fixed, but because these users crash earlier in std::sys::windows::rand::fallback_rng
and never reach uuid code.
Assignee | ||
Comment 19•1 year ago
•
|
||
Important: These STR can leave your system in an unbootable and/or corrupt state. Please follow them in a virtual machine.
Important: These STR only apply to Windows 7, which is the main concern for this crash.
Steps to reproduce:
- Take a snapshot of your Windows 7 VM, and boot it.
- Install the Firefox version you want to test.
- If testing Firefox 32-bit on Windows 64-bit, navigate to
C:\Windows\SysWOW64
.
In all other cases, navigate toC:\Windows\System32
instead.
In case of doubt, do the steps in both folders. - Right click on
bcryptprimitives.dll
, clickProperties
, then underSecurity
clickAdvanced
. - Under
Owner
, clickEdit...
, select your current user, clickOK
, clickOK
. - Close
Advanced Security Settings
. - Back in
bcryptprimitives.dll Properties
, underSecurity
, clickEdit...
, selectUsers
, click theAllow
box next toFull control
, clickOK
, clickYes
. - Back in the file explorer, rename
bcryptprimitives.dll
tobcryptprimitivesnotfound.dll
, clickContinue
. - Run Firefox.
- After noticing the problem, restore your VM snapshot.
Problem:
- Firefox refuses to start and crashes immediately (on Windows 7).
Expected:
- Firefox starts (on Windows 7).
Assignee | ||
Updated•1 year ago
|
Assignee | ||
Comment 20•1 year ago
|
||
Below is the stack trace for the first use of BCryptGenRandom
in firefox.exe
currently. We would need to setup a hook before that call.
00 000000ed`295ff688 00007ffb`d0d84f95 bcrypt!BCryptGenRandom
01 (Inline Function) --------`-------- xul!std::collections::hash::map::impl$82::new::KEYS::__getit::closure$0+0xb0 [/rustc/897e37553bba8b42751c67658967889d11ecd120/library\std\src\thread\local.rs @ 353]
02 (Inline Function) --------`-------- xul!std::thread::local::lazy::LazyKeyInner::initialize+0xb0 [/rustc/897e37553bba8b42751c67658967889d11ecd120/library\std\src\thread\local.rs @ 809]
03 000000ed`295ff690 00007ffb`d0d95a86 xul!std::thread::local::fast::Key::try_initialize<core::cell::Cell<tuple$<u64,u64> >,std::collections::hash::map::impl$82::new::KEYS::__getit::closure_env$0>+0xb5 [/rustc/897e37553bba8b42751c67658967889d11ecd120/library\std\src\thread\local.rs @ 987]
04 (Inline Function) --------`-------- xul!std::thread::local::fast::Key::get+0x22 [/rustc/897e37553bba8b42751c67658967889d11ecd120/library\std\src\thread\local.rs @ 970]
05 000000ed`295ff6d0 00007ffb`d1ed9fdf xul!std::collections::hash::map::impl$82::new::KEYS::__getit+0x26 [/rustc/897e37553bba8b42751c67658967889d11ecd120/library\std\src\thread\local.rs @ 356]
06 (Inline Function) --------`-------- xul!std::thread::local::LocalKey<core::cell::Cell<tuple$<u64,u64> > >::try_with+0x7 [/rustc/897e37553bba8b42751c67658967889d11ecd120\library\std\src\thread\local.rs @ 444]
07 (Inline Function) --------`-------- xul!std::thread::local::LocalKey<core::cell::Cell<tuple$<u64,u64> > >::with+0x7 [/rustc/897e37553bba8b42751c67658967889d11ecd120\library\std\src\thread\local.rs @ 421]
08 (Inline Function) --------`-------- xul!std::collections::hash::map::RandomState::new+0x7 [/rustc/897e37553bba8b42751c67658967889d11ecd120\library\std\src\collections\hash\map.rs @ 3129]
09 (Inline Function) --------`-------- xul!std::collections::hash::map::impl$87::default+0x7 [/rustc/897e37553bba8b42751c67658967889d11ecd120\library\std\src\collections\hash\map.rs @ 3209]
0a (Inline Function) --------`-------- xul!std::collections::hash::map::impl$8::default+0x7 [/rustc/897e37553bba8b42751c67658967889d11ecd120\library\std\src\collections\hash\map.rs @ 1320]
0b (Inline Function) --------`-------- xul!std::collections::hash::map::HashMap<enum2$<core::option::Option<alloc::string::String> >,log::LevelFilter,std::collections::hash::map::RandomState>::new+0x7 [/rustc/897e37553bba8b42751c67658967889d11ecd120\library\std\src\collections\hash\map.rs @ 235]
0c (Inline Function) --------`-------- xul!env_logger::filter::Builder::new+0x7 [/builds/worker/checkouts/gecko/third_party/rust/env_logger/src/filter/mod.rs @ 175]
0d (Inline Function) --------`-------- xul!env_logger::filter::impl$2::default+0x7 [/builds/worker/checkouts/gecko/third_party/rust/env_logger/src/filter/mod.rs @ 264]
0e (Inline Function) --------`-------- xul!env_logger::impl$9::default+0x7 [/builds/worker/checkouts/gecko/third_party/rust/env_logger/src/lib.rs @ 382]
0f 000000ed`295ff700 00007ffb`d1c75247 xul!env_logger::Builder::new+0xf [/builds/worker/checkouts/gecko/third_party/rust/env_logger/src/lib.rs @ 416]
10 (Inline Function) --------`-------- xul!gecko_logger::GeckoLogger::new+0x5 [/builds/worker/checkouts/gecko/xpcom/rust/gecko_logger/src/lib.rs @ 164]
11 000000ed`295ff740 00007ffb`d0871510 xul!gecko_logger::GeckoLogger::init+0x17 [/builds/worker/checkouts/gecko/xpcom/rust/gecko_logger/src/lib.rs @ 179]
12 (Inline Function) --------`-------- xul!gkrust_shared::GkRust_Init+0x5 [/builds/worker/checkouts/gecko/toolkit/library/rust/shared/lib.rs @ 141]
13 000000ed`295ff9e0 00007ffb`d4f83b1d xul!NS_InitXPCOM+0x60 [/builds/worker/checkouts/gecko/xpcom/build/XPCOMInit.cpp @ 256]
14 (Inline Function) --------`-------- xul!ScopedXPCOMStartup::Initialize+0x1f [/builds/worker/checkouts/gecko/toolkit/xre/nsAppRunner.cpp @ 2076]
15 000000ed`295ffb00 00007ffb`d28bfa82 xul!XREMain::XRE_main+0x2ed [/builds/worker/checkouts/gecko/toolkit/xre/nsAppRunner.cpp @ 5969]
16 000000ed`295ffbc0 00007ff7`dd00e9ae xul!XRE_main+0x62 [/builds/worker/checkouts/gecko/toolkit/xre/nsAppRunner.cpp @ 6029]
17 (Inline Function) --------`-------- firefox!do_main+0xcc [/builds/worker/checkouts/gecko/browser/app/nsBrowserApp.cpp @ 226]
18 (Inline Function) --------`-------- firefox!NS_internal_main+0x364 [/builds/worker/checkouts/gecko/browser/app/nsBrowserApp.cpp @ 430]
19 000000ed`295ffcc0 00007ff7`dd01cad8 firefox!wmain+0x5fe [/builds/worker/checkouts/gecko/toolkit/xre/nsWindowsWMain.cpp @ 167]
1a (Inline Function) --------`-------- firefox!invoke_main+0x22 [d:\agent\_work\2\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl @ 90]
1b 000000ed`295ffef0 00007ffc`81ef26bd firefox!__scrt_common_main_seh+0x10c [d:\agent\_work\2\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl @ 288]
1c 000000ed`295fff30 00007ffc`8384dfb8 KERNEL32!BaseThreadInitThunk+0x1d
1d 000000ed`295fff60 00000000`00000000 ntdll!RtlUserThreadStart+0x28
Assignee | ||
Updated•1 year ago
|
Assignee | ||
Comment 21•1 year ago
|
||
Updated•1 year ago
|
Comment 22•1 year ago
|
||
Pushed by yjuglaret@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/78e7281703a2 Implement a hook-based fallback for BCryptGenRandom to mitigate Rust panics. r=cmartin
Comment 23•1 year ago
|
||
Backed out changeset 78e7281703a2 (Bug 1788004) for causing SM failures.
Failure log: https://treeherder.mozilla.org/logviewer?job_id=406848582&repo=autoland
Backout link: https://hg.mozilla.org/integration/autoland/rev/609890ddffaef7dbc4c5b64c53292aa2257bf5a3
Comment 24•1 year ago
|
||
Pushed by yjuglaret@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/a5eb0191c293 Implement a hook-based fallback for BCryptGenRandom to mitigate Rust panics. r=cmartin
Comment 25•1 year ago
|
||
bugherder |
Updated•1 year ago
|
Comment 26•1 year ago
|
||
Not sure if this is something wanted to try to uplift to Beta or not, but it does graft cleanly if we did.
Assignee | ||
Comment 27•1 year ago
•
|
||
This is now fixed in Nightly:
- Following the STR in comment 19 with Firefox Nightly 2023-02-27-21-47-33 leads to a startup crash with both 32-bit and 64-bit builds.
- Following the STR in comment 19 with Firefox Nightly 2023-02-28-08-53-39 leads to normal startup.
The vast majority of crashes are with 32-bit builds, and the string recovery in the investigation from comment 9 more precisely suggests that this a majority of 32-bit builds running on a 64-bit system. This correlation could be explained by the fact that the Windows system itself can maybe cope with failing to load 32-bit bcryptprimitives.dll
on a 64-bit system, but not with other variations (i.e. corrupt 64-bit DLL on 64-bit system, or corrupt 32-bit DLL on 32-bit system) where it may refuse to boot. Note that although failing to load bcryptprimitives.dll
seems to be the root cause for the vast majority of crashes, BCryptGenRandom
can fail for other reasons, and the patch should fix those situations as well provided that BCryptGenRandom
fails consistently. For the situations where BCryptGenRandom
may fail intermittently, those will only be fixed properly by bug 1816953.
For the hook to take place with 64-bit builds on the versions which I tested, I had to manually add support for some specific instructions in our hooking code. Those instructions were in the 13 first bytes for BCryptGenRandom
on my Windows 7 and Windows 11 64-bit variants of bcrypt.dll
. If we still see crashes with 64-builds after the patch, that could be explained either by BCryptGenRandom
consistent failures and non-supported instructions in our hooking code, and in that case we could look for the 13 first bytes of the bcrypt.dll
version found in the crashes and implement them, or by BCryptGenRandom
intermittent failures, since we only try to hook if the first call to BCryptGenRandom
fails. This is not a problem for 32-bit builds, where BCryptGenRandom
has a hot-patch point, so we can apply a hooking strategy that does not require us to implement specific code for each possible variation of the first bytes of the function.
Assignee | ||
Comment 28•1 year ago
|
||
Comment on attachment 9319295 [details]
Bug 1788004 - Implement a hook-based fallback for BCryptGenRandom to mitigate Rust panics. r=cmartin,handyman
Beta/Release Uplift Approval Request
- User impact if declined: Firefox refuses to start and crashes at launch.
- Is this code covered by automated tests?: No
- Has the fix been verified in Nightly?: Yes
- Needs manual test from QE?: Yes
- If yes, steps to reproduce: The STR are listed in comment 19.
- List of other uplifts needed: None
- Risk to taking this patch: Low
- Why is the change risky/not risky? (and alternatives if risky): The patch will only change behavior for the users impacted by this crash. More precisely, we only try to hook BCryptGenRandom if that function fails consistently, which is precisely the situation that leads to this crash. We won't try to add a hook on machines where BCryptGenRandom seems to work normally, so the patch should not impact other users.
- String changes made/needed:
- Is Android affected?: No
Assignee | ||
Updated•1 year ago
|
Comment 29•1 year ago
|
||
Comment on attachment 9319295 [details]
Bug 1788004 - Implement a hook-based fallback for BCryptGenRandom to mitigate Rust panics. r=cmartin,handyman
Approved for 111.0b8
Rejecting release uplift request, next week is RC week for 111 and there's no 110 dot release on the radar before then.
Comment 30•1 year ago
|
||
bugherder uplift |
Updated•1 year ago
|
Comment 31•1 year ago
|
||
I've reproduced the crash using steps from comment 19 using Firefox 110.0 RC on both Windows 7 32 and 64bit VM:
- https://crash-stats.mozilla.org/report/index/3c9215eb-17d7-49af-a727-7cab20230303 (Crash 32bit)
- https://crash-stats.mozilla.org/report/index/9c72e443-4d88-4fa6-be1f-e07c80230303 (Crash 64bit)
I verified that using latest builds Latest Nightly 112.0a1 from 2023-03-02 and Firefox Beta 110.0b8 the crashes no longer occur on both Windows 7 32 and 64bit VM.
Comment 32•1 year ago
|
||
Firefox Beta 110.0b8 the crashes no longer occur on both Windows 7 32 and 64bit VM.
I think this would have been Beta 111.0b8.
Description
•