Patch Rust crate "getrandom" to fall back to RtlGenRandom
Categories
(Core :: Widget: Win32, task, P2)
Tracking
()
Tracking | Status | |
---|---|---|
firefox114 | --- | fixed |
People
(Reporter: cmartin, Assigned: chutten)
References
Details
(Whiteboard: [win:stability])
Attachments
(5 files, 1 obsolete file)
On a small set of machines, BCrypt.dll
and bcryptprimitives.dll
will fail to load, causing everything that relies on BCrypt
exclusively to fail.
This issue has luckily been fixed in Rust std
, but the problem remains open in getrandom
, which is a highly-popular crate used as the backbone of many other important Rust crates, such as uuid
, rand
, and deflate
, which themselves are the base of many important crates.
We should patch getrandom
in our tree to fallback to RtlGenRandom
if BCrypt
fails.
For inspiration, here are some links:
Reporter | ||
Updated•2 years ago
|
Reporter | ||
Comment 1•2 years ago
|
||
Also note that getrandom
has been made aware of the issue, and they seem open to the idea of following what std
did, so it might make sense for whoever authors this change to offer them the patch as well. It would probably be better for everyone if the fix was pushed upstream (and we could eventually not have to maintain this patch).
Reporter | ||
Updated•2 years ago
|
Comment 2•2 years ago
•
|
||
This patch will fix bug 1788004. [:janerik] would have been great for this patch as a continuation of bug 1788004, but it seems that he is currently away until March 13. [:chutten], would someone else on Telemetry side have the time to work on this?
Updated•2 years ago
|
Assignee | ||
Comment 3•2 years ago
|
||
We're all as short on time as you might expect... and also short on expertise. But maybe I can give this a gander.
Assignee | ||
Comment 4•2 years ago
|
||
Alrighty, I've given it a try at https://github.com/rust-random/getrandom/pull/337. Builds and tests on my Windows machine.
Let's try patching Firefox.
Comment 5•2 years ago
•
|
||
Thanks [:chutten], I'll review the proposed changes for Windows-specific knowledge.
Assignee | ||
Comment 6•2 years ago
|
||
Assignee | ||
Comment 7•2 years ago
|
||
I threw it on try and no crashes so far: https://treeherder.mozilla.org/jobs?repo=try&tier=1%2C2%2C3&revision=0791bd4f2fe50b600f37c797e8233b08b3c7a0e9
I'm happy to find a reviewer. I'm patching tip-of-tree for getrandom (not 0.2.8) so I'm not quite sure how we want to proceed here.
Updated•2 years ago
|
Comment 8•2 years ago
|
||
I'm happy with the patch that was merged the master
branch of getrandom
thanks to [:chutten]. I was able to generate a custom Rust binary that uses Uuid::new_v4()
, and it didn't panic in a Windows 7 VM where bcryptprimitives.dll
is unavailable. Since it was merged, does this mean we don't need have an in-tree getrandom
?
Assignee | ||
Comment 9•2 years ago
|
||
Well, all of our third-party sources need to be in-tree to support offline builds.
If getrandom
isn't heading towards an imminent v0.2.9 (I hope they are, and I've asked them with my customary subtlety), then we'll still need to point at a specific SHA for our vendoring if we want that fix soonish.
Assignee | ||
Comment 10•2 years ago
|
||
After a conversation on #rust:mozilla.org, it's uncovered that there's no rush here as the same problem's causing issues in stdlib. As we need to wait for stdlib changes which ought to land in time for Firefox 113, that sets the tempo here too.
I'll set a reminder so autonag'll ni?
me on March 20 so I can request tracking when the flag's been added. And maybe there'll have been a getrandom
release by then, so we can bring it in.
Updated•2 years ago
|
Comment 11•2 years ago
|
||
25 days ago, Chris H-C :chutten placed a reminder on the bug using the whiteboard tag [reminder-test 2023-03-20]
.
chutten, please refer to the original comment to better understand the reason for the reminder.
Assignee | ||
Comment 12•2 years ago
|
||
Yannis: given your patch on bug 1788004, are we still looking forward to stdlib's and getrandom's updates here? Should I be asking getrandom
for a release?
Comment 13•2 years ago
•
|
||
(In reply to Chris H-C :chutten from comment #12)
Yannis: given your patch on bug 1788004, are we still looking forward to stdlib's and getrandom's updates here? Should I be asking
getrandom
for a release?
Yes, it is better to have the new stdlib and getrandom with fallbacks. The hook from bug 1788004 is not applied with intermittent failures (probably resulting in the very few crashes we received in 111 there), it may fail due to unhandled instructions in our x64 hooking code, and overall hooking should be considered a last resort solution. We should get the new stdlib and getrandom and add tests that check that stdlib and getrandom have a working fallback (to be futureproof and detect it the day they remove RtlGenRandom). Then we can remove the hook as long as the tests pass (and restore it in the future if unfortunately the situation changes). I can help or work on writing these tests.
Assignee | ||
Comment 14•2 years ago
|
||
First step would be to ask for a getrandom
release, I suppose. Do you have any contacts on that crate, or should I file an issue? (There's no contact information anywhere, so I'm not sure how else to ask)
Comment 16•2 years ago
•
|
||
(In reply to Chris H-C :chutten from comment #14)
First step would be to ask for a
getrandom
release, I suppose. Do you have any contacts on that crate, or should I file an issue? (There's no contact information anywhere, so I'm not sure how else to ask)
I confirm that our current Rust stdlib when I build Nightly uses a fallback based on RtlGenRandom
, so now we are only waiting for a new getrandom
release indeed. Neither me nor [:cmartin] have special contacts, so filing an issue sounds good.
Assignee | ||
Comment 17•2 years ago
|
||
Fingers crossed: https://github.com/rust-random/getrandom/issues/349
All else fails we can pin to a SHA on the repo (wouldn't be the first or the final time we do that), but I'd like it better if it were in a release.
Comment 18•2 years ago
|
||
(In reply to Yannis Juglaret [:yannis] from comment #13)
We should get the new stdlib and getrandom and add tests that check that stdlib and getrandom have a working fallback (to be futureproof and detect it the day they remove RtlGenRandom). Then we can remove the hook as long as the tests pass (and restore it in the future if unfortunately the situation changes).
So about these tests, I think we would need a C++ gtest calling into Rust code. The C++ test would:
- intercept calls to
BCryptGenRandom
(similar to what I did in the patch for the current crash) to make the function always return an error; - intercept calls to
RtlGenRandom
to set a boolean whenever a call occurs; - call into
Uuid::new_v4
and check that we went throughRtlGenRandom
; - call into a custom Rust function that uses a stdlib hashmap and check that we went through
RtlGenRandom
.
This would allow us to automatically notice changes in the stdlib and getrandom implementations that could regress the crash. Would you like to give this a try, [:chutten]? I'm happy to help.
Assignee | ||
Comment 19•2 years ago
|
||
Alas, I'm absolutely swamped. Sounds like an excellent task for anyone in the intersection of Windows and Rust, though.
Comment 20•2 years ago
|
||
OK, I'll try this myself and see how this goes.
Comment 21•2 years ago
|
||
Comment 22•2 years ago
|
||
In bug 1788004, we started hooking BCryptGenRandom on the machines where
calling it for the first time fails. This was useful to mitigate Rust
panics linked to RNG function failures in the Rust stdlib and
in the getrandom crate. Both now have proper fallbacks again, so we can
remove our hook.
Depends on D174965
Comment 23•2 years ago
|
||
Our hooking code was misinterpreting 48 FF 15 as the beginning of a
JUMP/4 instruction, but it's a CALL/4. We must check more bits on the
third byte to distinguish a JMP from a CALL, and from other instructions
that also fall into the pattern we were matching.
Depends on D174966
Comment 24•2 years ago
|
||
We have had a lot of back and forth with the Rust stdlib and getrandom
crate removing RtlGenRandom as a fallback to BCryptGenRandom. We need
this fallback in particular for Windows 7 machines, where the two
functions are completely independent. Some users seem to be unable to
load the 32-bit variant of bcryptprimitives.dll on their machine when using
Firefox 32-bit on Windows 7 64-bit and the RtlGenRandom fallback saves
them. This led to a lot of crashes that are Rust panics in bug 1788004.
This patch adds a gtest that ensures that RtlGenRandom is called as a
fallback in case of BCryptGenRandom failures, so that we can
automatically detect a new hypothetical removal of the RtlGenRandom
fallbacks in the future.
Depends on D174967
Comment 25•2 years ago
|
||
The getrandom release has arrived! My tests currently work locally but I have failures on try, so WIP status for the moment. I understood that [:chutten] is on PTO next week, so I'm proposing other people as reviewers. Feel free to correct me if I'm wrong.
Updated•2 years ago
|
Updated•2 years ago
|
Comment 26•2 years ago
|
||
We were using a nice high-level syntax for our jumper:
gDetouredCall(aCallee);
We were relying on compiling this to:
jmp qword ptr [rip + offset gDetouredCall]
Unfortunately, in mingwclang the compiler was adding useless
instructions before the jmp, thus breaking the test. So we now use a
less friendly syntax to achieve the same goal.
Depends on D174968
Comment 27•2 years ago
|
||
This looks like a -fomit-frame-pointer
/ -fno-omit-frame-pointer
thing: https://godbolt.org/z/EavP1Tjxz maybe the default is different depending on the compiler?
Comment 28•2 years ago
|
||
Comment 29•2 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/664508850a28
https://hg.mozilla.org/mozilla-central/rev/dffff6afac9e
https://hg.mozilla.org/mozilla-central/rev/e5bb3a791721
https://hg.mozilla.org/mozilla-central/rev/b65f15660566
https://hg.mozilla.org/mozilla-central/rev/0cdaa9edab36
Description
•