Multithreaded Wasm hits compartment mismatch crash with Warp changes
Categories
(Core :: JavaScript: WebAssembly, defect, P1)
Tracking
()
People
(Reporter: jryans, Assigned: lth)
References
(Regression, )
Details
(Keywords: csectype-uaf, regression, sec-high, Whiteboard: [adv-main86+][sec-survey])
Attachments
(2 files)
48 bytes,
text/x-phabricator-request
|
jandem
:
feedback+
pascalc
:
approval-mozilla-beta+
tjr
:
sec-approval+
|
Details | Review |
227 bytes,
text/plain
|
Details |
Summary
A multithreaded Wasm project I am working on has started crashing in recent Firefox versions with a compartment mismatch. mozregression
identified bug 1660862 (https://hg.mozilla.org/integration/autoland/rev/4d530fda38c294549a80f51db315c9279fdbec77) as the cause.
STR
- Clone the demo repo at https://github.com/jryans/wasm-warp-compartment-mismatch
yarn install
(to get a local HTTP server with COOP / COEP headers)yarn serve
- Go to http://localhost:5000
- Wait ~10s for the large Wasm blob to download
- Crash (reproduces every time)
Notes
- Opening the browser DevTools somehow prevents the crash (perhaps because the debugger sets the JIT down a different path)
Crash reports
- https://crash-stats.mozilla.org/report/index/490a9c3a-53a9-464f-835c-ca7ef0201210
- https://crash-stats.mozilla.org/report/index/2fb6a7f8-941e-4574-800c-bc0da0201210
- https://crash-stats.mozilla.org/report/index/b7111bdd-e17e-4f03-a329-a06280201210
- https://crash-stats.mozilla.org/report/index/f86abc5f-ebb1-4e9d-9c4a-429c30201210
- https://crash-stats.mozilla.org/report/index/ba14e939-4ebf-4aa9-9d86-a8cd20201210
- https://crash-stats.mozilla.org/report/index/605f946a-437e-4cba-872d-214ee0201210
Updated•4 years ago
|
Comment 1•4 years ago
|
||
Lars, is it possible Wasm export functions are incorrectly exposed to other Wasm threads? Can you help clarify the threading situation here?
Assignee | ||
Comment 2•4 years ago
|
||
(In reply to Jan de Mooij [:jandem] from comment #1)
Lars, is it possible Wasm export functions are incorrectly exposed to other Wasm threads? Can you help clarify the threading situation here?
We are supposed to be sharing code and metadata (because we share the module) but not function objects - those must all be private to the instance, and the instance is private to the thread.
Assignee | ||
Comment 3•4 years ago
•
|
||
I'll take a look, possibly related to bug 1667984. Won't have time until January.
Updated•4 years ago
|
Assignee | ||
Comment 4•4 years ago
|
||
In a local Linux debug build:
Assertion failure: cx_ == cx, at /home/lhansen/m-u/js/src/threading/ProtectedData.cpp:57:
#0 0x00007fffee74c5cc in js::CheckContextLocal::check() const (this=0x7fffa80b9838)
at /home/lhansen/m-u/js/src/threading/ProtectedData.cpp:57
#1 0x00007fffee58566c in js::ProtectedData<js::CheckContextLocal, bool>::ref() const (this=0x7fffa80b9830)
at /home/lhansen/m-u/js/src/threading/ProtectedData.h:149
#2 0x00007fffee585625 in js::ProtectedData<js::CheckContextLocal, bool>::operator bool const&() const (this=0x7fffa80b9830)
at /home/lhansen/m-u/js/src/threading/ProtectedData.h:93
#3 0x00007fffee57f61e in JSContext::isExceptionPending() const (this=0x7fffa80b9000) at /home/lhansen/m-u/js/src/vm/JSContext.h:784
#4 0x00007fffee57d9ad in CallJSNative(JSContext*, bool (*)(JSContext*, unsigned int, JS::Value*), js::CallReason, JS::CallArgs const&) (cx=0x7fffa80b9000, native=0x7fffefb1a420 <WasmCall(JSContext*, unsigned int, JS::Value*)>, reason=js::CallReason::Call, args=...)
at /home/lhansen/m-u/js/src/vm/Interpreter.cpp:497
#5 0x00007fffee56a1d0 in js::InternalCallOrConstruct(JSContext*, JS::CallArgs const&, js::MaybeConstruct, js::CallReason)
(cx=0x7fffa80b9000, args=..., construct=js::NO_CONSTRUCT, reason=js::CallReason::Call)
at /home/lhansen/m-u/js/src/vm/Interpreter.cpp:594
#6 0x00007fffee56a98f in InternalCall(JSContext*, js::AnyInvokeArgs const&, js::CallReason)
(cx=0x7fffa80b9000, args=..., reason=js::CallReason::Call) at /home/lhansen/m-u/js/src/vm/Interpreter.cpp:647
#7 0x00007fffee56aa37 in js::Call(JSContext*, JS::Handle<JS::Value>, JS::Handle<JS::Value>, js::AnyInvokeArgs const&, JS::MutableHandle<JS::Value>, js::CallReason) (cx=0x7fffa80b9000, fval=..., thisv=..., args=..., rval=..., reason=js::CallReason::Call)
at /home/lhansen/m-u/js/src/vm/Interpreter.cpp:664
#8 0x00007fffef444b64 in js::jit::InvokeFunction(JSContext*, JS::Handle<JSObject*>, bool, bool, unsigned int, JS::Value*, JS::MutableHandle<JS::Value>) (cx=0x7fffa80b9000, obj=..., constructing=false, ignoresReturnValue=false, argc=2, argv=0x7fffa66e5540, rval=...)
at /home/lhansen/m-u/js/src/jit/VMFunctions.cpp:753
#9 0x00007fffef444e77 in js::jit::InvokeFromInterpreterStub(JSContext*, js::jit::InterpreterStubExitFrameLayout*)
(cx=0x7fffa80b9000, frame=0x7fffa66e5518) at /home/lhansen/m-u/js/src/jit/VMFunctions.cpp:773
#10 0x0000273bc98fb6af in ()
#11 0x0000381669c9d420 in ()
#12 0x00007fffa66e5518 in ()
...
#104 0xfff8800000ca79f0 in ()
#105 0x00007fffa0feb000 in ()
#106 0x00007fffefb1ff2c in js::WasmTableObject::get(JSContext*, unsigned int, JS::Value*) (cx=0x2, argc=14358, vp=0x3042)
at /home/lhansen/m-u/js/src/wasm/WasmJS.cpp:2862
(The end of the stack doesn't make a ton of sense.)
Another crash I've seen is this, no more info so far:
Assertion failure: *stack == reinterpret_cast<Rooted<detail::RootListEntry*>*>(this), at /home/lhansen/m-u/obj-x86_64-pc-linux-gnu/dist/include/js/RootingAPI.h:1134
Assignee | ||
Comment 5•4 years ago
|
||
My preliminary analysis is that the patch cited in comment 0 is wrong (specifically line 3.19) because it patches context-specific code into what is, and should be, a shared jump table, and this is wrong. The change I landed afterwards was really not correct either, it just papers over the race problems we ran into with the original patch.
I need to read the code more in depth to be sure, because this is tricky territory; hope to get to that tomorrow.
Assignee | ||
Comment 6•4 years ago
|
||
"The change I landed afterwards" was in bug 1666568 and amounts to replacing the setJitEntry in Jan's patch with something that is thread-aware: https://hg.mozilla.org/integration/autoland/rev/6a81d2da051d9870405d833303567112ccd17e85
Assignee | ||
Comment 7•4 years ago
|
||
(In reply to Lars T Hansen [:lth] from comment #5)
My preliminary analysis is that the patch cited in comment 0 is wrong (specifically line 3.19) because it patches context-specific code into what is, and should be, a shared jump table, and this is wrong. The change I landed afterwards was really not correct either, it just papers over the race problems we ran into with the original patch.
I need to read the code more in depth to be sure, because this is tricky territory; hope to get to that tomorrow.
Specifically, code emitted by generateInterpreterStub computes the context as masm.loadJSContext(), which is defined as
movePtr(ImmPtr(GetJitContext()->runtime->mainContextPtr()), dest);
which is almost certainly inappropriate for code shared across threads. For wasm, the correct behavior here is to load the JSContext* from the wasm tls instead.
This more or less suggests that we could clone getInterpreterStub for wasm with code that gets the context correctly and that this might work with current code, and might not even need my patch, since the code that's patched in would be context-invariant and if we overwrite anything we would be overwriting with the same value.
Assignee | ||
Comment 8•4 years ago
|
||
Ryan, is this program supposed to do something, if it doesn't crash?
Assignee | ||
Comment 9•4 years ago
|
||
Use a context-agnostic interpreterStub for wasm, so as to allow the
stub to be referenced from a table that's shared between threads.
Assignee | ||
Comment 10•4 years ago
|
||
Comment on attachment 9196058 [details]
Bug 1681724 - Shared provisional stub. r?jandem
How about something like this? The patch isn't completely clean yet, I'd want to update the commentary and maybe look for ways of sharing code, and then there's also the question about whether the JS interpreter stub could not be using this code too.
This sits on top of both of the previous fixes: yours to fix Ian's crash and enable better perf, and mine, which is still necessary to avoid the stub->null transition that bit us before. That suggests that there's more work to be done here to clean up how we handle these stubs, but I guess we knew that.
(I also need to check that this does not regress perf again, I guess there's a risk since there's now a different stub address.)
Comment 11•4 years ago
|
||
Comment on attachment 9196058 [details]
Bug 1681724 - Shared provisional stub. r?jandem
This makes sense, but the trampoline's JIT code will be thrown out when the JSContext/thread goes away. Is that guaranteed to outlive the callers of the trampoline? The trampoline is being called from other JS threads after all...
Reporter | ||
Comment 12•4 years ago
•
|
||
(In reply to Lars T Hansen [:lth] from comment #8)
Ryan, is this program supposed to do something, if it doesn't crash?
It's an early attempt at running Seshat (a Matrix message index for searching) in the browser. At the moment, it does not do much beyond printing a few "Seshat" prefixed messages to the console, but it's possible you might not even see those, as it seems to trigger other Firefox-specific Wasm issues related to too many threads allocating too many WebAssembly.Instance
s.
I guess that's a long way of saying it might have further unrelated issues, so for the purposes of this bug, as long as it doesn't crash, that seems like success to me. (Sorry it's not the easiest test case to work with... I wasn't sure which parts were necessary to trigger the problem.)
Assignee | ||
Comment 13•4 years ago
|
||
(In reply to Jan de Mooij [:jandem] from comment #11)
Comment on attachment 9196058 [details]
Bug 1681724 - Use a different stub for Wasm.This makes sense, but the trampoline's JIT code will be thrown out when the JSContext/thread goes away. Is that guaranteed to outlive the callers of the trampoline? The trampoline is being called from other JS threads after all...
Ouch. The module can be sent on a message channel to other contexts in the same agent cluster, and the originating context can go away after that while the module lives on. The trampoline must have a lifetime at least as long as the module's, ideally it's allocated with the module I guess. Maybe that's the trick we're looking for.
Assignee | ||
Comment 14•4 years ago
|
||
A wrinkle here is that the work Dmitry is doing on speeding up indirect calls in wasm may want less code to be shared across instances (because the code will bake in the context pointer). I think what I'll do here is find a fix for the current problem quickly, but any larger redesign we'll defer until the consequences of Dmitry's work become clearer.
Assignee | ||
Comment 15•4 years ago
|
||
I think we'll manage code for this trampoline the same way we manage code for builtin thunks: create at startup, destroy at shutdown, everyone shares the same trampoline.
Assignee | ||
Comment 16•4 years ago
|
||
... except, masm.failureLabel() is jumped to if C++ returns an error code, and the code at failureLabel is not context-unaware at all, and is indeed platform-specific, and can't be generated until later (requires a JitRuntime, which we don't have when we generate context-unaware code). I guess this code can possibly be further parameterized by the JSContext* which might be returned as the error signal from our modified stub (why not?) but there's quite some surgery required.
Assignee | ||
Comment 17•4 years ago
|
||
After chatting with Jan, we're going to try plan B: we generate a stub that calls out to C++ to look up the context-specific interpreter stub and return it, and then we tailcall (jump to) that stub. This is in a sense nicer than what we have already because it is a local change, it does not depend on setting up the callout stackframe partly in machine code and partly in C++.
Assignee | ||
Comment 18•4 years ago
|
||
Comment on attachment 9196058 [details]
Bug 1681724 - Shared provisional stub. r?jandem
This should be a lot better, it has sensible storage management and delays looking up the context's interpreter stub until the last moment. With this patch, we could possibly undo my previous change that prevented overwriting a non-null entry in the table, but there's further cleanup to do (in other bugs), so that can wait.
A couple things I still don't like here:
- there's a messy construction of the masm in the generator function in WasmBuiltins.cpp for reasons documented there, I'd like to perhaps do something cleaner if there are options (discuss)
- ditto another cleanup item in that file flagged as TODO
- construction of the context-agnostic stub is being ensured at the latest possible moment, just before we install it in the table (or check it with an assertion); i would like to perhaps move this earlier, but I haven't found a good choke point yet, as not all modules are compiled (they can be deserialized from cache).
Comment 19•4 years ago
|
||
Comment on attachment 9196058 [details]
Bug 1681724 - Shared provisional stub. r?jandem
The approach makes sense to me.
I agree the Wasm-stub-generator is messy. What if we generate this code in wasm::GenerateStubs
instead? That already emits the throw-stub and a bunch of other things so this one would be a relative small addition there...
Assignee | ||
Comment 20•4 years ago
|
||
Memo to self: do I properly understand the semantics of the stubs jump tables, eg, is there any risk that anything is serialized? Because if so, then the stub must indeed be per-module. (This code is very tricky and everyone who knew it well have been laid off or have left. More thought needed.)
Assignee | ||
Updated•4 years ago
|
Updated•4 years ago
|
Assignee | ||
Comment 21•4 years ago
|
||
There's rather a lot of documentation now in a patch on bug 1667984 (it's still a little WIP, I don't understand eager stubs yet, but that is not important here). In regards to the present bug:
- what we're looking for here is what's called the "provisional jit-entry" in that documentation and the "context-agnostic interpreter stub" in my WIP patch on this bug
- the jit-entry stubs belong to a module (indeed to a code tier), but they are not going to be serialized with their module and could be created standalone as in my WIP patch here or with the module as Jan suggests; all we care is that there's sharing at least within the module so that we only need to generate one, not one per function. There's possibly a slight advantage for a future improvement I'm thinking about for the stub to be shared globally but really I don't think it'll matter in practice.
- both Jan's earlier patch and my later improvements to that are necessary for correctness: it is only correct - in the sense of what we intend to have happen - to install the provisional jit-entry if the jit-entry table element is null. If we install it unconditionally then - due to sharing - we racily downgrade a stub. While it may be that that would work, or technically, it would never fail, there are side tables containing stubs that are used to check whether stubs need to be recompiled, for example, and it's probably an intended invariant on those tables that they don't hold stubs that aren't present in the table.
So what I'll do here is try to clean up my patch by moving the code generation into the Module to generate with the other stubs, and then put it up for review again; I want this landed for FF88, and the window is closing, esp wrt sec-review and all that.
Assignee | ||
Comment 22•4 years ago
|
||
(In reply to Jan de Mooij [:jandem] from comment #19)
Comment on attachment 9196058 [details]
I agree the Wasm-stub-generator is messy. What if we generate this code in
wasm::GenerateStubs
instead? That already emits the throw-stub and a bunch of other things so this one would be a relative small addition there...
Yes, and the resulting patch is very clean. Unfortunately it doesn't work (at least not so far) because the generated code bakes in a pointer to C++ (since it needs to call C++) and the code can therefore not be serialized. But if the code is generated with the stubs, it will need to be serialized with the module. Generating it with the builtins solves this problems (since they all have this problem).
Two avenues: Either clean up the existing WIP patch, or try to separate out the initialization code for this one stub while remaining in WasmStubs.cpp. I'll try the latter before I give up and revert to the former.
Updated•4 years ago
|
Assignee | ||
Comment 23•4 years ago
|
||
Ended up cleaning the current WIP patch significantly, I think the result is pretty clean and could likely be uplifted with little risk if we miss the merge window.
Assignee | ||
Comment 24•4 years ago
|
||
Comment on attachment 9196058 [details]
Bug 1681724 - Shared provisional stub. r?jandem
Security Approval Request
- How easily could an exploit be constructed based on the patch?: Probably hard, we found this because we hit a run-time check in a release build. Even if such checks could be bypassed it's unclear how an exploit would be built.
- Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?: No
- Which older supported branches are affected by this flaw?: FF79 and later
- If not all supported branches, which bug introduced the flaw?: Latent problem resulting from enabling SharedArrayBuffer
- Do you have backports for the affected branches?: No
- If not, how different, hard to create, and risky will they be?: I expect a backport to FF85, FF86 would be straightforward
- How likely is this patch to cause regressions; how much testing does it need?: Not likely to cause regressions, because it modifies existing logic only slightly.
Comment 25•4 years ago
|
||
Comment on attachment 9196058 [details]
Bug 1681724 - Shared provisional stub. r?jandem
Approved to land and request uplift from Relman.
Updated•4 years ago
|
Assignee | ||
Comment 26•4 years ago
|
||
Current patch grafts onto beta with no conflicts, I'll file an uplift request when we know the patch has stuck to Nightly.
![]() |
||
Comment 27•4 years ago
|
||
Landed: https://hg.mozilla.org/integration/autoland/rev/71b724bd06812b3e3ca99ac21ca0f941374da358
Backed out for causing failure at WasmJS.cpp in non-unified builds:
https://hg.mozilla.org/integration/autoland/rev/a92708095b70aa615871e33dc73fe9dcb5d98bac
Failure log: https://treeherder.mozilla.org/logviewer?job_id=329319408&repo=autoland
/builds/worker/checkouts/gecko/js/src/wasm/WasmJS.cpp:1986:14: error: use of undeclared identifier 'EnsureBuiltinThunksInitialized'; did you mean 'EnsureHelperThreadsInitialized'?
/builds/worker/checkouts/gecko/js/src/wasm/WasmJS.cpp:1989:41: error: use of undeclared identifier 'ProvisionalJitEntryStub'
Assignee | ||
Comment 28•4 years ago
•
|
||
Missing include file for nonunified builds.
![]() |
||
Comment 29•4 years ago
|
||
Shared provisional stub. r=jandem
https://hg.mozilla.org/integration/autoland/rev/d0eef4ce1d433bfa2ce0ef3cb5bdb451e5327a03
![]() |
||
Comment 30•4 years ago
|
||
Assignee | ||
Comment 31•4 years ago
|
||
Comment on attachment 9196058 [details]
Bug 1681724 - Shared provisional stub. r?jandem
Beta/Release Uplift Approval Request
- User impact if declined: Some functionality loss (multi-threaded webassembly) due to crashes or compartment errors; slight risk of security exploit. Sec-approval requested uplift.
- Is this code covered by automated tests?: No
- Has the fix been verified in Nightly?: Yes
- Needs manual test from QE?: Yes
- If yes, steps to reproduce: See comment 0 on this bug
- List of other uplifts needed: None
- Risk to taking this patch: Low
- Why is the change risky/not risky? (and alternatives if risky): The patch tweaks existing logic only slightly.
- String changes made/needed:
Assignee | ||
Updated•4 years ago
|
Updated•4 years ago
|
Comment 32•4 years ago
|
||
Comment on attachment 9196058 [details]
Bug 1681724 - Shared provisional stub. r?jandem
Approved for out last beta, thanks.
Comment 33•4 years ago
|
||
As part of a security bug pattern analysis, we are requesting your help with a high level analysis of this bug. It is our hope to develop static analysis (or potentially runtime/dynamic analysis) in the future to identify classes of bugs.
Please visit this google form to reply.
Assignee | ||
Updated•4 years ago
|
Comment 34•4 years ago
|
||
uplift |
Comment 35•4 years ago
|
||
I was able to reproduce this bug on macOS 10.15, by using the steps from comment 0, on an affected Nightly build 2020-12-10.
The crash doesn't reproduce anymore on latest Nightly 87 and Beta 86.0b9 (downloaded from treeherder). I've tested this on macOS 10.15, Win 10 x64 and Ubuntu 18 x64.
Updated•4 years ago
|
Comment 36•4 years ago
|
||
Updated•4 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Description
•