Closed Bug 1681724 (CVE-2021-23970) Opened 2 years ago Closed 2 years ago

Multithreaded Wasm hits compartment mismatch crash with Warp changes

Categories

(Core :: JavaScript: WebAssembly, defect, P1)

defect

Tracking

()

VERIFIED FIXED
87 Branch
Tracking Status
firefox-esr78 --- unaffected
firefox85 --- wontfix
firefox86 + verified
firefox87 + verified

People

(Reporter: jryans, Assigned: lth)

References

(Regression, )

Details

(Keywords: csectype-uaf, regression, sec-high, Whiteboard: [adv-main86+][sec-survey])

Attachments

(2 files)

Summary

A multithreaded Wasm project I am working on has started crashing in recent Firefox versions with a compartment mismatch. mozregression identified bug 1660862 (https://hg.mozilla.org/integration/autoland/rev/4d530fda38c294549a80f51db315c9279fdbec77) as the cause.

STR

  1. Clone the demo repo at https://github.com/jryans/wasm-warp-compartment-mismatch
  2. yarn install (to get a local HTTP server with COOP / COEP headers)
  3. yarn serve
  4. Go to http://localhost:5000
  5. Wait ~10s for the large Wasm blob to download
  6. Crash (reproduces every time)

Notes

  • Opening the browser DevTools somehow prevents the crash (perhaps because the debugger sets the JIT down a different path)

Crash reports

Flags: needinfo?(jdemooij)

Lars, is it possible Wasm export functions are incorrectly exposed to other Wasm threads? Can you help clarify the threading situation here?

Flags: needinfo?(jdemooij) → needinfo?(lhansen)

(In reply to Jan de Mooij [:jandem] from comment #1)

Lars, is it possible Wasm export functions are incorrectly exposed to other Wasm threads? Can you help clarify the threading situation here?

We are supposed to be sharing code and metadata (because we share the module) but not function objects - those must all be private to the instance, and the instance is private to the thread.

I'll take a look, possibly related to bug 1667984. Won't have time until January.

Assignee: nobody → lhansen
Group: core-security
Severity: -- → S3
Status: NEW → ASSIGNED
Component: JavaScript Engine: JIT → Javascript: WebAssembly
Flags: needinfo?(lhansen)
Priority: -- → P2
Group: core-security → javascript-core-security
Keywords: csectype-uaf
Summary: Multithreaded Wasm hits component mismatch crash with Warp changes → Multithreaded Wasm hits compartment mismatch crash with Warp changes

In a local Linux debug build:

Assertion failure: cx_ == cx, at /home/lhansen/m-u/js/src/threading/ProtectedData.cpp:57:

#0  0x00007fffee74c5cc in js::CheckContextLocal::check() const (this=0x7fffa80b9838)
    at /home/lhansen/m-u/js/src/threading/ProtectedData.cpp:57
#1  0x00007fffee58566c in js::ProtectedData<js::CheckContextLocal, bool>::ref() const (this=0x7fffa80b9830)
    at /home/lhansen/m-u/js/src/threading/ProtectedData.h:149
#2  0x00007fffee585625 in js::ProtectedData<js::CheckContextLocal, bool>::operator bool const&() const (this=0x7fffa80b9830)
    at /home/lhansen/m-u/js/src/threading/ProtectedData.h:93
#3  0x00007fffee57f61e in JSContext::isExceptionPending() const (this=0x7fffa80b9000) at /home/lhansen/m-u/js/src/vm/JSContext.h:784
#4  0x00007fffee57d9ad in CallJSNative(JSContext*, bool (*)(JSContext*, unsigned int, JS::Value*), js::CallReason, JS::CallArgs const&) (cx=0x7fffa80b9000, native=0x7fffefb1a420 <WasmCall(JSContext*, unsigned int, JS::Value*)>, reason=js::CallReason::Call, args=...)
    at /home/lhansen/m-u/js/src/vm/Interpreter.cpp:497
#5  0x00007fffee56a1d0 in js::InternalCallOrConstruct(JSContext*, JS::CallArgs const&, js::MaybeConstruct, js::CallReason)
    (cx=0x7fffa80b9000, args=..., construct=js::NO_CONSTRUCT, reason=js::CallReason::Call)
    at /home/lhansen/m-u/js/src/vm/Interpreter.cpp:594
#6  0x00007fffee56a98f in InternalCall(JSContext*, js::AnyInvokeArgs const&, js::CallReason)
    (cx=0x7fffa80b9000, args=..., reason=js::CallReason::Call) at /home/lhansen/m-u/js/src/vm/Interpreter.cpp:647
#7  0x00007fffee56aa37 in js::Call(JSContext*, JS::Handle<JS::Value>, JS::Handle<JS::Value>, js::AnyInvokeArgs const&, JS::MutableHandle<JS::Value>, js::CallReason) (cx=0x7fffa80b9000, fval=..., thisv=..., args=..., rval=..., reason=js::CallReason::Call)
    at /home/lhansen/m-u/js/src/vm/Interpreter.cpp:664
#8  0x00007fffef444b64 in js::jit::InvokeFunction(JSContext*, JS::Handle<JSObject*>, bool, bool, unsigned int, JS::Value*, JS::MutableHandle<JS::Value>) (cx=0x7fffa80b9000, obj=..., constructing=false, ignoresReturnValue=false, argc=2, argv=0x7fffa66e5540, rval=...)
    at /home/lhansen/m-u/js/src/jit/VMFunctions.cpp:753
#9  0x00007fffef444e77 in js::jit::InvokeFromInterpreterStub(JSContext*, js::jit::InterpreterStubExitFrameLayout*)
    (cx=0x7fffa80b9000, frame=0x7fffa66e5518) at /home/lhansen/m-u/js/src/jit/VMFunctions.cpp:773
#10 0x0000273bc98fb6af in  ()
#11 0x0000381669c9d420 in  ()
#12 0x00007fffa66e5518 in  ()
...
#104 0xfff8800000ca79f0 in  ()
#105 0x00007fffa0feb000 in  ()
#106 0x00007fffefb1ff2c in js::WasmTableObject::get(JSContext*, unsigned int, JS::Value*) (cx=0x2, argc=14358, vp=0x3042)
    at /home/lhansen/m-u/js/src/wasm/WasmJS.cpp:2862

(The end of the stack doesn't make a ton of sense.)

Another crash I've seen is this, no more info so far:

Assertion failure: *stack == reinterpret_cast<Rooted<detail::RootListEntry*>*>(this), at /home/lhansen/m-u/obj-x86_64-pc-linux-gnu/dist/include/js/RootingAPI.h:1134

My preliminary analysis is that the patch cited in comment 0 is wrong (specifically line 3.19) because it patches context-specific code into what is, and should be, a shared jump table, and this is wrong. The change I landed afterwards was really not correct either, it just papers over the race problems we ran into with the original patch.

I need to read the code more in depth to be sure, because this is tricky territory; hope to get to that tomorrow.

"The change I landed afterwards" was in bug 1666568 and amounts to replacing the setJitEntry in Jan's patch with something that is thread-aware: https://hg.mozilla.org/integration/autoland/rev/6a81d2da051d9870405d833303567112ccd17e85

(In reply to Lars T Hansen [:lth] from comment #5)

My preliminary analysis is that the patch cited in comment 0 is wrong (specifically line 3.19) because it patches context-specific code into what is, and should be, a shared jump table, and this is wrong. The change I landed afterwards was really not correct either, it just papers over the race problems we ran into with the original patch.

I need to read the code more in depth to be sure, because this is tricky territory; hope to get to that tomorrow.

Specifically, code emitted by generateInterpreterStub computes the context as masm.loadJSContext(), which is defined as

  movePtr(ImmPtr(GetJitContext()->runtime->mainContextPtr()), dest);

which is almost certainly inappropriate for code shared across threads. For wasm, the correct behavior here is to load the JSContext* from the wasm tls instead.

This more or less suggests that we could clone getInterpreterStub for wasm with code that gets the context correctly and that this might work with current code, and might not even need my patch, since the code that's patched in would be context-invariant and if we overwrite anything we would be overwriting with the same value.

Ryan, is this program supposed to do something, if it doesn't crash?

Flags: needinfo?(jryans)

Use a context-agnostic interpreterStub for wasm, so as to allow the
stub to be referenced from a table that's shared between threads.

Comment on attachment 9196058 [details]
Bug 1681724 - Shared provisional stub. r?jandem

How about something like this? The patch isn't completely clean yet, I'd want to update the commentary and maybe look for ways of sharing code, and then there's also the question about whether the JS interpreter stub could not be using this code too.

This sits on top of both of the previous fixes: yours to fix Ian's crash and enable better perf, and mine, which is still necessary to avoid the stub->null transition that bit us before. That suggests that there's more work to be done here to clean up how we handle these stubs, but I guess we knew that.

(I also need to check that this does not regress perf again, I guess there's a risk since there's now a different stub address.)

Attachment #9196058 - Flags: feedback?(jdemooij)

Comment on attachment 9196058 [details]
Bug 1681724 - Shared provisional stub. r?jandem

This makes sense, but the trampoline's JIT code will be thrown out when the JSContext/thread goes away. Is that guaranteed to outlive the callers of the trampoline? The trampoline is being called from other JS threads after all...

Flags: needinfo?(lhansen)
Attachment #9196058 - Flags: feedback?(jdemooij) → feedback+

(In reply to Lars T Hansen [:lth] from comment #8)

Ryan, is this program supposed to do something, if it doesn't crash?

It's an early attempt at running Seshat (a Matrix message index for searching) in the browser. At the moment, it does not do much beyond printing a few "Seshat" prefixed messages to the console, but it's possible you might not even see those, as it seems to trigger other Firefox-specific Wasm issues related to too many threads allocating too many WebAssembly.Instances.

I guess that's a long way of saying it might have further unrelated issues, so for the purposes of this bug, as long as it doesn't crash, that seems like success to me. (Sorry it's not the easiest test case to work with... I wasn't sure which parts were necessary to trigger the problem.)

Flags: needinfo?(jryans)

(In reply to Jan de Mooij [:jandem] from comment #11)

Comment on attachment 9196058 [details]
Bug 1681724 - Use a different stub for Wasm.

This makes sense, but the trampoline's JIT code will be thrown out when the JSContext/thread goes away. Is that guaranteed to outlive the callers of the trampoline? The trampoline is being called from other JS threads after all...

Ouch. The module can be sent on a message channel to other contexts in the same agent cluster, and the originating context can go away after that while the module lives on. The trampoline must have a lifetime at least as long as the module's, ideally it's allocated with the module I guess. Maybe that's the trick we're looking for.

Flags: needinfo?(lhansen)
Keywords: sec-high

A wrinkle here is that the work Dmitry is doing on speeding up indirect calls in wasm may want less code to be shared across instances (because the code will bake in the context pointer). I think what I'll do here is find a fix for the current problem quickly, but any larger redesign we'll defer until the consequences of Dmitry's work become clearer.

I think we'll manage code for this trampoline the same way we manage code for builtin thunks: create at startup, destroy at shutdown, everyone shares the same trampoline.

... except, masm.failureLabel() is jumped to if C++ returns an error code, and the code at failureLabel is not context-unaware at all, and is indeed platform-specific, and can't be generated until later (requires a JitRuntime, which we don't have when we generate context-unaware code). I guess this code can possibly be further parameterized by the JSContext* which might be returned as the error signal from our modified stub (why not?) but there's quite some surgery required.

After chatting with Jan, we're going to try plan B: we generate a stub that calls out to C++ to look up the context-specific interpreter stub and return it, and then we tailcall (jump to) that stub. This is in a sense nicer than what we have already because it is a local change, it does not depend on setting up the callout stackframe partly in machine code and partly in C++.

Comment on attachment 9196058 [details]
Bug 1681724 - Shared provisional stub. r?jandem

This should be a lot better, it has sensible storage management and delays looking up the context's interpreter stub until the last moment. With this patch, we could possibly undo my previous change that prevented overwriting a non-null entry in the table, but there's further cleanup to do (in other bugs), so that can wait.

A couple things I still don't like here:

  • there's a messy construction of the masm in the generator function in WasmBuiltins.cpp for reasons documented there, I'd like to perhaps do something cleaner if there are options (discuss)
  • ditto another cleanup item in that file flagged as TODO
  • construction of the context-agnostic stub is being ensured at the latest possible moment, just before we install it in the table (or check it with an assertion); i would like to perhaps move this earlier, but I haven't found a good choke point yet, as not all modules are compiled (they can be deserialized from cache).
Attachment #9196058 - Flags: feedback+ → feedback?(jdemooij)

Comment on attachment 9196058 [details]
Bug 1681724 - Shared provisional stub. r?jandem

The approach makes sense to me.

I agree the Wasm-stub-generator is messy. What if we generate this code in wasm::GenerateStubs instead? That already emits the throw-stub and a bunch of other things so this one would be a relative small addition there...

Attachment #9196058 - Flags: feedback?(jdemooij) → feedback+

Memo to self: do I properly understand the semantics of the stubs jump tables, eg, is there any risk that anything is serialized? Because if so, then the stub must indeed be per-module. (This code is very tricky and everyone who knew it well have been laid off or have left. More thought needed.)

Priority: P2 → P1

There's rather a lot of documentation now in a patch on bug 1667984 (it's still a little WIP, I don't understand eager stubs yet, but that is not important here). In regards to the present bug:

  • what we're looking for here is what's called the "provisional jit-entry" in that documentation and the "context-agnostic interpreter stub" in my WIP patch on this bug
  • the jit-entry stubs belong to a module (indeed to a code tier), but they are not going to be serialized with their module and could be created standalone as in my WIP patch here or with the module as Jan suggests; all we care is that there's sharing at least within the module so that we only need to generate one, not one per function. There's possibly a slight advantage for a future improvement I'm thinking about for the stub to be shared globally but really I don't think it'll matter in practice.
  • both Jan's earlier patch and my later improvements to that are necessary for correctness: it is only correct - in the sense of what we intend to have happen - to install the provisional jit-entry if the jit-entry table element is null. If we install it unconditionally then - due to sharing - we racily downgrade a stub. While it may be that that would work, or technically, it would never fail, there are side tables containing stubs that are used to check whether stubs need to be recompiled, for example, and it's probably an intended invariant on those tables that they don't hold stubs that aren't present in the table.

So what I'll do here is try to clean up my patch by moving the code generation into the Module to generate with the other stubs, and then put it up for review again; I want this landed for FF88, and the window is closing, esp wrt sec-review and all that.

(In reply to Jan de Mooij [:jandem] from comment #19)

Comment on attachment 9196058 [details]

I agree the Wasm-stub-generator is messy. What if we generate this code in wasm::GenerateStubs instead? That already emits the throw-stub and a bunch of other things so this one would be a relative small addition there...

Yes, and the resulting patch is very clean. Unfortunately it doesn't work (at least not so far) because the generated code bakes in a pointer to C++ (since it needs to call C++) and the code can therefore not be serialized. But if the code is generated with the stubs, it will need to be serialized with the module. Generating it with the builtins solves this problems (since they all have this problem).

Two avenues: Either clean up the existing WIP patch, or try to separate out the initialization code for this one stub while remaining in WasmStubs.cpp. I'll try the latter before I give up and revert to the former.

Attachment #9196058 - Attachment description: Bug 1681724 - Use a different stub for Wasm. → Bug 1681724 - Better provisional jit-entry stub for Wasm r?jandem

Ended up cleaning the current WIP patch significantly, I think the result is pretty clean and could likely be uplifted with little risk if we miss the merge window.

Comment on attachment 9196058 [details]
Bug 1681724 - Shared provisional stub. r?jandem

Security Approval Request

  • How easily could an exploit be constructed based on the patch?: Probably hard, we found this because we hit a run-time check in a release build. Even if such checks could be bypassed it's unclear how an exploit would be built.
  • Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?: No
  • Which older supported branches are affected by this flaw?: FF79 and later
  • If not all supported branches, which bug introduced the flaw?: Latent problem resulting from enabling SharedArrayBuffer
  • Do you have backports for the affected branches?: No
  • If not, how different, hard to create, and risky will they be?: I expect a backport to FF85, FF86 would be straightforward
  • How likely is this patch to cause regressions; how much testing does it need?: Not likely to cause regressions, because it modifies existing logic only slightly.
Attachment #9196058 - Flags: sec-approval?

Comment on attachment 9196058 [details]
Bug 1681724 - Shared provisional stub. r?jandem

Approved to land and request uplift from Relman.

Attachment #9196058 - Flags: sec-approval? → sec-approval+
Attachment #9196058 - Attachment description: Bug 1681724 - Better provisional jit-entry stub for Wasm r?jandem → Bug 1681724 - Shared provisional stub. r?jandem

Current patch grafts onto beta with no conflicts, I'll file an uplift request when we know the patch has stuck to Nightly.

Landed: https://hg.mozilla.org/integration/autoland/rev/71b724bd06812b3e3ca99ac21ca0f941374da358

Backed out for causing failure at WasmJS.cpp in non-unified builds:

https://hg.mozilla.org/integration/autoland/rev/a92708095b70aa615871e33dc73fe9dcb5d98bac

Push with failures: https://treeherder.mozilla.org/jobs?repo=autoland&group_state=expanded&resultStatus=testfailed%2Cbusted%2Cexception%2Cretry%2Cusercancel%2Crunnable&revision=71b724bd06812b3e3ca99ac21ca0f941374da358

Failure log: https://treeherder.mozilla.org/logviewer?job_id=329319408&repo=autoland
/builds/worker/checkouts/gecko/js/src/wasm/WasmJS.cpp:1986:14: error: use of undeclared identifier 'EnsureBuiltinThunksInitialized'; did you mean 'EnsureHelperThreadsInitialized'?
/builds/worker/checkouts/gecko/js/src/wasm/WasmJS.cpp:1989:41: error: use of undeclared identifier 'ProvisionalJitEntryStub'

Flags: needinfo?(lhansen)

Missing include file for nonunified builds.

Flags: needinfo?(lhansen)
Group: javascript-core-security → core-security-release
Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → 87 Branch

Comment on attachment 9196058 [details]
Bug 1681724 - Shared provisional stub. r?jandem

Beta/Release Uplift Approval Request

  • User impact if declined: Some functionality loss (multi-threaded webassembly) due to crashes or compartment errors; slight risk of security exploit. Sec-approval requested uplift.
  • Is this code covered by automated tests?: No
  • Has the fix been verified in Nightly?: Yes
  • Needs manual test from QE?: Yes
  • If yes, steps to reproduce: See comment 0 on this bug
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): The patch tweaks existing logic only slightly.
  • String changes made/needed:
Attachment #9196058 - Flags: approval-mozilla-beta?
Flags: qe-verify+
QA Whiteboard: [qa-triaged]

Comment on attachment 9196058 [details]
Bug 1681724 - Shared provisional stub. r?jandem

Approved for out last beta, thanks.

Attachment #9196058 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

As part of a security bug pattern analysis, we are requesting your help with a high level analysis of this bug. It is our hope to develop static analysis (or potentially runtime/dynamic analysis) in the future to identify classes of bugs.

Please visit this google form to reply.

Flags: needinfo?(lhansen)
Whiteboard: [sec-survey]
Flags: needinfo?(lhansen)

I was able to reproduce this bug on macOS 10.15, by using the steps from comment 0, on an affected Nightly build 2020-12-10.

The crash doesn't reproduce anymore on latest Nightly 87 and Beta 86.0b9 (downloaded from treeherder). I've tested this on macOS 10.15, Win 10 x64 and Ubuntu 18 x64.

Status: RESOLVED → VERIFIED
Flags: qe-verify+
Whiteboard: [sec-survey] → [adv-main86+][sec-survey]
Attached file advisory.txt
Alias: CVE-2021-23970
Group: core-security-release
Has Regression Range: --- → yes
You need to log in before you can comment on or make changes to this bug.