Closed Bug 1666568 (CVE-2020-15681) Opened 4 years ago Closed 4 years ago

WebAssembly threads crash on release assert for CurrentThreadCanAccessZone

Categories

(Core :: JavaScript: WebAssembly, defect, P1)

x86_64
Linux
defect

Tracking

()

RESOLVED FIXED
83 Branch
Tracking Status
firefox-esr78 --- unaffected
firefox81 --- wontfix
firefox82 --- fixed
firefox83 --- fixed

People

(Reporter: alonzakai, Assigned: lth)

References

Details

(Keywords: csectype-race, sec-moderate, Whiteboard: [adv-main82+][post-critsmash-triage])

Crash Data

Attachments

(3 files)

Attached file emscripten_test.zip

Example crash report: https://crash-stats.mozilla.org/report/index/a028e387-9af2-4489-a85d-6ede40200922

Top 10 frames of crashing thread:

0 libxul.so js::ObjectGroup::defaultNewGroup js/src/vm/ObjectGroup.cpp:541
1 libxul.so js::NewObjectWithGivenTaggedProto js/src/vm/JSObject.cpp:857
2 libxul.so js::SavedFrame::create js/src/vm/SavedStacks.cpp:563
3 libxul.so js::SavedStacks::getOrCreateSavedFrame js/src/vm/SavedStacks.cpp:1753
4 libxul.so js::SavedStacks::saveCurrentStack js/src/vm/SavedStacks.cpp:1307
5 libxul.so js::CaptureStack js/src/jsexn.cpp:221
6 libxul.so js::ErrorToException js/src/jsexn.cpp:331
7 libxul.so js::ReportErrorNumberVA js/src/vm/ErrorReporting.cpp:477
8 libxul.so JS_ReportErrorNumberASCII js/src/jsapi.cpp:4702
9 libxul.so js::ReportOverRecursed js/src/vm/JSContext.cpp:309

This started to crash consistently for us on Emscripten CI, which runs Firefox Dev edition, so this may be a regression that happened in the last few days.

The crashes almost always have

MOZ_RELEASE_ASSERT(CurrentThreadCanAccessZone(zone))

failing in them.

To reproduce in emscripten,

./tests/runner.py browser.test_pthread_create

Attached is a zip of the build files, just run a webserver and browse to the HTML. (Will need either a COOP/COEP-enabled server, or to flip the "insecure" flag, happens either way.)

Note that this is specific to webassembly as building to JS (-s WASM=0) does not hit the problem.

Comment 1 is helpful, thanks - that makes it more mysterious.

Severity: -- → S3
OS: Unspecified → Linux
Priority: -- → P1
Hardware: Unspecified → x86_64

I'm inclined to think that this is a corrupted cx resulting from some dodgy code from bug 1639153, just backed out - will try to verify that.

Not fixed after backout of that bug, something else.

Debug build: Assertion failure: *callee.wasmJitEntry() != interpStub, at /home/lhansen/m-u/js/src/wasm/WasmInstance.cpp:1975

Assignee: nobody → lhansen
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true

In a debug build, ./mach run --disable-e10s --debug and step past the SIGILL and we end up as in comment 5 with a stack:

#0  0x00007fffefd2ed12 in GetInterpEntry(JSContext*, js::wasm::Instance&, unsigned int, JS::CallArgs, void**, js::wasm::FuncType const**)
    (cx=0x7fffad765000, instance=..., funcIndex=28, args=..., interpEntry=0x7ffe285db990, funcType=0x7ffe285db988) at /home/lhansen/m-u/js/src/wasm/WasmInstance.cpp:1975
#1  0x00007fffefd2dfa6 in js::wasm::Instance::callExport(JSContext*, unsigned int, JS::CallArgs) (this=0x7fffb1418980, cx=0x7fffad765000, funcIndex=28, args=...)
    at /home/lhansen/m-u/js/src/wasm/WasmInstance.cpp:2124
#2  0x00007fffefdaf903 in WasmCall(JSContext*, unsigned int, JS::Value*) (cx=0x7fffad765000, argc=1, vp=0x7ffe285dc7a0) at /home/lhansen/m-u/js/src/wasm/WasmJS.cpp:1896
#3  0x00007fffee750a23 in CallJSNative(JSContext*, bool (*)(JSContext*, unsigned int, JS::Value*), js::CallReason, JS::CallArgs const&)
    (cx=0x7fffad765000, native=0x7fffefdaf820 <WasmCall(JSContext*, unsigned int, JS::Value*)>, reason=js::CallReason::Call, args=...) at /home/lhansen/m-u/js/src/vm/Interpreter.cpp:508
#4  0x00007fffee73a80a in js::InternalCallOrConstruct(JSContext*, JS::CallArgs const&, js::MaybeConstruct, js::CallReason)
    (cx=0x7fffad765000, args=..., construct=js::NO_CONSTRUCT, reason=js::CallReason::Call) at /home/lhansen/m-u/js/src/vm/Interpreter.cpp:600
#5  0x00007fffee73afce in InternalCall(JSContext*, js::AnyInvokeArgs const&, js::CallReason) (cx=0x7fffad765000, args=..., reason=js::CallReason::Call)
    at /home/lhansen/m-u/js/src/vm/Interpreter.cpp:665
#6  0x00007fffee73b077 in js::Call(JSContext*, JS::Handle<JS::Value>, JS::Handle<JS::Value>, js::AnyInvokeArgs const&, JS::MutableHandle<JS::Value>, js::CallReason)
    (cx=0x7fffad765000, fval=..., thisv=..., args=..., rval=..., reason=js::CallReason::Call) at /home/lhansen/m-u/js/src/vm/Interpreter.cpp:682
#7  0x00007fffeeb6fefd in js::fun_apply(JSContext*, unsigned int, JS::Value*) (cx=0x7fffad765000, argc=2, vp=0x7fffb1425168) at /home/lhansen/m-u/js/src/vm/JSFunction.cpp:1209
#8  0x00007fffee750a23 in CallJSNative(JSContext*, bool (*)(JSContext*, unsigned int, JS::Value*), js::CallReason, JS::CallArgs const&)
    (cx=0x7fffad765000, native=0x7fffeeb6f850 <js::fun_apply(JSContext*, unsigned int, JS::Value*)>, reason=js::CallReason::Call, args=...) at /home/lhansen/m-u/js/src/vm/Interpreter.cpp:508
#9  0x00007fffee73a80a in js::InternalCallOrConstruct(JSContext*, JS::CallArgs const&, js::MaybeConstruct, js::CallReason)
    (cx=0x7fffad765000, args=..., construct=js::NO_CONSTRUCT, reason=js::CallReason::Call) at /home/lhansen/m-u/js/src/vm/Interpreter.cpp:600
#10 0x00007fffee73afce in InternalCall(JSContext*, js::AnyInvokeArgs const&, js::CallReason) (cx=0x7fffad765000, args=..., reason=js::CallReason::Call)
    at /home/lhansen/m-u/js/src/vm/Interpreter.cpp:665
#11 0x00007fffee73ada2 in js::CallFromStack(JSContext*, JS::CallArgs const&) (cx=0x7fffad765000, args=...) at /home/lhansen/m-u/js/src/vm/Interpreter.cpp:669
#12 0x00007fffee72e0ac in Interpret(JSContext*, js::RunState&) (cx=0x7fffad765000, state=...) at /home/lhansen/m-u/js/src/vm/Interpreter.cpp:3337
#13 0x00007fffee722cc9 in js::RunScript(JSContext*, js::RunState&) (cx=0x7fffad765000, state=...) at /home/lhansen/m-u/js/src/vm/Interpreter.cpp:469
#14 0x00007fffee73aa78 in js::InternalCallOrConstruct(JSContext*, JS::CallArgs const&, js::MaybeConstruct, js::CallReason)
    (cx=0x7fffad765000, args=..., construct=js::NO_CONSTRUCT, reason=js::CallReason::Call) at /home/lhansen/m-u/js/src/vm/Interpreter.cpp:637
#15 0x00007fffee73afce in InternalCall(JSContext*, js::AnyInvokeArgs const&, js::CallReason) (cx=0x7fffad765000, args=..., reason=js::CallReason::Call)
    at /home/lhansen/m-u/js/src/vm/Interpreter.cpp:665
#16 0x00007fffee73b077 in js::Call(JSContext*, JS::Handle<JS::Value>, JS::Handle<JS::Value>, js::AnyInvokeArgs const&, JS::MutableHandle<JS::Value>, js::CallReason)
    (cx=0x7fffad765000, fval=..., thisv=..., args=..., rval=..., reason=js::CallReason::Call) at /home/lhansen/m-u/js/src/vm/Interpreter.cpp:682
#17 0x00007fffee8c4829 in JS::Call(JSContext*, JS::Handle<JS::Value>, JS::Handle<JS::Value>, JS::HandleValueArray const&, JS::MutableHandle<JS::Value>)
    (cx=0x7fffad765000, thisv=..., fval=..., args=..., rval=...) at /home/lhansen/m-u/js/src/jsapi.cpp:2821
#18 0x00007fffe8f1e609 in mozilla::dom::EventHandlerNonNull::Call(mozilla::dom::BindingCallContext&, JS::Handle<JS::Value>, mozilla::dom::Event&, JS::MutableHandle<JS::Value>, mozilla::ErrorResult&)
    (this=0x7fffaca9ba80, cx=..., aThisVal=..., event=..., aRetVal=..., aRv=...) at EventHandlerBinding.cpp:278
#19 0x00007fffe995c02c in mozilla::dom::EventHandlerNonNull::Call<nsCOMPtr<mozilla::dom::EventTarget> >(nsCOMPtr<mozilla::dom::EventTarget> const&, mozilla::dom::Event&, JS::MutableHandle<JS::Value>, mozilla::ErrorResult&, char const*, mozilla::dom::CallbackObject::ExceptionHandling, JS::Realm*)
    (this=0x7fffaca9ba80, thisVal=..., event=..., aRetVal=..., aRv=..., aExecutionReason=0x7fffdf54b8c4 "EventHandlerNonNull", aExceptionHandling=mozilla::dom::CallbackObject::eReportExceptions, 
    aRealm=0x0) at /home/lhansen/m-u/obj-x86_64-pc-linux-gnu/dist/include/mozilla/dom/EventHandlerBinding.h:367
#20 0x00007fffe99475fa in mozilla::JSEventHandler::HandleEvent(mozilla::dom::Event*) (this=0x7fffa9f26a30, aEvent=0x7fffacad0920) at /home/lhansen/m-u/dom/events/JSEventHandler.cpp:201
...
#32 0x00007fffeac74fa4 in mozilla::dom::WorkerRunnable::Run() (this=0x7fffa9f37030) at /home/lhansen/m-u/dom/workers/WorkerRunnable.cpp:370
#33 0x00007fffe4e96459 in nsThread::ProcessNextEvent(bool, bool*) (this=0x7fffaca54800, aMayWait=false, aResult=0x7ffe285e1fa7) at /home/lhansen/m-u/xpcom/threads/nsThread.cpp:1234
#34 0x00007fffe4e9c163 in NS_ProcessNextEvent(nsIThread*, bool) (aThread=0x7fffaca54800, aMayWait=false) at /home/lhansen/m-u/xpcom/threads/nsThreadUtils.cpp:513
#35 0x00007fffeac63ebe in mozilla::dom::WorkerPrivate::DoRunLoop(JSContext*) (this=0x7fffbb8c2800, aCx=0x7fffad765000) at /home/lhansen/m-u/dom/workers/WorkerPrivate.cpp:2982
#36 0x00007fffeac21f0a in mozilla::dom::workerinternals::(anonymous namespace)::WorkerThreadPrimaryRunnable::Run() (this=0x7fffb2128800) at /home/lhansen/m-u/dom/workers/RuntimeService.cpp:2218
#37 0x00007fffe4e96459 in nsThread::ProcessNextEvent(bool, bool*) (this=0x7fffaca54800, aMayWait=false, aResult=0x7ffe285e2a67) at /home/lhansen/m-u/xpcom/threads/nsThread.cpp:1234
#38 0x00007fffe4e9c163 in NS_ProcessNextEvent(nsIThread*, bool) (aThread=0x7fffaca54800, aMayWait=false) at /home/lhansen/m-u/xpcom/threads/nsThreadUtils.cpp:513
#39 0x00007fffe5d263ba in mozilla::ipc::MessagePumpForNonMainThreads::Run(base::MessagePump::Delegate*) (this=0x7fffb21289c0, aDelegate=0x7ffe285e2c78)
    at /home/lhansen/m-u/ipc/glue/MessagePump.cpp:302
#40 0x00007fffe5c1c087 in MessageLoop::RunInternal() (this=0x7ffe285e2c78) at /home/lhansen/m-u/ipc/chromium/src/base/message_loop.cc:334
#41 0x00007fffe5c1c005 in MessageLoop::RunHandler() (this=0x7ffe285e2c78) at /home/lhansen/m-u/ipc/chromium/src/base/message_loop.cc:327
#42 0x00007fffe5c1bfc3 in MessageLoop::Run() (this=0x7ffe285e2c78) at /home/lhansen/m-u/ipc/chromium/src/base/message_loop.cc:309
#43 0x00007fffe4e92520 in nsThread::ThreadFunc(void*) (aArg=0x7fffffff1730) at /home/lhansen/m-u/xpcom/threads/nsThread.cpp:442
#44 0x00007ffff7a57df0 in _pt_root (arg=0x7fffca5c8700) at /home/lhansen/m-u/nsprpub/pr/src/pthreads/ptthread.c:201
#45 0x00007ffff7f8f432 in start_thread () at /lib64/libpthread.so.0
#46 0x00007ffff7b65913 in clone () at /lib64/libc.so.6

That is, we're definitely in a worker. One possibility is that there are assumptions somewhere here that are not completely thread-aware. That would correspond nicely with the originally reported error, which seems thread-related.

Group: core-security

There is a table of lazy stubs for exported functions. A call to Instance::getExportedFunction will initialize the lazy stub entry to a sentinel (the pointer to the JS baseline compiler's interpreter stub) and a later call to callExport will create the actual stub and replace the entry. This allows for various optimizations in stub creation, and the creation of fewer stubs. getExportedFunction will store the function it creates in a per-instance table, and if the table is present then it is used the next time the exported function is gotten, and this will also prevent the lazy stubs table from being overwritten; stubs created by callExport will stay in the table.

The lazy stubs table for a module is however shared among all the instances that share the module. Since the table of created functions is per-instance, a function having been put in it in one instance does not prevent another instance from not finding a function and thus overwriting the entry in the shared stubs table.

It is possible for this to happen in non-threaded code (and I'll try to create a test case) but massively threaded code makes this much more likely to happen.

(This is clearly a serious functionality bug, but whether it's exploitable is open to question. It's a long-standing problem, this code is not new.)

Preventing the overwrite makes the crash go away. For this I used a cmpxchg, but I suspect this is not the solution we want. Part of the problem is that code retains pointers into the table, so we need to worry about whether loads from the table need to be atomic, or if this is just a matter of locking the table during updates since updates are essentially monotonic null -> sentinel -> final value, or what. Initializing the table with the sentinel might also work; on the other hand, there may be other, similar, cases to consider.

After some trivial plumbing it becomes clear that every other use of setJitEntry (and setTieringEntry for that matter) than the one in getExportedFunction either updates the stubs table under a CodeTier lock or does not need to lock (because init code). Readers do not need to lock or synchronize because we assume the element read and write are copy-atomic; so long as a write eventually gets to the memory we're fine. This is technically racy but (modulo very clever compilers, as usual) benign. If getExportedFunction were to lock it would be able to inspect the value to see if it were null and could thus avoid the overwrite without resorting to an atomic operation.

In principle the CodeTier lock is the wrong lock, because the stubs table is per-code; the mechanism needs to use a lock that is per-code. It is possible that this new lock has to be acquired before any of the tier locks in a situation where that is a concern. But another possibility is that we use the stable tier lock as the lock to control the jump table, this would probably be simplest.

We could keep the structure of the code as it is by plumbing a lock witness into setJitEntry, probably. The alternative is to move the jump table functions from the Code to the CodeTier and expose them only on the stable tier. Not sure how much I like that.

Group: core-security → javascript-core-security

Fairly sure this goes back a long way and we want to uplift to beta minimally. This bug will potentially affect a lot of wasm code, though it has to be structured just so, and multithreaded code is more vulnerable. We're pushing for eg Earth to use threads in Firefox and would want to have solid code for threads as soon as possible.

Following some investigation, I have concluded that this needs a short-term local (and ideally upliftable) fix based on cmpxchg so that non-null entries are not overwritten, and a more elaborate long-term fix that properly takes care of the raciness in how the jump table is handled. Some of that raciness was designed in; some appears to perhaps be accidental. I'll file a different bug for that.

This patch introduces a method for initializing jump table entries and makes sure it does not overwrite non-null entries. For this it uses a cmpxchg. This is technically racy because other accesses of the table do not use atomics, but I think it is no worse than the raciness already in that code. (If you think it's necessary I could change setJitEntry to use an atomic store, to prevent the compiler from recognizing the problem.) Anyway, using the naked clang atomic built-in seems to work OK since we use clang on all platforms.

A future fix will get rid of the raciness altogether by properly locking things and/or by using our own safe-for-races atomics.

Flags: needinfo?(bbouvier)

Thanks, I'll comment in the review.

Flags: needinfo?(bbouvier)
Group: javascript-core-security → core-security-release
Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → 83 Branch

Comment on attachment 9178173 [details]
Bug 1666568 - Initialize null jump table entries only once. r?bbouvier

Beta/Release Uplift Approval Request

  • User impact if declined: Some webassembly code, especially codee using multiple threads, will be crashy.
  • Is this code covered by automated tests?: No
  • Has the fix been verified in Nightly?: Yes
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): The patch ensures (by means of an atomic compare-and-swap) that a field in a shared data structure is initialized only once. Previously, a later reinitialization could overwrite a value written since the first initialization and thus cause a crash.
  • String changes made/needed:
Attachment #9178173 - Flags: approval-mozilla-beta?

Comment on attachment 9178173 [details]
Bug 1666568 - Initialize null jump table entries only once. r?bbouvier

approved for 82.0b5

Attachment #9178173 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Whiteboard: [adv-main82+]

Do we want this on esr78?

Flags: needinfo?(lhansen)

(In reply to Julien Cristau [:jcristau] from comment #20)

Do we want this on esr78?

I'm torn. We discovered this only because we turned on threads in Wasm. We turned threads on post-78. On the other hand, it's not a very tricky fix.

Let me mull it over for a day and look at what 78 looks like.

(In reply to Julien Cristau [:jcristau] from comment #20)

Do we want this on esr78?

We do not. I believe the present patch fixes a bug / adapts to a change introduced in bug 1660862, related to the interaction of wasm and the new Warp JS compiler. That change definitely landed after 78 branched.

Flags: needinfo?(lhansen)
See Also: → 1660862
Flags: qe-verify-
Whiteboard: [adv-main82+] → [adv-main82+][post-critsmash-triage]
Attached file advisory.txt

I described it as 'potentially exploitable' ¯_(ツ)_/¯

Alias: CVE-2020-15681
Group: core-security-release
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: