1776658 - Crash [@ std::__atomic_base<unsigned long>::load] through [@ JSObject::shape]

Reporter

Description

•

2 years ago

In experimental IPC fuzzing, we found the following crash on mozilla-central revision 20220627-a66dcaea4196 (fuzzing-asan-nyx-opt build):

==11691==ERROR: AddressSanitizer: SEGV on unknown address 0x20b5f574f575 (pc 0x7fcf60e757f8 bp 0x7fffbc22d1c0 sp 0x7fffbc22d1c0 T0)
==11691==The signal is caused by a READ memory access.
    #0 0x7fcf60e757f8 in std::__atomic_base<unsigned long>::load(std::memory_order) const /builds/worker/fetches/sysroot-x86_64-linux-gnu/usr/lib/gcc/x86_64-linux-gnu/7.5.0/../../../../include/c++/7.5.0/bits/atomic_base.h:396:9
    #1 0x7fcf60e757f8 in mozilla::detail::IntrinsicMemoryOps<unsigned long, (mozilla::MemoryOrdering)0>::load(std::atomic<unsigned long> const&) /builds/worker/workspace/obj-build/dist/include/mozilla/Atomics.h:195:17
    #2 0x7fcf60e757f8 in mozilla::detail::AtomicBaseIncDec<unsigned long, (mozilla::MemoryOrdering)0>::operator unsigned long() const /builds/worker/workspace/obj-build/dist/include/mozilla/Atomics.h:340:31
    #3 0x7fcf60e757f8 in js::gc::CellWithTenuredGCPointer<js::gc::Cell, js::Shape>::headerPtr() const /js/src/gc/Cell.h:802:46
    #4 0x7fcf60e757f8 in JSObject::shape() const /js/src/vm/JSObject.h:99:37
    #5 0x7fcf60e757f8 in JSObject::compartment() const /js/src/vm/JSObject.h:149:49
    #6 0x7fcf60f124c7 in js::ContextChecks::check(JSObject*, int) /js/src/vm/JSContext-inl.h:90:18
    #7 0x7fcf60f124c7 in js::ContextChecks::check(JS::Value const&, int) /js/src/vm/JSContext-inl.h:131:7
    #8 0x7fcf62da23bb in void JSContext::checkImpl<JS::MutableHandle<JS::Value> >(JS::MutableHandle<JS::Value> const&) /js/src/vm/JSContext-inl.h:213:33
    #9 0x7fcf62da23bb in void JSContext::check<JS::MutableHandle<JS::Value> >(JS::MutableHandle<JS::Value> const&) /js/src/vm/JSContext-inl.h:220:5
    #10 0x7fcf62da23bb in CallJSNative(JSContext*, bool (*)(JSContext*, unsigned int, JS::Value*), js::CallReason, JS::CallArgs const&) /js/src/vm/Interpreter.cpp:423:9
    #11 0x7fcf62da23bb in js::InternalCallOrConstruct(JSContext*, JS::CallArgs const&, js::MaybeConstruct, js::CallReason) /js/src/vm/Interpreter.cpp:508:12
    #12 0x7fcf62da4656 in InternalCall(JSContext*, js::AnyInvokeArgs const&, js::CallReason) /js/src/vm/Interpreter.cpp:575:10
    #13 0x7fcf62da4656 in js::Call(JSContext*, JS::Handle<JS::Value>, JS::Handle<JS::Value>, js::AnyInvokeArgs const&, JS::MutableHandle<JS::Value>, js::CallReason) /js/src/vm/Interpreter.cpp:606:8
    #14 0x7fcf62da66a6 in js::CallGetter(JSContext*, JS::Handle<JS::Value>, JS::Handle<JS::Value>, JS::MutableHandle<JS::Value>) /js/src/vm/Interpreter.cpp:732:10
    #15 0x7fcf61324e27 in CallGetter(JSContext*, JS::Handle<js::NativeObject*>, JS::Handle<JS::Value>, JS::Handle<JS::PropertyKey>, js::PropertyInfoBase<unsigned int>, JS::MutableHandle<JS::Value>) /js/src/vm/NativeObject.cpp:1973:12
    #16 0x7fcf612f8a9d in bool GetExistingProperty<(js::AllowGC)1>(JSContext*, js::MaybeRooted<JS::Value, (js::AllowGC)1>::HandleType, js::MaybeRooted<js::NativeObject*, (js::AllowGC)1>::HandleType, js::MaybeRooted<JS::PropertyKey, (js::AllowGC)1>::HandleType, js::PropertyInfoBase<unsigned int>, js::MaybeRooted<JS::Value, (js::AllowGC)1>::MutableHandleType) /js/src/vm/NativeObject.cpp:2001:12
    #17 0x7fcf612f8a9d in bool NativeGetPropertyInline<(js::AllowGC)1>(JSContext*, js::MaybeRooted<js::NativeObject*, (js::AllowGC)1>::HandleType, js::MaybeRooted<JS::Value, (js::AllowGC)1>::HandleType, js::MaybeRooted<JS::PropertyKey, (js::AllowGC)1>::HandleType, IsNameLookup, js::MaybeRooted<JS::Value, (js::AllowGC)1>::MutableHandleType) /js/src/vm/NativeObject.cpp:2143:14
    #18 0x7fcf612f8a9d in js::NativeGetProperty(JSContext*, JS::Handle<js::NativeObject*>, JS::Handle<JS::Value>, JS::Handle<JS::PropertyKey>, JS::MutableHandle<JS::Value>) /js/src/vm/NativeObject.cpp:2174:10
    #19 0x7fcf60ee72da in js::GetProperty(JSContext*, JS::Handle<JSObject*>, JS::Handle<JS::Value>, JS::Handle<JS::PropertyKey>, JS::MutableHandle<JS::Value>) /js/src/vm/ObjectOperations-inl.h:120:10
    #20 0x7fcf60ee72da in js::GetProperty(JSContext*, JS::Handle<JSObject*>, JS::Handle<JS::Value>, js::PropertyName*, JS::MutableHandle<JS::Value>) /js/src/vm/ObjectOperations-inl.h:127:10
    #21 0x7fcf62dab684 in js::GetProperty(JSContext*, JS::Handle<JS::Value>, JS::Handle<js::PropertyName*>, JS::MutableHandle<JS::Value>) /js/src/vm/Interpreter.cpp:4684:10
    #22 0x7fcf62d6bc85 in Interpret(JSContext*, js::RunState&) /js/src/vm/Interpreter.cpp:2995:12
    #23 0x7fcf62d66aef in js::RunScript(JSContext*, js::RunState&) /js/src/vm/Interpreter.cpp:390:13
    #24 0x7fcf62da1edf in js::InternalCallOrConstruct(JSContext*, JS::CallArgs const&, js::MaybeConstruct, js::CallReason) /js/src/vm/Interpreter.cpp:540:13
    #25 0x7fcf62da4656 in InternalCall(JSContext*, js::AnyInvokeArgs const&, js::CallReason) /js/src/vm/Interpreter.cpp:575:10
    #26 0x7fcf62da4656 in js::Call(JSContext*, JS::Handle<JS::Value>, JS::Handle<JS::Value>, js::AnyInvokeArgs const&, JS::MutableHandle<JS::Value>, js::CallReason) /js/src/vm/Interpreter.cpp:606:8
    #27 0x7fcf610543ef in JS_CallFunctionValue(JSContext*, JS::Handle<JSObject*>, JS::Handle<JS::Value>, JS::HandleValueArray const&, JS::MutableHandle<JS::Value>) /js/src/vm/CallAndConstruct.cpp:53:10
    #28 0x7fcf51fdc36a in nsXPCWrappedJS::CallMethod(unsigned short, nsXPTMethodInfo const*, nsXPTCMiniVariant*) /js/xpconnect/src/XPCWrappedJSClass.cpp:981:17
    #29 0x7fcf4feeadd3 in PrepareAndDispatch /xpcom/reflect/xptcall/md/unix/xptcstubs_x86_64_linux.cpp:115:37
    #30 0x7fcf4fee97b2 in SharedStub xptcstubs_x86_64_linux.cpp
    #31 0x7fcf4fc9b32f in nsObserverList::NotifyObservers(nsISupports*, char const*, char16_t const*) /xpcom/ds/nsObserverList.cpp:70:19
    #32 0x7fcf4fca6beb in nsObserverService::NotifyObservers(nsISupports*, char const*, char16_t const*) /xpcom/ds/nsObserverService.cpp:291:19
    #33 0x7fcf5932ecd4 in (anonymous namespace)::HangMonitorParent::ClearHangNotification() /dom/ipc/ProcessHangMonitor.cpp:818:20
    #34 0x7fcf5932f37a in void DispatchToMethod<(anonymous namespace)::HangMonitorParent, void ((anonymous namespace)::HangMonitorParent::*)()>((anonymous namespace)::HangMonitorParent*, void ((anonymous namespace)::HangMonitorParent::*)(), Tuple0 const&) /ipc/chromium/src/base/tuple.h:381:3
    #35 0x7fcf5932f37a in mozilla::ipc::TaskFactory<(anonymous namespace)::HangMonitorParent>::RunnableMethod<void ((anonymous namespace)::HangMonitorParent::*)(), Tuple0>::Run() /builds/worker/workspace/obj-build/dist/include/mozilla/ipc/TaskFactory.h:80:7
    #36 0x7fcf5932f17c in mozilla::ipc::TaskFactory<(anonymous namespace)::HangMonitorParent>::TaskWrapper<mozilla::ipc::TaskFactory<(anonymous namespace)::HangMonitorParent>::RunnableMethod<void ((anonymous namespace)::HangMonitorParent::*)(), Tuple0> >::Run() /builds/worker/workspace/obj-build/dist/include/mozilla/ipc/TaskFactory.h:37:44
    #37 0x7fcf4fe99fb3 in mozilla::RunnableTask::Run() /xpcom/threads/TaskController.cpp:538:16
    #38 0x7fcf4fe44485 in mozilla::TaskController::DoExecuteNextTaskOnlyMainThreadInternal(mozilla::detail::BaseAutoLock<mozilla::Mutex&> const&) /xpcom/threads/TaskController.cpp:851:26
    #39 0x7fcf4fe4078f in mozilla::TaskController::ExecuteNextTaskOnlyMainThreadInternal(mozilla::detail::BaseAutoLock<mozilla::Mutex&> const&) /xpcom/threads/TaskController.cpp:683:15
    #40 0x7fcf4fe41011 in mozilla::TaskController::ProcessPendingMTTask(bool) /xpcom/threads/TaskController.cpp:461:36
    #41 0x7fcf4fea69ce in mozilla::TaskController::InitializeInternal()::$_0::operator()() const /xpcom/threads/TaskController.cpp:187:37
    #42 0x7fcf4fea69ce in mozilla::detail::RunnableFunction<mozilla::TaskController::InitializeInternal()::$_0>::Run() /builds/worker/workspace/obj-build/dist/include/nsThreadUtils.h:531:5
    #43 0x7fcf4fe7047a in nsThread::ProcessNextEvent(bool, bool*) /xpcom/threads/nsThread.cpp:1205:16
    [...]
    #56 0x7fcf7c3e5c86 in __libc_start_main /build/glibc-uZu3wS/glibc-2.27/csu/../csu/libc-start.c:310

The good news is: We have a test case and we can reproduce this using the new Nyx in-browser replay code without a specialized environment. All you need is the attached testcase, a test page that I will attach in a few and a TC build that you can get as follows:

python -mfuzzfetch --nyx -a --fuzzing --target firefox

You can then create a new profile (called test1 here) and run:

AFL_IGNORE_PROBLEMS=1 NYX_FUZZER=IPC_Generic MOZ_FUZZ_TESTFILE=test.bin path/to/firefox --headless -P test1 file:///path/to/fuzz.html

The attached log also has all the raw IPC packets going over the wire.

Christian Holler (:decoder)

Reporter

Comment 1

•

2 years ago

Attached file Detailed Crash Information — Details

Christian Holler (:decoder)

Reporter

Comment 2

•

2 years ago

Attached file Testcase — Details

Christian Holler (:decoder)

Reporter

Comment 3

•

2 years ago

Attached file page.zip — Details

This is the full page that is used during fuzzing.

Andrew McCreight [:mccr8]

Updated

•

2 years ago

Group: core-security → dom-core-security

Andrew McCreight [:mccr8]

Updated

•

2 years ago

Group: dom-core-security → javascript-core-security

Andrew McCreight [:mccr8]

Comment 4

•

2 years ago

I haven't tried to reproduce this, but looking at the code that is in the stacks my guess is that this is an issue with the hang monitor and not the JS engine. HangMonitorParent::ClearHangNotification() is on the stack. That gets dispatched to the main thread in HangMonitorParent::RecvClearHang(). Maybe there's a ClearHang message, then the hang monitor parent shuts down, then the ClearHang runs and while it is in the middle of running JS something gets torn down?

Mostly I find it alarming that we're running JS while this is on the stack:

observerService->NotifyObservers(mProcess, "clear-hang-report", nullptr);

A strong ref should be taken of mProcess on the stack, and then that value passed in, or something.

Andrew McCreight [:mccr8]

Updated

•

2 years ago

Group: javascript-core-security → dom-core-security

Component: JavaScript Engine → DOM: Content Processes

Bobby Holley (:bholley)

Comment 5

•

2 years ago

Tom, I recall that eliminating nsObserverList was one of the things you proposed to harden against sandbox escapes. Is this more evidence of the footgun-ness or is it a coincidence?

Flags: needinfo?(tom)

Tom Ritter [:tjr] (OOTO until 4/30 at least)

Comment 6

•

2 years ago

I think a coincidence. I don't see any evidence of us adding an observer to an nsObserverList twice...

Flags: needinfo?(tom)

Daniel Veditz [:dveditz]

Updated

•

2 years ago

Keywords: csectype-sandbox-escape, csectype-wildptr, sec-high

Christian Holler (:decoder)

Reporter

Comment 7

•

2 years ago

Hey Jens, any idea who could look into this? If this is about ProcessHangMonitor, it looks like you also made some audit changes on that code in April (not sure if this is related, but we did not have continuous testing since then, so it could be a regression from this).

Flags: needinfo?(jstutte)

Christian Holler (:decoder)

Reporter

Updated

•

2 years ago

Whiteboard: [fuzzblocker]

Jens Stutte [:jstutte]

Comment 8

•

2 years ago

•

Edited

Thanks for the heads up, I wanted to take a look here, yes.

Jens Stutte [:jstutte]

Comment 9

•

2 years ago

(In reply to Andrew McCreight [:mccr8] (PTOish) from comment #4)

I haven't tried to reproduce this, but looking at the code that is in the stacks my guess is that this is an issue with the hang monitor and not the JS engine. HangMonitorParent::ClearHangNotification() is on the stack. That gets dispatched to the main thread in HangMonitorParent::RecvClearHang(). Maybe there's a ClearHang message, then the hang monitor parent shuts down, then the ClearHang runs and while it is in the middle of running JS something gets torn down?

Thanks for the analysis!

Mostly I find it alarming that we're running JS while this is on the stack:
observerService->NotifyObservers(mProcess, "clear-hang-report", nullptr);

We run most probably this ProcessHangMonitor observer written as jsm. That seems wanted by design (similar situation on Android).

A strong ref should be taken of mProcess on the stack, and then that value passed in, or something.

Looking at ProcessHangMonitor::RemoveProcess I do not think this alone would help. And the relation with the crash seems not obvious to me, but maybe I am overlooking something.

:decoder, I do not have a setup here where I can easily generate a pernosco session for this, it would be great to actually see the sequence of IPC calls where they hit the code. In alternative: Do we have any human-readable form of the IPC messages?

Flags: needinfo?(jstutte) → needinfo?(choller)

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 10

•

2 years ago

https://pernos.co/debug/i45t58X7pWKaWIuPSa8gVQ/index.html is from decoder.

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 11

•

2 years ago

•

Edited

Everything is rather heavily optimized out in that pernosco session.
decoder, would it be possible to get a pernosco session from a tiny bit less optimized build?

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 12

•

2 years ago

https://pernos.co/debug/-8C0PqEO7_GiR9LNf3FQng/index.html
seems to be quite a bit better pernosco session.

Christian Holler (:decoder)

Reporter

Comment 13

•

2 years ago

•

Edited

The one Olli posted in comment 12 is with --disable-optimize, let me know if this works.

Flags: needinfo?(choller)

Ryan VanderMeulen [:RyanVM]

Updated

•

2 years ago

status-firefox103: affected → wontfix

Christian Holler (:decoder)

Reporter

Comment 14

•

2 years ago

I enabled some debug statements to output the actor / message type of each message, here is a trace for a run that reproduced the crash:

INFO: Performing snapshot...
DEBUG: MakeTargetDecision: Protocol: PImageBridge msgType: PImageBridge::Msg_PMediaSystemResourceManagerConstructor
INFO: [OnActorConnected] ActorID 1 Protocol: PMediaSystemResourceManager
DEBUG: MakeTargetDecision: Protocol: PMediaSystemResourceManager msgType: PMediaSystemResourceManager::Msg_RemoveResourceManager
DEBUG: MakeTargetDecision: Protocol: PMediaSystemResourceManager msgType: PMediaSystemResourceManager::Msg___delete__
DEBUG: MakeTargetDecision: Protocol: PMediaSystemResourceManager msgType: PMediaSystemResourceManager::Msg___delete__
DEBUG: MakeTargetDecision: Protocol: PMediaSystemResourceManager msgType: PMediaSystemResourceManager::Msg_RemoveResourceManager
DEBUG: MakeTargetDecision: Protocol: PImageBridge msgType: PImageBridge::Msg_ParentAsyncMessages
DEBUG: MakeTargetDecision: Protocol: PContent msgType: PContent::Msg_RawMessage
DEBUG: MakeTargetDecision: Protocol: PProcessHangMonitor msgType: PProcessHangMonitor::Msg_HangEvidence
DEBUG: MakeTargetDecision: Protocol: PProcessHangMonitor msgType: PProcessHangMonitor::Msg_ClearHang
DEBUG: MakeTargetDecision: Protocol: PWindowGlobal msgType: PWindowGlobal::Msg_CheckPermitUnload
DEBUG: MakeTargetDecision: Protocol: PHttpBackgroundChannel msgType: PHttpBackgroundChannel::Msg_SetClassifierMatchedTrackingInfo
DEBUG: MakeTargetDecision: Protocol: PProfiler msgType: PProfiler::Msg_Stop

It's likely that not all of these messages are relevant of course.

Flags: needinfo?(jstutte)

Jens Stutte [:jstutte]

Comment 15

•

2 years ago

•

Edited

Interesting. On the crashing stack I see HangMonitorParent::ClearHangNotification which then fires a clear-hang-report notification which triggers a JS observer.

Could you try to replay only:

DEBUG: MakeTargetDecision: Protocol: PProcessHangMonitor msgType: PProcessHangMonitor::Msg_HangEvidence
DEBUG: MakeTargetDecision: Protocol: PProcessHangMonitor msgType: PProcessHangMonitor::Msg_ClearHang

or even just the Msg_ClearHang message to see if it is enough to reproduce?

Flags: needinfo?(jstutte) → needinfo?(choller)

Christian Holler (:decoder)

Reporter

Comment 16

•

2 years ago

(In reply to Jens Stutte [:jstutte] from comment #15)

Could you try to replay only:
DEBUG: MakeTargetDecision: Protocol: PProcessHangMonitor msgType: PProcessHangMonitor::Msg_HangEvidence
DEBUG: MakeTargetDecision: Protocol: PProcessHangMonitor msgType: PProcessHangMonitor::Msg_ClearHang
or even just the Msg_ClearHang message to see if it is enough to reproduce?

Unfortunately we can't replay selective messages yet. The reason is that the decision making for each packet also depends on the number of actors present (target actor selection is based on which actors are alive, so taking away some other messages can break the test).

I can try to see if I can eliminate some messages that wouldn't affect the actor count. In the long run, we would like to have a framework for serializing out test cases so they don't have the random selection in them anymore.

Flags: needinfo?(choller)

Christian Holler (:decoder)

Reporter

Comment 17

•

2 years ago

Since PProcessHangMonitor is a toplevel actor, my last comment is not applicable (number of actors only matters for sub protocols) and I was able to isolate the two messages. I can confirm that the bug reproduces just with the two messages to PProcessHangMonitor. I will try now to pin it to a single message.

Christian Holler (:decoder)

Reporter

Comment 18

•

2 years ago

So it looks like none of the messages alone trigger the bug, you really need both of the messages to PProcessHangMonitor in sequence.

Flags: needinfo?(jstutte)

Jens Stutte [:jstutte]

Comment 19

•

2 years ago

(In reply to Christian Holler (:decoder) from comment #18)

So it looks like none of the messages alone trigger the bug, you really need both of the messages to PProcessHangMonitor in sequence.

Thanks! Would it be too much to ask for an updated pernosco (debug build) session with just those two messages? In the other session from comment 12 I see two calls of RecvHangEvidence which seems strange and I'd like to know if this could be part of the problem.

Flags: needinfo?(jstutte)

Christian Holler (:decoder)

Reporter

Comment 20

•

2 years ago

(In reply to Jens Stutte [:jstutte] from comment #19)

Thanks! Would it be too much to ask for an updated pernosco (debug build) session with just those two messages? In the other session from comment 12 I see two calls of RecvHangEvidence which seems strange and I'd like to know if this could be part of the problem.

I can make an updated pernosco build, but it will have two calls again for each of the messages, only the second pair of calls is actually required. I confirmed locally that only the second pair is responsible because with my newer fuzzing logic, the first few message cause a "pinning" to a toplevel actor for the next 5 messages, so the fuzzer ends up producing different calls. With the filter in place to only actually send the two PProcessHangMonitor messages, this pinning never happens and the fuzzer again resorts to the older series of calls.

Kris Maglione [:kmag]

Assignee

Comment 21

•

2 years ago

I think Andrew is right that we should be storing a strong reference to mProcess. I have a hard time navigating JS stacks in Pernosco without the GDP pretty printing helpers we normally have, but we're crashing accessing the "hangDuration" property here. The report object is QIed from the mProcess that we pass in as the subject, which means it's probably getting freed somewhere in the middle of that function. Although I am not entirely clear on how that could happen.

Christian Holler (:decoder)

Reporter

Comment 22

•

2 years ago

Since I can reproduce the bug locally, I can test a potential patch with a strong reference to mProcess, just let me know.

Jens Stutte [:jstutte]

Comment 23

•

2 years ago

(In reply to Kris Maglione [:kmag] from comment #21)

I think Andrew is right that we should be storing a strong reference to mProcess. I have a hard time navigating JS stacks in Pernosco without the GDP pretty printing helpers we normally have, but we're crashing accessing the "hangDuration" property here. The report object is QIed from the mProcess that we pass in as the subject, which means it's probably getting freed somewhere in the middle of that function. Although I am not entirely clear on how that could happen.

I added some more notes in the pernosco notebook. AFAICS there is no release of the `HangMonitorParent' instance that contains the 'mProcess' before we crash. So I'd assume holding a reference to it while we notify will not change anything?
And yes, reading the JS as C++ stack is hard, no real progress here.

Jens Stutte [:jstutte]

Comment 24

•

2 years ago

•

Edited

Poking at this a bit more, I found an interesting twist. Best viewed in the pernosco session.

In a nutshell: HangDuration is read as -nan(0xfffffffffffff) and that makes the following check function derail, apparently. I assume we can heal this for this specific case if we just check the duration in HangMonitorParent::RecvHangEvidence, but I assume we want to check the check function, too ?

Flags: needinfo?(kmaglione+bmo)

Kris Maglione [:kmag]

Assignee

Comment 25

•

2 years ago

OK, that actually makes more sense. The wrapper for the process object should hold it alive, but I've seen cases where we've done weird things with reference counting and XPCOM objects get released while the wrapper is still alive anyway. But that also makes more sense of why the crash would be in the compartment checker and on the return value.

Anyway. This is actually quite scary...

So, I think the problem is here where we convert the return value. It essentially does JS::Value::setDouble which just does a bitwise cast. Since JS values use NaN boxing to determine the type of the object, that has the effect making us interpret the value as the wrong type if we have an odd NaN value.

That shouldn't be exploitable by numbers that come from JavaScript, since it generally shouldn't be able to generate those weird NaN values. But for values that we get from the network, or ArrayBuffers, that could open an exploit.

I'm not sure whether we should fix this in the XPConnect type conversion code or add stronger checks to the JS engine, though. We should maybe do both.

Flags: needinfo?(kmaglione+bmo)

Kris Maglione [:kmag]

Assignee

Comment 26

•

2 years ago

It might also be nice if the fuzzers could specifically try to use NaN values that the JS engine will interpret as objects in places where we expect double values, if they don't already.

Assignee: nobody → kmaglione+bmo

Component: DOM: Content Processes → XPConnect

Jens Stutte [:jstutte]

Comment 27

•

2 years ago

•

Edited

(In reply to Kris Maglione [:kmag] from comment #25)

Anyway. This is actually quite scary...

So, I think the problem is here where we convert the return value. It essentially does JS::Value::setDouble which just does a bitwise cast. Since JS values use NaN boxing to determine the type of the object, that has the effect making us interpret the value as the wrong type if we have an odd NaN value.

That shouldn't be exploitable by numbers that come from JavaScript, since it generally shouldn't be able to generate those weird NaN values. But for values that we get from the network, or ArrayBuffers, that could open an exploit.

I'm not sure whether we should fix this in the XPConnect type conversion code or add stronger checks to the JS engine, though. We should maybe do both.

Well, we can probably have more than one strategy to avoid situations like this. We might want to avoid such values to get into our C++ variables from the beginning, such that we do not need to mitigate (and can have maybe just an assert) on read ?
As you mention "values that we get from the network" I assume you intend (also) IPC, and it would be indeed nice if already HangMonitorParent::RecvHangEvidence (or its direct caller on IPC level) would return an IPC_FAIL here, potentially just killing the bogus child. The question is, if having a NaN check for double (float, ...) values already in the IPC layer would be correct - I just hope that we do not abuse NaN to signal a missing parameter or such anywhere... But having a stricter check on incoming IPC parameters in response to IPC fuzzing seems actually quite straight-forward?

Jan de Mooij [:jandem]

Comment 28

•

2 years ago

(In reply to Kris Maglione [:kmag] from comment #25)

I'm not sure whether we should fix this in the XPConnect type conversion code or add stronger checks to the JS engine, though. We should maybe do both.

We have a debug-assertion in Value::setDouble. We could consider making it a diagnostic or release assertion, but it would need some careful perf and binary size measurements.

Jens Stutte [:jstutte]

Comment 29

•

2 years ago

BTW, I do not think we can call this a regression? Bug 1558179 might have changed something here, but it seems limited to Sparc?

Andrew McCreight [:mccr8]

Updated

•

2 years ago

status-firefox104: --- → affected

status-firefox105: --- → affected

status-firefox-esr102: --- → wontfix

status-firefox-esr91: --- → affected

Keywords: regression

Andrew McCreight [:mccr8]

Comment 30

•

2 years ago

(In reply to Jens Stutte [:jstutte] from comment #27)

Well, we can probably have more than one strategy to avoid situations like this. We might want to avoid such values to get into our C++ variables from the beginning, such that we do not need to mitigate (and can have maybe just an assert) on read ?

I don't think we can check that all floating point values in the browser (or even ones that we send over IPC) are NaN-boxable. Presumably graphics code has tons of floats that never end up in JS.

The question is, if having a NaN check for double (float, ...) values already in the IPC layer would be correct - I just hope that we do not abuse NaN to signal a missing parameter or such anywhere... But having a stricter check on incoming IPC parameters in response to IPC fuzzing seems actually quite straight-forward?

Just having NaN isn't the problem. The problem is having the "wrong" kind of NaN value. Maybe we could canonicalize NaN values passing through IPC, though? I don't know enough about floating point values to know if that could cause subtle issues for graphics or media code...

Kris Maglione [:kmag]

Assignee

Comment 31

•

2 years ago

(In reply to Jens Stutte [:jstutte] from comment #27)

As you mention "values that we get from the network" I assume you intend (also) IPC

Yes, I'm including IPC, but there are lots of other places a binary double could potentially come over the network and pass through XPConnect to JavaScript, e.g., WebRTC.

The question is, if having a NaN check for double (float, ...) values already in the IPC layer would be correct - I just hope that we do not abuse NaN to signal a missing parameter or such anywhere... But having a stricter check on incoming IPC parameters in response to IPC fuzzing seems actually quite straight-forward?

I wouldn't necessarily be opposed to that. But I also don't think that using non-canonical NaN values for signaling is necessarily "abuse". They're valid values per the spec. They just become a problem when we pass them to the JavaScript engine which already assigns special meaning to them.

(In reply to Andrew McCreight [:mccr8] from comment #30)

Just having NaN isn't the problem. The problem is having the "wrong" kind of NaN value. Maybe we could canonicalize NaN values passing through IPC, though? I don't know enough about floating point values to know if that could cause subtle issues for graphics or media code...

I'd prefer crashing or returning IPC_FAIL rather than silently canonicalizing, since there could be code that legitimately uses them for sentinel values, and silently changing them could cause worse problems.

(In reply to Jan de Mooij [:jandem] from comment #28)

(In reply to Kris Maglione [:kmag] from comment #25)

I'm not sure whether we should fix this in the XPConnect type conversion code or add stronger checks to the JS engine, though. We should maybe do both.

We have a debug-assertion in Value::setDouble. We could consider making it a diagnostic or release assertion, but it would need some careful perf and binary size measurements.

Yes, I'm still thinking about what that would have to look like. I think we'd probably want to limit the change to the public API, and have JS internal callers (which are presumably more trustworthy) continue to set it with only a debug assert. For the XPConnect conversion code, I'm probably going to change it to use JS_NumberValue, which does the appropriate canonicalization already.

Kris Maglione [:kmag]

Assignee

Comment 32

•

2 years ago

I'm actually somewhat inclined at this point to just have setNumber(double) do CanonicalizeNaN since the only caller in the JS engine that uses it right now is MIR.cpp, which does retVal.setNumber(JS::CanonicalizeNaN(ret)), and I'm replacing all of the non-JS-internal callers.

But if we do, I'd rather do it in a follow-up. I'm trying to make this patch noisy enough that it doesn't point a direct finger at NaNs.

Kris Maglione [:kmag]

Assignee

Comment 33

•

2 years ago

(In reply to Kris Maglione [:kmag] from comment #32)

I'm actually somewhat inclined at this point to just have setNumber(double) do CanonicalizeNaN since the only caller in the JS engine that uses it right now is MIR.cpp, which does retVal.setNumber(JS::CanonicalizeNaN(ret)), and I'm replacing all of the non-JS-internal callers.

I take that back. Searchfox gets confused by setNumber(double) being called on wrapper types.

Kris Maglione [:kmag]

Assignee

Comment 34

•

2 years ago

Attached file Bug 1776658: Update some setNumber callers. r=jandem — Details

Kris Maglione [:kmag]

Assignee

Comment 35

•

2 years ago

Attached file Bug 1776658: Add test. r=jandem,mccr8 — Details

Kris Maglione [:kmag]

Assignee

Comment 36

•

2 years ago

Comment on attachment 9287153 [details]
Bug 1776658: Update some setNumber callers. r=jandem

Security Approval Request

How easily could an exploit be constructed based on the patch?: Not easily. Once an attacker understands that the issue is the JavaScript engine misinterpreting certain NaN values, they would still have to find a place where they could introduce a binary double value to one of the several call locations changed by the patch.

At that point, they could introduce an arbitrary pointer value that would be interpreted as a JSObject*. But they would then have to understand the memory layout of the process well enough to construct a pointer which wouldn't crash in the JS compartment checker, and would then be interpreted in an exploitable way. It's probably possible, but non-trivial.

Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?: No
Which older supported branches are affected by this flaw?: All
If not all supported branches, which bug introduced the flaw?: None
Do you have backports for the affected branches?: No
If not, how different, hard to create, and risky will they be?: The patch likely applies cleanly to all supported branches, but if not, it would be trivial to adapt.
How likely is this patch to cause regressions; how much testing does it need?: Unlikely. The only significant change is that several locations where we set JavaScript number values now also canonicalize any NaN values so they aren't interpreted as the wrong type. Any callers which would be affected in normal usage would already trigger a debug assertion in the current code.
Is Android affected?: Yes

Attachment #9287153 - Flags: sec-approval?

Kris Maglione [:kmag]

Assignee

Updated

•

2 years ago

Attachment #9287154 - Flags: sec-approval?

Tom Ritter [:tjr] (OOTO until 4/30 at least)

Comment 37

•

2 years ago

Comment on attachment 9287153 [details]
Bug 1776658: Update some setNumber callers. r=jandem

Approved to land and uplift. I would also like us to file a followup bug on some defense in depth work we can do here.

Attachment #9287153 - Flags: sec-approval? → sec-approval+

Tom Ritter [:tjr] (OOTO until 4/30 at least)

Comment 38

•

2 years ago

Comment on attachment 9287154 [details]
Bug 1776658: Add test. r=jandem,mccr8

Can land the test after or after Sept 6

Attachment #9287154 - Flags: sec-approval?

Tom Ritter [:tjr] (OOTO until 4/30 at least)

Updated

•

2 years ago

Whiteboard: [fuzzblocker] → [fuzzblocker][reminder-test 2022-09-06]

Kris Maglione [:kmag]

Assignee

Comment 39

•

2 years ago

Comment on attachment 9287153 [details]
Bug 1776658: Update some setNumber callers. r=jandem

Beta/Release Uplift Approval Request

User impact if declined: Possible privilege escalation or sandbox escape due to raw pointer access.
Is this code covered by automated tests?: Yes
Has the fix been verified in Nightly?: No
Needs manual test from QE?: Yes
If yes, steps to reproduce: STR is in comment 0
List of other uplifts needed: None
Risk to taking this patch: Low
Why is the change risky/not risky? (and alternatives if risky): It makes trivial changes to places where we set a JS::Value to a double so that only NaN values which would otherwise be misinterpreted as pointers or other special values are correctly interpreted as NaNs. Any places where that would cause a difference in behavior on production would otherwise cause a crash or possible exploit.
String changes made/needed:
Is Android affected?: Yes

ESR Uplift Approval Request

If this is not a sec:{high,crit} bug, please state case for ESR consideration:
User impact if declined: Possible privilege escalation or sandbox escape due to raw pointer access.
Fix Landed on Version:
Risk to taking this patch: Low
Why is the change risky/not risky? (and alternatives if risky): It makes trivial changes to places where we set a JS::Value to a double so that only NaN values which would otherwise be misinterpreted as pointers or other special values are correctly interpreted as NaNs. Any places where that would cause a difference in behavior on production would otherwise cause a crash or possible exploit.

Attachment #9287153 - Flags: approval-mozilla-release?

Attachment #9287153 - Flags: approval-mozilla-esr102?

Attachment #9287153 - Flags: approval-mozilla-beta?

Kris Maglione [:kmag]

Assignee

Updated

•

2 years ago

Flags: qe-verify+

Sebastian Hengst [:aryx] (needinfo me if it's about an intermittent or backout)

Comment 40

•

2 years ago

Landed: https://hg.mozilla.org/integration/autoland/rev/6409e37a183412fae409e9949f398c811e14af3c

Backed out for causing assertion failures in Value.h:
https://hg.mozilla.org/integration/autoland/rev/c545bb8954d577ddc8812bf5eb30aa8f888e5e8e

Push with failures
Failure log

Flags: needinfo?(kmaglione+bmo)

Ryan VanderMeulen [:RyanVM]

Updated

•

2 years ago

status-firefox-esr102: wontfix → affected

tracking-firefox104: --- → +

tracking-firefox105: --- → +

tracking-firefox-esr102: --- → 104+

tracking-firefox-esr91: --- → 104+

Kris Maglione [:kmag]

Assignee

Comment 41

•

2 years ago

(In reply to Sebastian Hengst [:aryx] (needinfo me if it's about an intermittent or backout) from comment #40)

Landed: https://hg.mozilla.org/integration/autoland/rev/6409e37a183412fae409e9949f398c811e14af3c

Backed out for causing assertion failures in Value.h:
https://hg.mozilla.org/integration/autoland/rev/c545bb8954d577ddc8812bf5eb30aa8f888e5e8e

Push with failures
Failure log

The cookie service sets the expiry time for session cookies to INT64_MAX, which can't be losslessly represented as a double. The setNumber overload for int64_t asserts that there isn't any loss of precision, while the previous explicit cast to a double ignored the loss in precision and asserts when we try to return it from a getter.

I think the version that asserts is more correct, so I'll try to return to it in a follow-up, but for now I'm just going to revert the change to that part of the conversion function to get this landed.

Flags: needinfo?(kmaglione+bmo)

Sebastian Hengst [:aryx] (needinfo me if it's about an intermittent or backout)

Comment 42

•

2 years ago

Update some setNumber callers. r=jandem
https://hg.mozilla.org/integration/autoland/rev/88c94f32cf0271a185cfc7a47665d3be5452e858
https://hg.mozilla.org/mozilla-central/rev/88c94f32cf02

Group: dom-core-security → core-security-release

Status: NEW → RESOLVED

Closed: 2 years ago

status-firefox105: affected → fixed

Resolution: --- → FIXED

Target Milestone: --- → 105 Branch

Ryan VanderMeulen [:RyanVM]

Updated

•

2 years ago

Attachment #9287153 - Flags: approval-mozilla-esr91?

Donal Meehan [:dmeehan]

Comment 43

•

2 years ago

Comment on attachment 9287153 [details]
Bug 1776658: Update some setNumber callers. r=jandem

Rejecting release uplift request for 103 - patch will go out with 104, ESR91.13, and ESR102.2 pending uplifts.

Attachment #9287153 - Flags: approval-mozilla-release? → approval-mozilla-release-

Dianna Smith [:diannaS]

Comment 44

•

2 years ago

Comment on attachment 9287153 [details]
Bug 1776658: Update some setNumber callers. r=jandem

Approved for 104.0b6

Attachment #9287153 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

Dianna Smith [:diannaS]

Comment 45

•

2 years ago

uplift

https://hg.mozilla.org/releases/mozilla-beta/rev/22075771cc8a

status-firefox104: affected → fixed

Brindusa Tot, Desktop QA

Updated

•

2 years ago

QA Whiteboard: [qa-triaged]

Alexandru Trif, Desktop QA [:atrif]

Comment 46

•

2 years ago

Attached file Firefox104.0b6 — Details

Reproduced the issue with Firefox 105.0a1 (20220727155540-5c209c536dad) asan-fuzzing-nyx build on Ubuntu 21.04. After executing the AFL_IGNORE_PROBLEMS=1 NYX_FUZZER=IPC_Generic MOZ_FUZZ_TESTFILE=test.bin '/home/alex/Desktop/1/firefox/firefox' --headless -P test1 file:'/home/alex/Downloads/fuzz.html' command on the affected build the AddressSanitizer: SEGV /builds/worker/fetches/sysroot-x86_64-linux-gnu/usr/lib/gcc/x86_64-linux-gnu/7.5.0/../../../../include/c++/7.5.0/bits/atomic_base.h:396:9 in std::__atomic_base<unsigned long>::load(std::memory_order) crash occurs.

The crash no longer occurs by using the same command with Firefox 105.0a1 (20220805092851) and 104.0b6 (20220804183742) asan-fuzzing-nyx builds on Ubuntu 21.04. However terminal gives Exiting due to channel error. after [Replay Mode] Nyx::release() called. and Firefox does not start (I don't know if it should). Is this expected? Attaching the log from the terminal as well. Thank you in advance.

Flags: needinfo?(kmaglione+bmo)

Christian Holler (:decoder)

Reporter

Comment 47

•

2 years ago

(In reply to Alexandru Trif, QA [:atrif] from comment #46)

However terminal gives Exiting due to channel error. after [Replay Mode] Nyx::release() called. and Firefox does not start (I don't know if it should). Is this expected? Attaching the log from the terminal as well. Thank you in advance.

Yes, this is expected behavior :)

Flags: needinfo?(kmaglione+bmo)

Ryan VanderMeulen [:RyanVM]

Comment 48

•

2 years ago

Comment on attachment 9287153 [details]
Bug 1776658: Update some setNumber callers. r=jandem

Approved for 102.2esr.

Attachment #9287153 - Flags: approval-mozilla-esr102? → approval-mozilla-esr102+

Ryan VanderMeulen [:RyanVM]

Comment 49

•

2 years ago

uplift

https://hg.mozilla.org/releases/mozilla-esr102/rev/e54330341b52

status-firefox-esr102: affected → fixed

Alexandru Trif, Desktop QA [:atrif]

Comment 50

•

2 years ago

•

Edited

(In reply to Christian Holler (:decoder) from comment #47)

Yes, this is expected behavior :)

Thank you Christian. Marking Firefox 105.0a1 and 104.0b6 as verified per comment 46 and comment 47.
Verified fixed with 102.0esr (20220808014342) asan-fuzzing-nyx build from comment 49 on Ubuntu 21.04. The crash no longer occurs after executing AFL_IGNORE_PROBLEMS=1 NYX_FUZZER=IPC_Generic MOZ_FUZZ_TESTFILE=test.bin '/home/alex/Desktop/esr102/firefox/firefox' --headless -P testesr2 file:'/home/alex/Downloads/fuzz.html' command.

status-firefox104: fixed → verified

status-firefox105: fixed → verified

status-firefox-esr102: fixed → verified

Ryan VanderMeulen [:RyanVM]

Comment 51

•

2 years ago

Comment on attachment 9287153 [details]
Bug 1776658: Update some setNumber callers. r=jandem

Approved for 91.13esr.

Attachment #9287153 - Flags: approval-mozilla-esr91? → approval-mozilla-esr91+

Ryan VanderMeulen [:RyanVM]

Comment 52

•

2 years ago

uplift

https://hg.mozilla.org/releases/mozilla-esr91/rev/b40d2204f6ce

status-firefox-esr91: affected → fixed

Alexandru Trif, Desktop QA [:atrif]

Comment 53

•

2 years ago

•

Edited

Hello! Can this issue be verified on 91.13esr as well? I'm asking because I can't find an asan-fuzzing-nyx build inside treeherder for 91.13esr like the one for 102.0esr (example). Is there a way to manually verify this? Thank you!

Flags: needinfo?(choller)

Christian Holler (:decoder)

Reporter

Comment 54

•

2 years ago

No, this cannot be verified on 91ESR.

Flags: needinfo?(choller)

Alexandru Trif, Desktop QA [:atrif]

Comment 55

•

2 years ago

(In reply to Christian Holler (:decoder) from comment #54)

No, this cannot be verified on 91ESR.

Thanks again. Closing this and removing flags.

Status: RESOLVED → VERIFIED

QA Whiteboard: [qa-triaged]

Flags: qe-verify+

Tom Ritter [:tjr] (OOTO until 4/30 at least)

Updated

•

2 years ago

Whiteboard: [fuzzblocker][reminder-test 2022-09-06] → [fuzzblocker][reminder-test 2022-09-06][adv-main104+r]

Tom Ritter [:tjr] (OOTO until 4/30 at least)

Updated

•

2 years ago

Whiteboard: [fuzzblocker][reminder-test 2022-09-06][adv-main104+r] → [fuzzblocker][reminder-test 2022-09-06][adv-main104+r][adv-esr91.13+r]

Tom Ritter [:tjr] (OOTO until 4/30 at least)

Updated

•

2 years ago

Whiteboard: [fuzzblocker][reminder-test 2022-09-06][adv-main104+r][adv-esr91.13+r] → [fuzzblocker][reminder-test 2022-09-06][adv-main104+r][adv-esr91.13+r][adv-esr102.2+r]

Daniel Veditz [:dveditz]

Updated

•

1 year ago

Group: core-security-release

BugBot [:suhaib / :marco/ :calixte]

Comment 56

•

1 year ago

5 months ago, Tom Ritter [:tjr] placed a reminder on the bug using the whiteboard tag [reminder-test 2022-09-06] .

kmag, please refer to the original comment to better understand the reason for the reminder.

Flags: needinfo?(kmaglione+bmo)

Whiteboard: [fuzzblocker][reminder-test 2022-09-06][adv-main104+r][adv-esr91.13+r][adv-esr102.2+r] → [fuzzblocker][adv-main104+r][adv-esr91.13+r][adv-esr102.2+r]

Andrew McCreight [:mccr8]

Comment 57

•

1 year ago

I'll try to take a look into landing the test.

Flags: needinfo?(kmaglione+bmo) → needinfo?(continuation)

Andrew McCreight [:mccr8]

Comment 58

•

1 year ago

I rebased and tested it locally, and then landed the old patch.

Flags: needinfo?(continuation)

Pulsebot

Comment 59

•

1 year ago

Pushed by amccreight@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/0e5bee0c373f
Add test. r=jandem,mccr8

Marian-Vasile Laza

Comment 60

•

1 year ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/0e5bee0c373f

Detailed Crash Information 2 years ago Christian Holler (:decoder) 46.50 KB, text/plain		Details
Testcase 2 years ago Christian Holler (:decoder) 10.21 KB, application/octet-stream		Details
page.zip 2 years ago Christian Holler (:decoder) 5.66 MB, application/zip		Details
Bug 1776658: Update some setNumber callers. r=jandem 2 years ago Kris Maglione [:kmag] 48 bytes, text/x-phabricator-request	diannaS : approval-mozilla-beta+ dmeehan : approval-mozilla-release- RyanVM : approval-mozilla-esr102+ RyanVM : approval-mozilla-esr91+ tjr : sec-approval+	Details \| Review
Bug 1776658: Add test. r=jandem,mccr8 2 years ago Kris Maglione [:kmag] 48 bytes, text/x-phabricator-request		Details \| Review
Firefox104.0b6 2 years ago Alexandru Trif, Desktop QA [:atrif] 39.27 KB, application/octet-stream		Details