Closed Bug 1776658 Opened 2 years ago Closed 2 years ago

Crash [@ std::__atomic_base<unsigned long>::load] through [@ JSObject::shape]

Categories

(Core :: XPConnect, defect)

x86_64
Linux
defect

Tracking

()

VERIFIED FIXED
105 Branch
Tracking Status
firefox-esr91 104+ fixed
firefox-esr102 104+ verified
firefox103 --- wontfix
firefox104 + verified
firefox105 + verified

People

(Reporter: decoder, Assigned: kmag)

Details

(5 keywords, Whiteboard: [fuzzblocker][adv-main104+r][adv-esr91.13+r][adv-esr102.2+r])

Crash Data

Attachments

(6 files)

In experimental IPC fuzzing, we found the following crash on mozilla-central revision 20220627-a66dcaea4196 (fuzzing-asan-nyx-opt build):

==11691==ERROR: AddressSanitizer: SEGV on unknown address 0x20b5f574f575 (pc 0x7fcf60e757f8 bp 0x7fffbc22d1c0 sp 0x7fffbc22d1c0 T0)
==11691==The signal is caused by a READ memory access.
    #0 0x7fcf60e757f8 in std::__atomic_base<unsigned long>::load(std::memory_order) const /builds/worker/fetches/sysroot-x86_64-linux-gnu/usr/lib/gcc/x86_64-linux-gnu/7.5.0/../../../../include/c++/7.5.0/bits/atomic_base.h:396:9
    #1 0x7fcf60e757f8 in mozilla::detail::IntrinsicMemoryOps<unsigned long, (mozilla::MemoryOrdering)0>::load(std::atomic<unsigned long> const&) /builds/worker/workspace/obj-build/dist/include/mozilla/Atomics.h:195:17
    #2 0x7fcf60e757f8 in mozilla::detail::AtomicBaseIncDec<unsigned long, (mozilla::MemoryOrdering)0>::operator unsigned long() const /builds/worker/workspace/obj-build/dist/include/mozilla/Atomics.h:340:31
    #3 0x7fcf60e757f8 in js::gc::CellWithTenuredGCPointer<js::gc::Cell, js::Shape>::headerPtr() const /js/src/gc/Cell.h:802:46
    #4 0x7fcf60e757f8 in JSObject::shape() const /js/src/vm/JSObject.h:99:37
    #5 0x7fcf60e757f8 in JSObject::compartment() const /js/src/vm/JSObject.h:149:49
    #6 0x7fcf60f124c7 in js::ContextChecks::check(JSObject*, int) /js/src/vm/JSContext-inl.h:90:18
    #7 0x7fcf60f124c7 in js::ContextChecks::check(JS::Value const&, int) /js/src/vm/JSContext-inl.h:131:7
    #8 0x7fcf62da23bb in void JSContext::checkImpl<JS::MutableHandle<JS::Value> >(JS::MutableHandle<JS::Value> const&) /js/src/vm/JSContext-inl.h:213:33
    #9 0x7fcf62da23bb in void JSContext::check<JS::MutableHandle<JS::Value> >(JS::MutableHandle<JS::Value> const&) /js/src/vm/JSContext-inl.h:220:5
    #10 0x7fcf62da23bb in CallJSNative(JSContext*, bool (*)(JSContext*, unsigned int, JS::Value*), js::CallReason, JS::CallArgs const&) /js/src/vm/Interpreter.cpp:423:9
    #11 0x7fcf62da23bb in js::InternalCallOrConstruct(JSContext*, JS::CallArgs const&, js::MaybeConstruct, js::CallReason) /js/src/vm/Interpreter.cpp:508:12
    #12 0x7fcf62da4656 in InternalCall(JSContext*, js::AnyInvokeArgs const&, js::CallReason) /js/src/vm/Interpreter.cpp:575:10
    #13 0x7fcf62da4656 in js::Call(JSContext*, JS::Handle<JS::Value>, JS::Handle<JS::Value>, js::AnyInvokeArgs const&, JS::MutableHandle<JS::Value>, js::CallReason) /js/src/vm/Interpreter.cpp:606:8
    #14 0x7fcf62da66a6 in js::CallGetter(JSContext*, JS::Handle<JS::Value>, JS::Handle<JS::Value>, JS::MutableHandle<JS::Value>) /js/src/vm/Interpreter.cpp:732:10
    #15 0x7fcf61324e27 in CallGetter(JSContext*, JS::Handle<js::NativeObject*>, JS::Handle<JS::Value>, JS::Handle<JS::PropertyKey>, js::PropertyInfoBase<unsigned int>, JS::MutableHandle<JS::Value>) /js/src/vm/NativeObject.cpp:1973:12
    #16 0x7fcf612f8a9d in bool GetExistingProperty<(js::AllowGC)1>(JSContext*, js::MaybeRooted<JS::Value, (js::AllowGC)1>::HandleType, js::MaybeRooted<js::NativeObject*, (js::AllowGC)1>::HandleType, js::MaybeRooted<JS::PropertyKey, (js::AllowGC)1>::HandleType, js::PropertyInfoBase<unsigned int>, js::MaybeRooted<JS::Value, (js::AllowGC)1>::MutableHandleType) /js/src/vm/NativeObject.cpp:2001:12
    #17 0x7fcf612f8a9d in bool NativeGetPropertyInline<(js::AllowGC)1>(JSContext*, js::MaybeRooted<js::NativeObject*, (js::AllowGC)1>::HandleType, js::MaybeRooted<JS::Value, (js::AllowGC)1>::HandleType, js::MaybeRooted<JS::PropertyKey, (js::AllowGC)1>::HandleType, IsNameLookup, js::MaybeRooted<JS::Value, (js::AllowGC)1>::MutableHandleType) /js/src/vm/NativeObject.cpp:2143:14
    #18 0x7fcf612f8a9d in js::NativeGetProperty(JSContext*, JS::Handle<js::NativeObject*>, JS::Handle<JS::Value>, JS::Handle<JS::PropertyKey>, JS::MutableHandle<JS::Value>) /js/src/vm/NativeObject.cpp:2174:10
    #19 0x7fcf60ee72da in js::GetProperty(JSContext*, JS::Handle<JSObject*>, JS::Handle<JS::Value>, JS::Handle<JS::PropertyKey>, JS::MutableHandle<JS::Value>) /js/src/vm/ObjectOperations-inl.h:120:10
    #20 0x7fcf60ee72da in js::GetProperty(JSContext*, JS::Handle<JSObject*>, JS::Handle<JS::Value>, js::PropertyName*, JS::MutableHandle<JS::Value>) /js/src/vm/ObjectOperations-inl.h:127:10
    #21 0x7fcf62dab684 in js::GetProperty(JSContext*, JS::Handle<JS::Value>, JS::Handle<js::PropertyName*>, JS::MutableHandle<JS::Value>) /js/src/vm/Interpreter.cpp:4684:10
    #22 0x7fcf62d6bc85 in Interpret(JSContext*, js::RunState&) /js/src/vm/Interpreter.cpp:2995:12
    #23 0x7fcf62d66aef in js::RunScript(JSContext*, js::RunState&) /js/src/vm/Interpreter.cpp:390:13
    #24 0x7fcf62da1edf in js::InternalCallOrConstruct(JSContext*, JS::CallArgs const&, js::MaybeConstruct, js::CallReason) /js/src/vm/Interpreter.cpp:540:13
    #25 0x7fcf62da4656 in InternalCall(JSContext*, js::AnyInvokeArgs const&, js::CallReason) /js/src/vm/Interpreter.cpp:575:10
    #26 0x7fcf62da4656 in js::Call(JSContext*, JS::Handle<JS::Value>, JS::Handle<JS::Value>, js::AnyInvokeArgs const&, JS::MutableHandle<JS::Value>, js::CallReason) /js/src/vm/Interpreter.cpp:606:8
    #27 0x7fcf610543ef in JS_CallFunctionValue(JSContext*, JS::Handle<JSObject*>, JS::Handle<JS::Value>, JS::HandleValueArray const&, JS::MutableHandle<JS::Value>) /js/src/vm/CallAndConstruct.cpp:53:10
    #28 0x7fcf51fdc36a in nsXPCWrappedJS::CallMethod(unsigned short, nsXPTMethodInfo const*, nsXPTCMiniVariant*) /js/xpconnect/src/XPCWrappedJSClass.cpp:981:17
    #29 0x7fcf4feeadd3 in PrepareAndDispatch /xpcom/reflect/xptcall/md/unix/xptcstubs_x86_64_linux.cpp:115:37
    #30 0x7fcf4fee97b2 in SharedStub xptcstubs_x86_64_linux.cpp
    #31 0x7fcf4fc9b32f in nsObserverList::NotifyObservers(nsISupports*, char const*, char16_t const*) /xpcom/ds/nsObserverList.cpp:70:19
    #32 0x7fcf4fca6beb in nsObserverService::NotifyObservers(nsISupports*, char const*, char16_t const*) /xpcom/ds/nsObserverService.cpp:291:19
    #33 0x7fcf5932ecd4 in (anonymous namespace)::HangMonitorParent::ClearHangNotification() /dom/ipc/ProcessHangMonitor.cpp:818:20
    #34 0x7fcf5932f37a in void DispatchToMethod<(anonymous namespace)::HangMonitorParent, void ((anonymous namespace)::HangMonitorParent::*)()>((anonymous namespace)::HangMonitorParent*, void ((anonymous namespace)::HangMonitorParent::*)(), Tuple0 const&) /ipc/chromium/src/base/tuple.h:381:3
    #35 0x7fcf5932f37a in mozilla::ipc::TaskFactory<(anonymous namespace)::HangMonitorParent>::RunnableMethod<void ((anonymous namespace)::HangMonitorParent::*)(), Tuple0>::Run() /builds/worker/workspace/obj-build/dist/include/mozilla/ipc/TaskFactory.h:80:7
    #36 0x7fcf5932f17c in mozilla::ipc::TaskFactory<(anonymous namespace)::HangMonitorParent>::TaskWrapper<mozilla::ipc::TaskFactory<(anonymous namespace)::HangMonitorParent>::RunnableMethod<void ((anonymous namespace)::HangMonitorParent::*)(), Tuple0> >::Run() /builds/worker/workspace/obj-build/dist/include/mozilla/ipc/TaskFactory.h:37:44
    #37 0x7fcf4fe99fb3 in mozilla::RunnableTask::Run() /xpcom/threads/TaskController.cpp:538:16
    #38 0x7fcf4fe44485 in mozilla::TaskController::DoExecuteNextTaskOnlyMainThreadInternal(mozilla::detail::BaseAutoLock<mozilla::Mutex&> const&) /xpcom/threads/TaskController.cpp:851:26
    #39 0x7fcf4fe4078f in mozilla::TaskController::ExecuteNextTaskOnlyMainThreadInternal(mozilla::detail::BaseAutoLock<mozilla::Mutex&> const&) /xpcom/threads/TaskController.cpp:683:15
    #40 0x7fcf4fe41011 in mozilla::TaskController::ProcessPendingMTTask(bool) /xpcom/threads/TaskController.cpp:461:36
    #41 0x7fcf4fea69ce in mozilla::TaskController::InitializeInternal()::$_0::operator()() const /xpcom/threads/TaskController.cpp:187:37
    #42 0x7fcf4fea69ce in mozilla::detail::RunnableFunction<mozilla::TaskController::InitializeInternal()::$_0>::Run() /builds/worker/workspace/obj-build/dist/include/nsThreadUtils.h:531:5
    #43 0x7fcf4fe7047a in nsThread::ProcessNextEvent(bool, bool*) /xpcom/threads/nsThread.cpp:1205:16
    [...]
    #56 0x7fcf7c3e5c86 in __libc_start_main /build/glibc-uZu3wS/glibc-2.27/csu/../csu/libc-start.c:310

The good news is: We have a test case and we can reproduce this using the new Nyx in-browser replay code without a specialized environment. All you need is the attached testcase, a test page that I will attach in a few and a TC build that you can get as follows:

python -mfuzzfetch --nyx -a --fuzzing --target firefox

You can then create a new profile (called test1 here) and run:

AFL_IGNORE_PROBLEMS=1 NYX_FUZZER=IPC_Generic MOZ_FUZZ_TESTFILE=test.bin path/to/firefox --headless -P test1 file:///path/to/fuzz.html

The attached log also has all the raw IPC packets going over the wire.

Attached file Testcase
Attached file page.zip

This is the full page that is used during fuzzing.

Group: core-security → dom-core-security
Group: dom-core-security → javascript-core-security

I haven't tried to reproduce this, but looking at the code that is in the stacks my guess is that this is an issue with the hang monitor and not the JS engine. HangMonitorParent::ClearHangNotification() is on the stack. That gets dispatched to the main thread in HangMonitorParent::RecvClearHang(). Maybe there's a ClearHang message, then the hang monitor parent shuts down, then the ClearHang runs and while it is in the middle of running JS something gets torn down?

Mostly I find it alarming that we're running JS while this is on the stack:

observerService->NotifyObservers(mProcess, "clear-hang-report", nullptr);

A strong ref should be taken of mProcess on the stack, and then that value passed in, or something.

Group: javascript-core-security → dom-core-security
Component: JavaScript Engine → DOM: Content Processes

Tom, I recall that eliminating nsObserverList was one of the things you proposed to harden against sandbox escapes. Is this more evidence of the footgun-ness or is it a coincidence?

Flags: needinfo?(tom)

I think a coincidence. I don't see any evidence of us adding an observer to an nsObserverList twice...

Flags: needinfo?(tom)

Hey Jens, any idea who could look into this? If this is about ProcessHangMonitor, it looks like you also made some audit changes on that code in April (not sure if this is related, but we did not have continuous testing since then, so it could be a regression from this).

Flags: needinfo?(jstutte)
Whiteboard: [fuzzblocker]

Thanks for the heads up, I wanted to take a look here, yes.

(In reply to Andrew McCreight [:mccr8] (PTOish) from comment #4)

I haven't tried to reproduce this, but looking at the code that is in the stacks my guess is that this is an issue with the hang monitor and not the JS engine. HangMonitorParent::ClearHangNotification() is on the stack. That gets dispatched to the main thread in HangMonitorParent::RecvClearHang(). Maybe there's a ClearHang message, then the hang monitor parent shuts down, then the ClearHang runs and while it is in the middle of running JS something gets torn down?

Thanks for the analysis!

Mostly I find it alarming that we're running JS while this is on the stack:

observerService->NotifyObservers(mProcess, "clear-hang-report", nullptr);

We run most probably this ProcessHangMonitor observer written as jsm. That seems wanted by design (similar situation on Android).

A strong ref should be taken of mProcess on the stack, and then that value passed in, or something.

Looking at ProcessHangMonitor::RemoveProcess I do not think this alone would help. And the relation with the crash seems not obvious to me, but maybe I am overlooking something.

:decoder, I do not have a setup here where I can easily generate a pernosco session for this, it would be great to actually see the sequence of IPC calls where they hit the code. In alternative: Do we have any human-readable form of the IPC messages?

Flags: needinfo?(jstutte) → needinfo?(choller)

Everything is rather heavily optimized out in that pernosco session.
decoder, would it be possible to get a pernosco session from a tiny bit less optimized build?

The one Olli posted in comment 12 is with --disable-optimize, let me know if this works.

Flags: needinfo?(choller)

I enabled some debug statements to output the actor / message type of each message, here is a trace for a run that reproduced the crash:

INFO: Performing snapshot...
DEBUG: MakeTargetDecision: Protocol: PImageBridge msgType: PImageBridge::Msg_PMediaSystemResourceManagerConstructor
INFO: [OnActorConnected] ActorID 1 Protocol: PMediaSystemResourceManager
DEBUG: MakeTargetDecision: Protocol: PMediaSystemResourceManager msgType: PMediaSystemResourceManager::Msg_RemoveResourceManager
DEBUG: MakeTargetDecision: Protocol: PMediaSystemResourceManager msgType: PMediaSystemResourceManager::Msg___delete__
DEBUG: MakeTargetDecision: Protocol: PMediaSystemResourceManager msgType: PMediaSystemResourceManager::Msg___delete__
DEBUG: MakeTargetDecision: Protocol: PMediaSystemResourceManager msgType: PMediaSystemResourceManager::Msg_RemoveResourceManager
DEBUG: MakeTargetDecision: Protocol: PImageBridge msgType: PImageBridge::Msg_ParentAsyncMessages
DEBUG: MakeTargetDecision: Protocol: PContent msgType: PContent::Msg_RawMessage
DEBUG: MakeTargetDecision: Protocol: PProcessHangMonitor msgType: PProcessHangMonitor::Msg_HangEvidence
DEBUG: MakeTargetDecision: Protocol: PProcessHangMonitor msgType: PProcessHangMonitor::Msg_ClearHang
DEBUG: MakeTargetDecision: Protocol: PWindowGlobal msgType: PWindowGlobal::Msg_CheckPermitUnload
DEBUG: MakeTargetDecision: Protocol: PHttpBackgroundChannel msgType: PHttpBackgroundChannel::Msg_SetClassifierMatchedTrackingInfo
DEBUG: MakeTargetDecision: Protocol: PProfiler msgType: PProfiler::Msg_Stop

It's likely that not all of these messages are relevant of course.

Flags: needinfo?(jstutte)

Interesting. On the crashing stack I see HangMonitorParent::ClearHangNotification which then fires a clear-hang-report notification which triggers a JS observer.

Could you try to replay only:

DEBUG: MakeTargetDecision: Protocol: PProcessHangMonitor msgType: PProcessHangMonitor::Msg_HangEvidence
DEBUG: MakeTargetDecision: Protocol: PProcessHangMonitor msgType: PProcessHangMonitor::Msg_ClearHang

or even just the Msg_ClearHang message to see if it is enough to reproduce?

Flags: needinfo?(jstutte) → needinfo?(choller)

(In reply to Jens Stutte [:jstutte] from comment #15)

Could you try to replay only:

DEBUG: MakeTargetDecision: Protocol: PProcessHangMonitor msgType: PProcessHangMonitor::Msg_HangEvidence
DEBUG: MakeTargetDecision: Protocol: PProcessHangMonitor msgType: PProcessHangMonitor::Msg_ClearHang

or even just the Msg_ClearHang message to see if it is enough to reproduce?

Unfortunately we can't replay selective messages yet. The reason is that the decision making for each packet also depends on the number of actors present (target actor selection is based on which actors are alive, so taking away some other messages can break the test).

I can try to see if I can eliminate some messages that wouldn't affect the actor count. In the long run, we would like to have a framework for serializing out test cases so they don't have the random selection in them anymore.

Flags: needinfo?(choller)

Since PProcessHangMonitor is a toplevel actor, my last comment is not applicable (number of actors only matters for sub protocols) and I was able to isolate the two messages. I can confirm that the bug reproduces just with the two messages to PProcessHangMonitor. I will try now to pin it to a single message.

So it looks like none of the messages alone trigger the bug, you really need both of the messages to PProcessHangMonitor in sequence.

Flags: needinfo?(jstutte)

(In reply to Christian Holler (:decoder) from comment #18)

So it looks like none of the messages alone trigger the bug, you really need both of the messages to PProcessHangMonitor in sequence.

Thanks! Would it be too much to ask for an updated pernosco (debug build) session with just those two messages? In the other session from comment 12 I see two calls of RecvHangEvidence which seems strange and I'd like to know if this could be part of the problem.

Flags: needinfo?(jstutte)

(In reply to Jens Stutte [:jstutte] from comment #19)

Thanks! Would it be too much to ask for an updated pernosco (debug build) session with just those two messages? In the other session from comment 12 I see two calls of RecvHangEvidence which seems strange and I'd like to know if this could be part of the problem.

I can make an updated pernosco build, but it will have two calls again for each of the messages, only the second pair of calls is actually required. I confirmed locally that only the second pair is responsible because with my newer fuzzing logic, the first few message cause a "pinning" to a toplevel actor for the next 5 messages, so the fuzzer ends up producing different calls. With the filter in place to only actually send the two PProcessHangMonitor messages, this pinning never happens and the fuzzer again resorts to the older series of calls.

I think Andrew is right that we should be storing a strong reference to mProcess. I have a hard time navigating JS stacks in Pernosco without the GDP pretty printing helpers we normally have, but we're crashing accessing the "hangDuration" property here. The report object is QIed from the mProcess that we pass in as the subject, which means it's probably getting freed somewhere in the middle of that function. Although I am not entirely clear on how that could happen.

Since I can reproduce the bug locally, I can test a potential patch with a strong reference to mProcess, just let me know.

(In reply to Kris Maglione [:kmag] from comment #21)

I think Andrew is right that we should be storing a strong reference to mProcess. I have a hard time navigating JS stacks in Pernosco without the GDP pretty printing helpers we normally have, but we're crashing accessing the "hangDuration" property here. The report object is QIed from the mProcess that we pass in as the subject, which means it's probably getting freed somewhere in the middle of that function. Although I am not entirely clear on how that could happen.

I added some more notes in the pernosco notebook. AFAICS there is no release of the `HangMonitorParent' instance that contains the 'mProcess' before we crash. So I'd assume holding a reference to it while we notify will not change anything?
And yes, reading the JS as C++ stack is hard, no real progress here.

Poking at this a bit more, I found an interesting twist. Best viewed in the pernosco session.

In a nutshell: HangDuration is read as -nan(0xfffffffffffff) and that makes the following check function derail, apparently. I assume we can heal this for this specific case if we just check the duration in HangMonitorParent::RecvHangEvidence, but I assume we want to check the check function, too ?

Flags: needinfo?(kmaglione+bmo)

OK, that actually makes more sense. The wrapper for the process object should hold it alive, but I've seen cases where we've done weird things with reference counting and XPCOM objects get released while the wrapper is still alive anyway. But that also makes more sense of why the crash would be in the compartment checker and on the return value.

Anyway. This is actually quite scary...

So, I think the problem is here where we convert the return value. It essentially does JS::Value::setDouble which just does a bitwise cast. Since JS values use NaN boxing to determine the type of the object, that has the effect making us interpret the value as the wrong type if we have an odd NaN value.

That shouldn't be exploitable by numbers that come from JavaScript, since it generally shouldn't be able to generate those weird NaN values. But for values that we get from the network, or ArrayBuffers, that could open an exploit.

I'm not sure whether we should fix this in the XPConnect type conversion code or add stronger checks to the JS engine, though. We should maybe do both.

Flags: needinfo?(kmaglione+bmo)

It might also be nice if the fuzzers could specifically try to use NaN values that the JS engine will interpret as objects in places where we expect double values, if they don't already.

Assignee: nobody → kmaglione+bmo
Component: DOM: Content Processes → XPConnect

(In reply to Kris Maglione [:kmag] from comment #25)

Anyway. This is actually quite scary...

So, I think the problem is here where we convert the return value. It essentially does JS::Value::setDouble which just does a bitwise cast. Since JS values use NaN boxing to determine the type of the object, that has the effect making us interpret the value as the wrong type if we have an odd NaN value.

That shouldn't be exploitable by numbers that come from JavaScript, since it generally shouldn't be able to generate those weird NaN values. But for values that we get from the network, or ArrayBuffers, that could open an exploit.

I'm not sure whether we should fix this in the XPConnect type conversion code or add stronger checks to the JS engine, though. We should maybe do both.

Well, we can probably have more than one strategy to avoid situations like this. We might want to avoid such values to get into our C++ variables from the beginning, such that we do not need to mitigate (and can have maybe just an assert) on read ?
As you mention "values that we get from the network" I assume you intend (also) IPC, and it would be indeed nice if already HangMonitorParent::RecvHangEvidence (or its direct caller on IPC level) would return an IPC_FAIL here, potentially just killing the bogus child. The question is, if having a NaN check for double (float, ...) values already in the IPC layer would be correct - I just hope that we do not abuse NaN to signal a missing parameter or such anywhere... But having a stricter check on incoming IPC parameters in response to IPC fuzzing seems actually quite straight-forward?

(In reply to Kris Maglione [:kmag] from comment #25)

I'm not sure whether we should fix this in the XPConnect type conversion code or add stronger checks to the JS engine, though. We should maybe do both.

We have a debug-assertion in Value::setDouble. We could consider making it a diagnostic or release assertion, but it would need some careful perf and binary size measurements.

BTW, I do not think we can call this a regression? Bug 1558179 might have changed something here, but it seems limited to Sparc?

(In reply to Jens Stutte [:jstutte] from comment #27)

Well, we can probably have more than one strategy to avoid situations like this. We might want to avoid such values to get into our C++ variables from the beginning, such that we do not need to mitigate (and can have maybe just an assert) on read ?

I don't think we can check that all floating point values in the browser (or even ones that we send over IPC) are NaN-boxable. Presumably graphics code has tons of floats that never end up in JS.

The question is, if having a NaN check for double (float, ...) values already in the IPC layer would be correct - I just hope that we do not abuse NaN to signal a missing parameter or such anywhere... But having a stricter check on incoming IPC parameters in response to IPC fuzzing seems actually quite straight-forward?

Just having NaN isn't the problem. The problem is having the "wrong" kind of NaN value. Maybe we could canonicalize NaN values passing through IPC, though? I don't know enough about floating point values to know if that could cause subtle issues for graphics or media code...

(In reply to Jens Stutte [:jstutte] from comment #27)

As you mention "values that we get from the network" I assume you intend (also) IPC

Yes, I'm including IPC, but there are lots of other places a binary double could potentially come over the network and pass through XPConnect to JavaScript, e.g., WebRTC.

The question is, if having a NaN check for double (float, ...) values already in the IPC layer would be correct - I just hope that we do not abuse NaN to signal a missing parameter or such anywhere... But having a stricter check on incoming IPC parameters in response to IPC fuzzing seems actually quite straight-forward?

I wouldn't necessarily be opposed to that. But I also don't think that using non-canonical NaN values for signaling is necessarily "abuse". They're valid values per the spec. They just become a problem when we pass them to the JavaScript engine which already assigns special meaning to them.

(In reply to Andrew McCreight [:mccr8] from comment #30)

Just having NaN isn't the problem. The problem is having the "wrong" kind of NaN value. Maybe we could canonicalize NaN values passing through IPC, though? I don't know enough about floating point values to know if that could cause subtle issues for graphics or media code...

I'd prefer crashing or returning IPC_FAIL rather than silently canonicalizing, since there could be code that legitimately uses them for sentinel values, and silently changing them could cause worse problems.

(In reply to Jan de Mooij [:jandem] from comment #28)

(In reply to Kris Maglione [:kmag] from comment #25)

I'm not sure whether we should fix this in the XPConnect type conversion code or add stronger checks to the JS engine, though. We should maybe do both.

We have a debug-assertion in Value::setDouble. We could consider making it a diagnostic or release assertion, but it would need some careful perf and binary size measurements.

Yes, I'm still thinking about what that would have to look like. I think we'd probably want to limit the change to the public API, and have JS internal callers (which are presumably more trustworthy) continue to set it with only a debug assert. For the XPConnect conversion code, I'm probably going to change it to use JS_NumberValue, which does the appropriate canonicalization already.

I'm actually somewhat inclined at this point to just have setNumber(double) do CanonicalizeNaN since the only caller in the JS engine that uses it right now is MIR.cpp, which does retVal.setNumber(JS::CanonicalizeNaN(ret)), and I'm replacing all of the non-JS-internal callers.

But if we do, I'd rather do it in a follow-up. I'm trying to make this patch noisy enough that it doesn't point a direct finger at NaNs.

(In reply to Kris Maglione [:kmag] from comment #32)

I'm actually somewhat inclined at this point to just have setNumber(double) do CanonicalizeNaN since the only caller in the JS engine that uses it right now is MIR.cpp, which does retVal.setNumber(JS::CanonicalizeNaN(ret)), and I'm replacing all of the non-JS-internal callers.

I take that back. Searchfox gets confused by setNumber(double) being called on wrapper types.

Comment on attachment 9287153 [details]
Bug 1776658: Update some setNumber callers. r=jandem

Security Approval Request

  • How easily could an exploit be constructed based on the patch?: Not easily. Once an attacker understands that the issue is the JavaScript engine misinterpreting certain NaN values, they would still have to find a place where they could introduce a binary double value to one of the several call locations changed by the patch.

At that point, they could introduce an arbitrary pointer value that would be interpreted as a JSObject*. But they would then have to understand the memory layout of the process well enough to construct a pointer which wouldn't crash in the JS compartment checker, and would then be interpreted in an exploitable way. It's probably possible, but non-trivial.

  • Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?: No
  • Which older supported branches are affected by this flaw?: All
  • If not all supported branches, which bug introduced the flaw?: None
  • Do you have backports for the affected branches?: No
  • If not, how different, hard to create, and risky will they be?: The patch likely applies cleanly to all supported branches, but if not, it would be trivial to adapt.
  • How likely is this patch to cause regressions; how much testing does it need?: Unlikely. The only significant change is that several locations where we set JavaScript number values now also canonicalize any NaN values so they aren't interpreted as the wrong type. Any callers which would be affected in normal usage would already trigger a debug assertion in the current code.
  • Is Android affected?: Yes
Attachment #9287153 - Flags: sec-approval?
Attachment #9287154 - Flags: sec-approval?

Comment on attachment 9287153 [details]
Bug 1776658: Update some setNumber callers. r=jandem

Approved to land and uplift. I would also like us to file a followup bug on some defense in depth work we can do here.

Attachment #9287153 - Flags: sec-approval? → sec-approval+

Comment on attachment 9287154 [details]
Bug 1776658: Add test. r=jandem,mccr8

Can land the test after or after Sept 6

Attachment #9287154 - Flags: sec-approval?
Whiteboard: [fuzzblocker] → [fuzzblocker][reminder-test 2022-09-06]

Comment on attachment 9287153 [details]
Bug 1776658: Update some setNumber callers. r=jandem

Beta/Release Uplift Approval Request

  • User impact if declined: Possible privilege escalation or sandbox escape due to raw pointer access.
  • Is this code covered by automated tests?: Yes
  • Has the fix been verified in Nightly?: No
  • Needs manual test from QE?: Yes
  • If yes, steps to reproduce: STR is in comment 0
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): It makes trivial changes to places where we set a JS::Value to a double so that only NaN values which would otherwise be misinterpreted as pointers or other special values are correctly interpreted as NaNs. Any places where that would cause a difference in behavior on production would otherwise cause a crash or possible exploit.
  • String changes made/needed:
  • Is Android affected?: Yes

ESR Uplift Approval Request

  • If this is not a sec:{high,crit} bug, please state case for ESR consideration:
  • User impact if declined: Possible privilege escalation or sandbox escape due to raw pointer access.
  • Fix Landed on Version:
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): It makes trivial changes to places where we set a JS::Value to a double so that only NaN values which would otherwise be misinterpreted as pointers or other special values are correctly interpreted as NaNs. Any places where that would cause a difference in behavior on production would otherwise cause a crash or possible exploit.
Attachment #9287153 - Flags: approval-mozilla-release?
Attachment #9287153 - Flags: approval-mozilla-esr102?
Attachment #9287153 - Flags: approval-mozilla-beta?
Flags: qe-verify+

(In reply to Sebastian Hengst [:aryx] (needinfo me if it's about an intermittent or backout) from comment #40)

Landed: https://hg.mozilla.org/integration/autoland/rev/6409e37a183412fae409e9949f398c811e14af3c

Backed out for causing assertion failures in Value.h:
https://hg.mozilla.org/integration/autoland/rev/c545bb8954d577ddc8812bf5eb30aa8f888e5e8e

Push with failures
Failure log

The cookie service sets the expiry time for session cookies to INT64_MAX, which can't be losslessly represented as a double. The setNumber overload for int64_t asserts that there isn't any loss of precision, while the previous explicit cast to a double ignored the loss in precision and asserts when we try to return it from a getter.

I think the version that asserts is more correct, so I'll try to return to it in a follow-up, but for now I'm just going to revert the change to that part of the conversion function to get this landed.

Flags: needinfo?(kmaglione+bmo)
Group: dom-core-security → core-security-release
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → 105 Branch
Attachment #9287153 - Flags: approval-mozilla-esr91?

Comment on attachment 9287153 [details]
Bug 1776658: Update some setNumber callers. r=jandem

Rejecting release uplift request for 103 - patch will go out with 104, ESR91.13, and ESR102.2 pending uplifts.

Attachment #9287153 - Flags: approval-mozilla-release? → approval-mozilla-release-

Comment on attachment 9287153 [details]
Bug 1776658: Update some setNumber callers. r=jandem

Approved for 104.0b6

Attachment #9287153 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
QA Whiteboard: [qa-triaged]
Attached file Firefox104.0b6

Reproduced the issue with Firefox 105.0a1 (20220727155540-5c209c536dad) asan-fuzzing-nyx build on Ubuntu 21.04. After executing the AFL_IGNORE_PROBLEMS=1 NYX_FUZZER=IPC_Generic MOZ_FUZZ_TESTFILE=test.bin '/home/alex/Desktop/1/firefox/firefox' --headless -P test1 file:'/home/alex/Downloads/fuzz.html' command on the affected build the AddressSanitizer: SEGV /builds/worker/fetches/sysroot-x86_64-linux-gnu/usr/lib/gcc/x86_64-linux-gnu/7.5.0/../../../../include/c++/7.5.0/bits/atomic_base.h:396:9 in std::__atomic_base<unsigned long>::load(std::memory_order) crash occurs.

The crash no longer occurs by using the same command with Firefox 105.0a1 (20220805092851) and 104.0b6 (20220804183742) asan-fuzzing-nyx builds on Ubuntu 21.04. However terminal gives Exiting due to channel error. after [Replay Mode] Nyx::release() called. and Firefox does not start (I don't know if it should). Is this expected? Attaching the log from the terminal as well. Thank you in advance.

Flags: needinfo?(kmaglione+bmo)

(In reply to Alexandru Trif, QA [:atrif] from comment #46)

However terminal gives Exiting due to channel error. after [Replay Mode] Nyx::release() called. and Firefox does not start (I don't know if it should). Is this expected? Attaching the log from the terminal as well. Thank you in advance.

Yes, this is expected behavior :)

Flags: needinfo?(kmaglione+bmo)

Comment on attachment 9287153 [details]
Bug 1776658: Update some setNumber callers. r=jandem

Approved for 102.2esr.

Attachment #9287153 - Flags: approval-mozilla-esr102? → approval-mozilla-esr102+

(In reply to Christian Holler (:decoder) from comment #47)

Yes, this is expected behavior :)

Thank you Christian. Marking Firefox 105.0a1 and 104.0b6 as verified per comment 46 and comment 47.
Verified fixed with 102.0esr (20220808014342) asan-fuzzing-nyx build from comment 49 on Ubuntu 21.04. The crash no longer occurs after executing AFL_IGNORE_PROBLEMS=1 NYX_FUZZER=IPC_Generic MOZ_FUZZ_TESTFILE=test.bin '/home/alex/Desktop/esr102/firefox/firefox' --headless -P testesr2 file:'/home/alex/Downloads/fuzz.html' command.

Comment on attachment 9287153 [details]
Bug 1776658: Update some setNumber callers. r=jandem

Approved for 91.13esr.

Attachment #9287153 - Flags: approval-mozilla-esr91? → approval-mozilla-esr91+

Hello! Can this issue be verified on 91.13esr as well? I'm asking because I can't find an asan-fuzzing-nyx build inside treeherder for 91.13esr like the one for 102.0esr (example). Is there a way to manually verify this? Thank you!

Flags: needinfo?(choller)

No, this cannot be verified on 91ESR.

Flags: needinfo?(choller)

(In reply to Christian Holler (:decoder) from comment #54)

No, this cannot be verified on 91ESR.

Thanks again. Closing this and removing flags.

Status: RESOLVED → VERIFIED
QA Whiteboard: [qa-triaged]
Flags: qe-verify+
Whiteboard: [fuzzblocker][reminder-test 2022-09-06] → [fuzzblocker][reminder-test 2022-09-06][adv-main104+r]
Whiteboard: [fuzzblocker][reminder-test 2022-09-06][adv-main104+r] → [fuzzblocker][reminder-test 2022-09-06][adv-main104+r][adv-esr91.13+r]
Whiteboard: [fuzzblocker][reminder-test 2022-09-06][adv-main104+r][adv-esr91.13+r] → [fuzzblocker][reminder-test 2022-09-06][adv-main104+r][adv-esr91.13+r][adv-esr102.2+r]
Group: core-security-release

5 months ago, Tom Ritter [:tjr] placed a reminder on the bug using the whiteboard tag [reminder-test 2022-09-06] .

kmag, please refer to the original comment to better understand the reason for the reminder.

Flags: needinfo?(kmaglione+bmo)
Whiteboard: [fuzzblocker][reminder-test 2022-09-06][adv-main104+r][adv-esr91.13+r][adv-esr102.2+r] → [fuzzblocker][adv-main104+r][adv-esr91.13+r][adv-esr102.2+r]

I'll try to take a look into landing the test.

Flags: needinfo?(kmaglione+bmo) → needinfo?(continuation)

I rebased and tested it locally, and then landed the old patch.

Flags: needinfo?(continuation)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: