1740534 - (CVE-2022-22763) Assertion failure: mBodyStream, at /dom/base/BodyStream.cpp:54

Jason Kratzer [:jkratzer]

Reporter

Description

•

3 years ago

•

Edited

Testcase found while fuzzing mozilla-central rev 333f08065c8c (built with: --enable-debug --enable-fuzzing).

Testcase can be reproduced using the following commands:

$ pip install fuzzfetch grizzly-framework
$ python -m fuzzfetch --build 333f08065c8c --debug --fuzzing -n firefox
$ python -m grizzly.replay ./firefox/firefox testcase.zip --repeat 10

Assertion failure: mBodyStream, at /dom/base/BodyStream.cpp:54

    ==721216==ERROR: UndefinedBehaviorSanitizer: SEGV on unknown address 0x000000000000 (pc 0x7f4a52b40f70 bp 0x7f4a484db430 sp 0x7f4a484db3f0 T721294)
    ==721216==The signal is caused by a WRITE memory access.
    ==721216==Hint: address points to the zero page.
        #0 0x7f4a52b40f70 in ForgetBodyStream /dom/base/BodyStream.cpp:54:3
        #1 0x7f4a52b40f70 in mozilla::dom::BodyStream::ReleaseObjects(mozilla::detail::BaseAutoLock<mozilla::Mutex&> const&) /dom/base/BodyStream.cpp:545:18
        #2 0x7f4a52b41916 in mozilla::dom::BodyStream::Close() /dom/base/BodyStream.cpp
        #3 0x7f4a52b3f4eb in operator() /dom/base/BodyStream.cpp:98:51
        #4 0x7f4a52b3f4eb in ~ScopeExit /builds/worker/workspace/obj-build/dist/include/mozilla/ScopeExit.h:106:7
        #5 0x7f4a52b3f4eb in mozilla::dom::BodyStream::Create(JSContext*, mozilla::dom::BodyStreamHolder*, nsIGlobalObject*, nsIInputStream*, mozilla::ErrorResult&) /dom/base/BodyStream.cpp:151:1
        #6 0x7f4a544a322d in mozilla::dom::FetchBody<mozilla::dom::Response>::GetBody(JSContext*, JS::MutableHandle<JSObject*>, mozilla::ErrorResult&) /dom/fetch/Fetch.cpp:1394:3
        #7 0x7f4a53492eed in mozilla::dom::Response_Binding::get_body(JSContext*, JS::Handle<JSObject*>, void*, JSJitGetterCallArgs) /builds/worker/workspace/obj-build/dom/bindings/ResponseBinding.cpp:1295:24
        #8 0x7f4a54053fdf in bool mozilla::dom::binding_detail::GenericGetter<mozilla::dom::binding_detail::NormalThisPolicy, mozilla::dom::binding_detail::ThrowExceptions>(JSContext*, unsigned int, JS::Value*) /dom/bindings/BindingUtils.cpp:3182:13
        #9 0x7f4a5795e5bf in CallJSNative(JSContext*, bool (*)(JSContext*, unsigned int, JS::Value*), js::CallReason, JS::CallArgs const&) /js/src/vm/Interpreter.cpp:385:13
        #10 0x7f4a5795dcbb in js::InternalCallOrConstruct(JSContext*, JS::CallArgs const&, js::MaybeConstruct, js::CallReason) /js/src/vm/Interpreter.cpp:472:12
        #11 0x7f4a5795f79e in InternalCall(JSContext*, js::AnyInvokeArgs const&, js::CallReason) /js/src/vm/Interpreter.cpp:532:10
        #12 0x7f4a5796082f in Call /js/src/vm/Interpreter.cpp:549:8
        #13 0x7f4a5796082f in js::CallGetter(JSContext*, JS::Handle<JS::Value>, JS::Handle<JS::Value>, JS::MutableHandle<JS::Value>) /js/src/vm/Interpreter.cpp:675:10
        #14 0x7f4a57ccda0f in CallGetter /js/src/vm/NativeObject.cpp:1942:12
        #15 0x7f4a57ccda0f in bool GetExistingProperty<(js::AllowGC)1>(JSContext*, js::MaybeRooted<JS::Value, (js::AllowGC)1>::HandleType, js::MaybeRooted<js::NativeObject*, (js::AllowGC)1>::HandleType, js::MaybeRooted<JS::PropertyKey, (js::AllowGC)1>::HandleType, js::PropertyInfoBase<unsigned int>, js::MaybeRooted<JS::Value, (js::AllowGC)1>::MutableHandleType) /js/src/vm/NativeObject.cpp:1970:12
        #16 0x7f4a57cce0a3 in bool NativeGetPropertyInline<(js::AllowGC)1>(JSContext*, js::MaybeRooted<js::NativeObject*, (js::AllowGC)1>::HandleType, js::MaybeRooted<JS::Value, (js::AllowGC)1>::HandleType, js::MaybeRooted<JS::PropertyKey, (js::AllowGC)1>::HandleType, IsNameLookup, js::MaybeRooted<JS::Value, (js::AllowGC)1>::MutableHandleType) /js/src/vm/NativeObject.cpp:2116:14
        #17 0x7f4a579647ef in GetProperty /js/src/vm/ObjectOperations-inl.h:115:10
        #18 0x7f4a579647ef in js::GetProperty(JSContext*, JS::Handle<JSObject*>, JS::Handle<JS::Value>, js::PropertyName*, JS::MutableHandle<JS::Value>) /js/src/vm/ObjectOperations-inl.h:122:10
        #19 0x7f4a57963ca0 in js::GetProperty(JSContext*, JS::Handle<JS::Value>, JS::Handle<js::PropertyName*>, JS::MutableHandle<JS::Value>) /js/src/vm/Interpreter.cpp:4548:10
        #20 0x7f4a579529bb in GetPropertyOperation /js/src/vm/Interpreter.cpp:203:10
        #21 0x7f4a579529bb in Interpret(JSContext*, js::RunState&) /js/src/vm/Interpreter.cpp:2904:12
        #22 0x7f4a5794bbe5 in js::RunScript(JSContext*, js::RunState&) /js/src/vm/Interpreter.cpp:354:13
        #23 0x7f4a5795dbb6 in js::InternalCallOrConstruct(JSContext*, JS::CallArgs const&, js::MaybeConstruct, js::CallReason) /js/src/vm/Interpreter.cpp:504:13
        #24 0x7f4a5795f79e in InternalCall(JSContext*, js::AnyInvokeArgs const&, js::CallReason) /js/src/vm/Interpreter.cpp:532:10
        #25 0x7f4a5795f9a1 in js::Call(JSContext*, JS::Handle<JS::Value>, JS::Handle<JS::Value>, js::AnyInvokeArgs const&, JS::MutableHandle<JS::Value>, js::CallReason) /js/src/vm/Interpreter.cpp:549:8
        #26 0x7f4a57b17c21 in JS::Call(JSContext*, JS::Handle<JS::Value>, JS::Handle<JS::Value>, JS::HandleValueArray const&, JS::MutableHandle<JS::Value>) /js/src/vm/CallAndConstruct.cpp:117:10
        #27 0x7f4a53df182c in mozilla::dom::VoidFunction::Call(mozilla::dom::BindingCallContext&, JS::Handle<JS::Value>, mozilla::ErrorResult&) /builds/worker/workspace/obj-build/dom/bindings/FunctionBinding.cpp:81:8
        #28 0x7f4a52d81e25 in mozilla::dom::VoidFunction::Call(mozilla::ErrorResult&, char const*, mozilla::dom::CallbackObject::ExceptionHandling, JS::Realm*) /builds/worker/workspace/obj-build/dist/include/mozilla/dom/FunctionBinding.h:172:12
        #29 0x7f4a52d81c0e in QueuedMicrotask::Run(mozilla::AutoSlowOperation&) /dom/base/nsIGlobalObject.cpp:271:31
        #30 0x7f4a50dcec18 in mozilla::CycleCollectedJSContext::PerformMicroTaskCheckPoint(bool) /xpcom/base/CycleCollectedJSContext.cpp:674:17
        #31 0x7f4a50dcfa3c in mozilla::CycleCollectedJSContext::AfterProcessTask(unsigned int) /xpcom/base/CycleCollectedJSContext.cpp:463:3
        #32 0x7f4a50efa45a in nsThread::ProcessNextEvent(bool, bool*) /xpcom/threads/nsThread.cpp:1212:24
        #33 0x7f4a50f0126a in NS_ProcessNextEvent(nsIThread*, bool) /xpcom/threads/nsThreadUtils.cpp:467:10
        #34 0x7f4a554243d8 in mozilla::dom::WorkerPrivate::DoRunLoop(JSContext*) /dom/workers/WorkerPrivate.cpp:3105:7
        #35 0x7f4a554047a7 in mozilla::dom::workerinternals::(anonymous namespace)::WorkerThreadPrimaryRunnable::Run() /dom/workers/RuntimeService.cpp:2244:42
        #36 0x7f4a50efa149 in nsThread::ProcessNextEvent(bool, bool*) /xpcom/threads/nsThread.cpp:1169:16
        #37 0x7f4a50f0126a in NS_ProcessNextEvent(nsIThread*, bool) /xpcom/threads/nsThreadUtils.cpp:467:10
        #38 0x7f4a5198e624 in mozilla::ipc::MessagePumpForNonMainThreads::Run(base::MessagePump::Delegate*) /ipc/glue/MessagePump.cpp:330:5
        #39 0x7f4a518ac9f7 in MessageLoop::RunInternal() /ipc/chromium/src/base/message_loop.cc:331:10
        #40 0x7f4a518ac902 in RunHandler /ipc/chromium/src/base/message_loop.cc:324:3
        #41 0x7f4a518ac902 in MessageLoop::Run() /ipc/chromium/src/base/message_loop.cc:306:3
        #42 0x7f4a50ef5dbb in nsThread::ThreadFunc(void*) /xpcom/threads/nsThread.cpp:391:10
        #43 0x7f4a67a98a07 in _pt_root /nsprpub/pr/src/pthreads/ptthread.c:201:5
        #44 0x7f4a6880c608 in start_thread /build/glibc-eX1tMB/glibc-2.31/nptl/pthread_create.c:477:8
        #45 0x7f4a683d4292 in __clone /build/glibc-eX1tMB/glibc-2.31/misc/../sysdeps/unix/sysv/linux/x86_64/clone.S:95
    
    UndefinedBehaviorSanitizer can not provide additional info.
    SUMMARY: UndefinedBehaviorSanitizer: SEGV /dom/base/BodyStream.cpp:54:3 in ForgetBodyStream
    ==721216==ABORTING

Jason Kratzer [:jkratzer]

Reporter

Comment 1

•

3 years ago

Attached file Testcase (obsolete) — Details

Bugmon [:jkratzer for issues]

Comment 2

•

3 years ago

Bugmon Analysis
Unable to reproduce bug 1740534 using build mozilla-central 20211110092453-333f08065c8c. Without a baseline, bugmon is unable to analyze this bug.
Removing bugmon keyword as no further action possible. Please review the bug and re-add the keyword for further analysis.

Keywords: bugmon

Hsin-Yi Tsai (she/her) [:hsinyi]

Updated

•

3 years ago

Component: DOM: Core & HTML → DOM: Streams

BugBot [:suhaib / :marco/ :calixte]

Comment 3

•

3 years ago

The severity field is not set for this bug.
:mgaudet, could you have a look please?

For more information, please visit auto_nag documentation.

Flags: needinfo?(mgaudet)

Matthew Gaudet (he/him) [:mgaudet]

Comment 4

•

3 years ago

So: I could not reproduce this failure. (I do wonder if the testcase.zip is a bit incomplete? I looked at the test case and it expects a worker.js, but there's no worker.js packaged?)

The stack trace is definitely peculiar: We are in the middle of firing the ScopeExit cleanup method. This means that we are taking an early exit from this method. There are four possible paths.

Given the stack and crash, we can guess that somehow this BodyStream has ended up with a null mStreamHolder member; mStreamHolder is initialized as part of the BodyStream constructor, and there's a diagnostic assert to guarantee it's non-null at that point. The other place where it is modified is part of ReleaseObjects, where it is nulled out. However, before then, there's a state guard....

So I'm not sure what to make of this one. For now, I'm going to mark as P3/S3.

Severity: -- → S3

Flags: needinfo?(mgaudet)

Priority: -- → P3

Jason Kratzer [:jkratzer]

Reporter

Comment 5

•

3 years ago

Attached file Testcase (obsolete) — Details

My apologies. It looks like I had uploaded an incomplete testcase. The newly attached testcase can be reproduced using the following commands:

$ pip install fuzzfetch grizzly-framework
$ python -m fuzzfetch --build d03f87555639 --debug --fuzzing -n firefox
$ python -m grizzly.replay ./firefox/firefox testcase.zip --repeat 10

Attachment #9250210 - Attachment is obsolete: true

Jason Kratzer [:jkratzer]

Reporter

Comment 6

•

3 years ago

Attached file testcase.zip — Details

Attachment #9252865 - Attachment is obsolete: true

Matthew Gaudet (he/him) [:mgaudet]

Comment 7

•

3 years ago

Hmm.

OK; so with the updated test case, I can successfully reproduce the failure now. However, I cannot get it to reproduce under rr, which makes me wonder if there's something racy...; still, just reproducing isn't a huge help, unless I can get more information.

Jason: For me this test case reproduces first try with python3 -m grizzly.replay ./firefox/firefox testcase.zip --repeat 10 --xvfb; I wonder, is it possible to get grizzly.replay to run firefox under a regular debugger? Or is it possible to have your automation try to get a pernosco reproduction automatically?

Flags: needinfo?(jkratzer)

Jason Kratzer [:jkratzer]

Reporter

Comment 8

•

3 years ago

:mgaudet, unfortunately I've also been trying to get a pernosco session of this bug without any luck so far. As for running under a regular debugger, grizzly.replay does not have that capability. However, I can confirm that the crash triggers on a default debug build without any special prefs set. Just unzip the testcase, start a local webserver and point Firefox to testcase.html.

Flags: needinfo?(jkratzer)

Tyson Smith [:tsmith]

Comment 9

•

3 years ago

A Pernosco session is available here: https://pernos.co/debug/paeqwbX6GxYe8l4H0PZdGQ/index.html

Tyson Smith [:tsmith]

Updated

•

3 years ago

Flags: needinfo?(mgaudet)

Matthew Gaudet (he/him) [:mgaudet]

Comment 10

•

3 years ago

🤩 Thanks so much Tyson! I'm excited to dig in!

Tyson Smith [:tsmith]

Comment 11

•

3 years ago

(In reply to Matthew Gaudet (he/him) [:mgaudet] from comment #10)

🤩 Thanks so much Tyson! I'm excited to dig in!

Happy to help! That one was one of the most stubborn to reproduce (under rr) I've seen :)

Matthew Gaudet (he/him) [:mgaudet]

Comment 12

•

3 years ago

Ok. There's some more detailed notes in the pernosco trace. Given the trace, I suspect I understand more about why this was so hard to reproduce.

This assertion failure is the combination of two factors playing together:

Worker Shutdown. As I understand it, Workers can go away at any time but as near as I can tell, during shutdown there is still some (JS) code execution that happens once shutdown has started.
Fetch Bodies try to lazily create their body streams. Note, a FetchBody is also a BodyStreamHolder

My read of the Pernosco trace (this bug is a huge huge argument in favour of the productivity benefit it brings) is this:

We have a fetch inside a worker, and we request the stream.
As part of this, We create a WeakWorkerRef, and as a shutdown callback we tell it to call stream->Close()
The worker begins to shutdown. At this point the WeakWorkerRef is notified, and stream->Close() is called. This nulls out the stream in the BodyStreamHolder (FetchBody)
This is the part that is most unclear to me: Some more JS runs, and we yet-again try to get a Stream for the FetchBody. The Lazy stream creation code kicks in again, so we start to create a different BodyStream.
Because we're in a worker, we try to create a WeakWorkerRef, again, but this time the system declines, as we're already cancelling; so we side exit from BodyStream::Create. Now, to ensure cleanup, there's a ScopeExit on the stack inside of BodyStream::Create, which calls stream->Close() on any side-exits from the create function, so this is invoked. This stream knows that it has FetchBody as a stream holder, and tries to disconnect that edge... but here's where the assertion fires, as FetchBody already had a stream created and nulled out, but mStreamCreated was never set to false.

So: the fix for this bug isn't immediately clear to me. It seems to me that we probably don't want to be trying to create a BodyStream for a FetchBody in a worker if the worker is already terminating... but I don't entirely understand what JS code should be continuing to run and under what circumstances during worker shutdown.

Andrew: is there a pattern for this kind of thing that is established elsewhere?

Flags: needinfo?(mgaudet) → needinfo?(bugmail)

Andrew Sutherland [:asuth] (he/him)

Assignee

Comment 13

•

3 years ago

Marking this as a secbug as this now gets into territory of pointing out interesting things for security researchers; it might be reasonable to relax this.

The situation:

We're running a content Promise during a microtask queue drain during worker shutdown and we should not be.
- The pernosco trace is missing the generated Unified_cpp_dom_base8.cpp and UnifiedBindings6.cpp, presumably due to omitting the objdir from the command-line passed to pernosco-submit? But it seems likely these frames are in CallbackObject and are inlined.
The check if we can run script algorithm is generally responsible for making sure we don't call into content too late. It's sorta window-specific in the spec. Specific spec call-sites are:
- In run a classic script.
- In run a module script.
- In HostEnqueuePromiseJob which notably enqueues a microtask that runs the check at runtime (not enqueue time).
Our implementation of "check if we can run script" is this logic in CallbackObject::CallSetup::CallSetup. This combines:
- For windows, the spec window active document check
- In nsIGlobalObject::IsScriptForbidden:
  - A failsafe back-stop for workers to avoid calling into content during ClearMainEventQueue that I added in bug 1545345 (and made some supporting changes).
  - A Scriptability check. This allows:
    - Blocking at creation time by: determination via nsIPrincipal::IsScriptAllowedByPolicy
    - Blocking dynamically via Scriptability::SetWindowAllowsScript which can be altered by getting disabled by going into the bfcache in WindowStateHolder, and BrowserContext/WindowContext propagation and recomputation when the synced AllowJavascript field changes.
    - Blocking dynamically with paired allow/block via: Cu.blockScriptForGlobal, the slow script dialog disabling script, an additional action taken in NukeAllWrappersForRealm.
Notably we don't check nsIGlobalObject::IsDying which is our closest approximation to appropriate logic for "check if we can run script" for a worker.
- This method is used in a number of spots
- A notable other helper API in this area is DOMEventTargetHelper::CheckCurrentGlobalCorrectness which uses IsDying
The lack of this check to IsDying is the problem.
- Interestingly, while QueuedMicrotask::Run does not check IsDying(), PromiseJobRunnable does check IsDying..
This conservative approach here was taken by me in bug 1545345 because there was a security bug that needed to be backported and I was more concerned about regressions for a backport and less confident/in-the-know about spec related things for this. (Also, the spec and consensus around global lifetimes has improved a lot.)

As a solution I would propose:

nsIGlobalObject should gain a method that more directly corresponds to check if we can run script like CanRunScript.
- If mIsDying or mIsScriptForbidden, we return false. This covers the normal global-is-dying case plus the worker paranoia case where ClearMainEventQueue is happening.
- For windows, this maintains the HasActiveDocument check, returning false if it's not active.
- The Scriptability Allowed check from IsScriptForbidden can be in there or can move back out to CallbackObject.
We haveCallbackObject use CanRunScript. It's already the case that we are doing virtual dispatch to call IsScriptForbidden, so as long as we don't get crazy with the virtual dispatch, performance isn't likely to be a problem.

:smaug, does this seem reasonable from an implementation perspective?

:annevk, does it make sense that the can run script spec should perhaps gain some more discussion of workers? (Maybe there's existing issues/PR's on this?)

Group: core-security

Flags: needinfo?(bugs)

Flags: needinfo?(bugmail)

Flags: needinfo?(annevk)

Andrew McCreight [:mccr8]

Updated

•

3 years ago

Group: core-security → dom-core-security

Anne (:annevk)

Comment 14

•

3 years ago

Yeah, it would make sense for that to check https://html.spec.whatwg.org/#dom-workerglobalscope-closing I think. Should be an easy PR, but maybe we want to wait until a fix is in?

Flags: needinfo?(annevk)

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 15

•

3 years ago

I don't think QueuedMicrotask::Run should check anything, but Callsetup should. And it does. So IsScriptForbidden could check mIsDying

Flags: needinfo?(bugs)

Andrew Sutherland [:asuth] (he/him)

Assignee

Comment 16

•

3 years ago

(In reply to Olli Pettay [:smaug] from comment #15)

I don't think QueuedMicrotask::Run should check anything, but Callsetup should. And it does. So IsScriptForbidden could check mIsDying

Agreed on QueuedMicrotask. And yeah, just altering IsScriptForbidden sounds like a great minimal fix, thanks! We can potentially rename things in the future (and not near as a security bug) to look more like spec terminology.

Assignee: nobody → bugmail

Status: NEW → ASSIGNED

Andrew Sutherland [:asuth] (he/him)

Assignee

Comment 17

•

3 years ago

•

Edited

I'm categorizing this as sec-moderate from https://wiki.mozilla.org/Security_Severity_Ratings/Client with a rationale of:

I don't believe this is directly exploitable. I also don't believe this is indirectly exploitable.
- This only allows a single draining of the microtask queue after a task where content was already running.
- We will not have run any shutdown-related GC's or any other specialized shutdown logic on the way out of the primary runnable because we have not yet begun to unwind that loop.
  - The scope and JS context are all still valid. The dying global is a DOM construct, not a JS construct.
- WorkerRefs will correctly fail to be created at this time which code should handle.
However this is still definitely an invariant violation and there's no need to point people directly at it, so this still gets to be a security bug. (It's of course plausible that this could be chained with other, more serious bugs.)
Because there is an invariant being violated here, as in this bug, code can get upset as its invariants get violated as code runs after WorkerRefs had their callbacks invoked and their DETHs disconnected. So we do expect assertion violations and potentially nullptr dereferences.
- DOMEventTargetHelper instance will bind to parents even though CheckCurrentGlobalCorrectness would return false, however, the DETH instances will properly be detached again on our way out of the primary run loop (and actually would be done again in nsIGlobalObject's destructor where the scope closes in the next line, but no new DETHs should be added during that interval; if they are, that would be a different, serious bug).

I think it's worth fixing and landing this bug now with a target of uplifting into beta here early in the cycle, especially as it is likely that this will likely help eliminate a number of fuzzing errors that could otherwise be distracting. It could also make sense to let this be uplifted to release if there are dot releases but I don't believe this would necessarily be urgent for security reasons, but could be desirable as there could reasonably be non-exploitable crashes that could be eliminated.

Keywords: sec-moderate

Andrew Sutherland [:asuth] (he/him)

Assignee

Comment 18

•

3 years ago

•

Edited

(In reply to Anne (:annevk) from comment #14)

Yeah, it would make sense for that to check https://html.spec.whatwg.org/#dom-workerglobalscope-closing I think. Should be an easy PR, but maybe we want to wait until a fix is in?

Yes, agreed on waiting. There may also be some edge cases here around the difference between a self.close() and a Worker.terminate() to deal with.

Andrew Sutherland [:asuth] (he/him)

Assignee

Comment 19

•

3 years ago

Attached file Bug 1740534 - Improve global consistency. r=smaug — Details

Andrew Sutherland [:asuth] (he/him)

Assignee

Comment 20

•

3 years ago

Try run at https://treeherder.mozilla.org/jobs?repo=try&revision=9d3527ff3b0662fa892d67c8fbbef97d147420a0 shows that tests that are explicitly about things still working in detached iframes get angry when things do not work in detached iframes! I feel like we've been talking about being more strict for detached iframes in general and it seems like there are somehow maybe only 2 of these[1], so maybe this is all okay?

Specific things:

Bug 986542 (in 2014) gave us https://searchfox.org/mozilla-central/source/js/xpconnect/tests/mochitest/test_bug986542.html that wants onclick to work in a detached iframe. This decision seems to have been made on the basis of the site in question breaking between Firefox 27 and Firefox 28.
https://searchfox.org/mozilla-central/source/testing/web-platform/tests/dom/traversal/TreeWalker-acceptNode-filter-cross-realm-null-browsing-context.html fails with: TEST-UNEXPECTED-FAIL | /dom/traversal/TreeWalker-acceptNode-filter-cross-realm-null-browsing-context.html | TreeWalker: NodeFilter from detached iframe works as expected - NodeIterator.nextNode: Refusing to execute function from global in which script is disabled.

Chrome (and of course therefore Edge) fail this per wpt.fyi which seems like they're probably right about this? The error they report is FAIL message: Failed to execute 'acceptNode' on 'NodeFilter': The provided callback is no longer runnable. but presumably these differences are our platform specific strings?

:smaug / :annevk, does it seem reasonable for us to be treating detached iframes properly as dead globals or is this the tip of a compat nightmare for now and only workers need to care about dead globals?

1: Of course, as soon as a test fails in a dir, it's basically game over, so I've revised my patch to delete the mochitest for the first failure and altered our expectation to failure for the 2nd (with the expectation that we can fix the test in the future when we do the spec-y stuff) and have pushed that with another try push at https://treeherder.mozilla.org/#/jobs?repo=try&revision=c412faec7468fa13cbebc7aa63f0cc34e4d085bf

Flags: needinfo?(bugs)

Flags: needinfo?(annevk)

Anne (:annevk)

Comment 21

•

3 years ago

A detached iframe does not result in a "dead" global. You can still have a hold on it and do things with it. This is different from the worker scenario. In the worker scenario it's really the agent that goes down and in the detached iframe case the agent is still very much there (except I suppose if it was cross-site). A detached iframe does result in a document that is no longer fully active, so it can no longer queue tasks and such.

I think it's known that Chromium has bugs here.

So my recommendation would be to limit this to workers.

Flags: needinfo?(annevk)

Andrew Sutherland [:asuth] (he/him)

Assignee

Comment 22

•

3 years ago

(In reply to Andrew Sutherland [:asuth] (he/him) from comment #20)

Specific things:

Bug 986542 (in 2014) gave us https://searchfox.org/mozilla-central/source/js/xpconnect/tests/mochitest/test_bug986542.html that wants onclick to work in a detached iframe. This decision seems to have been made on the basis of the site in question breaking between Firefox 27 and Firefox 28.

The pernosco robot gave me a pernosco trace for this automatically.

Copying some limited analysis I did into here: https://searchfox.org/mozilla-central/source/js/xpconnect/tests/mochitest/test_bug986542.html is doing an intentionally weird thing with target.onclick = ifr.contentWindow.f; where it's trying to have the parent window's click handler be a reference to JS code that's explicitly in the iframe's global. (Like the function it's referencing was created by a dynamically generated script tag string, so it's not accidentally creating the function in the parent window's global.)

I think our change in behavior here may be appropriate, as I believe Can Run Script is saying we shouldn't run script if the detached iframe isn't fully active, which would be the situation in this case unless there's a wrapper bug where the environment settings object that should be considered as used is the parent window?

Andrew Sutherland [:asuth] (he/him)

Assignee

Comment 23

•

3 years ago

•

Edited

Replicating some of my comments to :annevk in slack here too, but I think the key thing is that I mis-characterized these failures as indicating this would be a major regression in our behavior, when the reality is that there are two tests that seem to be broken per spec.

(In reply to Andrew Sutherland [:asuth] (he/him) from comment #20)

https://searchfox.org/mozilla-central/source/testing/web-platform/tests/dom/traversal/TreeWalker-acceptNode-filter-cross-realm-null-browsing-context.html fails with: TEST-UNEXPECTED-FAIL | /dom/traversal/TreeWalker-acceptNode-filter-cross-realm-null-browsing-context.html | TreeWalker: NodeFilter from detached iframe works as expected - NodeIterator.nextNode: Refusing to execute function from global in which script is disabled.

For the TreeWalker case, it seems like the situation is that the NodeIterator is created in the iframe, which is then immediately removed, then a test is done to use the NodeIterator which is very definitely created inside the iframe's global (via a helper in the iframe's source file). That's these lines.

And it seems like maybe the NodeIterator actually wants to try and work in that case, but because the node iterator takes the filter function that's definitely created in the iframe global and definitely executed in the iframe global we need to run code in the not fully active document, and we fail there. This also seems like the fix is likely correctly forbidding that.

Andrew Sutherland [:asuth] (he/him)

Assignee

Comment 24

•

3 years ago

•

Edited

:annevk and I reached clarity/consensus on the failing tests above. The conclusions as I understand them are:

Spec-wise, there is a gap in WebIDL's Invoking Callback Functions and this should be rectified after our fix here sufficiently lands.
- Step 5 and 6 are appropriately retrieving the settings object from the callback, but the WebIDL spec is not calling check if we can run script as part of/prelude to step 11, even though prepare to run script is correctly being called with the right settings object in step 8.
Both of the tests that failed above are in fact depending on the ability to call into script when check if we can run a script would return "do not run" because the detached iframe is not fully active. And so it's reasonable for us to appropriately disable the tests, in this case removing the incorrect mochitest and marking the WPT test as an expected failure, with plans to correct the WPT test as part of the spec-work.
When addressing the spec issues, we'll also make sure to ensure WPT coverage for these changes, with both of these tests providing good insight to examples of how this might be tested!

I'm clearing the needinfo on :smaug because I think I have all the answers necessary here.

I'll post an attachment in a second that's a revised version of the mochitest that I used for exploration when talking to Anne.

Flags: needinfo?(bugs)

Andrew Sutherland [:asuth] (he/him)

Assignee

Comment 25

•

3 years ago

Attached file modified version of mochitest test_bug986542.html for investigation — Details

My explanations about results/what's under tests:

Only the third case passes. The first case is the original check of the original test. The second case is when the handler was defined in the iframe global, the third case is when the handler was defined in the parent window global.
The 2nd and 3rd tests are checking whether the window global's DOM under-pinnings are still happy to run dispatchEvent on the dead global.
The 1st and 2nd tests are checking whether can run script returns true for a not fully active document (as enforced by our isDying check, and not previously enforced). I guess normally this check would depend on the inner window not being the current inner window for the outer window. But for a detached iframe, the inner window is still the current inner window.

Andrew Sutherland [:asuth] (he/him)

Assignee

Comment 26

•

3 years ago

My attempt to land this on Friday ran into some variation of bug 1714646 (which has been fixed for a while) where it picked pine as upstream when it should not have picked pine as upstream. (This likely has to do with me switching to using git after a number of very bad experiences with the "evolution" extension creating worst-case scenarios on rebases/histedits gone wrong. git obviously presents additional edge-cases for moz-phab.)

I'm not sure this is easily salvageable from within the existing diff and may just end up resubmitting as a fresh patch and overriding all the things that get upset about patches not being officially reviewed.

Sebastian Hengst [:aryx] (needinfo me if it's about an intermittent or backout)

Comment 27

•

3 years ago

Improve global consistency. r=smaug
https://hg.mozilla.org/integration/autoland/rev/ebbf05cd3be0f7bb664407e53e03ab47c7da7928
https://hg.mozilla.org/mozilla-central/rev/ebbf05cd3be0

Group: dom-core-security → core-security-release

Status: ASSIGNED → RESOLVED

Closed: 3 years ago

status-firefox97: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → 97 Branch

Andrew Sutherland [:asuth] (he/him)

Assignee

Comment 28

•

3 years ago

(Moving this into workers for clarity about where the fix was.)

Component: DOM: Streams → DOM: Workers

Ryan VanderMeulen [:RyanVM]

Comment 29

•

3 years ago

Reading through this bug, it's not clear to me how much bake time this fix needs to identify what (if any) webcompat fallout there will be from it. Are we still thinking we want to uplift this cycle or do want to have it ride 97 instead?

status-firefox95: --- → wontfix

status-firefox96: --- → affected

status-firefox-esr91: --- → affected

Flags: needinfo?(bugmail)

Andrew Sutherland [:asuth] (he/him)

Assignee

Comment 30

•

3 years ago

Comment on attachment 9254570 [details]
Bug 1740534 - Improve global consistency. r=smaug

Beta/Release Uplift Approval Request

User impact if declined: There should be no usable-visible impacts from this fix as it concerns edge-cases that would primarily occur when a browser tab is being closed and there is no potential for the page to do anything that would have meaningful impact. (Storage can no longer be written to, etc.)

This fix eliminates an edge-case that, while we are not aware of any security exploits, seems like the type of thing where it would be better to eliminate the edge-case rather than discover later on that security experts are exceedingly clever in ways we could not foresee.

Is this code covered by automated tests?: Yes
Has the fix been verified in Nightly?: Yes
Needs manual test from QE?: No
If yes, steps to reproduce:
List of other uplifts needed: None
Risk to taking this patch: Low
Why is the change risky/not risky? (and alternatives if risky): Our initial fears of potential web-compat issues on this bug and the patch were overblown. The removed mochitest is a single example of a site doing something very sketchy and where we intentionally maintained behavior for compatibility reasons in the absence of clear semantics at a spec level. The web platform has advanced to a point where we have specs for these things and clarity about what is and is not appropriate; there is a gap in our spec text in WebIDL, but it's clear that our test was wrong and contradicts the authoritative "can run script" check that should be run.

That said, it's possible there could be breakage out there somewhere on the web, but this needs to be weighed against the reality that to maintain that behavior would be to intentionally poke holes in our invariants in a way that could lead to serious security bugs.

String changes made/needed:

Flags: needinfo?(bugmail)

Attachment #9254570 - Flags: approval-mozilla-beta?

Andrew Sutherland [:asuth] (he/him)

Assignee

Comment 31

•

3 years ago

(In reply to Ryan VanderMeulen [:RyanVM] from comment #29)

Reading through this bug, it's not clear to me how much bake time this fix needs to identify what (if any) webcompat fallout there will be from it. Are we still thinking we want to uplift this cycle or do want to have it ride 97 instead?

We should definitely uplift this to beta and I've requested that above. I would wait on ESR uplift until after this hits release, however, given that ESR users seem more likely to be running sites that do sketchy and/or Firefox-specific things and where sites are unlikely to be updated in a timely fashion.

Ryan VanderMeulen [:RyanVM]

Updated

•

3 years ago

tracking-firefox96: --- → +

tracking-firefox97: --- → +

tracking-firefox-esr91: --- → 97+

BugBot [:suhaib / :marco/ :calixte]

Comment 32

•

3 years ago

Changing the priority to p1 as the bug is tracked by a release manager for the current beta.
See What Do You Triage for more information

Priority: P3 → P1

Dianna Smith [:diannaS]

Comment 33

•

3 years ago

Comment on attachment 9254570 [details]
Bug 1740534 - Improve global consistency. r=smaug

Approved for 96.0b7

Attachment #9254570 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

Dianna Smith [:diannaS]

Comment 34

•

3 years ago

uplift

https://hg.mozilla.org/releases/mozilla-beta/rev/4946d7aa99f9

Dianna Smith [:diannaS]

Updated

•

3 years ago

status-firefox96: affected → fixed

Mihai Boldan, Desktop QA [:mboldan]

Updated

•

3 years ago

QA Whiteboard: [post-critsmash-triage]

Flags: qe-verify-

Frederik Braun [:freddy]

Updated

•

3 years ago

Whiteboard: [bugmon:confirm] → [bugmon:confirm][adv-main96+r]

Andrew Sutherland [:asuth] (he/him)

Assignee

Comment 35

•

3 years ago

Comment on attachment 9254570 [details]
Bug 1740534 - Improve global consistency. r=smaug

ESR Uplift Approval Request

If this is not a sec:{high,crit} bug, please state case for ESR consideration: This helps rule out eliminate a class of worker shutdown related potential crashes and security bugs.
User impact if declined: (copied from beta request, where beta has now reached release without incident that we know of): There should be no usable-visible impacts from this fix as it concerns edge-cases that would primarily occur when a browser tab is being closed and there is no potential for the page to do anything that would have meaningful impact. (Storage can no longer be written to, etc.)

This fix eliminates an edge-case that, while we are not aware of any security exploits, seems like the type of thing where it would be better to eliminate the edge-case rather than discover later on that security experts are exceedingly clever in ways we could not foresee.

Fix Landed on Version: 96
Risk to taking this patch: Low
Why is the change risky/not risky? (and alternatives if risky): (from the beta request): Our initial fears of potential web-compat issues on this bug and the patch were overblown. The removed mochitest is a single example of a site doing something very sketchy and where we intentionally maintained behavior for compatibility reasons in the absence of clear semantics at a spec level. The web platform has advanced to a point where we have specs for these things and clarity about what is and is not appropriate; there is a gap in our spec text in WebIDL, but it's clear that our test was wrong and contradicts the authoritative "can run script" check that should be run.

That said, it's possible there could be breakage out there somewhere on the web, but this needs to be weighed against the reality that to maintain that behavior would be to intentionally poke holes in our invariants in a way that could lead to serious security bugs.

Attachment #9254570 - Flags: approval-mozilla-esr91?

Ryan VanderMeulen [:RyanVM]

Comment 36

•

3 years ago

Comment on attachment 9254570 [details]
Bug 1740534 - Improve global consistency. r=smaug

Approved for 91.6esr. Thanks for the detailed approval request.

Attachment #9254570 - Flags: approval-mozilla-esr91? → approval-mozilla-esr91+

Ryan VanderMeulen [:RyanVM]

Comment 37

•

3 years ago

uplift

https://hg.mozilla.org/releases/mozilla-esr91/rev/977ec107a58a

status-firefox-esr91: affected → fixed

Tom Ritter [:tjr]

Updated

•

3 years ago

Whiteboard: [bugmon:confirm][adv-main96+r] → [bugmon:confirm][adv-main96+r][adv-esr91.6+r]

Tom Ritter [:tjr]

Comment 38

•

3 years ago

Attached file advisory.txt — Details

Tom Ritter [:tjr]

Updated

•

3 years ago

Alias: CVE-2022-22763

BugBot [:suhaib / :marco/ :calixte]

Comment 39

•

3 years ago

:asuth, since this bug contains a bisection range, could you fill (if possible) the regressed_by field?
For more information, please visit auto_nag documentation.

Flags: needinfo?(bugmail)

Kagami Rosylight [:saschanaz] (they/them)

Comment 40

•

3 years ago

There's no range, just a bot issue. https://github.com/mozilla/relman-auto-nag/pull/1476

Flags: needinfo?(bugmail)

Daniel Veditz [:dveditz]

Updated

•

3 years ago

Group: core-security-release

Testcase 3 years ago Jason Kratzer [:jkratzer] 5.14 KB, application/octet-stream		Details
Testcase 3 years ago Jason Kratzer [:jkratzer] 5.54 KB, application/zip		Details
testcase.zip 3 years ago Jason Kratzer [:jkratzer] 5.49 KB, application/zip		Details
Bug 1740534 - Improve global consistency. r=smaug 3 years ago Andrew Sutherland [:asuth] (he/him) 48 bytes, text/x-phabricator-request	diannaS : approval-mozilla-beta+ RyanVM : approval-mozilla-esr91+	Details \| Review
modified version of mochitest test_bug986542.html for investigation 3 years ago Andrew Sutherland [:asuth] (he/him) 2.64 KB, text/html		Details
advisory.txt 3 years ago Tom Ritter [:tjr] 208 bytes, text/plain		Details