Assertion failure: mBodyStream, at /dom/base/BodyStream.cpp:54
Categories
(Core :: DOM: Workers, defect, P1)
Tracking
()
People
(Reporter: jkratzer, Assigned: asuth)
References
(Blocks 1 open bug)
Details
(Keywords: sec-moderate, testcase, Whiteboard: [bugmon:confirm][adv-main96+r][adv-esr91.6+r])
Attachments
(4 files, 2 obsolete files)
5.49 KB,
application/zip
|
Details | |
48 bytes,
text/x-phabricator-request
|
diannaS
:
approval-mozilla-beta+
RyanVM
:
approval-mozilla-esr91+
|
Details | Review |
2.64 KB,
text/html
|
Details | |
208 bytes,
text/plain
|
Details |
Testcase found while fuzzing mozilla-central rev 333f08065c8c (built with: --enable-debug --enable-fuzzing).
Testcase can be reproduced using the following commands:
$ pip install fuzzfetch grizzly-framework
$ python -m fuzzfetch --build 333f08065c8c --debug --fuzzing -n firefox
$ python -m grizzly.replay ./firefox/firefox testcase.zip --repeat 10
Assertion failure: mBodyStream, at /dom/base/BodyStream.cpp:54
==721216==ERROR: UndefinedBehaviorSanitizer: SEGV on unknown address 0x000000000000 (pc 0x7f4a52b40f70 bp 0x7f4a484db430 sp 0x7f4a484db3f0 T721294)
==721216==The signal is caused by a WRITE memory access.
==721216==Hint: address points to the zero page.
#0 0x7f4a52b40f70 in ForgetBodyStream /dom/base/BodyStream.cpp:54:3
#1 0x7f4a52b40f70 in mozilla::dom::BodyStream::ReleaseObjects(mozilla::detail::BaseAutoLock<mozilla::Mutex&> const&) /dom/base/BodyStream.cpp:545:18
#2 0x7f4a52b41916 in mozilla::dom::BodyStream::Close() /dom/base/BodyStream.cpp
#3 0x7f4a52b3f4eb in operator() /dom/base/BodyStream.cpp:98:51
#4 0x7f4a52b3f4eb in ~ScopeExit /builds/worker/workspace/obj-build/dist/include/mozilla/ScopeExit.h:106:7
#5 0x7f4a52b3f4eb in mozilla::dom::BodyStream::Create(JSContext*, mozilla::dom::BodyStreamHolder*, nsIGlobalObject*, nsIInputStream*, mozilla::ErrorResult&) /dom/base/BodyStream.cpp:151:1
#6 0x7f4a544a322d in mozilla::dom::FetchBody<mozilla::dom::Response>::GetBody(JSContext*, JS::MutableHandle<JSObject*>, mozilla::ErrorResult&) /dom/fetch/Fetch.cpp:1394:3
#7 0x7f4a53492eed in mozilla::dom::Response_Binding::get_body(JSContext*, JS::Handle<JSObject*>, void*, JSJitGetterCallArgs) /builds/worker/workspace/obj-build/dom/bindings/ResponseBinding.cpp:1295:24
#8 0x7f4a54053fdf in bool mozilla::dom::binding_detail::GenericGetter<mozilla::dom::binding_detail::NormalThisPolicy, mozilla::dom::binding_detail::ThrowExceptions>(JSContext*, unsigned int, JS::Value*) /dom/bindings/BindingUtils.cpp:3182:13
#9 0x7f4a5795e5bf in CallJSNative(JSContext*, bool (*)(JSContext*, unsigned int, JS::Value*), js::CallReason, JS::CallArgs const&) /js/src/vm/Interpreter.cpp:385:13
#10 0x7f4a5795dcbb in js::InternalCallOrConstruct(JSContext*, JS::CallArgs const&, js::MaybeConstruct, js::CallReason) /js/src/vm/Interpreter.cpp:472:12
#11 0x7f4a5795f79e in InternalCall(JSContext*, js::AnyInvokeArgs const&, js::CallReason) /js/src/vm/Interpreter.cpp:532:10
#12 0x7f4a5796082f in Call /js/src/vm/Interpreter.cpp:549:8
#13 0x7f4a5796082f in js::CallGetter(JSContext*, JS::Handle<JS::Value>, JS::Handle<JS::Value>, JS::MutableHandle<JS::Value>) /js/src/vm/Interpreter.cpp:675:10
#14 0x7f4a57ccda0f in CallGetter /js/src/vm/NativeObject.cpp:1942:12
#15 0x7f4a57ccda0f in bool GetExistingProperty<(js::AllowGC)1>(JSContext*, js::MaybeRooted<JS::Value, (js::AllowGC)1>::HandleType, js::MaybeRooted<js::NativeObject*, (js::AllowGC)1>::HandleType, js::MaybeRooted<JS::PropertyKey, (js::AllowGC)1>::HandleType, js::PropertyInfoBase<unsigned int>, js::MaybeRooted<JS::Value, (js::AllowGC)1>::MutableHandleType) /js/src/vm/NativeObject.cpp:1970:12
#16 0x7f4a57cce0a3 in bool NativeGetPropertyInline<(js::AllowGC)1>(JSContext*, js::MaybeRooted<js::NativeObject*, (js::AllowGC)1>::HandleType, js::MaybeRooted<JS::Value, (js::AllowGC)1>::HandleType, js::MaybeRooted<JS::PropertyKey, (js::AllowGC)1>::HandleType, IsNameLookup, js::MaybeRooted<JS::Value, (js::AllowGC)1>::MutableHandleType) /js/src/vm/NativeObject.cpp:2116:14
#17 0x7f4a579647ef in GetProperty /js/src/vm/ObjectOperations-inl.h:115:10
#18 0x7f4a579647ef in js::GetProperty(JSContext*, JS::Handle<JSObject*>, JS::Handle<JS::Value>, js::PropertyName*, JS::MutableHandle<JS::Value>) /js/src/vm/ObjectOperations-inl.h:122:10
#19 0x7f4a57963ca0 in js::GetProperty(JSContext*, JS::Handle<JS::Value>, JS::Handle<js::PropertyName*>, JS::MutableHandle<JS::Value>) /js/src/vm/Interpreter.cpp:4548:10
#20 0x7f4a579529bb in GetPropertyOperation /js/src/vm/Interpreter.cpp:203:10
#21 0x7f4a579529bb in Interpret(JSContext*, js::RunState&) /js/src/vm/Interpreter.cpp:2904:12
#22 0x7f4a5794bbe5 in js::RunScript(JSContext*, js::RunState&) /js/src/vm/Interpreter.cpp:354:13
#23 0x7f4a5795dbb6 in js::InternalCallOrConstruct(JSContext*, JS::CallArgs const&, js::MaybeConstruct, js::CallReason) /js/src/vm/Interpreter.cpp:504:13
#24 0x7f4a5795f79e in InternalCall(JSContext*, js::AnyInvokeArgs const&, js::CallReason) /js/src/vm/Interpreter.cpp:532:10
#25 0x7f4a5795f9a1 in js::Call(JSContext*, JS::Handle<JS::Value>, JS::Handle<JS::Value>, js::AnyInvokeArgs const&, JS::MutableHandle<JS::Value>, js::CallReason) /js/src/vm/Interpreter.cpp:549:8
#26 0x7f4a57b17c21 in JS::Call(JSContext*, JS::Handle<JS::Value>, JS::Handle<JS::Value>, JS::HandleValueArray const&, JS::MutableHandle<JS::Value>) /js/src/vm/CallAndConstruct.cpp:117:10
#27 0x7f4a53df182c in mozilla::dom::VoidFunction::Call(mozilla::dom::BindingCallContext&, JS::Handle<JS::Value>, mozilla::ErrorResult&) /builds/worker/workspace/obj-build/dom/bindings/FunctionBinding.cpp:81:8
#28 0x7f4a52d81e25 in mozilla::dom::VoidFunction::Call(mozilla::ErrorResult&, char const*, mozilla::dom::CallbackObject::ExceptionHandling, JS::Realm*) /builds/worker/workspace/obj-build/dist/include/mozilla/dom/FunctionBinding.h:172:12
#29 0x7f4a52d81c0e in QueuedMicrotask::Run(mozilla::AutoSlowOperation&) /dom/base/nsIGlobalObject.cpp:271:31
#30 0x7f4a50dcec18 in mozilla::CycleCollectedJSContext::PerformMicroTaskCheckPoint(bool) /xpcom/base/CycleCollectedJSContext.cpp:674:17
#31 0x7f4a50dcfa3c in mozilla::CycleCollectedJSContext::AfterProcessTask(unsigned int) /xpcom/base/CycleCollectedJSContext.cpp:463:3
#32 0x7f4a50efa45a in nsThread::ProcessNextEvent(bool, bool*) /xpcom/threads/nsThread.cpp:1212:24
#33 0x7f4a50f0126a in NS_ProcessNextEvent(nsIThread*, bool) /xpcom/threads/nsThreadUtils.cpp:467:10
#34 0x7f4a554243d8 in mozilla::dom::WorkerPrivate::DoRunLoop(JSContext*) /dom/workers/WorkerPrivate.cpp:3105:7
#35 0x7f4a554047a7 in mozilla::dom::workerinternals::(anonymous namespace)::WorkerThreadPrimaryRunnable::Run() /dom/workers/RuntimeService.cpp:2244:42
#36 0x7f4a50efa149 in nsThread::ProcessNextEvent(bool, bool*) /xpcom/threads/nsThread.cpp:1169:16
#37 0x7f4a50f0126a in NS_ProcessNextEvent(nsIThread*, bool) /xpcom/threads/nsThreadUtils.cpp:467:10
#38 0x7f4a5198e624 in mozilla::ipc::MessagePumpForNonMainThreads::Run(base::MessagePump::Delegate*) /ipc/glue/MessagePump.cpp:330:5
#39 0x7f4a518ac9f7 in MessageLoop::RunInternal() /ipc/chromium/src/base/message_loop.cc:331:10
#40 0x7f4a518ac902 in RunHandler /ipc/chromium/src/base/message_loop.cc:324:3
#41 0x7f4a518ac902 in MessageLoop::Run() /ipc/chromium/src/base/message_loop.cc:306:3
#42 0x7f4a50ef5dbb in nsThread::ThreadFunc(void*) /xpcom/threads/nsThread.cpp:391:10
#43 0x7f4a67a98a07 in _pt_root /nsprpub/pr/src/pthreads/ptthread.c:201:5
#44 0x7f4a6880c608 in start_thread /build/glibc-eX1tMB/glibc-2.31/nptl/pthread_create.c:477:8
#45 0x7f4a683d4292 in __clone /build/glibc-eX1tMB/glibc-2.31/misc/../sysdeps/unix/sysv/linux/x86_64/clone.S:95
UndefinedBehaviorSanitizer can not provide additional info.
SUMMARY: UndefinedBehaviorSanitizer: SEGV /dom/base/BodyStream.cpp:54:3 in ForgetBodyStream
==721216==ABORTING
Reporter | ||
Comment 1•3 years ago
|
||
Comment 2•3 years ago
|
||
Bugmon Analysis
Unable to reproduce bug 1740534 using build mozilla-central 20211110092453-333f08065c8c. Without a baseline, bugmon is unable to analyze this bug.
Removing bugmon keyword as no further action possible. Please review the bug and re-add the keyword for further analysis.
Updated•3 years ago
|
Comment 3•3 years ago
|
||
The severity field is not set for this bug.
:mgaudet, could you have a look please?
For more information, please visit auto_nag documentation.
Comment 4•3 years ago
|
||
So: I could not reproduce this failure. (I do wonder if the testcase.zip is a bit incomplete? I looked at the test case and it expects a worker.js
, but there's no worker.js
packaged?)
The stack trace is definitely peculiar: We are in the middle of firing the ScopeExit
cleanup method. This means that we are taking an early exit from this method. There are four possible paths.
Given the stack and crash, we can guess that somehow this BodyStream
has ended up with a null mStreamHolder
member; mStreamHolder is initialized as part of the BodyStream
constructor, and there's a diagnostic assert to guarantee it's non-null at that point. The other place where it is modified is part of ReleaseObjects
, where it is null
ed out. However, before then, there's a state guard....
So I'm not sure what to make of this one. For now, I'm going to mark as P3/S3.
Reporter | ||
Comment 5•3 years ago
|
||
My apologies. It looks like I had uploaded an incomplete testcase. The newly attached testcase can be reproduced using the following commands:
$ pip install fuzzfetch grizzly-framework
$ python -m fuzzfetch --build d03f87555639 --debug --fuzzing -n firefox
$ python -m grizzly.replay ./firefox/firefox testcase.zip --repeat 10
Reporter | ||
Comment 6•3 years ago
|
||
Comment 7•3 years ago
|
||
Hmm.
OK; so with the updated test case, I can successfully reproduce the failure now. However, I cannot get it to reproduce under rr
, which makes me wonder if there's something racy...; still, just reproducing isn't a huge help, unless I can get more information.
Jason: For me this test case reproduces first try with python3 -m grizzly.replay ./firefox/firefox testcase.zip --repeat 10 --xvfb
; I wonder, is it possible to get grizzly.replay to run firefox under a regular debugger? Or is it possible to have your automation try to get a pernosco reproduction automatically?
Reporter | ||
Comment 8•3 years ago
|
||
:mgaudet, unfortunately I've also been trying to get a pernosco session of this bug without any luck so far. As for running under a regular debugger, grizzly.replay does not have that capability. However, I can confirm that the crash triggers on a default debug build without any special prefs set. Just unzip the testcase, start a local webserver and point Firefox to testcase.html.
Comment 9•3 years ago
|
||
A Pernosco session is available here: https://pernos.co/debug/paeqwbX6GxYe8l4H0PZdGQ/index.html
Updated•3 years ago
|
Comment 10•3 years ago
|
||
🤩 Thanks so much Tyson! I'm excited to dig in!
Comment 11•3 years ago
|
||
(In reply to Matthew Gaudet (he/him) [:mgaudet] from comment #10)
🤩 Thanks so much Tyson! I'm excited to dig in!
Happy to help! That one was one of the most stubborn to reproduce (under rr) I've seen :)
Comment 12•3 years ago
|
||
Ok. There's some more detailed notes in the pernosco trace. Given the trace, I suspect I understand more about why this was so hard to reproduce.
This assertion failure is the combination of two factors playing together:
- Worker Shutdown. As I understand it, Workers can go away at any time but as near as I can tell, during shutdown there is still some (JS) code execution that happens once shutdown has started.
- Fetch Bodies try to lazily create their body streams. Note, a FetchBody is also a BodyStreamHolder
My read of the Pernosco trace (this bug is a huge huge argument in favour of the productivity benefit it brings) is this:
- We have a fetch inside a worker, and we request the stream.
- As part of this, We create a
WeakWorkerRef
, and as a shutdown callback we tell it to callstream->Close()
- The worker begins to shutdown. At this point the
WeakWorkerRef
is notified, andstream->Close()
is called. This nulls out the stream in theBodyStreamHolder
(FetchBody) - This is the part that is most unclear to me: Some more JS runs, and we yet-again try to get a Stream for the
FetchBody
. The Lazy stream creation code kicks in again, so we start to create a different BodyStream. - Because we're in a worker, we try to create a
WeakWorkerRef
, again, but this time the system declines, as we're already cancelling; so we side exit fromBodyStream::Create
. Now, to ensure cleanup, there's aScopeExit
on the stack inside ofBodyStream::Create
, which callsstream->Close()
on any side-exits from the create function, so this is invoked. This stream knows that it hasFetchBody
as a stream holder, and tries to disconnect that edge... but here's where the assertion fires, asFetchBody
already had a stream created and nulled out, butmStreamCreated
was never set to false.
So: the fix for this bug isn't immediately clear to me. It seems to me that we probably don't want to be trying to create a BodyStream for a FetchBody
in a worker if the worker is already terminating... but I don't entirely understand what JS code should be continuing to run and under what circumstances during worker shutdown.
Andrew: is there a pattern for this kind of thing that is established elsewhere?
Assignee | ||
Comment 13•3 years ago
|
||
Marking this as a secbug as this now gets into territory of pointing out interesting things for security researchers; it might be reasonable to relax this.
The situation:
- We're running a content Promise during a microtask queue drain during worker shutdown and we should not be.
- The pernosco trace is missing the generated Unified_cpp_dom_base8.cpp and UnifiedBindings6.cpp, presumably due to omitting the objdir from the command-line passed to pernosco-submit? But it seems likely these frames are in CallbackObject and are inlined.
- The check if we can run script algorithm is generally responsible for making sure we don't call into content too late. It's sorta window-specific in the spec. Specific spec call-sites are:
- In run a classic script.
- In run a module script.
- In HostEnqueuePromiseJob which notably enqueues a microtask that runs the check at runtime (not enqueue time).
- Our implementation of "check if we can run script" is this logic in CallbackObject::CallSetup::CallSetup. This combines:
- For windows, the spec window active document check
- In nsIGlobalObject::IsScriptForbidden:
- A failsafe back-stop for workers to avoid calling into content during ClearMainEventQueue that I added in bug 1545345 (and made some supporting changes).
- A Scriptability check. This allows:
- Blocking at creation time by: determination via nsIPrincipal::IsScriptAllowedByPolicy
- Blocking dynamically via Scriptability::SetWindowAllowsScript which can be altered by getting disabled by going into the bfcache in WindowStateHolder, and BrowserContext/WindowContext propagation and recomputation when the synced AllowJavascript field changes.
- Blocking dynamically with paired allow/block via: Cu.blockScriptForGlobal, the slow script dialog disabling script, an additional action taken in NukeAllWrappersForRealm.
- Notably we don't check nsIGlobalObject::IsDying which is our closest approximation to appropriate logic for "check if we can run script" for a worker.
- This method is used in a number of spots
- A notable other helper API in this area is DOMEventTargetHelper::CheckCurrentGlobalCorrectness which uses IsDying
- The lack of this check to IsDying is the problem.
- Interestingly, while QueuedMicrotask::Run does not check IsDying(), PromiseJobRunnable does check IsDying..
- This conservative approach here was taken by me in bug 1545345 because there was a security bug that needed to be backported and I was more concerned about regressions for a backport and less confident/in-the-know about spec related things for this. (Also, the spec and consensus around global lifetimes has improved a lot.)
As a solution I would propose:
- nsIGlobalObject should gain a method that more directly corresponds to check if we can run script like
CanRunScript
.- If
mIsDying
ormIsScriptForbidden
, we return false. This covers the normal global-is-dying case plus the worker paranoia case where ClearMainEventQueue is happening. - For windows, this maintains the
HasActiveDocument
check, returning false if it's not active. - The Scriptability Allowed check from IsScriptForbidden can be in there or can move back out to CallbackObject.
- If
- We haveCallbackObject use CanRunScript. It's already the case that we are doing virtual dispatch to call
IsScriptForbidden
, so as long as we don't get crazy with the virtual dispatch, performance isn't likely to be a problem.
:smaug, does this seem reasonable from an implementation perspective?
:annevk, does it make sense that the can run script spec should perhaps gain some more discussion of workers? (Maybe there's existing issues/PR's on this?)
Updated•3 years ago
|
Comment 14•3 years ago
|
||
Yeah, it would make sense for that to check https://html.spec.whatwg.org/#dom-workerglobalscope-closing I think. Should be an easy PR, but maybe we want to wait until a fix is in?
Comment 15•3 years ago
|
||
I don't think QueuedMicrotask::Run should check anything, but Callsetup should. And it does. So IsScriptForbidden could check mIsDying
Assignee | ||
Comment 16•3 years ago
|
||
(In reply to Olli Pettay [:smaug] from comment #15)
I don't think QueuedMicrotask::Run should check anything, but Callsetup should. And it does. So IsScriptForbidden could check mIsDying
Agreed on QueuedMicrotask. And yeah, just altering IsScriptForbidden sounds like a great minimal fix, thanks! We can potentially rename things in the future (and not near as a security bug) to look more like spec terminology.
Assignee | ||
Comment 17•3 years ago
•
|
||
I'm categorizing this as sec-moderate from https://wiki.mozilla.org/Security_Severity_Ratings/Client with a rationale of:
- I don't believe this is directly exploitable. I also don't believe this is indirectly exploitable.
- This only allows a single draining of the microtask queue after a task where content was already running.
- We will not have run any shutdown-related GC's or any other specialized shutdown logic on the way out of the primary runnable because we have not yet begun to unwind that loop.
- The scope and JS context are all still valid. The dying global is a DOM construct, not a JS construct.
- WorkerRefs will correctly fail to be created at this time which code should handle.
- However this is still definitely an invariant violation and there's no need to point people directly at it, so this still gets to be a security bug. (It's of course plausible that this could be chained with other, more serious bugs.)
- Because there is an invariant being violated here, as in this bug, code can get upset as its invariants get violated as code runs after WorkerRefs had their callbacks invoked and their DETHs disconnected. So we do expect assertion violations and potentially nullptr dereferences.
- DOMEventTargetHelper instance will bind to parents even though CheckCurrentGlobalCorrectness would return false, however, the DETH instances will properly be detached again on our way out of the primary run loop (and actually would be done again in nsIGlobalObject's destructor where the scope closes in the next line, but no new DETHs should be added during that interval; if they are, that would be a different, serious bug).
I think it's worth fixing and landing this bug now with a target of uplifting into beta here early in the cycle, especially as it is likely that this will likely help eliminate a number of fuzzing errors that could otherwise be distracting. It could also make sense to let this be uplifted to release if there are dot releases but I don't believe this would necessarily be urgent for security reasons, but could be desirable as there could reasonably be non-exploitable crashes that could be eliminated.
Assignee | ||
Comment 18•3 years ago
•
|
||
(In reply to Anne (:annevk) from comment #14)
Yeah, it would make sense for that to check https://html.spec.whatwg.org/#dom-workerglobalscope-closing I think. Should be an easy PR, but maybe we want to wait until a fix is in?
Yes, agreed on waiting. There may also be some edge cases here around the difference between a self.close()
and a Worker.terminate()
to deal with.
Assignee | ||
Comment 19•3 years ago
|
||
Assignee | ||
Comment 20•3 years ago
|
||
Try run at https://treeherder.mozilla.org/jobs?repo=try&revision=9d3527ff3b0662fa892d67c8fbbef97d147420a0 shows that tests that are explicitly about things still working in detached iframes get angry when things do not work in detached iframes! I feel like we've been talking about being more strict for detached iframes in general and it seems like there are somehow maybe only 2 of these[1], so maybe this is all okay?
Specific things:
- Bug 986542 (in 2014) gave us https://searchfox.org/mozilla-central/source/js/xpconnect/tests/mochitest/test_bug986542.html that wants onclick to work in a detached iframe. This decision seems to have been made on the basis of the site in question breaking between Firefox 27 and Firefox 28.
- https://searchfox.org/mozilla-central/source/testing/web-platform/tests/dom/traversal/TreeWalker-acceptNode-filter-cross-realm-null-browsing-context.html fails with:
TEST-UNEXPECTED-FAIL | /dom/traversal/TreeWalker-acceptNode-filter-cross-realm-null-browsing-context.html | TreeWalker: NodeFilter from detached iframe works as expected - NodeIterator.nextNode: Refusing to execute function from global in which script is disabled.
- Chrome (and of course therefore Edge) fail this per wpt.fyi which seems like they're probably right about this? The error they report is
FAIL message: Failed to execute 'acceptNode' on 'NodeFilter': The provided callback is no longer runnable.
but presumably these differences are our platform specific strings?
:smaug / :annevk, does it seem reasonable for us to be treating detached iframes properly as dead globals or is this the tip of a compat nightmare for now and only workers need to care about dead globals?
1: Of course, as soon as a test fails in a dir, it's basically game over, so I've revised my patch to delete the mochitest for the first failure and altered our expectation to failure for the 2nd (with the expectation that we can fix the test in the future when we do the spec-y stuff) and have pushed that with another try push at https://treeherder.mozilla.org/#/jobs?repo=try&revision=c412faec7468fa13cbebc7aa63f0cc34e4d085bf
Comment 21•3 years ago
|
||
A detached iframe
does not result in a "dead" global. You can still have a hold on it and do things with it. This is different from the worker scenario. In the worker scenario it's really the agent that goes down and in the detached iframe
case the agent is still very much there (except I suppose if it was cross-site). A detached iframe
does result in a document that is no longer fully active, so it can no longer queue tasks and such.
I think it's known that Chromium has bugs here.
So my recommendation would be to limit this to workers.
Assignee | ||
Comment 22•3 years ago
|
||
(In reply to Andrew Sutherland [:asuth] (he/him) from comment #20)
Specific things:
- Bug 986542 (in 2014) gave us https://searchfox.org/mozilla-central/source/js/xpconnect/tests/mochitest/test_bug986542.html that wants onclick to work in a detached iframe. This decision seems to have been made on the basis of the site in question breaking between Firefox 27 and Firefox 28.
The pernosco robot gave me a pernosco trace for this automatically.
Copying some limited analysis I did into here: https://searchfox.org/mozilla-central/source/js/xpconnect/tests/mochitest/test_bug986542.html is doing an intentionally weird thing with target.onclick = ifr.contentWindow.f; where it's trying to have the parent window's click handler be a reference to JS code that's explicitly in the iframe's global. (Like the function it's referencing was created by a dynamically generated script tag string, so it's not accidentally creating the function in the parent window's global.)
I think our change in behavior here may be appropriate, as I believe Can Run Script is saying we shouldn't run script if the detached iframe isn't fully active, which would be the situation in this case unless there's a wrapper bug where the environment settings object that should be considered as used is the parent window?
Assignee | ||
Comment 23•3 years ago
•
|
||
Replicating some of my comments to :annevk in slack here too, but I think the key thing is that I mis-characterized these failures as indicating this would be a major regression in our behavior, when the reality is that there are two tests that seem to be broken per spec.
(In reply to Andrew Sutherland [:asuth] (he/him) from comment #20)
- https://searchfox.org/mozilla-central/source/testing/web-platform/tests/dom/traversal/TreeWalker-acceptNode-filter-cross-realm-null-browsing-context.html fails with:
TEST-UNEXPECTED-FAIL | /dom/traversal/TreeWalker-acceptNode-filter-cross-realm-null-browsing-context.html | TreeWalker: NodeFilter from detached iframe works as expected - NodeIterator.nextNode: Refusing to execute function from global in which script is disabled.
For the TreeWalker case, it seems like the situation is that the NodeIterator is created in the iframe, which is then immediately removed, then a test is done to use the NodeIterator which is very definitely created inside the iframe's global (via a helper in the iframe's source file). That's these lines.
And it seems like maybe the NodeIterator actually wants to try and work in that case, but because the node iterator takes the filter function that's definitely created in the iframe global and definitely executed in the iframe global we need to run code in the not fully active document, and we fail there. This also seems like the fix is likely correctly forbidding that.
Assignee | ||
Comment 24•3 years ago
•
|
||
:annevk and I reached clarity/consensus on the failing tests above. The conclusions as I understand them are:
- Spec-wise, there is a gap in WebIDL's Invoking Callback Functions and this should be rectified after our fix here sufficiently lands.
- Step 5 and 6 are appropriately retrieving the settings object from the callback, but the WebIDL spec is not calling check if we can run script as part of/prelude to step 11, even though prepare to run script is correctly being called with the right settings object in step 8.
- Both of the tests that failed above are in fact depending on the ability to call into script when check if we can run a script would return "do not run" because the detached iframe is not fully active. And so it's reasonable for us to appropriately disable the tests, in this case removing the incorrect mochitest and marking the WPT test as an expected failure, with plans to correct the WPT test as part of the spec-work.
- When addressing the spec issues, we'll also make sure to ensure WPT coverage for these changes, with both of these tests providing good insight to examples of how this might be tested!
I'm clearing the needinfo on :smaug because I think I have all the answers necessary here.
I'll post an attachment in a second that's a revised version of the mochitest that I used for exploration when talking to Anne.
Assignee | ||
Comment 25•3 years ago
|
||
My explanations about results/what's under tests:
- Only the third case passes. The first case is the original check of the original test. The second case is when the handler was defined in the iframe global, the third case is when the handler was defined in the parent window global.
- The 2nd and 3rd tests are checking whether the window global's DOM under-pinnings are still happy to run dispatchEvent on the dead global.
- The 1st and 2nd tests are checking whether can run script returns true for a not fully active document (as enforced by our isDying check, and not previously enforced). I guess normally this check would depend on the inner window not being the current inner window for the outer window. But for a detached iframe, the inner window is still the current inner window.
Assignee | ||
Comment 26•3 years ago
|
||
My attempt to land this on Friday ran into some variation of bug 1714646 (which has been fixed for a while) where it picked pine as upstream when it should not have picked pine as upstream. (This likely has to do with me switching to using git after a number of very bad experiences with the "evolution" extension creating worst-case scenarios on rebases/histedits gone wrong. git obviously presents additional edge-cases for moz-phab.)
I'm not sure this is easily salvageable from within the existing diff and may just end up resubmitting as a fresh patch and overriding all the things that get upset about patches not being officially reviewed.
Comment 27•3 years ago
|
||
Improve global consistency. r=smaug
https://hg.mozilla.org/integration/autoland/rev/ebbf05cd3be0f7bb664407e53e03ab47c7da7928
https://hg.mozilla.org/mozilla-central/rev/ebbf05cd3be0
Assignee | ||
Comment 28•3 years ago
|
||
(Moving this into workers for clarity about where the fix was.)
Comment 29•3 years ago
|
||
Reading through this bug, it's not clear to me how much bake time this fix needs to identify what (if any) webcompat fallout there will be from it. Are we still thinking we want to uplift this cycle or do want to have it ride 97 instead?
Assignee | ||
Comment 30•3 years ago
|
||
Comment on attachment 9254570 [details]
Bug 1740534 - Improve global consistency. r=smaug
Beta/Release Uplift Approval Request
- User impact if declined: There should be no usable-visible impacts from this fix as it concerns edge-cases that would primarily occur when a browser tab is being closed and there is no potential for the page to do anything that would have meaningful impact. (Storage can no longer be written to, etc.)
This fix eliminates an edge-case that, while we are not aware of any security exploits, seems like the type of thing where it would be better to eliminate the edge-case rather than discover later on that security experts are exceedingly clever in ways we could not foresee.
- Is this code covered by automated tests?: Yes
- Has the fix been verified in Nightly?: Yes
- Needs manual test from QE?: No
- If yes, steps to reproduce:
- List of other uplifts needed: None
- Risk to taking this patch: Low
- Why is the change risky/not risky? (and alternatives if risky): Our initial fears of potential web-compat issues on this bug and the patch were overblown. The removed mochitest is a single example of a site doing something very sketchy and where we intentionally maintained behavior for compatibility reasons in the absence of clear semantics at a spec level. The web platform has advanced to a point where we have specs for these things and clarity about what is and is not appropriate; there is a gap in our spec text in WebIDL, but it's clear that our test was wrong and contradicts the authoritative "can run script" check that should be run.
That said, it's possible there could be breakage out there somewhere on the web, but this needs to be weighed against the reality that to maintain that behavior would be to intentionally poke holes in our invariants in a way that could lead to serious security bugs.
- String changes made/needed:
Assignee | ||
Comment 31•3 years ago
|
||
(In reply to Ryan VanderMeulen [:RyanVM] from comment #29)
Reading through this bug, it's not clear to me how much bake time this fix needs to identify what (if any) webcompat fallout there will be from it. Are we still thinking we want to uplift this cycle or do want to have it ride 97 instead?
We should definitely uplift this to beta and I've requested that above. I would wait on ESR uplift until after this hits release, however, given that ESR users seem more likely to be running sites that do sketchy and/or Firefox-specific things and where sites are unlikely to be updated in a timely fashion.
Updated•3 years ago
|
Comment 32•3 years ago
|
||
Changing the priority to p1 as the bug is tracked by a release manager for the current beta.
See What Do You Triage for more information
Comment 33•3 years ago
|
||
Comment on attachment 9254570 [details]
Bug 1740534 - Improve global consistency. r=smaug
Approved for 96.0b7
Comment 34•3 years ago
|
||
uplift |
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Assignee | ||
Comment 35•3 years ago
|
||
Comment on attachment 9254570 [details]
Bug 1740534 - Improve global consistency. r=smaug
ESR Uplift Approval Request
- If this is not a sec:{high,crit} bug, please state case for ESR consideration: This helps rule out eliminate a class of worker shutdown related potential crashes and security bugs.
- User impact if declined: (copied from beta request, where beta has now reached release without incident that we know of): There should be no usable-visible impacts from this fix as it concerns edge-cases that would primarily occur when a browser tab is being closed and there is no potential for the page to do anything that would have meaningful impact. (Storage can no longer be written to, etc.)
This fix eliminates an edge-case that, while we are not aware of any security exploits, seems like the type of thing where it would be better to eliminate the edge-case rather than discover later on that security experts are exceedingly clever in ways we could not foresee.
- Fix Landed on Version: 96
- Risk to taking this patch: Low
- Why is the change risky/not risky? (and alternatives if risky): (from the beta request): Our initial fears of potential web-compat issues on this bug and the patch were overblown. The removed mochitest is a single example of a site doing something very sketchy and where we intentionally maintained behavior for compatibility reasons in the absence of clear semantics at a spec level. The web platform has advanced to a point where we have specs for these things and clarity about what is and is not appropriate; there is a gap in our spec text in WebIDL, but it's clear that our test was wrong and contradicts the authoritative "can run script" check that should be run.
That said, it's possible there could be breakage out there somewhere on the web, but this needs to be weighed against the reality that to maintain that behavior would be to intentionally poke holes in our invariants in a way that could lead to serious security bugs.
Comment 36•3 years ago
|
||
Comment on attachment 9254570 [details]
Bug 1740534 - Improve global consistency. r=smaug
Approved for 91.6esr. Thanks for the detailed approval request.
Comment 37•3 years ago
|
||
uplift |
Updated•3 years ago
|
Comment 38•3 years ago
|
||
Updated•3 years ago
|
Comment 39•3 years ago
|
||
:asuth, since this bug contains a bisection range, could you fill (if possible) the regressed_by field?
For more information, please visit auto_nag documentation.
Comment 40•3 years ago
|
||
There's no range, just a bot issue. https://github.com/mozilla/relman-auto-nag/pull/1476
Updated•2 years ago
|
Description
•