Open Bug 1949168 Opened 6 months ago Updated 2 months ago

sync-about-blank: test_notification_serviceworker_openWindow.html fails

Categories

(Core :: DOM: Navigation, defect, P3)

defect

Tracking

()

ASSIGNED

People

(Reporter: vhilla, Assigned: vhilla)

References

(Blocks 1 open bug)

Details

Attachments

(4 files)

test_notification_serviceworker_openWindow.html fails with the patch from Bug 543435. I think there are multiple issues that need to be resolved

ShouldUsePartitionPrincipalForServiceWorker cannot access opener doc

When browser.link.open_newwindow is set to OPEN_CURRENTWINDOW, window.open("https://example.org/") leads to the following call stack. Here, we are creating a viewer and query our opener document. But as we open the URL in ourselves, we appear to be our own opener and fail when asserting we have a viewer. At least our browsing context and opener browsing context have the same address. Some of this code was added in the sync-about-blank patch. I verified that we also are our own opener on central, but don't fail there - even if we wait for the example.org load to complete before closing the window.

Call Stack

 0:24.91 GECKO(474021) #01: nsDocShell::EnsureDocumentViewer(bool, nsIOpenWindowInfo*, mozilla::dom::WindowGlobalChild*) (/home/vincent/worktree/aboutblank/docshell/base/nsDocShell.cpp:6420)
 0:24.91 GECKO(474021) #02: nsDocShell::GetDocument() (/home/vincent/worktree/aboutblank/docshell/base/nsDocShell.cpp:3042)
 0:24.91 GECKO(474021) #03: mozilla::StoragePrincipalHelper::ShouldUsePartitionPrincipalForServiceWorker(nsIDocShell*) (/home/vincent/worktree/aboutblank/toolkit/components/antitracking/StoragePrincipalHelper.cpp:417)
 0:24.92 GECKO(474021) #04: nsDocShell::CreateAboutBlankDocumentViewer(nsIPrincipal*, nsIPrincipal*, nsIContentSecurityPolicy*, nsIURI*, bool, mozilla::Maybe<nsILoadInfo::CrossOriginEmbedderPolicy> const&, bool, bool, mozilla::dom::WindowGlobalChild*) (/home/vincent/worktree/aboutblank/docshell/base/nsDocShell.cpp:6606)
 0:24.92 GECKO(474021) #05: nsDocShell::EnsureDocumentViewer(bool, nsIOpenWindowInfo*, mozilla::dom::WindowGlobalChild*) (/home/vincent/worktree/aboutblank/docshell/base/nsDocShell.cpp:6423)
 0:24.92 GECKO(474021) #06: nsDocShell::Initialize(nsIOpenWindowInfo*, mozilla::dom::WindowGlobalChild*) (/home/vincent/worktree/aboutblank/docshell/base/nsDocShell.cpp:465)
 0:24.93 GECKO(474021) #07: nsDocShell::InitWindow(nsIWidget*, int, int, int, int, nsIOpenWindowInfo*, mozilla::dom::WindowGlobalChild*) (/home/vincent/worktree/aboutblank/docshell/base/nsDocShell.cpp:436)
 0:24.93 GECKO(474021) #08: nsWebBrowser::Create(nsIWebBrowserChrome*, nsIWidget*, mozilla::dom::BrowsingContext*, mozilla::dom::WindowGlobalChild*, nsIOpenWindowInfo*) (/home/vincent/worktree/aboutblank/toolkit/components/browser/nsWebBrowser.cpp:160)
 0:24.93 GECKO(474021) #09: mozilla::dom::BrowserChild::Init(mozIDOMWindowProxy*, mozilla::dom::WindowGlobalChild*, nsIOpenWindowInfo*) (/home/vincent/worktree/aboutblank/dom/ipc/BrowserChild.cpp:404)
 0:24.93 GECKO(474021) #10: mozilla::dom::ContentChild::RecvConstructBrowser(mozilla::ipc::ManagedEndpoint<mozilla::dom::PBrowserChild>&&, mozilla::ipc::ManagedEndpoint<mozilla::dom::PWindowGlobalChild>&&, mozilla::dom::IdType<mozilla::dom::BrowserParent> const&, mozilla::dom::IPCTabContext const&, mozilla::dom::WindowGlobalInit const&, unsigned int const&, mozilla::dom::IdType<mozilla::dom::ContentParent> const&, bool const&, bool const&) (/home/vincent/worktree/aboutblank/dom/ipc/ContentChild.cpp:1894)

The added code:

if (!document) {
      // Try opener
      dom::BrowsingContext* bc = aDocShell->GetBrowsingContext();
      if (bc) {  // Can this even be null?
        RefPtr<dom::BrowsingContext> opener = bc->GetOpener();
        if (opener) {
          document = opener->GetDocument();
        }
      }

navigator.serviceWorker.controller is null

Here are two pernosco sessions for the test: central and crash. rr failed to record the full crash, so I had to sleep and SIGINT right before ShouldUsePartitionPrincipalForServiceWorker. On central, I changed open("") to open("about:blank") and have the test wait after open("example.org") to be more similar to the version with the patch.

The main issue seems to be that the ServiceWorkerContainer never gets the controller set. From window.open(""), we'll do a sync-about-blank load and create the ServiceWorkerContainer. But at that point, we haven't yet reached ClientSource::SetController, thus have a null container. I ensured the same happens on central by explicitly loading about:blank there. When we later load file_notification_openWindow.html, WouldReuseInnerWindow here checks IsInitialDocument, which is true with the patch. Thus we reuse our inner window, including the navigator and ServiceWorkerController. But we never update the controller.

There is ServiceWorkerContainer::ControllerChanged that can be called in ClientSource::SetContainer. But both with the patch and on central, when we reach this point, ClientSource doesn't have an owner and thus fails to get the inner window or the current ServiceWorkerContainer.

I guess either we only want to reuse the window for the initial uncommitted about:blank, or need to call ControllerChanged, or update the controller when reusing the window.

Assignee: nobody → vhilla
Severity: -- → S3
Priority: -- → P3

Another thing to note. The patch originally moved when we set browser.link.open_newwindow to before window.open(""). This way we navigate our test window away and immediately shutdown. But of course, then above issues don't arise. Maybe that was an incorrect attempt at fixing the test.

And also to note. Originally, I found a second way to reach the assertion in EnsureDocumentViewer from RecomputeResistFingerprinting. But that was a regression from some other test fix and easy to resolve.

:hsivonen

  1. Do you have any recollection about that ShouldUsePartitionPrincipalForServiceWorker code? Does it seem valid that we are our own opener bc? What failures could we look out for if we break this code? We could use GetExtantDocument, it will be null but not trigger the assert. I think we load example.org from example.com, so there is also a process switch.
  2. Would it seem sensible for WouldReuseInnerWindow to check IsUncommittedInitialDocument?
Flags: needinfo?(hsivonen)
Status: NEW → ASSIGNED

Using IsUncommittedInitialDocument caused many regressions: try.

(In reply to Vincent Hilla [:vhilla] from comment #3)

:hsivonen

  1. Do you have any recollection about that ShouldUsePartitionPrincipalForServiceWorker code?

Sorry, I don't have a recollection of the details of ShouldUsePartitionPrincipalForServiceWorker. I do recall that getting the ServiceWorker tests to pass involved a very long (perhaps my longest) Pernosco debugging effort to compare runs with and without the patch.

Does it seem valid that we are our own opener bc?

That seems weird, but I don't recall if it can be valid.

What failures could we look out for if we break this code?

Redirecting this question to Tim Huang.

  1. Would it seem sensible for WouldReuseInnerWindow to check IsUncommittedInitialDocument?

I don't know. Possibly.

Flags: needinfo?(hsivonen) → needinfo?(tihuang)

Given the try failures, and the wpt custom-elements/custom-element-registry/per-global.html, it seems to me that we shouldn't change WouldReuseInnerWindow. I managed to get the window be reused on central too by doing window.open() and then waiting for the load to complete, instead of overwriting it with some test document. I can see that we reuse the window and anyway create a new ServiceWorkerContainer when loading file_notification_openWindow.html.

It seems that on central, we don't eagerly create a ServiceWorkerContainer for the initial document, as this happens on DispatchContentLoadedEvents.

Edit: Chrome, Edge and Safari also fail per-global.html and do replace the window when navigating away from about:blank. From my initial look, this change would regress around 30 mochitest and browser tests.

I managed to reproduce the failure on central with the following diff:

--- a/dom/notification/test/mochitest/test_notification_serviceworker_openWindow.html
+++ b/dom/notification/test/mochitest/test_notification_serviceworker_openWindow.html
@@ -49,7 +49,10 @@ add_task(async function test() {
       // worker notification is shown, the document will open in the focused tab.
       // If we don't open a new tab, the document will be opened in the
       // current test-runner tab and mess up the test setup.
-      window.open("");
+      const w = window.open("");
+      await new Promise(res => setTimeout(() => setTimeout(res, 0), 0));  // initial doc creation
+      w.navigator.serviceWorker;  // trigger lazy creation
     }
     info(`Setting browser.link.open_newwindow to ${prefValue}.`);
     await SpecialPowers.pushPrefEnv({

This will cause navigator.serviceWorker.controller to be null in file_notification_openWindow.html.

:asuth is c7 a valid bug in service workers? We create a ServiceWorkerContainer for the initial about:blank, aGlobal->GetController() is still null. Then we reuse this window and the container for file_notification_openWindow.html without ever setting the controller.

With forcing a ServiceWorkerContainer for the initial doc, this is quite an edge case. But Bug 543435 will (1) keep the initial about:blank instead of creating a new one if the navigation destination is about:blank (2) fire the load event and thus create Navigator and ServiceWorkerContainer on it and (3) load the about:blank synchronously. This makes it much easier to reach the serviceworker issue.

If this is a bug, should ClientSource::SetContainer have called ServiceWorkerContainer::ControllerChanged? Or do you have other ideas for how to resolve the issue? I'm not sure yet whether we want to reuse the window, not doing so would also solve this bug.

Flags: needinfo?(bugmail)

The ShouldUsePartitionPrincipalForServiceWorker() is used to determine whether to use the partitioned principal for the service worker in a third-party context. So, breaking this code could potentially lead to using an unpartitioned principal for third-party service workers, which breaks the Total Cookie Protection.

I think we have tests to ensure this shouldn't happen, such as browser_partitionedServiceWorkers.js.

Flags: needinfo?(tihuang)
See Also: → 1910011, 1880012

(In reply to Vincent Hilla [:vhilla] from comment #8)

:asuth is c7 a valid bug in service workers? We create a ServiceWorkerContainer for the initial about:blank, aGlobal->GetController() is still null. Then we reuse this window and the container for file_notification_openWindow.html without ever setting the controller.

Yes, definitely a bug.

There's obviously a ton going on with the spec, but I understand the general spec situation to be this:

  • window.open runs the window.open steps, whose step 13 invokes the rules for choosing a navigable whose step 8 takes the "If the user agent has been configured such that in this instance it will create a new top-level traversable" branch whose (sub)step 9 will create a new top-level traversable. Step 3 there will be taken and create a new auxiliary browsing context and document, whose step 4 is to create a new browsing context and document. Step 13 of that is to set up a window environment settings object.
  • That invocation of setting up a window environment settings object is monkeypatched by ServiceWorkers under Control and Use: The window client case with the text "When a window client is created in the process of a browsing context creation:" with the following algorithm:

    If the browsing context’s initial active document’s origin is an opaque origin, the window client’s active service worker is set to null. Otherwise, it is set to the creator document’s service worker client’s active service worker.

  • When we later go to navigate, step 6 of Shared document creation infrastructure will have us reuse the window for the new Document with spec text "If browsingContext's active document's is initial about:blank is true, and browsingContext's active document's origin is same origin-domain with navigationParams's origin, then set window to browsingContext's active window." and with the very helpful info box:

    This means that both the initial about:blank Document, and the new Document that is about to be created, will share the same Window object.

  • The monkeypatching logic mentioned above also has much sketchier things to say about "When a window client is created in the process of the browsing context’s navigation:". But there's actual normative text in step 9.8 of ServiceWorker's Handle Fetch algorithm that the monkeypatching approximates. The key point is that we go through fetch and we will set the reserved client passed into fetch as part of the request's active service worker to the matching registration.
  • And so as a result we do expect the Active Service Worker to be updated.
  • The spec text for navigator.serviceWorker.controller just consults the client's active service worker. So we absolutely should be returning the ServiceWorker and it seems consistent with spec to be reusing the window.

(In reply to Vincent Hilla [:vhilla] from comment #8)

If this is a bug, should ClientSource::SetContainer have called ServiceWorkerContainer::ControllerChanged? Or do you have other ideas for how to resolve the issue? I'm not sure yet whether we want to reuse the window, not doing so would also solve this bug.

The spec does not call for Notify Controller Change to be invoked right now, but the spec arguably is just hand-waving way too much. It certainly seems like it could be reasonable for it to fire.

I need to look a bit more at the plumbing of how we map the spec onto our implementation and the changes you've made, but if you already have changes that accomplish the above result, that's probably fine? I'm setting a self-needinfo on myself but please feel free to re-add a needinfo on me. I will try and have another update on Monday but then I will be on PTO until Friday.

My initial thought is that the moment we choose to reuse the window in SetNewDocument certainly seems like a reasonable time to be updating the ClientSource and its controller. (And as per the above, it also seems fine for that to notify a controller change until we raise and resolve that in the spec.)

Flags: needinfo?(bugmail)
Flags: needinfo?(bugmail)

I didn't yet have a change to accomplish a controller change notification, but doing that in SetNewDocument works.

Though I looked a bit further at pernosco traces and found another location where we could update the controller. I'm unsure if this is better?

When we load the initial about:blank, we create the ServiceWorkerController and try to get the controller from nsDocShell::mInitialClientSource. For this client source, we never set a controller.

When file_notification_openWindow.html starts loading, the parent calls into the ServiceWorkerInterceptController, which leads to ClientHandle::Control. A message is sent to the content to invoke ClientSource::SetController for a client source that lives on the load info. In nsGlobalWindowInner::EnsureClientSource, mClientSource is set to this client source and we again call SetController with loadInfo->GetController(). Although the client source is associated with this window, the window is never assigned as it's owner. So it won't notify the SWC of this controller change. I'm unsure if this is supposed to be the case. But if a new ServiceWorkerController were to be created, as is the case if we wouldn't re-use the window, we would get the controller via window->mClientSource->mController.

So it feels to me like the window should notify the SWC if it changes the controller. At least if the controller doesn't do that on its own. The following diff would address this:

diff --git a/dom/base/nsGlobalWindowInner.cpp b/dom/base/nsGlobalWindowInner.cpp
index 507a2ec16def1..54ad0a121e199 100644
--- a/dom/base/nsGlobalWindowInner.cpp
+++ b/dom/base/nsGlobalWindowInner.cpp
@@ -2015,6 +2016,12 @@ nsresult nsGlobalWindowInner::EnsureClientSource() {
     const Maybe<ServiceWorkerDescriptor> controller = loadInfo->GetController();
     if (controller.isSome()) {
       mClientSource->SetController(controller.ref());

+      if (!mClientSource->GetInnerWindow()) {
+        // ClientSource has no owner, so didn't notify the SWC, bug 1949168
+        MOZ_ASSERT(!nsContentUtils::IsSafeToRunScript()); // pretty sure it always holds
+        nsWeakPtr weakSelf = do_GetWeakReference(this);
+        nsContentUtils::AddScriptRunner(NS_NewRunnableFunction("DelayedControllerChange", [weakSelf](){
+          if (nsCOMPtr<nsPIDOMWindowInner> self = do_QueryReferent(weakSelf)) {
+            RefPtr<ServiceWorkerContainer> swc = self->Navigator()->ServiceWorker();
+            swc->ControllerChanged(IgnoreErrors());
+          }
+        }));
+      }
     }
 
     // We also have to handle the case where te initial about:blank is

Edit: Added script runner to avoid assertion failures from here

Maybe the ServiceWorkerContainer doesn't cope well with window reuse in general. service-workers/service-worker/postmessage-to-client-message-queue.https.html is also failing and I hoped it would be fixed by resolving this bug. But that doesn't seem to be the case, rather I guess ServiceWorkerContainer::mMessagesStarted is set to true by the initial about:blank load and never cleared for consecutive loads.

(Back from PTO now.)

In general, yeah, ServiceWorkerContainer definitely did not expect window reuse. (And an awkwardness of the logic that was about:blank aware was that at the time of development there was no unified spec and Gecko and other browsers diverged.)

So I understand what is probably breaking things to be:

My immediate thoughts/questions are:

  • Is the spec saying that we should be enabling the client message queue in the initial about:blank case? If it's not, our implementation could perhaps just change the call there to not call startMessages for initial about:blank.
  • If the spec is saying we should be doing that for initial about:blank, maybe we could change the spec so it doesn't?

We probably want to discuss this as a spec issue regardless, but if you've traced uses of "The end" to know the answers above, that will be helpful as a starting point, but I can also look into it.

Flags: needinfo?(bugmail)
Attached file sw.js

In general, I believe the spec isn't completely accurate on the handling of the initial about:blank. I understand the spec such that we should enable the client message queue for about:blank. But from testing in chrome, they seem to not do that and even ignore startMessages for about:blank (see attached files). Though I don't see this reflected in their code.

Thanks for linking to the spec. I'm trying to familiarize me with the service worker parts.

(In reply to Andrew Sutherland [:asuth] (he/him) from comment #10)

  • When we later go to navigate, step 6 of Shared document creation infrastructure will have us reuse the window for the new Document with spec text "If browsingContext's active document's is initial about:blank is true, and browsingContext's active document's origin is same origin-domain with navigationParams's origin, then set window to browsingContext's active window." and with the very helpful info box:

    This means that both the initial about:blank Document, and the new Document that is about to be created, will share the same Window object.

I just found https://github.com/whatwg/html/issues/3267. There is interest in having the spec not reuse the window.

Attached file test_swc_reuse.html

I was a bit confused by window reuse. It turns out WebKit and Blink do reuse the inner window, as the spec mandates. But unlike the spec, they seem to reset the Navigator (see attached test case, wpt).

I did verify this in WebKit's code: LocalDomWindow::didSecureTransitionTo clears m_navigator. It is called from Document::takeDOMWindowFrom link, which is called in DocumentWriter::begin if shouldReuseDefaultView link. I didn't manage to find similar code in Blink, but at least my test shows a different navigator after navigating away from about:blank.

I'm not completely confident in my test cases. I also wonder why startMessages didn't seem to work on about:blank in Chrome.

But at least, resetting the navigator lets ServiceWorkerContainer avoid dealing with window reuse. That does resolve the two test failures I saw with the sync-about-blank patch.

Regressions: 1968515
No longer regressions: 1968515
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: