Closed Bug 1568597 Opened 5 years ago Closed 4 years ago

Change ServiceWorkerPrivate process selection to be fission-aware

Categories

(Core :: DOM: Service Workers, enhancement, P2)

enhancement

Tracking

()

RESOLVED FIXED
82 Branch
Fission Milestone M6b
Tracking Status
firefox82 --- fixed

People

(Reporter: asuth, Assigned: rpl)

References

(Blocks 2 open bugs)

Details

Attachments

(4 files)

Bug 1231213 chooses processes randomly from the "web" remoteType pool. When Fission is enabled it is appropriate to choose from the appropriate fission origin pool for the origin and we should do that.

Fission Milestone: --- → M5
Assignee: nobody → perry

Perry, is this bug still relevant? Do you plan to work on it soon?

Deferring to Fission Nightly (M6)

Flags: needinfo?(perry)

Moving P2 M5 bugs to M5b milestone

Fission Milestone: M5 → M5b

(In reply to Chris Peterson [:cpeterson] from comment #1)

Deferring to Fission Nightly (M6)

Oops. I forgot to actually change the Fission Milestone field to M6 at that time.

Is this bug still an issue?

Fission Milestone: M5b → M6

Hi Andrew, Perry, this is our last bug on the M6 Fission list, it seems. If it is still an issue, we should prioritize it.

Flags: needinfo?(bugmail)
Fission Milestone: M6 → M6b
Depends on: 1642676

Hi Perry,
I had to investigate some test failure in fission-enabled jobs triggered by pushing Bug 1642676's patches to autoland, and making RemoteWorkerManager::GetRemoteType and RemoteWorkerManager::MatchRemoteType (both added in Bug 1642676's D61708) fission-aware was one of the way to fix those failure (they were triggered because of the additional check that RemoteWorkerChild is doing to verify that the child process remoteType does match the rremote worker's emoteType) and so I ended up to take a look also to that.

The attached patch contains some small changes on top of Bug 1642676's D61708 and it does select a child process with a "webIsolated=[siteOrigin]" remoteType when fission is enabled (and it does pass the fission-enabled tests that were failing in the previous version of D61708, the updated version of that patch is now passing because Andrew and I agreed to relax the RemoteWorkerChild check and leave to this bug a more complete fission-aware process selection).

It's not that I wanted to steal this bug from you, I just agreed with Andrew to attach the patch here in case it may be useful as a start to complete the fission-aware process selection (e.g. I didn't checked explicitly yet that when RemoteWorkerManager does launch a new process with a fission remoteType then everything works correctly from a fission point of view).

I'll be happy to chat (or meet) with you next week if you think that having a brief chat about this patch may be useful to complete this bug.

No worries if you want to take the bug. The logic to make process selection account for fission remote types looks like it will work. And yeah, I suppose we should test to verify that the right fission process is chosen.

Flags: needinfo?(perry)

Luca is working on this, so changing the assignee.

Assignee: perry → lgreco
Status: NEW → ASSIGNED
Flags: needinfo?(bugmail)
Attachment #9159765 - Attachment description: Bug 1568597 - Make RemoteWorkerManager::GetRemoteType/MatchRemoteType fission-aware. → Bug 1568597 - Make RemoteWorkerManager::GetRemoteType/MatchRemoteType fission-aware. r=perry,asuth!,nika!
Attachment #9159765 - Attachment description: Bug 1568597 - Make RemoteWorkerManager::GetRemoteType/MatchRemoteType fission-aware. r=perry,asuth!,nika! → Bug 1568597 - Make RemoteWorkerManager::GetRemoteType/MatchRemoteType fission-aware. r=asuth!,nika!

I did notice this issue while investigating a test failure on
browser/components/contextualidentity/test/browser/browser_serviceworkers.js

The issue seems to be triggered when we are using a preallocated child process
for a call to RemoteWorkerManager::LaunchNewContentProcess, when that happens
we do expect that the new process is going to call RemoteWorkerManager::RegisterActor
once its RemoteWorkerService is being initialized in the new child process,
but when we are reusing a preallocated child process the RemoteWorkerService
was already initialized and RegisterActor was already called while the
remoteType for the child process was still "prealloc".

This patch fix the failure by deferring initializing RemoteWorkerService in
child processes to when we do receive a non "prealloc" remoteType in
ContentChild::RecvRemoteType.

Depends on D81373

Attachment #9169226 - Attachment description: Bug 1568597 - Defer RemoteWorkerService init to ContentChild::RecvRemoteType. → Bug 1568597 - Defer RemoteWorkerService init to ContentChild::RecvRemoteType. r?asuth!

Hi Luca, now that we are on nightly 82: can we land this? Thank you!

Flags: needinfo?(lgreco)

(In reply to Jens Stutte [:jstutte] (REO for FF 81) from comment #11)

Hi Luca, now that we are on nightly 82: can we land this? Thank you!

Absolutely, yesterday I rebased the 3 patches attached to this issue and pushed one last time on try to double-check that there are no new failures triggered after the rebase:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=ef16e339a0d53230a9b6304e5e4f4f64a50827c6

I'm going to push them to autoland today.

Flags: needinfo?(lgreco)
Pushed by luca.greco@alcacoop.it: https://hg.mozilla.org/integration/autoland/rev/0c69ed659f23 Allow extension principal workers to run in the main process if remote extensions are disabled by pref. r=asuth https://hg.mozilla.org/integration/autoland/rev/ed846f54fe7d Make RemoteWorkerManager::GetRemoteType/MatchRemoteType fission-aware. r=asuth,nika https://hg.mozilla.org/integration/autoland/rev/07523f6e8341 Defer RemoteWorkerService init to ContentChild::RecvRemoteType. r=asuth,nika

(In reply to Razvan Maries from comment #14)

Backed out for perma failrues.

Push with failure: https://treeherder.mozilla.org/#/jobs?repo=autoland&resultStatus=pending%2Crunning%2Csuccess%2Ctestfailed%2Cbusted%2Cexception&searchStr=linux%2C18.04%2Cx64%2Cwebrender%2Copt%2Cweb%2Cplatform%2Ctests%2Cwith%2Cfission%2Cenabled%2Ctest-linux1804-64-qr%2Fopt-web-platform-tests-fis-e10s%2Cwpt9&revision=07523f6e83410bf3ae73fc69184ca9ebd28139d7&selectedTaskRun=ULTvOqNYQNGsL8iySz1_iA.0

Log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=313982937&repo=autoland&lineNumber=67643

Backout: https://hg.mozilla.org/integration/autoland/rev/2ee42ae444dcfa828cac0420c03a5bf2627c86da

Well... the issue behind this wpt test failure (triggered while running with fission enabled) was definitely an interesting one.

I did spent some time investigating that failure yesterday, I was already convinced that the underlying issue was very likely some kind of race
(as additional hint about that was the fact that locally I wasn't able to trigger it in a debug build, but I was able to trigger it consistently enough in an optimized build) but I wanted to see clearly why making the remote workers fully fission aware did start to trigger it, what difference did it make in practice.

The wpt test that did fail is testing the creation of 4 shared workers, the first 2 are expected to be allowed (and their worker principal is going to be secure https://... one) and the last 2 are expected to be disallowed (and their worker principal is going to be an insecure http://... one).

When fission is enabled (and the patches attached to this issue applied):

  • for the first 2, we select the child process with remoteType "webIsolated=https://...", which is the same where the tab from which we do register the SharedWorker and so it is going to be kept alive also by the tab and it is not going to shutdown until we also close the tab

  • for the last 2, we select the child process with remoteType "webIsolated=http://...", which does not exist yet and so:

    • the first of the last 2 shared worker is going to let us to pick one of the child process with remoteType "prealloc", change it to remoteType "webIsolated=http://..." and spawn the first remote worker in this child process
    • for the second of the last 2 shared worker, if the shared worker is being registered fast enough, we will find that we do have already available a RemoteWorkerService actor from a child process for the remoteType "webIsolated=http://..." and so RemoteWorkerManager::SelectTargetActorForSharedWorker will select that one, but that child process isn't kept alive also by a tab as the secure "https://..." one and only the remote worker spawned by the previous test case is keeping it alive and so there is a chance that we may be shutdown that process between the time when we selected it and the time when we are able to spawn the new remote worker into it

Turned out that we already had a similar issue even before fission for the service workers (which could already been spawned into a new child process without any tab keeping that child process alive) and it is described in a very nice detailed inline comment here:

The new patch attached in comment 15 is basically using the same strategy that we are already using in RemoteWorkerManager::SelectTargetActorForServiceWorker as part of the shared worker target actor selection implemented in RemoteWorkerManager::SelectTargetActorForSharedWorker.

With this fix applied the wpt test did finally pass consistently on a local opt build (also while running with --verify), I did push it on try to double-check that it is also going to pass on the build infrastructure:

Flags: needinfo?(lgreco)
Pushed by luca.greco@alcacoop.it: https://hg.mozilla.org/integration/autoland/rev/8381d47bee21 Allow extension principal workers to run in the main process if remote extensions are disabled by pref. r=asuth https://hg.mozilla.org/integration/autoland/rev/e3b29d1ae141 Make RemoteWorkerManager::GetRemoteType/MatchRemoteType fission-aware. r=asuth,nika https://hg.mozilla.org/integration/autoland/rev/fbaf90b7996a Defer RemoteWorkerService init to ContentChild::RecvRemoteType. r=asuth,nika https://hg.mozilla.org/integration/autoland/rev/4ee63a7a01c5 RemoteWorkerManager::SelectTargetActorForSharedWorker should select an actor that is kept alive. r=asuth
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: