Change ServiceWorkerPrivate process selection to be fission-aware
Categories
(Core :: DOM: Service Workers, enhancement, P2)
Tracking
()
Tracking | Status | |
---|---|---|
firefox82 | --- | fixed |
People
(Reporter: asuth, Assigned: rpl)
References
(Blocks 2 open bugs)
Details
Attachments
(4 files)
Bug 1231213 chooses processes randomly from the "web" remoteType pool. When Fission is enabled it is appropriate to choose from the appropriate fission origin pool for the origin and we should do that.
Updated•5 years ago
|
Updated•5 years ago
|
Comment 1•5 years ago
|
||
Perry, is this bug still relevant? Do you plan to work on it soon?
Deferring to Fission Nightly (M6)
Comment 3•5 years ago
|
||
(In reply to Chris Peterson [:cpeterson] from comment #1)
Deferring to Fission Nightly (M6)
Oops. I forgot to actually change the Fission Milestone field to M6 at that time.
Is this bug still an issue?
Comment 4•5 years ago
|
||
Hi Andrew, Perry, this is our last bug on the M6 Fission list, it seems. If it is still an issue, we should prioritize it.
Updated•5 years ago
|
Assignee | ||
Comment 5•4 years ago
|
||
Assignee | ||
Comment 6•4 years ago
|
||
Hi Perry,
I had to investigate some test failure in fission-enabled jobs triggered by pushing Bug 1642676's patches to autoland, and making RemoteWorkerManager::GetRemoteType and RemoteWorkerManager::MatchRemoteType (both added in Bug 1642676's D61708) fission-aware was one of the way to fix those failure (they were triggered because of the additional check that RemoteWorkerChild is doing to verify that the child process remoteType does match the rremote worker's emoteType) and so I ended up to take a look also to that.
The attached patch contains some small changes on top of Bug 1642676's D61708 and it does select a child process with a "webIsolated=[siteOrigin]" remoteType when fission is enabled (and it does pass the fission-enabled tests that were failing in the previous version of D61708, the updated version of that patch is now passing because Andrew and I agreed to relax the RemoteWorkerChild check and leave to this bug a more complete fission-aware process selection).
It's not that I wanted to steal this bug from you, I just agreed with Andrew to attach the patch here in case it may be useful as a start to complete the fission-aware process selection (e.g. I didn't checked explicitly yet that when RemoteWorkerManager does launch a new process with a fission remoteType then everything works correctly from a fission point of view).
I'll be happy to chat (or meet) with you next week if you think that having a brief chat about this patch may be useful to complete this bug.
Comment 7•4 years ago
|
||
No worries if you want to take the bug. The logic to make process selection account for fission remote types looks like it will work. And yeah, I suppose we should test to verify that the right fission process is chosen.
Comment 8•4 years ago
|
||
Luca is working on this, so changing the assignee.
Updated•4 years ago
|
Updated•4 years ago
|
Assignee | ||
Comment 9•4 years ago
|
||
I did notice this issue while investigating a test failure on
browser/components/contextualidentity/test/browser/browser_serviceworkers.js
The issue seems to be triggered when we are using a preallocated child process
for a call to RemoteWorkerManager::LaunchNewContentProcess, when that happens
we do expect that the new process is going to call RemoteWorkerManager::RegisterActor
once its RemoteWorkerService is being initialized in the new child process,
but when we are reusing a preallocated child process the RemoteWorkerService
was already initialized and RegisterActor was already called while the
remoteType for the child process was still "prealloc".
This patch fix the failure by deferring initializing RemoteWorkerService in
child processes to when we do receive a non "prealloc" remoteType in
ContentChild::RecvRemoteType.
Depends on D81373
Assignee | ||
Comment 10•4 years ago
|
||
Updated•4 years ago
|
Comment 11•4 years ago
|
||
Hi Luca, now that we are on nightly 82: can we land this? Thank you!
Assignee | ||
Comment 12•4 years ago
|
||
(In reply to Jens Stutte [:jstutte] (REO for FF 81) from comment #11)
Hi Luca, now that we are on nightly 82: can we land this? Thank you!
Absolutely, yesterday I rebased the 3 patches attached to this issue and pushed one last time on try to double-check that there are no new failures triggered after the rebase:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=ef16e339a0d53230a9b6304e5e4f4f64a50827c6
I'm going to push them to autoland today.
Comment 13•4 years ago
|
||
Comment 14•4 years ago
|
||
Backed out for perma failrues.
Log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=313982937&repo=autoland&lineNumber=67643
Backout: https://hg.mozilla.org/integration/autoland/rev/2ee42ae444dcfa828cac0420c03a5bf2627c86da
Assignee | ||
Comment 15•4 years ago
|
||
Depends on D86590
Assignee | ||
Comment 16•4 years ago
•
|
||
(In reply to Razvan Maries from comment #14)
Backed out for perma failrues.
Log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=313982937&repo=autoland&lineNumber=67643
Backout: https://hg.mozilla.org/integration/autoland/rev/2ee42ae444dcfa828cac0420c03a5bf2627c86da
Well... the issue behind this wpt test failure (triggered while running with fission enabled) was definitely an interesting one.
I did spent some time investigating that failure yesterday, I was already convinced that the underlying issue was very likely some kind of race
(as additional hint about that was the fact that locally I wasn't able to trigger it in a debug build, but I was able to trigger it consistently enough in an optimized build) but I wanted to see clearly why making the remote workers fully fission aware did start to trigger it, what difference did it make in practice.
The wpt test that did fail is testing the creation of 4 shared workers, the first 2 are expected to be allowed (and their worker principal is going to be secure https://...
one) and the last 2 are expected to be disallowed (and their worker principal is going to be an insecure http://...
one).
When fission is enabled (and the patches attached to this issue applied):
-
for the first 2, we select the child process with remoteType
"webIsolated=https://..."
, which is the same where the tab from which we do register the SharedWorker and so it is going to be kept alive also by the tab and it is not going to shutdown until we also close the tab -
for the last 2, we select the child process with remoteType
"webIsolated=http://..."
, which does not exist yet and so:- the first of the last 2 shared worker is going to let us to pick one of the child process with remoteType
"prealloc"
, change it to remoteType"webIsolated=http://..."
and spawn the first remote worker in this child process - for the second of the last 2 shared worker, if the shared worker is being registered fast enough, we will find that we do have already available a RemoteWorkerService actor from a child process for the remoteType
"webIsolated=http://..."
and soRemoteWorkerManager::SelectTargetActorForSharedWorker
will select that one, but that child process isn't kept alive also by a tab as the secure "https://..." one and only the remote worker spawned by the previous test case is keeping it alive and so there is a chance that we may be shutdown that process between the time when we selected it and the time when we are able to spawn the new remote worker into it
- the first of the last 2 shared worker is going to let us to pick one of the child process with remoteType
Turned out that we already had a similar issue even before fission for the service workers (which could already been spawned into a new child process without any tab keeping that child process alive) and it is described in a very nice detailed inline comment here:
The new patch attached in comment 15 is basically using the same strategy that we are already using in RemoteWorkerManager::SelectTargetActorForServiceWorker
as part of the shared worker target actor selection implemented in RemoteWorkerManager::SelectTargetActorForSharedWorker
.
With this fix applied the wpt test did finally pass consistently on a local opt build (also while running with --verify), I did push it on try to double-check that it is also going to pass on the build infrastructure:
Comment 17•4 years ago
|
||
Comment 18•4 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/8381d47bee21
https://hg.mozilla.org/mozilla-central/rev/e3b29d1ae141
https://hg.mozilla.org/mozilla-central/rev/fbaf90b7996a
https://hg.mozilla.org/mozilla-central/rev/4ee63a7a01c5
Description
•