Closed Bug 1568597 Opened 5 years ago Closed 4 years ago

Change ServiceWorkerPrivate process selection to be fission-aware

Tracking

()

Status:

RESOLVED FIXED

Milestone:

82 Branch

Project Flags:

Fission Milestone

M6b

Tracking Flags:

Tracking

Status

firefox82

---

fixed

People

(Reporter: asuth, Assigned: rpl)

References

(Blocks 2 open bugs)

Details

Attachments

(4 files)

Bug 1568597 - Make RemoteWorkerManager::GetRemoteType/MatchRemoteType fission-aware. r=asuth!,nika! 4 years ago Luca Greco [:rpl] [:luca] [:lgreco] 47 bytes, text/x-phabricator-request		Details \| Review
Bug 1568597 - Defer RemoteWorkerService init to ContentChild::RecvRemoteType. r?asuth! 4 years ago Luca Greco [:rpl] [:luca] [:lgreco] 47 bytes, text/x-phabricator-request		Details \| Review
Bug 1568597 - Allow extension principal workers to run in the main process if remote extensions are disabled by pref. r?asuth! 4 years ago Luca Greco [:rpl] [:luca] [:lgreco] 47 bytes, text/x-phabricator-request		Details \| Review
Bug 1568597 - RemoteWorkerManager::SelectTargetActorForSharedWorker should select an actor that is kept alive. r?asuth! 4 years ago Luca Greco [:rpl] [:luca] [:lgreco] 47 bytes, text/x-phabricator-request		Details \| Review

Andrew Sutherland [:asuth] (he/him)

Reporter

Description

•

5 years ago

Bug 1231213 chooses processes randomly from the "web" remoteType pool. When Fission is enabled it is appropriate to choose from the appropriate fission origin pool for the origin and we should do that.

Nika Layzell [:nika] (ni? for response)

Updated

•

5 years ago

Fission Milestone: --- → M5

Perry Jiang [:perry] [no longer employee, use ni?]

Updated

•

5 years ago

Assignee: nobody → perry

Chris Peterson [:cpeterson]

Comment 1

•

5 years ago

Perry, is this bug still relevant? Do you plan to work on it soon?

Deferring to Fission Nightly (M6)

Flags: needinfo?(perry)

Chris Peterson [:cpeterson]

Comment 2

•

5 years ago

Moving P2 M5 bugs to M5b milestone

Fission Milestone: M5 → M5b

Chris Peterson [:cpeterson]

Comment 3

•

5 years ago

(In reply to Chris Peterson [:cpeterson] from comment #1)

Deferring to Fission Nightly (M6)

Oops. I forgot to actually change the Fission Milestone field to M6 at that time.

Is this bug still an issue?

Fission Milestone: M5b → M6

Jens Stutte [:jstutte]

Comment 4

•

5 years ago

Hi Andrew, Perry, this is our last bug on the M6 Fission list, it seems. If it is still an issue, we should prioritize it.

Flags: needinfo?(bugmail)

Nika Layzell [:nika] (ni? for response)

Updated

•

5 years ago

Fission Milestone: M6 → M6b

Luca Greco [:rpl] [:luca] [:lgreco]

Assignee

Comment 5

•

4 years ago

Attached file Bug 1568597 - Make RemoteWorkerManager::GetRemoteType/MatchRemoteType fission-aware. r=asuth!,nika! — Details

Luca Greco [:rpl] [:luca] [:lgreco]

Assignee

Updated

•

4 years ago

Depends on: 1642676

Luca Greco [:rpl] [:luca] [:lgreco]

Assignee

Comment 6

•

4 years ago

Hi Perry,
I had to investigate some test failure in fission-enabled jobs triggered by pushing Bug 1642676's patches to autoland, and making RemoteWorkerManager::GetRemoteType and RemoteWorkerManager::MatchRemoteType (both added in Bug 1642676's D61708) fission-aware was one of the way to fix those failure (they were triggered because of the additional check that RemoteWorkerChild is doing to verify that the child process remoteType does match the rremote worker's emoteType) and so I ended up to take a look also to that.

The attached patch contains some small changes on top of Bug 1642676's D61708 and it does select a child process with a "webIsolated=[siteOrigin]" remoteType when fission is enabled (and it does pass the fission-enabled tests that were failing in the previous version of D61708, the updated version of that patch is now passing because Andrew and I agreed to relax the RemoteWorkerChild check and leave to this bug a more complete fission-aware process selection).

It's not that I wanted to steal this bug from you, I just agreed with Andrew to attach the patch here in case it may be useful as a start to complete the fission-aware process selection (e.g. I didn't checked explicitly yet that when RemoteWorkerManager does launch a new process with a fission remoteType then everything works correctly from a fission point of view).

I'll be happy to chat (or meet) with you next week if you think that having a brief chat about this patch may be useful to complete this bug.

Perry Jiang [:perry] [no longer employee, use ni?]

Comment 7

•

4 years ago

No worries if you want to take the bug. The logic to make process selection account for fission remote types looks like it will work. And yeah, I suppose we should test to verify that the right fission process is chosen.

Flags: needinfo?(perry)

Neha Kochar [:neha]

Comment 8

•

4 years ago

Luca is working on this, so changing the assignee.

Assignee: perry → lgreco

Status: NEW → ASSIGNED

Flags: needinfo?(bugmail)

Phabricator Automation

Updated

•

4 years ago

Attachment #9159765 - Attachment description: Bug 1568597 - Make RemoteWorkerManager::GetRemoteType/MatchRemoteType fission-aware. → Bug 1568597 - Make RemoteWorkerManager::GetRemoteType/MatchRemoteType fission-aware. r=perry,asuth!,nika!

Phabricator Automation

Updated

•

4 years ago

Attachment #9159765 - Attachment description: Bug 1568597 - Make RemoteWorkerManager::GetRemoteType/MatchRemoteType fission-aware. r=perry,asuth!,nika! → Bug 1568597 - Make RemoteWorkerManager::GetRemoteType/MatchRemoteType fission-aware. r=asuth!,nika!

Luca Greco [:rpl] [:luca] [:lgreco]

Assignee

Comment 9

•

4 years ago

Attached file Bug 1568597 - Defer RemoteWorkerService init to ContentChild::RecvRemoteType. r?asuth! — Details

I did notice this issue while investigating a test failure on
browser/components/contextualidentity/test/browser/browser_serviceworkers.js

The issue seems to be triggered when we are using a preallocated child process
for a call to RemoteWorkerManager::LaunchNewContentProcess, when that happens
we do expect that the new process is going to call RemoteWorkerManager::RegisterActor
once its RemoteWorkerService is being initialized in the new child process,
but when we are reusing a preallocated child process the RemoteWorkerService
was already initialized and RegisterActor was already called while the
remoteType for the child process was still "prealloc".

This patch fix the failure by deferring initializing RemoteWorkerService in
child processes to when we do receive a non "prealloc" remoteType in
ContentChild::RecvRemoteType.

Depends on D81373

Luca Greco [:rpl] [:luca] [:lgreco]

Assignee

Comment 10

•

4 years ago

Attached file Bug 1568597 - Allow extension principal workers to run in the main process if remote extensions are disabled by pref. r?asuth! — Details

Phabricator Automation

Updated

•

4 years ago

Attachment #9169226 - Attachment description: Bug 1568597 - Defer RemoteWorkerService init to ContentChild::RecvRemoteType. → Bug 1568597 - Defer RemoteWorkerService init to ContentChild::RecvRemoteType. r?asuth!

Jens Stutte [:jstutte]

Comment 11

•

4 years ago

Hi Luca, now that we are on nightly 82: can we land this? Thank you!

Flags: needinfo?(lgreco)

Luca Greco [:rpl] [:luca] [:lgreco]

Assignee

Comment 12

•

4 years ago

(In reply to Jens Stutte [:jstutte] (REO for FF 81) from comment #11)

Hi Luca, now that we are on nightly 82: can we land this? Thank you!

Absolutely, yesterday I rebased the 3 patches attached to this issue and pushed one last time on try to double-check that there are no new failures triggered after the rebase:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=ef16e339a0d53230a9b6304e5e4f4f64a50827c6

I'm going to push them to autoland today.

Flags: needinfo?(lgreco)

Pulsebot

Comment 13

•

4 years ago

Pushed by luca.greco@alcacoop.it: https://hg.mozilla.org/integration/autoland/rev/0c69ed659f23 Allow extension principal workers to run in the main process if remote extensions are disabled by pref. r=asuth https://hg.mozilla.org/integration/autoland/rev/ed846f54fe7d Make RemoteWorkerManager::GetRemoteType/MatchRemoteType fission-aware. r=asuth,nika https://hg.mozilla.org/integration/autoland/rev/07523f6e8341 Defer RemoteWorkerService init to ContentChild::RecvRemoteType. r=asuth,nika

Razvan Maries

Comment 14

•

4 years ago

Backed out for perma failrues.

Push with failure: https://treeherder.mozilla.org/#/jobs?repo=autoland&resultStatus=pending%2Crunning%2Csuccess%2Ctestfailed%2Cbusted%2Cexception&searchStr=linux%2C18.04%2Cx64%2Cwebrender%2Copt%2Cweb%2Cplatform%2Ctests%2Cwith%2Cfission%2Cenabled%2Ctest-linux1804-64-qr%2Fopt-web-platform-tests-fis-e10s%2Cwpt9&revision=07523f6e83410bf3ae73fc69184ca9ebd28139d7&selectedTaskRun=ULTvOqNYQNGsL8iySz1_iA.0

Log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=313982937&repo=autoland&lineNumber=67643

Backout: https://hg.mozilla.org/integration/autoland/rev/2ee42ae444dcfa828cac0420c03a5bf2627c86da

Flags: needinfo?(lgreco)

Luca Greco [:rpl] [:luca] [:lgreco]

Assignee

Comment 15

•

4 years ago

Attached file Bug 1568597 - RemoteWorkerManager::SelectTargetActorForSharedWorker should select an actor that is kept alive. r?asuth! — Details

Depends on D86590

Luca Greco [:rpl] [:luca] [:lgreco]

Assignee

Comment 16

•

4 years ago

•

Edited

(In reply to Razvan Maries from comment #14)

Backed out for perma failrues.

Push with failure: https://treeherder.mozilla.org/#/jobs?repo=autoland&resultStatus=pending%2Crunning%2Csuccess%2Ctestfailed%2Cbusted%2Cexception&searchStr=linux%2C18.04%2Cx64%2Cwebrender%2Copt%2Cweb%2Cplatform%2Ctests%2Cwith%2Cfission%2Cenabled%2Ctest-linux1804-64-qr%2Fopt-web-platform-tests-fis-e10s%2Cwpt9&revision=07523f6e83410bf3ae73fc69184ca9ebd28139d7&selectedTaskRun=ULTvOqNYQNGsL8iySz1_iA.0

Log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=313982937&repo=autoland&lineNumber=67643

Backout: https://hg.mozilla.org/integration/autoland/rev/2ee42ae444dcfa828cac0420c03a5bf2627c86da

Well... the issue behind this wpt test failure (triggered while running with fission enabled) was definitely an interesting one.

I did spent some time investigating that failure yesterday, I was already convinced that the underlying issue was very likely some kind of race
(as additional hint about that was the fact that locally I wasn't able to trigger it in a debug build, but I was able to trigger it consistently enough in an optimized build) but I wanted to see clearly why making the remote workers fully fission aware did start to trigger it, what difference did it make in practice.

The wpt test that did fail is testing the creation of 4 shared workers, the first 2 are expected to be allowed (and their worker principal is going to be secure https://... one) and the last 2 are expected to be disallowed (and their worker principal is going to be an insecure http://... one).

When fission is enabled (and the patches attached to this issue applied):

for the first 2, we select the child process with remoteType "webIsolated=https://...", which is the same where the tab from which we do register the SharedWorker and so it is going to be kept alive also by the tab and it is not going to shutdown until we also close the tab
for the last 2, we select the child process with remoteType "webIsolated=http://...", which does not exist yet and so:
- the first of the last 2 shared worker is going to let us to pick one of the child process with remoteType "prealloc", change it to remoteType "webIsolated=http://..." and spawn the first remote worker in this child process
- for the second of the last 2 shared worker, if the shared worker is being registered fast enough, we will find that we do have already available a RemoteWorkerService actor from a child process for the remoteType "webIsolated=http://..." and so RemoteWorkerManager::SelectTargetActorForSharedWorker will select that one, but that child process isn't kept alive also by a tab as the secure "https://..." one and only the remote worker spawned by the previous test case is keeping it alive and so there is a chance that we may be shutdown that process between the time when we selected it and the time when we are able to spawn the new remote worker into it

Turned out that we already had a similar issue even before fission for the service workers (which could already been spawned into a new child process without any tab keeping that child process alive) and it is described in a very nice detailed inline comment here:

https://searchfox.org/mozilla-central/rev/27932d4e6ebd2f4b8519865dad864c72176e4e3b/dom/workers/remoteworkers/RemoteWorkerManager.cpp#411-423

The new patch attached in comment 15 is basically using the same strategy that we are already using in RemoteWorkerManager::SelectTargetActorForServiceWorker as part of the shared worker target actor selection implemented in RemoteWorkerManager::SelectTargetActorForSharedWorker.

With this fix applied the wpt test did finally pass consistently on a local opt build (also while running with --verify), I did push it on try to double-check that it is also going to pass on the build infrastructure:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=5f89f2e3eec2408aeb22588bad4b73af556abc42

Flags: needinfo?(lgreco)

Pulsebot

Comment 17

•

4 years ago

Pushed by luca.greco@alcacoop.it: https://hg.mozilla.org/integration/autoland/rev/8381d47bee21 Allow extension principal workers to run in the main process if remote extensions are disabled by pref. r=asuth https://hg.mozilla.org/integration/autoland/rev/e3b29d1ae141 Make RemoteWorkerManager::GetRemoteType/MatchRemoteType fission-aware. r=asuth,nika https://hg.mozilla.org/integration/autoland/rev/fbaf90b7996a Defer RemoteWorkerService init to ContentChild::RecvRemoteType. r=asuth,nika https://hg.mozilla.org/integration/autoland/rev/4ee63a7a01c5 RemoteWorkerManager::SelectTargetActorForSharedWorker should select an actor that is kept alive. r=asuth

Cosmin Sabou [:CosminS]

Comment 18

•

4 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/8381d47bee21
https://hg.mozilla.org/mozilla-central/rev/e3b29d1ae141
https://hg.mozilla.org/mozilla-central/rev/fbaf90b7996a
https://hg.mozilla.org/mozilla-central/rev/4ee63a7a01c5

Status: ASSIGNED → RESOLVED

Closed: 4 years ago

status-firefox82: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → 82 Branch

Sebastian Hengst [:aryx] (needinfo me if it's about an intermittent or backout)

Updated

•

4 years ago

Regressions: 1661860

You need to log in before you can comment on or make changes to this bug.