Closed Bug 1747261 Opened 2 years ago Closed 2 years ago

Turn on ServiceWorker site isolation for all ServiceWorkers

Categories

(Core :: DOM: Content Processes, enhancement)

enhancement

Tracking

()

RESOLVED FIXED
98 Branch
Tracking Status
firefox98 --- fixed

People

(Reporter: jesup, Assigned: jesup)

References

(Blocks 2 open bugs, Regressed 1 open bug)

Details

(Keywords: perf-alert)

Attachments

(1 file)

We landed ServiceWorker isolation with an Allow-list. This is for removing the Allow-list and enabling isolation for all ServiceWorkers. We could enable it for a subset (say just Fetch ServiceWorkers), however telemetry tells us that of running ServiceWorkers, >95% of them are handling Fetch. (Of registered ServiceWorkers, only about 2/3 are Fetch.)

This will cost us perhaps 15MB per running SW. Telemetry shows us that 83% of the time we have none running, 9% 1 running, 4% 2 running, and only ~0.5% of the time do we have more than 4 running.

This should help performance (significantly) for many or most sites using ServiceWorkers, especially for Fetch.

Assignee: nobody → rjesup
Status: NEW → ASSIGNED
Attachment #9256543 - Attachment description: WIP: Bug 1747261: enable ServiceWorker process-isolation for all domains → Bug 1747261: enable ServiceWorker process-isolation for all domains
Pushed by rjesup@wgate.com:
https://hg.mozilla.org/integration/autoland/rev/3b51e6804554
enable ServiceWorker process-isolation for all domains r=dom-storage-reviewers,edenchuang

Backed out changeset 3b51e6804554 (Bug 1747261) for causing xpcshell failures on test_E10SUtils_workers_remote_types.js.
Backout link
Push with failures
Failure Log

Flags: needinfo?(rjesup)
Attachment #9256543 - Attachment description: Bug 1747261: enable ServiceWorker process-isolation for all domains → WIP: Bug 1747261: enable ServiceWorker process-isolation for all domains
Flags: needinfo?(rjesup)
Pushed by rjesup@wgate.com:
https://hg.mozilla.org/integration/autoland/rev/76ca93f4a831
WIP: Bug 1747261: enable ServiceWorker process-isolation for all domains r=dom-storage-reviewers,edenchuang,jstutte

Backed out for causing mochitest failures on browser_serviceworker_fetch_new_process.js

Flags: needinfo?(rjesup)
Pushed by rjesup@wgate.com:
https://hg.mozilla.org/integration/autoland/rev/1bcc96832309
WIP: Bug 1747261: enable ServiceWorker process-isolation for all domains r=dom-storage-reviewers,edenchuang,jstutte
Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → 97 Branch

Backed out for causing mochitest-devtools failures on browser_dbg-windowless-service-workers.js.

Push with failures

Failure log

Backout link

Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Target Milestone: 97 Branch → ---

This was brought up during a DevTools meeting. I think the problem is that we can not setup the "service worker pausing" mechanism early enough here.

Previously, the service worker would spawn in an existing content process. Not necessarily the same as the page it controls, but the important bit is that it was an already existing one. With ServiceWorker site isolation, it seems more likely that a new process will be spawned dedicated for the worker.

Early breakpoints for windowless serviceworker debugging work with a WorkerPauser which is created in each and every process. This WorkerPauser listens to the WorkerDebuggerManager, and its role will be to prevent the ServiceWorker from executing any script until the DevTools thread actor is fully ready in the Worker's thread. This way breakpoints are ready when the worker's script starts.

But for a new process, it takes some time to setup everything:

Any serviceworker which was created before all those async steps were done will not break on any early breakpoint. The only point where we can freeze the WorkerDebugger execution is before it is registered in the Manager (see WorkerPrivate.cpp).

Here, when we call pauseMatchingServiceWorkers, the WorkerDebugger was already registered in the WorkerDebuggerManager, so we can no longer freeze it. Then the worker script starts while we are creating the worker target, attaching the thread actor, setting up breakpoints. When all that is done the script has already executed the line on which we had a breakpoint.

tldr; This is a complicated variant of the early breakpoint issue we fixed for regular webpages.

Or rather it's an "early breakpoint in new service worker in new process" issue. I don't think we could easily fix it as it requires to handle service worker targets in the same way as window global targets, and move everything to the server, closer to the creation of the platform objects we try to debug.

I suggest to disable the subtest #4 from browser_dbg-windowless-service-workers. Maybe other peers in DevTools team will have a better idea.

Thanks! In fact, in almost all cases it will create a new process for the serviceworker, unless a serviceworker process for that site is already running. The only way to avoid this startup race that I immediately see would be to somehow ensure the process for that site was already started before we register/start the serviceworker we want to breakpoint.

Flags: needinfo?(rjesup)

(In reply to Julian Descottes [:jdescottes] from comment #9)

I suggest to disable the subtest #4 from browser_dbg-windowless-service-workers. Maybe other peers in DevTools team will have a better idea.

No better idea on my side.

With this patch you are only increasing the chances of missing early breakpoints for service workers in this disabled-by-default feature.
Disabled because of its bad performance, but also because of this known weakness.

Disabling this subtest #4 sounds like a good approach.
Or the whole test if that makes some subsequent test fail.
Or somehow force the creation of the SW process before running this test?

Blocks: 1749341
Pushed by rjesup@wgate.com:
https://hg.mozilla.org/integration/autoland/rev/0ba397633624
WIP: Bug 1747261: enable ServiceWorker process-isolation for all domains r=jdescottes
Regressions: 1749569
Status: REOPENED → RESOLVED
Closed: 2 years ago2 years ago
Resolution: --- → FIXED
Target Milestone: --- → 98 Branch

== Change summary for alert #32933 (as of Thu, 13 Jan 2022 06:15:07 GMT) ==

Improvements:

Ratio Test Platform Options Absolute values (old vs new)
11% twitter ContentfulSpeedIndex windows10-64-shippable-qr fission warm webrender 952.67 -> 849.08
11% twitter SpeedIndex windows10-64-shippable-qr fission warm webrender 963.25 -> 858.75
8% twitter loadtime windows10-64-shippable-qr fission warm webrender 575.60 -> 528.21
8% twitter LastVisualChange windows10-64-shippable-qr fission warm webrender 1,480.71 -> 1,367.00
8% twitter PerceptualSpeedIndex windows10-64-shippable-qr fission warm webrender 590.25 -> 545.92

For up to date results, see: https://treeherder.mozilla.org/perfherder/alerts?id=32933

Backed out as requested by sparky in Bug 1749569.

Status: RESOLVED → REOPENED
Flags: needinfo?(rjesup)
Resolution: FIXED → ---
Target Milestone: 98 Branch → ---

(In reply to Cristian Tuns from comment #15)

Backed out as requested by sparky in Bug 1749569.

== Change summary for alert #32992 (as of Wed, 19 Jan 2022 14:26:35 GMT) ==

Regressions:

Ratio Test Platform Options Absolute values (old vs new)
15% twitter SpeedIndex windows10-64-shippable-qr fission warm webrender 860.40 -> 990.83
15% twitter ContentfulSpeedIndex windows10-64-shippable-qr fission warm webrender 848.60 -> 976.83
11% twitter loadtime windows10-64-shippable-qr fission warm webrender 524.53 -> 580.58
10% twitter PerceptualSpeedIndex windows10-64-shippable-qr fission warm webrender 546.27 -> 600.42
10% twitter LastVisualChange windows10-64-shippable-qr fission warm webrender 1,371.20 -> 1,502.83
7% pinterest loadtime linux1804-64-shippable-qr fission warm webrender 1,496.67 -> 1,599.21
5% outlook fcp windows10-64-shippable-qr fission warm webrender 203.50 -> 213.46
4% outlook fnbpaint windows10-64-shippable-qr fission warm webrender 207.17 -> 215.75
3% pinterest PerceptualSpeedIndex windows10-64-shippable-qr fission warm webrender 1,787.67 -> 1,841.42
2% pinterest dcf windows10-64-shippable-qr fission warm webrender 1,217.41 -> 1,243.04

Improvements:

Ratio Test Platform Options Absolute values (old vs new)
17% reddit FirstVisualChange linux1804-64-shippable-qr fission warm webrender 341.67 -> 283.33
14% imgur fnbpaint macosx1015-64-shippable-qr fission warm webrender 71.62 -> 61.62
14% reddit fcp linux1804-64-shippable-qr fission warm webrender 354.96 -> 305.67
14% reddit fnbpaint linux1804-64-shippable-qr fission warm webrender 369.00 -> 318.67
13% reddit SpeedIndex linux1804-64-shippable-qr fission warm webrender 453.67 -> 396.42
... ... ... ... ...
9% pinterest SpeedIndex macosx1015-64-shippable-qr fission warm webrender 856.46 -> 782.75

For up to date results, see: https://treeherder.mozilla.org/perfherder/alerts?id=32992

I'm going to re-land this patch; the issue with office performance tests failing doesn't appear to be directly a result of this patch - serviceworkers are failing to install for office, regardless of my patch being in or not, and the failure with my patch in appears to be some type of race condition in the page when run with mitmproxy. For example, just turning on the profiler "fixes" office with my patch.

Re-landing will cause an improvement in twitter (and maybe pinterest) warmload performance, and a regression in reddit warmload performance (which I'm investigating).

The performance test team is prioritizing re-recording office, and we'll work with that recording to ensure the serviceworkers are installing correctly. In the meantime we'll disable office

Flags: needinfo?(rjesup)
Attachment #9256543 - Attachment description: WIP: Bug 1747261: enable ServiceWorker process-isolation for all domains → Bug 1747261: enable ServiceWorker process-isolation for all domains
Pushed by rjesup@wgate.com:
https://hg.mozilla.org/integration/autoland/rev/51777ba56635
enable ServiceWorker process-isolation for all domains r=jdescottes
Status: REOPENED → RESOLVED
Closed: 2 years ago2 years ago
Resolution: --- → FIXED
Target Milestone: --- → 98 Branch

I assume bug 1752858 comment 4 is still the state of knowledge here.

You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: