Closed Bug 1438945 Opened 6 years ago Closed 6 years ago

SharedWorkers should be shared cross processes

Categories

(Core :: DOM: Workers, enhancement, P2)

58 Branch
enhancement

Tracking

()

RESOLVED FIXED
mozilla65
Tracking Status
firefox65 --- fixed

People

(Reporter: baku, Assigned: baku)

References

(Blocks 1 open bug)

Details

Attachments

(19 files, 28 obsolete files)

4.90 KB, patch
asuth
: review+
Details | Diff | Splinter Review
29.17 KB, patch
Details | Diff | Splinter Review
45.12 KB, patch
Details | Diff | Splinter Review
14.49 KB, patch
Details | Diff | Splinter Review
15.60 KB, patch
Details | Diff | Splinter Review
4.56 KB, patch
asuth
: review+
Details | Diff | Splinter Review
11.43 KB, patch
asuth
: review+
Details | Diff | Splinter Review
8.76 KB, patch
asuth
: review-
mrbkap
: review+
Details | Diff | Splinter Review
1.70 KB, patch
asuth
: review+
Details | Diff | Splinter Review
83.75 KB, patch
asuth
: review-
Details | Diff | Splinter Review
8.41 KB, patch
asuth
: review+
mrbkap
: review+
Details | Diff | Splinter Review
47.58 KB, patch
asuth
: review+
Details | Diff | Splinter Review
99.23 KB, patch
asuth
: review+
Details | Diff | Splinter Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
We currently execute SharedWorkers in the process where they are created and they are shared only by other globals running on the same process. By spec, they should be shared with any global (with the right principal and SharedWorker name) also cross processes.

The aim is to have SharedWorkers running on a selected process, chosen because of a site isolation algorithm, or round robin, or other reasons.

In order to implement this, I want to proceed in 2 steps:

1. SharedWorkers and IPC (this bug): A centralized service, running in the parent process that allows the sharing of a SharedWorker cross processes. I'll describe it in more details, soon.
2. Remote Workers: a chunk of code able to dispatch a worker on any process and control it from the parent process. This is going to be done in a separate bug.

This bug is about the first point only. These patches are going to land together with the second point, but I prefer to keep the implementation separate.

Here the architecture of IPC SharedWorkers:

a. SharedWorker binding object can run on any process. It uses a PSharedWorker, PBackground protocol, to control the worker thread.

b. SharedWorkerChild/Parent are 2 PBackground actors.

c. SharedWorkerParent keeps alive a SharedWorkerService. This service knows any active worker thread and decides if a new one has to be created or if an existing one has to be shared.

d. Each worker thread is controlled by a SharedWorkerManager. Each SharedWorkerManager is registered into the ServiceWorkerService.

So far, SharedWorkerManager execs the Worker thread on the parent process, but, eventually, it will be able to remote it on a selected process.
I'm going to add several files. I prefer to keep it under dom/workers/sharedworkers. dom/sharedworkers seems a bit too much.
Attachment #8951707 - Flags: review?(bugs)
Attached patch part 2 - PSharedWorker protocol (obsolete) — Splinter Review
This is just the PSharedWorker protocol and the use of it in SharedWorker.
Attachment #8951708 - Flags: review?(bugs)
Here the execution of the Worker thread.
Attachment #8951709 - Flags: review?(bugs)
Attachment #8951710 - Flags: review?(bugs)
Attached patch part 5 - console test (obsolete) — Splinter Review
The console test fails because the console events are received on a different thread. Let's convert it to a browser test.
Attachment #8951840 - Flags: review?(bugs)
There are still a couple of failures related to CSP. I'll take a look on monday.
Attachment #8951710 - Attachment is obsolete: true
Attachment #8951710 - Flags: review?(bugs)
Attachment #8951956 - Flags: review?(bugs)
Priority: -- → P2
Attachment #8951707 - Flags: review?(bugs) → review?(bugmail)
Attachment #8951708 - Flags: review?(bugs) → review?(bugmail)
Attachment #8951709 - Flags: review?(bugs) → review?(bugmail)
Attachment #8951840 - Flags: review?(bugs) → review?(bugmail)
Attachment #8951956 - Flags: review?(bugs) → review?(bugmail)
Attachment #8951707 - Flags: review?(bugmail) → review+
Attachment #8951708 - Flags: review?(bugmail) → review+
Comment on attachment 8951709 [details] [diff] [review]
part 3 - SharedWorkerService and SharedWorkerManager

Review of attachment 8951709 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/workers/sharedworkers/SharedWorkerChild.h
@@ +52,5 @@
> +  void
> +  ActorDestroy(ActorDestroyReason aWhy) override;
> +
> +  // Raw pointer because mParent is set to null when released.
> +  SharedWorker* mParent;

I think this should have a MOZ_NON_OWNING_REF annotation? (https://searchfox.org/mozilla-central/source/mfbt/Attributes.h#579)

::: dom/workers/sharedworkers/SharedWorkerManager.cpp
@@ +245,5 @@
> +      ++suspended;
> +    }
> +  }
> +
> +  if (suspended != mActors.Length()) {

The intent of this comparison (and its analog in UpdateFrozen) seems to be "only suspend/freeze if all owning windows are suspended/frozen".  This makes sense, but UpdateSuspend()/UpdateFrozen() are also invoked for thawing/resuming, and this early return logic breaks them completely because we never will hit the Resume/Thaw calls.

Maybe you want this to be something like the following, ditching this early return:
  bool shouldSuspend = (suspended == mActors.Length());
And then down below where you decide whether to invoke {Suspend,Resume}OnMainThread use shouldSuspend.  That way the first resume will resume the SharedWorker.

Assuming I'm right about that, I'd suggest adding a comment along those lines.  Also, a comment somewhere explaining how the removal of an owning binding does not/cannot result in a suspended/frozen SharedWorker becoming resumed/thawed, which is why neither RemoveActor() or its call-sites Close() and ActorDestroy() invoke UpdateSuspended/UpdateFrozen.

@@ +284,5 @@
> +  }
> +
> +  RefPtr<SharedWorkerManager> self = this;
> +  nsCOMPtr<nsIRunnable> r =
> +    NS_NewRunnableFunction("SharedWorkerManager::UpdateSuspend",

Affirming that this copy-paste error is fixed in part 4 (s/UpdateSuspend/UpdateFrozen).

::: dom/workers/sharedworkers/SharedWorkerService.cpp
@@ +19,5 @@
> +
> +StaticMutex sSharedWorkerMutex;
> +
> +// Raw pointer because SharedWorkerParent keeps this object alive.
> +SharedWorkerService* sSharedWorkerService;

I think this wants a MOZ_NON_OWNING_REF annotation too (https://searchfox.org/mozilla-central/source/mfbt/Attributes.h#579)

@@ +236,5 @@
> +      return;
> +    }
> +  }
> +
> +  manager->ThawOnMainThread();

This could use a short comment like: "If the SharedWorker(Manager) already existed and was frozen, the existence of a new, un-frozen actor should trigger the thawing of the SharedWorker."
Attachment #8951709 - Flags: review?(bugmail) → review+
Comment on attachment 8951956 [details] [diff] [review]
part 4 - errors and communications

Review of attachment 8951956 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/workers/sharedworkers/SharedWorkerTypes.ipdlh
@@ +31,5 @@
>  
>    MessagePortIdentifier portIdentifier;
>  };
>  
> +struct ErrorDataNote {

I think these additions would benefit from some comments along the lines of:

ErrorData/ErrorDataNote correspond to WorkerErrorReport/WorkerErrorNote which in turn correspond to JSErrorReport/JSErrorNotes which allows JS to report complicated errors such as redeclarations that involve multiple distinct lines.  For more generic error-propagation IPC structures, see bug 1357463 on making ErrorResult usable over IPC.
Attachment #8951956 - Flags: review?(bugmail) → review+
Comment on attachment 8951840 [details] [diff] [review]
part 5 - console test

Review of attachment 8951840 [details] [diff] [review]:
-----------------------------------------------------------------

Restating: SpecialPowers.addObserver mechanism has limited support for proxying observer notifications from the parent to the child; in this case it just adds an observer in the current process.  By changing the test to a browser (chrome) test, the observer is added in the parent process.  And although SpecialPowers does have a special API family "registerConsoleListener" that registers via Services.console.registerListener() and a "console-api-log-event" observer, that observer is still just in the same process and does not handle "console-api-profiler".  As we straighten out the console logging in general, we can hopefully improve the specialpowers API.
Attachment #8951840 - Flags: review?(bugmail) → review+
I look forward to the RemoteWorker set of patches!  I may revisit these a little while reviewing those since there are obviously some loose ends here.
Attached patch csp.patch (obsolete) — Splinter Review
Attachment #8952777 - Flags: review?(ckerschb)
Comment on attachment 8952777 [details] [diff] [review]
csp.patch

Review of attachment 8952777 [details] [diff] [review]:
-----------------------------------------------------------------

Hey Baku, this looks great. I know we have discussed that over IRC and I said we shouldn't worry about the aDeliveredViaMetaTag bit, but now I am getting second thoughts about that because I know this piece of infrastructure will be re-used. Can you please propagate the deliveredViaMetaTag information? All you have to do is, set it in the same places we set ReportOnly() [1]. Other than that, this looks really good. Thanks for adding that!

[1] https://dxr.mozilla.org/mozilla-central/source/dom/security/nsCSPUtils.h#671
Attachment #8952777 - Flags: review?(ckerschb) → review+
Attached patch part 6 - CSP via IPC (obsolete) — Splinter Review
I need to make the CSP optional in PrincipalToPrincipalInfo otherwise many tests complain.
Attachment #8952777 - Attachment is obsolete: true
Attachment #8953062 - Flags: review?(ckerschb)
(In reply to Andrea Marchesini [:baku] from comment #16)
> Created attachment 8953062 [details] [diff] [review]
> part 6 - CSP via IPC
> 
> I need to make the CSP optional in PrincipalToPrincipalInfo otherwise many
> tests complain.

Actually, this is a bit worrying.  We have had many bugs where CSP was accidentally propagated via the principal in the past.  Adding CSP to PrincipalInfo maybe open up more ways that can happen and we don't really have good tests to catch it.
(In reply to Ben Kelly [:bkelly] from comment #17)
> Actually, this is a bit worrying.  We have had many bugs where CSP was
> accidentally propagated via the principal in the past.  Adding CSP to
> PrincipalInfo maybe open up more ways that can happen and we don't really
> have good tests to catch it.

Maybe you are right Ben, I know you spent quite so much make CSP in combination with workers work correctly. Is there any alternative that we could consider?
Flags: needinfo?(bkelly)
(In reply to Christoph Kerschbaumer [:ckerschb] from comment #18)
> (In reply to Ben Kelly [:bkelly] from comment #17)
> > Actually, this is a bit worrying.  We have had many bugs where CSP was
> > accidentally propagated via the principal in the past.  Adding CSP to
> > PrincipalInfo maybe open up more ways that can happen and we don't really
> > have good tests to catch it.
> 
> Maybe you are right Ben, I know you spent quite so much make CSP in
> combination with workers work correctly. Is there any alternative that we
> could consider?

If SharedWorker needs the CSP propagated, it could be done out-of-band with the PrincipalInfo for now.  Then once bug 965637 is fixed this mechanism could be aligned with the final solution.  I think that would be a bit safer in the short term.
Flags: needinfo?(bkelly)
I like this approach. Initially I didn't know porting CSP together with the principal was a good way to break everything.
And just because this feature seems to be needed just for SharedWorkers/RemoteWorkers, let's add something in this code, without touching PrincipalInfo.
Attached patch part 6 - CSP via IPC (obsolete) — Splinter Review
Attachment #8953062 - Attachment is obsolete: true
Attachment #8953062 - Flags: review?(ckerschb)
Attachment #8953153 - Flags: review?(ckerschb)
Comment on attachment 8953153 [details] [diff] [review]
part 6 - CSP via IPC

Review of attachment 8953153 [details] [diff] [review]:
-----------------------------------------------------------------

As discussed, I like this approach more than having it on the principal, makes more sense. The whole serialization will become clearer soon anyway once we have fixed bug 965637.
Attachment #8953153 - Flags: review?(ckerschb) → review+
Attached patch part 7 - RemoteWorker IPC (obsolete) — Splinter Review
Not green on try yet. I just want to share what I have so far.
Attachment #8953404 - Flags: feedback?(bugmail)
Attachment #8953405 - Flags: feedback?(bugmail)
Attached patch part 9 - RemoteWorkerObserver (obsolete) — Splinter Review
Attachment #8953406 - Flags: feedback?(bugmail)
Attached patch part 7 - RemoteWorker IPC (obsolete) — Splinter Review
Attachment #8953404 - Attachment is obsolete: true
Attachment #8953404 - Flags: feedback?(bugmail)
Attachment #8953408 - Flags: feedback?(bugmail)
Attachment #8953409 - Attachment description: part 4 - No RemoteWorkerService in the parent process if e10s → part 10 - No RemoteWorkerService in the parent process if e10s
Brain, when this set of patches will land, SharedWorkers (and ServiceWorkers in the future) could be executed on a different content process. From a console point of view, it means that the current process will not be able to retrieve console events generated by that worker.

It is still possible to retrieve such Console events on the parent process, and I changed a couple of tests to enforce it.

I would like to know: do you think having Console events not available on the current process (but only on the selected process and on the parent process), could it be a problem for devtools web console?
Flags: needinfo?(bgrinstead)
(In reply to Andrea Marchesini [:baku] from comment #28)
> Brain, when this set of patches will land, SharedWorkers (and ServiceWorkers
> in the future) could be executed on a different content process. From a
> console point of view, it means that the current process will not be able to
> retrieve console events generated by that worker.

To confirm: this related to / basically the same issue as Bug 1429805 but for SharedWorkers instead of ServiceWorkers?

> It is still possible to retrieve such Console events on the parent process,
> and I changed a couple of tests to enforce it.
>
> I would like to know: do you think having Console events not available on
> the current process (but only on the selected process and on the parent
> process), could it be a problem for devtools web console?

Right now there's only one console actor when debugging a tab, and it runs in the content process. So this means you wouldn't see messages from the Shared Workers in the tab's toolbox (only in the Browser Console which has an actor running in the parent process).

Changing this to also gather messages from the parent process is probably possible, but I'm not sure if it'd be easy. We do have a connection going the other direction in the Browser Console, where we forward messages from the content process up to the parent: https://searchfox.org/mozilla-central/source/devtools/server/actors/webconsole/listeners.js#468 / https://searchfox.org/mozilla-central/source/devtools/server/actors/webconsole/content-process-forward.js#128). Going to needinfo Alex to see if this is something that could be easily extended the other direction.

Alternatively, we're starting to track work needed for site isolation in devtools in Bug 1439048. Part of that will be supporting multiple 'targets' for a single toolbox (i.e. the client can directly connect to a separate console actor for each process). I think with that design and associated frontend changes in the console, you'd be able to see (and fully inspect objects in) messages coming from another process / worker. I'm not sure of the timeline of that project, but it's going to be a lot of work.
Flags: needinfo?(bgrinstead) → needinfo?(poirot.alex)
(In reply to Brian Grinstead [:bgrins] from comment #29)
> We do have a connection going
> the other direction in the Browser Console, where we forward messages from
> the content process up to the parent:
> https://searchfox.org/mozilla-central/source/devtools/server/actors/
> webconsole/listeners.js#468 /
> https://searchfox.org/mozilla-central/source/devtools/server/actors/
> webconsole/content-process-forward.js#128). Going to needinfo Alex to see if
> this is something that could be easily extended the other direction.

This is a good example of what you would do with today's architecture.
But site-isolation also aims to stop/reduce the data transferred to content processes.
So it also looks like something we would like to stop doing.

> Alternatively, we're starting to track work needed for site isolation in
> devtools in Bug 1439048.

So it would be great to wait for things to settle on the site-isolation front
and see if we could setup a new architecture where:
* it will be easier to manage actors that spread across processes
* prevents piping information to content processes (from parent or other content processes)

This doesn't mean waiting for site-isolation as a whole, but just for the definition of this barebone abstraction.
This bug could be a good first application of it.

pinging :jryans as he is leading site-isolation in DevTools.
Flags: needinfo?(poirot.alex) → needinfo?(jryans)
Attached patch part 2 - PSharedWorker protocol (obsolete) — Splinter Review
Attachment #8951708 - Attachment is obsolete: true
Attachment #8951709 - Attachment is obsolete: true
Attachment #8951956 - Attachment is obsolete: true
Attached patch part 5 - test (obsolete) — Splinter Review
Attachment #8951840 - Attachment is obsolete: true
Attachment #8954477 - Flags: review?(bugmail)
Andrea, I was talking to :jryans earlier today and he mentioned they looking at nsILoadContext.topFrameElement:

https://searchfox.org/mozilla-central/rev/14d933246211b02f5be21d2e730a57cf087c6606/docshell/base/nsILoadContext.idl#55

I'm guessing that this was probably also lost in the remoting of SharedWorker?

We probably need a new solution anyway.  SharedWorker never really corresponded to a single nsILoadContext anyway.  I believe we effectively picked either the first or last window context that attached to SharedWorker to respond there in the past.  We really need some way to reflect an array of attached clients or maybe window IDs.
Flags: needinfo?(amarchesini)
(In reply to Ben Kelly [:bkelly] from comment #35)
> Andrea, I was talking to :jryans earlier today and he mentioned they looking
> at nsILoadContext.topFrameElement:
> 
> https://searchfox.org/mozilla-central/rev/
> 14d933246211b02f5be21d2e730a57cf087c6606/docshell/base/nsILoadContext.idl#55

Currently this is used when listening in the parent process for network requests.  For service workers today, we listen in the content process and check the request's `associatedWindow`[1] property instead.

We're happy to change any of this as needed, as long as we have some mechanism to filter requests to the relevant client UUIDs and / or window IDs where they should appear.

[1]: https://searchfox.org/mozilla-central/rev/14d933246211b02f5be21d2e730a57cf087c6606/docshell/base/nsILoadContext.idl#35
Like Alex and Brian have mentioned, this relates to Site Isolation work for DevTools.

At a high level, I think we should move to a world where the client is connecting to the various targets of interest that exist for a tab (frames, workers, etc.) in each of their respective processes to retrieve messages, listen for new messages, etc.

I am not yet sure exactly what that should look like yet, and I'd rather not craft something ad-hoc right here if possible...  What's your timeline for enabling this work and having shared workers in a different process?

I created bug 1441711 to track the multi-process issue for the Console specifically.
Flags: needinfo?(jryans)
Comment on attachment 8954477 [details] [diff] [review]
part 5 - test

Review of attachment 8954477 [details] [diff] [review]:
-----------------------------------------------------------------

The only change I see to the file based on interdiff is the thing I commented on below.  If I missed something more, please let me know.

::: dom/workers/test/test_WorkerDebugger.xul
@@ +68,5 @@
>          await promise;
>  
>          info("Create a shared worker and wait for its debugger to be " + 
>               "registered");
> +        promise = waitForRegister("chrome://mochitests/content/chrome/dom/workers/test/" + SHARED_WORKER_URL);

Did you really mean to make this change to the tests here and below in this file?  This looks like it might have been a transient regression in your implementation that you hacked around and have since fixed?  At least, as far as I can tell from your most recent try push, your implementation passes SharedWorker::Constructor(...aScriptURL...)'s aScriptURL through as RemoteWorkerData's originalScriptURL and that gets propagated all the way through to RemoteWorkerChild::ExecWorkerOnMainThread which passes that as WorkerPrivate::Constructor's aScriptURL.  (And that becomes WorkerPrivate::mScriptURL which WorkerDebugger::GetUrl returns, which is what dom_worker_helper.js's waitForRegister is checking.)

I'm going to r+ because I'm okay with the hack, but I'd appreciate if you could explain where the normalization is creeping in.
Attachment #8954477 - Flags: review?(bugmail) → review+
I know that part 7 already exists, but I have to move them all. This patch goes after part 6 to make SharedWorker completely green on try.
Flags: needinfo?(amarchesini)
Attachment #8954865 - Flags: review?(bugmail)
This patch also execs the RemoteWorker on the current process if this is not the parent one.
Attachment #8953409 - Attachment is obsolete: true
Attachment #8953409 - Flags: feedback?(bugmail)
Attachment #8954889 - Flags: review?(bugmail)
Attachment #8955040 - Flags: feedback?(bugmail)
Just feedback request because there are details to adjust. These set of patches should be ready soon.
Attachment #8955042 - Flags: feedback?(bugmail)
Attachment #8954474 - Attachment is obsolete: true
Attachment #8954475 - Attachment is obsolete: true
Attachment #8954476 - Attachment is obsolete: true
Attached patch part 5 - testSplinter Review
Attachment #8954477 - Attachment is obsolete: true
Attachment #8953153 - Attachment is obsolete: true
Attachment #8954865 - Attachment is obsolete: true
Attachment #8954865 - Flags: review?(bugmail)
Attached patch part 8 - RemoteWorker IPC (obsolete) — Splinter Review
Attachment #8953405 - Attachment is obsolete: true
Attachment #8953406 - Attachment is obsolete: true
Attachment #8954889 - Attachment is obsolete: true
Attachment #8955040 - Attachment is obsolete: true
Attachment #8955042 - Attachment is obsolete: true
Attachment #8953405 - Flags: feedback?(bugmail)
Attachment #8953406 - Flags: feedback?(bugmail)
Attachment #8954889 - Flags: review?(bugmail)
Attachment #8955040 - Flags: feedback?(bugmail)
Attachment #8955042 - Flags: feedback?(bugmail)
Attachment #8958454 - Flags: review?(bugmail)
Attachment #8958456 - Flags: review?(bugmail)
Attachment #8958457 - Flags: review?(bugmail)
Attachment #8958461 - Flags: review?(bugmail)
Attachment #8953408 - Flags: feedback?(bugmail) → review?(bugmail)
Attachment #8953408 - Attachment is obsolete: true
Attachment #8953408 - Flags: review?(bugmail)
Attachment #8958456 - Attachment is obsolete: true
Attachment #8958456 - Flags: review?(bugmail)
Attachment #8963220 - Flags: review?(bugmail)
Attachment #8963220 - Attachment is obsolete: true
Attachment #8963220 - Flags: review?(bugmail)
Attachment #8969650 - Flags: review?(bugmail)
Attachment #8958461 - Attachment is obsolete: true
Attachment #8958461 - Flags: review?(bugmail)
Attachment #8969653 - Flags: review?(bugmail)
Attachment #8969653 - Flags: review?(mrbkap)
Attachment #8958459 - Flags: review?(mrbkap)
Attachment #8958458 - Flags: review?(mrbkap)
Comment on attachment 8969653 [details] [diff] [review]
part 13 - keeping ContentProcess alive

Review of attachment 8969653 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/ipc/ContentParent.h
@@ +355,5 @@
>                          bool aMarkedDestroying);
>  
> +  // This method can be called on any thread.
> +  void
> +  RegisterRemoveWorkerActor();

These two function names have typos, it seems: RegisterRemo*t*eWorkerActor.
Comment on attachment 8958459 [details] [diff] [review]
part 12 - Spawning a new process if needed.

Review of attachment 8958459 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/workers/remoteworkers/RemoteWorkerManager.cpp
@@ +183,5 @@
>    MOZ_ASSERT(XRE_IsParentProcess());
>  
> +  // This runnable will spawn a new process if it doesn't exist yet.
> +  nsCOMPtr<nsIRunnable> r =
> +    NS_NewRunnableFunction("ShutdownOnMainThread", [] () {

This name is wrong.
Fix both comments, I'm not uploading a new version of these patches. Waiting for the entire review.
Attached patch part 8 - RemoteWorker IPC (obsolete) — Splinter Review
This fixes the last leak: MessagePort must be released if the operation is canceled.
Attachment #8958454 - Attachment is obsolete: true
Attachment #8958454 - Flags: review?(bugmail)
Attachment #8970094 - Flags: review?(bugmail)
That was the wrong patch.
Attachment #8970094 - Attachment is obsolete: true
Attachment #8970094 - Flags: review?(bugmail)
Attachment #8970154 - Flags: review?(bugmail)
Attachment #8969650 - Attachment is obsolete: true
Attachment #8969650 - Flags: review?(bugmail)
Attachment #8970155 - Flags: review?(bugmail)
Attachment #8969653 - Flags: review?(mrbkap) → review+
Attachment #8958458 - Flags: review?(mrbkap) → review+
Comment on attachment 8958459 [details] [diff] [review]
part 12 - Spawning a new process if needed.

Review of attachment 8958459 [details] [diff] [review]:
-----------------------------------------------------------------

This patch, in particular, is very difficult to review in isolation. I'm going to defer to asuth's review on it as he has seen the rest of the patch stack and understand where this code fits in.
Attachment #8958459 - Flags: review?(mrbkap)
Comment on attachment 8963219 [details] [diff] [review]
part 3 - SharedWorkerService and SharedWorkerManager

Review of attachment 8963219 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/workers/sharedworkers/SharedWorkerManager.cpp
@@ +249,5 @@
> +      ++suspended;
> +    }
> +  }
> +
> +  if (suspended != 0 && suspended != mActors.Length()) {

This still seems wrong, or at least sufficiently confusing that a comment is merited.  See comment 9 for my original concern, but with an example here:

With mActors.Length() == 2:
* 0->1 transition: we early return and don't suspend because `if (true && true)`. Not suspended.
* 1->2 transition: we don't early return, do suspend because `if (true && false)`. Suspended.
* 2->1 transition: we early return and don't resume because `if (true && true)`. Suspended.
* 1->0 transition: we don't early return, do resume because `if (false && true)`. Not suspended.

The semantics of this are that we suspend only once all actors are suspended, and we resume only when all of the actors are un-suspended which contradicts the comments you removed from the old implementation, excerpted below:
// Shared workers are only frozen if all of their owning documents are
// frozen. It can happen that mSharedWorkers is empty but this thread has
// not been unregistered yet.

// Shared workers are resumed if any of their owning documents are thawed.
// It can happen that mSharedWorkers is empty but this thread has not been
// unregistered yet.
Attachment #8963219 - Flags: review-
Would this be easier if we avoided putting windows attached to a SharedWorker in bfcache at all?  Or maybe avoid putting them in bfcache if they are attached to a SharedWorker that is attached to another window preventing the SharedWorker from being suspended.
> Or maybe avoid putting them in bfcache if they are attached to a SharedWorker that is attached to another window preventing the SharedWorker from being suspended.

Oh, that won't work.  Because another window could attach while the first window is frozen in bfcache creating the same problem again.
Comment on attachment 8958453 [details] [diff] [review]
part 7 - SharedWorker can be intercepted by a ServiceWorker

Review of attachment 8958453 [details] [diff] [review]:
-----------------------------------------------------------------

(Mechanism moves to RemoteWorkerChild in part 9.)
Attachment #8958453 - Flags: review+
Comment on attachment 8969653 [details] [diff] [review]
part 13 - keeping ContentProcess alive

Review of attachment 8969653 [details] [diff] [review]:
-----------------------------------------------------------------

Restating:
- The number of RemoteWorkerActors alive in each ContentParent now gets tracked
  and causes ShouldKeepProcessAlive() to indicate that the process should be
  kept alive.
- The unusual race to worry about is DEBUG build shutdown time when PContent is
  torn down but PBackground is still around and the content process goes through
  an orderly XPCOM-style shutdown for leak-tracking and assertion-y purposes.
  In release builds, content processes will exit() and there's no race.
  - In this situation, ContentParent references will still exist, including by
    the BackgroundParent which holds the PContent reference until its own actor
    is destroyed.
  - Since RegisterRemoteWorkerActor() does not convey success (it's void), and
    the worker stuff doesn't otherwise pay attention, the interesting Q is what
    happens when UnregisterRemoteWorkerActor comes around.  The good news is
    it's all fine because all the calls check mShutdownPending and idempotently
    bail.  (That said, ContentProcessManager::GetTabParentCountByProcessId will
    take an NS_WARN_IF path because ContentProcessManager::RemoveContentProcess
    is invoked by ContentParent::ActorDestroy, so the table entry will no longer
    exist.  This seems like it might be desirable because it'd be interesting
    for chrome privileged code to be doing such a thing, let alone at shutdown.)

::: dom/workers/remoteworkers/RemoteWorkerParent.cpp
@@ +67,5 @@
> +  if (parent) {
> +    parent->RegisterRemoveWorkerActor();
> +
> +    nsCOMPtr<nsIEventTarget> target =
> +      SystemGroup::EventTargetFor(TaskCategory::Other);

I'm slightly un-nerved by the mismatch here between the release using a system group target and GetContentParent using NS_DispatchToMainThread[1] to perform the AddRef.  But NS_DispatchToMainThread uses the thread manager's bare/rare PrioritizedEventQueue, so it should win, so it's fine and I'm not worrying about it.

1: https://searchfox.org/mozilla-central/rev/c621276fbdd9591f52009042d959b9e19b66d49f/ipc/glue/BackgroundImpl.cpp#805
Attachment #8969653 - Flags: review?(bugmail) → review+
Comment on attachment 8970154 [details] [diff] [review]
part 8 - RemoteWorker IPC

Review of attachment 8970154 [details] [diff] [review]:
-----------------------------------------------------------------

At the all-hands I'd raised the issue of the RemoteWorkerManager potentially choosing the preallocated process and that being weird.

The difference between the preallocated process and other DEFAULT_REMOTE_TYPE processes is:
- It has a lower process priority.  This gets updated by PreallocatedProcessManager::Take()
- It is not listed in the ContentParent::GetOrCreatePool() pool for DEFAULT_REMOTE_TYPE until GetNewOrUsedProcess() puts it there.

This situation arises because the per-process RemoveWorkerService registers itself with the parent-process-only PBackground RemoteWorkerManager during ContentChild::InitXPCOM() inside ContentChild::RecvSetXPCOMProcessAttributes() which is triggered by ContentParent::InitInternal() where pretty much all the init logic happens.

I propose we address this by having part 13's ContentParent::RegisterRemoveWorkerActor roughly: if (!mRemoteWorkerActors), then get the pool, check if we're in the pool.  If not, do PreallocatedProcessManager::Take() and add itself to the pool.
Attachment #8970154 - Flags: review?(bugmail) → review+
Attachment #8958459 - Flags: review?(bugmail) → review+
Comment on attachment 8970155 [details] [diff] [review]
part 9 - RemoteWorker in SharedWorkerManager

Review of attachment 8970155 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/workers/remoteworkers/RemoteWorkerChild.cpp
@@ +306,5 @@
> +
> +  nsContentUtils::StorageAccess access =
> +    nsContentUtils::StorageAllowedForPrincipal(info.mPrincipal);
> +  info.mStorageAllowed =
> +    access > nsContentUtils::StorageAccess::ePrivateBrowsing;

Affirming: This is a change for SharedWorkers which previously used eDeny for comparison.  Now the comparison is consistent with ServiceWorkers.  The only actual changes in behavior, however, seem to be limited and/or desirable.  Namely:
- BroadcastChannel gets forbidden in SharedWorkers if the "privacy.restrict3rdpartystorage.enabled" pref is true.
- IndexedDB synchronously throws instead of asynchronously generating errors when the factory op realizes the source is private-browsing.  This is an improvement.

::: dom/workers/remoteworkers/RemoteWorkerChild.h
@@ +113,5 @@
> +    // Worker terminated.
> +    eTerminated,
> +  };
> +
> +  // Touched only on the owning thread (PBackground).

Owning thread is "Worker Launcher", not PBackground.  Same below.
Attachment #8970155 - Flags: review?(bugmail) → review+
Attachment #8958457 - Flags: review?(bugmail) → review+
Comment on attachment 8958458 [details] [diff] [review]
part 11 - selection of RemoteWorker actors

Review of attachment 8958458 [details] [diff] [review]:
-----------------------------------------------------------------

r-'ing to be clear I think we do need to fix the points below, but in general this patch and all the others are great.

::: dom/workers/remoteworkers/RemoteWorkerManager.cpp
@@ +169,3 @@
>  
> +  // System principal workers should run on the parent process.
> +  if (aData.principalInfo().type() == PrincipalInfo::TSystemPrincipalInfo) {

It seems like it might be reasonable to require that aProcessId also be 0 in this case.

@@ +178,5 @@
> +
> +  for (RemoteWorkerServiceParent* actor : mActors) {
> +    // Let's execute the RemoteWorker on the same process but not on the parent
> +    // process.
> +    if (aProcessId && actor->OtherPid() == aProcessId) {

It seems like there should be a Fennec/non-e10s special-case here or in its own loop.  Right now the fall-through case below handles it, but as noted below, that logic is dangerous in the e10s case.

@@ +183,5 @@
> +      return actor;
> +    }
> +  }
> +
> +  // Let's choose an actor, randomly.

This can select the parent process under e10s, which seems dangerous.  Specifically, if an attacker has a way to cause their content process to die, it seems theoretically possible to:
- In the content process, new a SharedWorker() that won't already have a manager.
- In the content process, cause the content process to die really quickly.
- In the parent, a new SharedWorkerManager will be created on the main thread.
- The content process death is processed.
- WorkerManagerCreatedRunnable will still call MaybeCreateRemoteWorker... the actor will only remove itself when it gets the ManagerCreated call which strictly happens after MaybeCreateRemoteWorker.

As noted above, it seems safer to be e10s/non-e10s aware and forbid returning the parent process if we're e10s.

::: dom/workers/sharedworkers/SharedWorkerService.cpp
@@ +156,5 @@
>      AssertIsOnBackgroundThread();
>  
>      if (NS_WARN_IF(!mManager->MaybeCreateRemoteWorker(mData, mWindowID,
> +                                                      mPortIdentifier,
> +                                                      mActor->OtherPid()))) {

Affirming: Although IProtocol::OtherPid uses the nullable mManager to traverse to ask its parent for OtherPid, ActorDestroy doesn't actually null it out and the manager is PBackground so it's not going away even if the SharedWorkerParent did during the time we bounced to the main thread and back.
Attachment #8958458 - Flags: review?(bugmail) → review-
The tl;dr from my notes of how everything here works:

### Class overviews and relations:
The order here is roughly the order in which classes will be involved as a
SharedWorker is created.

- SharedWorker: DOM binding.  Holds a SharedWorkerChild.  Must exist on the main
  thread because we only allow top-level windows to create SharedWorkers.
- SharedWorkerChild: Held by SharedWorker bindings to remotely control
  sharedworker lifecycle and receive error and termination reports.
- PSharedWorker: Protocol for SharedWorker bindings to communicate with
  per-worker SharedWorkerManager instances in the parent via SharedWorkerChild/
  SharedWorkerParent and SharedWorkerService getting/creating the
  SharedWorkerManager if it doesn't already exist.  Main-thread to PBackground.
- SharedWorkerParent: PBackground actor that relays life-cycle events
  (freeze/thaw, suspend/resume, close) to the PBackground SharedWorkerManager
  and relays error/termination back to the child.
- SharedWorkerService: PBackground service that creates and tracks the
  per-worker SharedWorkerManager instances, allowing rendezvous between
  SharedWorkerParent instances and the SharedWorkerManagers they want to talk to
  (1:1).
- SharedWorkerManager: PBackground instance that corresponds to a single logical
  Shared Worker that exists somewhere in the process tree.  Referenced/owned by
  multiple SharedWorkerParent instances on the PBackground thread.  Holds/owns
  a single RemoteWorkerController to interact with the actual shared worker
  thread, wherever it is located.  Creates the RemoteWorkerController via
  RemoteWorkerController::Create which uses RemoteWorkerManager::Launch under
  the hood.

- RemoteWorkerController: PBackground instance created by static
  RemoteWorkerController::Create that builds on RemoteWorkerManager.  Interface
  to control the remote worker as well as receive events via the
  RemoteWorkerObserver interface that the owner (SharedWorkerManager in this
  case) must implement to hear about errors, termination, and whether the
  initial spawning succeeded/failed.  Its methods may be called immediately
  after creation even though the worker is created asynchronously; an internal
  operation queue makes this work.  Communicates with the remote worker via
  owned RemoteWorkerParent over PRemoteWorker protocol.
- RemoteWorkerManager: PBackground instance that keeps tracks of
  RemoteWorkerServiceParent actors (1 per process, including the main process)
  and pending RemoteWorkerController requests to spawn remote workers if the
  spawn request can't be immediately fulfilled.  Decides which
  RemoteWorkerServerParent to use internally via SelectTargetActor in order to
  select a BackgroundParent manager on which to create a RemoteWorkerParent.
- RemoteWorkerService: Every process has a RemoteWorkerService which does the
  actual spawning of RemoteWorkerChild instances.  The RemoteWorkerService
  creates a "Worker Launcher" thread at initialization on which it creates a
  RemoteWorkerServiceChild to service spawn requests.  The thread is exposed as
  RemoteWorkerService::Thread().  A new/distinct thread is used because we
  (eventually) don't want to deal with main-thread contention, content processes
  have no equivalent of a PBackground thread, and actors are bound to specific
  threads.  (Disclaimer: currently most RemoteWorkerOps need to happen on the
  main thread because the main-thread ends up as the owner of the worker and
  all manipulation of the worker must happen from the owning thread.)
- RemoteWorkerServiceChild: "Worker Launcher"-thread child actor created by the
  RemoteWorkerService to register itself with the PBackground
  RemoteWorkerManager in the parent.
- RemoteWorkerServiceParent: PBackground parent actor that registers with the
  PBackground RemoteWorkerManager and used to relay spawn requests.
- RemoteWorkerParent: PBackground-managed parent actor that is mutually
  associated with a single RemoteWorkerController.  Relays error/close events to
  the controller and in turns is told life-cycle events.
- RemoteWorkerChild: PBackground-managed "Worker Launcher"-thread-resident
  created via the RemoteWorkerManager to actually spawn the worker.  Currently,
  the worker will be spawned from the main thread due to nsIPrincipal not being
  able to be created on background threads and other ownership invariants, most
  of which can be relaxed in the future.
Since I've got this locally un-bitrotted (and on try at https://treeherder.mozilla.org/#/jobs?repo=try&revision=b2bf08bb88f2db3c4fbe169b52f77e6141e3d9b5) NI'ing myself to provide fixes for the largely trivial review action items, I'll put them up for review by :baku.  I may augment/add a test or two too.
Flags: needinfo?(bugmail)
Andrew: is this still being worked on? Any eta for the full solution? Ie SharedWorkers being between separate tabs on the same domain?
This is in line behind some other LocalStorage next-gen reviews and the current tracking-protection/ServiceWorker issues.  ETA ~1week.

Do you have a specific SharedWorker use-case that's being impacted?
Flags: needinfo?(bugmail)
Andrew: yes, but its probably an obscure one, so its not something many other people are likely to hit ;)
We're developing a security heads up display (HUD) which overlays security information on a target web page.
We use a service worker to interact with the security tool (ZAP) and the fact that on Firefox we have to handle one service worker per tab has caused us various problems, especially as there used to be just one and still is on Chrome.
If a fix is not far off then thats good news for us :)
The fix is not far off.  Also note that the clients API is already properly browser-wide.
Are you planning to continue this work? I really appreciate. I don't have time to finish it right now, but if I can help in a low profile way, I'm happy to.
Flags: needinfo?(mrbkap)
I am! I'm actually hoping to get this landed in the next few days.
Flags: needinfo?(mrbkap)
https://treeherder.mozilla.org/#/jobs?repo=try&revision=fbdd77458f58c9b171c272a7d6d5e905717ea9da&selectedJob=211304036

There's one WPT test failure that creating a SharedWorker with a Blob URI that's immediately revoked still works. At the moment, because we do a round-trip to the parent, we aren't able to load the script and time out the test. One obvious fix would be to delay revoking in the child until after a round-trip to the parent, but I'd need to do a bunch more research to see how realistic that is.

I'm tempted to mark the test as failing for the time being and come back to this in a followup.

Andrew, does that seem reasonable to you?
Flags: needinfo?(bugmail)
Actually, my idea above won't work (in practice because there's a bunch of thread hopping in the SharedWorker code, but more generally it would have been fragile). Another idea would be to let the SharedWorker code "pin" the blob URI's data until it knows that either it connected to a SharedWorker thread in another process or the SharedWorker was spawned in its process.
(In reply to Blake Kaplan (:mrbkap) from comment #87)
> I am! I'm actually hoping to get this landed in the next few days.

Wow Cool! Thanks.


> There's one WPT test failure that creating a SharedWorker with a Blob URI
> that's immediately revoked still works. At the moment, because we do a
> round-trip to the parent, we aren't able to load the script and time out the
> test. One obvious fix would be to delay revoking in the child until after a

We already have a similar concept. When a blobURL is revoked, we marked it as revoked, and we start a timer:
https://searchfox.org/mozilla-central/rev/7f7c353e969e61a6a85201cc8ad3c3de12ac30d8/dom/file/uri/BlobURLProtocolHandler.cpp#678,690,699

The timer expires after 5 seconds, which should be enough.
https://searchfox.org/mozilla-central/rev/7f7c353e969e61a6a85201cc8ad3c3de12ac30d8/dom/file/uri/BlobURLProtocolHandler.cpp#34

If the SharedWorker opens the blobURL using OpenAsync2(), we consider also revoked blobURLs:
https://searchfox.org/mozilla-central/rev/7f7c353e969e61a6a85201cc8ad3c3de12ac30d8/dom/file/uri/BlobURLProtocolHandler.cpp#854,872

We should understand why this doesn't happen.
Flags: needinfo?(mrbkap)
(I was just going to say what :baku said.  We have hacks in place that mean it should work, so it's worth understanding why it doesn't work before disabling the test.)
Flags: needinfo?(bugmail)
This implements the behavior that as long as there's one non-frozen or
non-suspended actor, we resume or thaw the manager.
Some small fixes that I noticed.

Depends on D11822
This separates out the parent process from the list of child processes and
makes our handling of non-e10s more explicit.

Depends on D11823
In order to fix the problem mentioned in comment 91 & co, we need to hold onto
the URI object that we resolve in the child process when we construct the
SharedWorker. Otherwise, we risk the Blob getting deallocated from under us.
This patch isn't sufficient to fix that problem, however, because the worker
code itself ends up going back through strings. I fix that in the next couple
of patches.

Depends on D11824
In the next patch I get rid of the duplicated arguments.

Depends on D11825
https://hg.mozilla.org/integration/mozilla-inbound/rev/f816f239955c03172fc46f1caf8a46a898eca4e8
Bug 1438945 - Part 1: Moving SharedWorker in a separate folder. r=asuth

https://hg.mozilla.org/integration/mozilla-inbound/rev/d77e06240f3c05f592381bbbbed176eb78c4c7de
Bug 1438945 - Part 2: PSharedWorker protocol. r=asuth

https://hg.mozilla.org/integration/mozilla-inbound/rev/062638f850a0bef1cfd94e62419ac2f00325c40b
Bug 1438945 - Part 3: SharedWorkerService and SharedWorkerManager. r=asuth

https://hg.mozilla.org/integration/mozilla-inbound/rev/5de8d2303240d72cbca56a1fd1d50f575ec3b2de
Bug 1438945 - Part 4: errors and communications. r=asuth

https://hg.mozilla.org/integration/mozilla-inbound/rev/81c5bdf85bd781ba80d4bfc52aa66c778b248702
Bug 1438945 - Part 5: SharedWorker browser test. r=asuth

https://hg.mozilla.org/integration/mozilla-inbound/rev/d87a2d39f76f7e0ec05dcfb6387c2abda16ad545
Bug 1438945 - Part 6: CSP via IPC. r=ckerschb

https://hg.mozilla.org/integration/mozilla-inbound/rev/a38f567faf3faedcc609ced953aefa09d73e0fd6
Bug 1438945 - Part 7: SharedWorker can be intercepted by a ServiceWorker. r=asuth

https://hg.mozilla.org/integration/mozilla-inbound/rev/6059af9c85adc62a51d9b38b9cdcba3904d4d824
Bug 1438945 - Part 8: RemoteWorker IPC. r=asuth

https://hg.mozilla.org/integration/mozilla-inbound/rev/b3d9a379c852669a1fa005148bee537b99c91c6f
Bug 1438945 - Part 9: RemoteWorker in SharedWorkerManager. r=asuth

https://hg.mozilla.org/integration/mozilla-inbound/rev/9482aa753a31fd3adc22704e4c3ad1466f110150
Bug 1438945 - Part 10: RemoteWorkerObserver. r=asuth

https://hg.mozilla.org/integration/mozilla-inbound/rev/2a1a9e2c2010d94954a3d346a30c9690551a7bcf
Bug 1438945 - Part 11: selection of RemoteWorker actors. r=asuth,mrbkap

https://hg.mozilla.org/integration/mozilla-inbound/rev/6f7cba5101c17b5f2ffea4501bebc9ae0dc873b4
Bug 1438945 - Part 12: Spawning a new process if needed. r=asuth

https://hg.mozilla.org/integration/mozilla-inbound/rev/93c96481ec70d75496045840d5c42c90ee8ec761
Bug 1438945 - Part 13: keeping ContentProcess alive. r=asuth,mrbkap

https://hg.mozilla.org/integration/mozilla-inbound/rev/c52099a44a4e780047f71f4a3a5eec59cf2e7b1c
Bug 1438945 - Fix the suspend and freezing logic. r=asuth

https://hg.mozilla.org/integration/mozilla-inbound/rev/c9e6293a64a6e3d120ae2b7c56965e9249d5877b
Bug 1438945 - Fix typos in comments. r=asuth

https://hg.mozilla.org/integration/mozilla-inbound/rev/5fa24f8069cbffe7635ca7b928956cea63aa5763
Bug 1438945 - Fix process selection comments. r=asuth

https://hg.mozilla.org/integration/mozilla-inbound/rev/7733e4286f9c615bde46ad140329ab73dd613e1e
Bug 1438945 - Pass around URIs instead of strings in RemoteWorker. r=asuth

https://hg.mozilla.org/integration/mozilla-inbound/rev/fab208773c488ca77ce4778ad40598c5bc51c1b2
Bug 1438945 - Allow passing an existing URL into the Worker script loader. r=asuth

https://hg.mozilla.org/integration/mozilla-inbound/rev/d71fcd5f9cb5f47d7ffa42d6a59f4523e1811d3a
Bug 1438945 - Remove the string parameter in favor of always passing in an nsIURI. r=asuth
I was planning on folding the comment fixes into the (correct) original patches, but after fighting hg for an afternoon, I decided it wasn't worth it.
Blocks: 1508595
Depends on: 1509585
Depends on: 1514733
Depends on: 1530223
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: