Open Bug 1825446 Opened 2 years ago Updated 2 years ago

Change threshold for blocking shutdown for Service Workers in the parent process to workers having notified Cancellation through to the global instead of having fully shutdown

Categories

(Core :: DOM: Service Workers, enhancement)

enhancement

Tracking

()

ASSIGNED

People

(Reporter: asuth, Assigned: asuth)

References

(Blocks 1 open bug)

Details

ServiceWorkerShutdownBlocker currently blocks shutdown at the profile-change-teardown barrier under all build types. With the landing of bug 1775784 this potentially converts pre-existing WorkerRef-related logic errors from leaks into shutdown hangs in a way that does not produce useful telemetry or crash-stats data.

I would propose that the right course of action is:

  • Release builds: We continue to add the shutdown blocker at profile-before-change so we know about shutdown, but we remove it once RemoteWorkers have acknowledged that all workers have successfully notified cancellation on the worker.
    • This aligns the shutdown blocker's behavior with how it behaved before bug 1775784 landed, but retains the improvements so that RemoteWorkerManager logic does not remove the keepalive from the content process until the worker has fully shutdown. This is necessary to avoid the shutdown races that led to leaks, etc.
    • This allows us to establish an important invariant that we don't let shutdown advance until all the content globals we're responsible for have been notified. This is important because it avoids situations where content globals could see errors that are the result of shutdown doing things like aborting in-process IndexedDB database operations, etc. This is a real concern because:
      • It potentially contaminates the telemetry sites report to themselves about Gecko, as well as telemetry we might gather.
      • There's also a risk that if content globals are still alive and see an error that they might do something more dramatic like assume the IndexedDB database is broken and delete it, etc.
    • To do this, we would want to add a 3rd callback lambda to WorkerPrivate; currently we have callbacks for the point just prior to the parent initiating cancellation and after the worker has definitely terminated. The NotifyRunnable would send a new runnable back to its parent to perform this notification. This would want to be consistent with how we use TopLevelWorkerFinishedRunnable for top-level workers (it's just a Runnable) and WorkerFinishedRunnable for nested workers whose parent is therefore a worker (it's a WorkerControlRunnable).
  • Debug builds: We do the same thing as in release builds because we want debug builds to still try and resemble what happens in release builds as much as possible. Also, if we don't we still could end up with the same problem we wanted to address above in release where a correctness problem with a WorkerRef will not result in useful error reporting. Whereas if we do the same thing as in release builds, then :jmarshall's enhancements in bug 1805613 should help.
    • That said, it's possible that we could make the shutdown blocker help report any problems with WorkerRefs, but given that we have a bunch of pending refactorings, I don't think now is the time for that.
    • But if debug builds end up having weird races, we fall back to waiting for the RemoteWorker to have fully terminated.

Implementation Details

Currently, although RemoteWorkerObserver/PRemoteWorker has a concept of a "Termination" notification, this isn't used for ServiceWorkerPrivate::Shutdown; instead the op completion promise is used. Somewhat interestingly, the op's promise gets tied to the termination promise for the RemoteWorkerChild. However, as I note in a comment I added in bug 1775784, it seems almost certain that the promise will be rejected because SendClose will run and queue the Close message before the promise's resolution runnable gets to run, and RemoteWorkerPrivate::RecvClose will call MaybeSendDelete which will delete the actor, resulting in the ExecServiceWorkerOp MozPromise being rejected with ResponseRejectReason::ActorDestroyed. (This doesn't particularly end up mattering, though, because the shutdownblocker doesn't care whether the promise was resolved or rejected.

For this bug, I think it makes sense for SendClose to continue to be called at the place it's being called as a result of the worker termination callback. Since the termination promise only has a single caller (ServiceWorkerOp::MaybeStart), we can rename the Termination promise to CancelNotifiedPromise, have that promise resolved by the new lambda we pass to WorkerPrivate::Constructor, and have the ServiceWorkerOp use that. This nicely eliminates the race and the weirdness with the promise only ever being rejected.

Summary: Change threshold for blocking shutdown for Service Workers in the parent process in release builds (AKA non-DEBUG, non-FUZZING) to workers having notified Cancellation through to the global instead of having fully shutdown → Change threshold for blocking shutdown for Service Workers in the parent process to workers having notified Cancellation through to the global instead of having fully shutdown
You need to log in before you can comment on or make changes to this bug.