Bug 1756172 Comment 9 Edit History

Note: The actual edited comment in the bug view page will always show the original commenter’s name and original timestamp.

Original comment by

Andrew Sutherland [:asuth] (he/him)

on 2022-03-18 14:50:33 PDT

(In reply to Jens Stutte [:jstutte] from comment #7)
> I put some notes. So IIUC we should unroot the global only after having done [ mWorkerPrivate->ClearMainEventQueue(WorkerPrivate::WorkerRan);](https://searchfox.org/mozilla-central/rev/478ecccb085b7efd3c4773edb0480a7b2b16754c/dom/workers/RuntimeService.cpp#2192) ?

This is one of those cases where the thing that's crashing is indicating a potentially concerning lifetime problem but maybe we also want the thing that's catching this by virtue of crashing to assert under fuzzing but silently not crash or do bad things in production.

In terms of the things of what's catching this: In the pernosco trace, we don't actually care about the not-taken call to `WorkerPrivateSaysForbidScript` because the global was already marked as dying (nsIGlobalObject::StartDying) previously during the transitions to Canceling and Killing.  So if we ever got to [CallbackObject::CallSetup::CallSetup's invocation of IsScriptForbidden](https://searchfox.org/mozilla-central/rev/7d379061bd56251df911728686c378c5820513d8/dom/bindings/CallbackObject.cpp#255) we'd say it's not okay.  But we're not getting there because it's the call to nsIGlobalObject::HasJSGlobal during AutoEntryScript construction for the Promise that's crashing earlier.  In theory maybe this could check `mDying` and say "hey, no, the global's definitely already gone!"  Or maybe we could have the subclassed `WorkerGlobalScopeBase::GetGlobalJSObjectPreserveColor` that it calls assert hard and return false when mWorkerPrivate is null.  This might be preferable over changing the sequence of unrooting/nulling in the sense that I think it's still possible for an extra badly behaved.  :smaug NI-ing you to see if you have any thoughts about HasJSGlobal's behavior through shutdown.  Note that I have not yet translated the Peers meeting about doing better about poisoning globals/JSContexts and part of that discussion did involve the behavior of AutoEntryScript/etc., so that might potentially be a relevant place to interact with the global (for new AES's).

In terms of the lifetime problem, the problem is that [ServiceWorkerGlobalScope::SkipWaiting has a lambda that holds a promise that's basically entirely unaware of the lifecycle stuff going on](https://searchfox.org/mozilla-central/rev/7d379061bd56251df911728686c378c5820513d8/dom/workers/WorkerScope.cpp#1136).  It's (through a [helper wrapper](https://searchfox.org/mozilla-central/rev/7d379061bd56251df911728686c378c5820513d8/dom/workers/WorkerPrivate.cpp#5422)) calling [RemoteWorkerChild::MaybeSendSetServiceWorkerSkipWaitingFlag](https://searchfox.org/mozilla-central/rev/7d379061bd56251df911728686c378c5820513d8/dom/workers/remoteworkers/RemoteWorkerChild.cpp#1042) that involves bouncing to the worker launcher thread (bypassing the main thread).  Arguably this scenario could be simpler if we were sending the SkipWaiting request over a PBackground path from the worker, which is something we'd discussed in https://bugzilla.mozilla.org/show_bug.cgi?id=1672493#c1.  Specifically, async MozPromise/callbacks from the IPC actor will automatically reject when the actor is torn down, which is a fairly straightforward lifecycle step.

Note that [skipWaiting is explicitly NewObject](https://w3c.github.io/ServiceWorker/#ref-for-dom-serviceworkerglobalscope-skipwaiting) which means that we can't just have a dedicated holder slot for skipWaiting.  The simplest fix for this might just be to acquire and hold a StrongWorkerRef for the duration of the skipwaiting promise since the request has a finite/bounded lifetime (which is likely why we were cavalier with the promise's ownership).

Revision 1 by

Andrew Sutherland [:asuth] (he/him)

on 2022-03-18 15:04:12 PDT

(In reply to Jens Stutte [:jstutte] from comment #7)
> I put some notes. So IIUC we should unroot the global only after having done [ mWorkerPrivate->ClearMainEventQueue(WorkerPrivate::WorkerRan);](https://searchfox.org/mozilla-central/rev/478ecccb085b7efd3c4773edb0480a7b2b16754c/dom/workers/RuntimeService.cpp#2192) ?

This is one of those cases where the thing that's crashing is indicating a potentially concerning lifetime problem but maybe we also want the thing that's catching this by virtue of crashing to assert under fuzzing but silently not crash or do bad things in production.

In terms of the things of what's catching this: In the pernosco trace, we don't actually care about the not-taken call to `WorkerPrivateSaysForbidScript` because the global was already marked as dying (nsIGlobalObject::StartDying) previously during the transitions to Canceling and Killing.  So if we ever got to [CallbackObject::CallSetup::CallSetup's invocation of IsScriptForbidden](https://searchfox.org/mozilla-central/rev/7d379061bd56251df911728686c378c5820513d8/dom/bindings/CallbackObject.cpp#255) we'd say it's not okay.  But we're not getting there because it's the call to nsIGlobalObject::HasJSGlobal during AutoEntryScript construction for the Promise that's crashing earlier.  (Edited) Perhaps we could have the subclassed `WorkerGlobalScopeBase::GetGlobalJSObjectPreserveColor` that HasJSGlobal calls assert hard and return null when mWorkerPrivate is null.  I'm not immediately clear on whether there will be runtime bails from this somewhere or just a change in where the crash is.  :smaug NI-ing you to see if you have any thoughts about HasJSGlobal's behavior unrooting; pre-edit I was wondering if we should return false for HasJSGlobal when mDying, but that sounds too dramatic given that the HasJSGlobal was in an assert, not a runtime path.  Note that I have not yet translated the Peers meeting about doing better about poisoning globals/JSContexts and part of that discussion did involve the behavior of AutoEntryScript/etc., so that might potentially be a relevant place to interact with the global (for new AES's).

In terms of the lifetime problem, the problem is that [ServiceWorkerGlobalScope::SkipWaiting has a lambda that holds a promise that's basically entirely unaware of the lifecycle stuff going on](https://searchfox.org/mozilla-central/rev/7d379061bd56251df911728686c378c5820513d8/dom/workers/WorkerScope.cpp#1136).  It's (through a [helper wrapper](https://searchfox.org/mozilla-central/rev/7d379061bd56251df911728686c378c5820513d8/dom/workers/WorkerPrivate.cpp#5422)) calling [RemoteWorkerChild::MaybeSendSetServiceWorkerSkipWaitingFlag](https://searchfox.org/mozilla-central/rev/7d379061bd56251df911728686c378c5820513d8/dom/workers/remoteworkers/RemoteWorkerChild.cpp#1042) that involves bouncing to the worker launcher thread (bypassing the main thread).  Arguably this scenario could be simpler if we were sending the SkipWaiting request over a PBackground path from the worker, which is something we'd discussed in https://bugzilla.mozilla.org/show_bug.cgi?id=1672493#c1.  Specifically, async MozPromise/callbacks from the IPC actor will automatically reject when the actor is torn down, which is a fairly straightforward lifecycle step.

Note that [skipWaiting is explicitly NewObject](https://w3c.github.io/ServiceWorker/#ref-for-dom-serviceworkerglobalscope-skipwaiting) which means that we can't just have a dedicated holder slot for skipWaiting.  The simplest fix for this might just be to acquire and hold a StrongWorkerRef for the duration of the skipwaiting promise since the request has a finite/bounded lifetime (which is likely why we were cavalier with the promise's ownership).

Back to Bug 1756172 Comment 9