Bug 1833312 Comment 6 Edit History

Note: The actual edited comment in the bug view page will always show the original commenter’s name and original timestamp.

Original comment by

Andrew Sutherland [:asuth] (he/him)

on 2023-06-05 23:19:25 PDT

I dug into this a little more.

Restating:
- The bug that's causing the nullptr crash in this stack is that MessagePortIdentifierRunnable [steals the aOwner RefPtr we pass in by reference using std::move](https://searchfox.org/mozilla-central/rev/27e4816536c891d85d63695025f2549fd7976392/dom/workers/remoteworkers/RemoteWorkerChild.cpp#114), so it's not safe to call `aOwner->ErrorPropagationDispatch` in the failed dispatch case.
- The weird "hey, I'm stealing your strong reference, muahahaha" idiom I think is an outgrowth of when RemoteWorkerChild was not threadsafe refcounted.  I fixed that in bug 1775784.
- The reason bug 1800659 fixes the problem is that it makes it impossible that dispatch will fail.
  - The only possible situation where dispatch could have failed is if [SharedWorkerGlobalScope.close](https://searchfox.org/mozilla-central/rev/27e4816536c891d85d63695025f2549fd7976392/dom/webidl/SharedWorkerGlobalScope.webidl#21) was called because this would make dispatch fail but would not have advanced the mState; mState only would advance when the parent receives the self-cancellation request.
- It's not entirely clear to me that we actually need the calls to ForceClose added for SharedWorkerOp in bug 1800659 given the RAII class wrappers for the ports that may end up doing the equivalent thing, although it can potentially impact the thread on which that ends up getting called on so maybe it's desirable?

So I think the most correct fix for this crash is to stop MessagePortIdentifierRunnable's constructor from stealing the refcount which is very unhygienic and just let it have its own freshly allocated refcount.  We can land this on m-c and then uplift it.  I'll quickly provide a non-risky patch for this.

Revision 1 by

Andrew Sutherland [:asuth] (he/him)

on 2023-06-06 00:38:44 PDT

I dug into this a little more.

Restating:
- The bug that's causing the nullptr crash in this stack is that MessagePortIdentifierRunnable [steals the aOwner RefPtr we pass in by reference using std::move](https://searchfox.org/mozilla-central/rev/27e4816536c891d85d63695025f2549fd7976392/dom/workers/remoteworkers/RemoteWorkerChild.cpp#114), so it's not safe to call `aOwner->ErrorPropagationDispatch` in the failed dispatch case.
- The weird "hey, I'm stealing your strong reference, muahahaha" idiom I think is an outgrowth of when RemoteWorkerChild was not threadsafe refcounted. I fixed that in bug 1775784.
- The reason bug 1800659 fixes the problem is that it makes it impossible that dispatch will fail.
- The only possible situation where dispatch could have failed is if [SharedWorkerGlobalScope.close](https://searchfox.org/mozilla-central/rev/27e4816536c891d85d63695025f2549fd7976392/dom/webidl/SharedWorkerGlobalScope.webidl#21) was called because this would make dispatch fail but would not have advanced the mState; mState only would advance when the parent receives the self-cancellation request.
- Edited: The explicit calls to ForceClose that are newly introduced in that patch are important and necessary because until we have transferred the MessagePortIdentifier from IPC into a UniqueMessagePortId (which wraps MessagePortIdentifier) we don't have any RAII magic that will call ForceClose for us automatically. But they don't factor in to the crash.

So I think the most correct fix for this crash is to stop MessagePortIdentifierRunnable's constructor from stealing the refcount which is very unhygienic and just let it have its own freshly allocated refcount. We can land this on m-c and then uplift it. I'll quickly provide a non-risky patch for this.

Back to Bug 1833312 Comment 6