Potential Race in IPDL Background Thread if a new actor was terminated right after creation.

RESOLVED FIXED in Firefox 56

Status

()

Core
IPC
RESOLVED FIXED
a year ago
11 months ago

People

(Reporter: bevis, Assigned: billm)

Tracking

unspecified
mozilla56
Points:
---

Firefox Tracking Flags

(firefox56 fixed)

Details

Attachments

(1 attachment)

File this bug to address the potential race found in bug 1300927 comment 21.
Update the title and the component category to IPC because this is more related to the problem in ipc implementation according to bug 1300927 comment 16:
1. It is possible to create and close a actor immediately.
2. However, this cause mChannelState set to *ChannelClosing* right after changing to *ChannelOpening* when the while loop in [1] is waiting for *ChannelOpening*.

[1] http://searchfox.org/mozilla-central/rev/cac6cb6a10afb8ebb2ecfbeeedaff7c66f57dd75/ipc/glue/MessageLink.cpp#125
Component: DOM: Workers → IPC
Summary: Potential Race in IPDL Background Thread if a new Worker was terminated right after creation. → Potential Race in IPDL Background Thread if a new actor was terminated right after creation.
I still don't understand this.  The worker thread cannot close while initializing the PBackground actor since it uses BackgroundChild::SynchronouslyCreateForCurrentThread():

https://dxr.mozilla.org/mozilla-central/source/dom/workers/RuntimeService.cpp#2810

The thread can't close until the PBackground actor is created or failed to create.

The fact you are getting to this line:

http://searchfox.org/mozilla-central/source/dom/workers/RuntimeService.cpp#2885

Means that the actor using PBackground (IDB in this case) is not holding a WorkerHolder.  This is a bug in IDB, not the worker run loop.  For example, IDB made the dubious decision to explicitly not use a WorkerHolder in some cases.  See bug 1274343 comment 2.
Just to clarify, if an actor on the Worker thread might receive a message it *must* hold a WorkerHolder.
(In reply to Ben Kelly [not reviewing due to deadline][:bkelly] from comment #2)
> I still don't understand this.  The worker thread cannot close while
> initializing the PBackground actor since it uses
> BackgroundChild::SynchronouslyCreateForCurrentThread():
> 
> https://dxr.mozilla.org/mozilla-central/source/dom/workers/RuntimeService.
> cpp#2810
> 
> The thread can't close until the PBackground actor is created or failed to
> create.
Yes, the PBackground Actor *Child* for the new worker is created and is returned from SynchronouslyCreateForCurrentThread while PBackground Actor *Parent* is waiting for the mChannelState changed to *ChannelOpening*.
Since there is no active WorkerHolder the WorkerLoop and worker.terminate() was called will be returned immediately and a *Close* request to the PBackground Actor Parent will be sent that causes this potential race.
> The fact you are getting to this line:
> 
> http://searchfox.org/mozilla-central/source/dom/workers/RuntimeService.
> cpp#2885
> 
> Means that the actor using PBackground (IDB in this case) is not holding a
> WorkerHolder.  This is a bug in IDB, not the worker run loop.  For example,
> IDB made the dubious decision to explicitly not use a WorkerHolder in some
> cases.  See bug 1274343 comment 2.
No, it't not a bug in IDB because actually there is no IDB operation on this worker:
http://searchfox.org/mozilla-central/rev/ea31c4b64f34a29415a69fb768f8051495547315/dom/indexedDB/test/test_filehandle_workers.html#80-91
So, it is not possible to acquire a WorkerHolder from IDB in this case.

Furthermore, I could hit this race simply with the following 2 lines in rr chaos mode:
  worker = new Worker(url);
  worker.terminate();
(In reply to Bevis Tseng[:bevistseng][:btseng] from comment #4)
> Since there is no active WorkerHolder the WorkerLoop and worker.terminate()
> was called will be returned immediately and a *Close* request to the
Sorry for typo:
Since there is no active WorkerHolder and worker.terminate() was called, 
the WorkerLoop will be returned immediately and a *Close* request to the
PBackground Actor Parent that causes this potential race was sent.
Thanks for explaining.  That makes more sense to me.  We probably need a WorkerHolder until the channel completes opening.

What does the crash stack for this look like?  Are we seeing any of them in the wild?
Flags: needinfo?(btseng)
(In reply to Ben Kelly [not reviewing due to deadline][:bkelly] from comment #6)
> What does the crash stack for this look like?
m.. it's not a crash but a endless waiting in *IPDL BackgroundThread* in parent process as mentioned in bug 1300927 comment 11 (The call stack was dump there.)
> Are we seeing any of them in the wild?
Sorry, I don't have further information about this. :|
I discovered this problem because of fixing bug 1300927.
Flags: needinfo?(btseng)
Andrea, do you have any bandwidth to look at this?  Its a race that can lead to hanging the parent PBackground thread.  Kind of concerning since I don't think we have any insight into how much it happens in the field, etc.
Flags: needinfo?(amarchesini)

Updated

11 months ago
Blocks: 1371141

Comment 9

11 months ago
Reading through this again it seems like we have two options:

1. Comment 1 suggests maybe we could fix the MessageLink loop to handle an early transition to the closing state.
2. We could try to prevent the Worker thread and its PBackground actor from closing until the MessageLink is fully open.

Bill, what do you think?  Obviously I would prefer option (1), but maybe we could try the worker-specific thing as well.
Flags: needinfo?(wmccloskey)
Created attachment 8888919 [details] [diff] [review]
patch

I think this fixes the problem.
Assignee: nobody → wmccloskey
Attachment #8888919 - Flags: review?(btseng)
Flags: needinfo?(wmccloskey)
(Reporter)

Updated

11 months ago
Attachment #8888919 - Flags: review?(btseng) → review+

Comment 11

11 months ago
Bill, is this ready to land?
Flags: needinfo?(amarchesini) → needinfo?(wmccloskey)

Comment 12

11 months ago
Pushed by wmccloskey@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/db7e15eb0f1c
Fix race if IPDL actors are created and then quickly destroyed (r=bevis)
Flags: needinfo?(wmccloskey)

Comment 13

11 months ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/db7e15eb0f1c
Status: NEW → RESOLVED
Last Resolved: 11 months ago
status-firefox56: --- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla56

Updated

11 months ago
Duplicate of this bug: 1371141
You need to log in before you can comment on or make changes to this bug.