Closed Bug 1340941 Opened 7 years ago Closed 7 years ago

Potential Race in IPDL Background Thread if a new actor was terminated right after creation.

Categories

(Core :: IPC, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla56
Tracking Status
firefox56 --- fixed

People

(Reporter: bevis, Assigned: billm)

References

Details

Attachments

(1 file)

File this bug to address the potential race found in bug 1300927 comment 21.
Update the title and the component category to IPC because this is more related to the problem in ipc implementation according to bug 1300927 comment 16:
1. It is possible to create and close a actor immediately.
2. However, this cause mChannelState set to *ChannelClosing* right after changing to *ChannelOpening* when the while loop in [1] is waiting for *ChannelOpening*.

[1] http://searchfox.org/mozilla-central/rev/cac6cb6a10afb8ebb2ecfbeeedaff7c66f57dd75/ipc/glue/MessageLink.cpp#125
Component: DOM: Workers → IPC
Summary: Potential Race in IPDL Background Thread if a new Worker was terminated right after creation. → Potential Race in IPDL Background Thread if a new actor was terminated right after creation.
I still don't understand this.  The worker thread cannot close while initializing the PBackground actor since it uses BackgroundChild::SynchronouslyCreateForCurrentThread():

https://dxr.mozilla.org/mozilla-central/source/dom/workers/RuntimeService.cpp#2810

The thread can't close until the PBackground actor is created or failed to create.

The fact you are getting to this line:

http://searchfox.org/mozilla-central/source/dom/workers/RuntimeService.cpp#2885

Means that the actor using PBackground (IDB in this case) is not holding a WorkerHolder.  This is a bug in IDB, not the worker run loop.  For example, IDB made the dubious decision to explicitly not use a WorkerHolder in some cases.  See bug 1274343 comment 2.
Just to clarify, if an actor on the Worker thread might receive a message it *must* hold a WorkerHolder.
(In reply to Ben Kelly [not reviewing due to deadline][:bkelly] from comment #2)
> I still don't understand this.  The worker thread cannot close while
> initializing the PBackground actor since it uses
> BackgroundChild::SynchronouslyCreateForCurrentThread():
> 
> https://dxr.mozilla.org/mozilla-central/source/dom/workers/RuntimeService.
> cpp#2810
> 
> The thread can't close until the PBackground actor is created or failed to
> create.
Yes, the PBackground Actor *Child* for the new worker is created and is returned from SynchronouslyCreateForCurrentThread while PBackground Actor *Parent* is waiting for the mChannelState changed to *ChannelOpening*.
Since there is no active WorkerHolder the WorkerLoop and worker.terminate() was called will be returned immediately and a *Close* request to the PBackground Actor Parent will be sent that causes this potential race.
> The fact you are getting to this line:
> 
> http://searchfox.org/mozilla-central/source/dom/workers/RuntimeService.
> cpp#2885
> 
> Means that the actor using PBackground (IDB in this case) is not holding a
> WorkerHolder.  This is a bug in IDB, not the worker run loop.  For example,
> IDB made the dubious decision to explicitly not use a WorkerHolder in some
> cases.  See bug 1274343 comment 2.
No, it't not a bug in IDB because actually there is no IDB operation on this worker:
http://searchfox.org/mozilla-central/rev/ea31c4b64f34a29415a69fb768f8051495547315/dom/indexedDB/test/test_filehandle_workers.html#80-91
So, it is not possible to acquire a WorkerHolder from IDB in this case.

Furthermore, I could hit this race simply with the following 2 lines in rr chaos mode:
  worker = new Worker(url);
  worker.terminate();
(In reply to Bevis Tseng[:bevistseng][:btseng] from comment #4)
> Since there is no active WorkerHolder the WorkerLoop and worker.terminate()
> was called will be returned immediately and a *Close* request to the
Sorry for typo:
Since there is no active WorkerHolder and worker.terminate() was called, 
the WorkerLoop will be returned immediately and a *Close* request to the
PBackground Actor Parent that causes this potential race was sent.
Thanks for explaining.  That makes more sense to me.  We probably need a WorkerHolder until the channel completes opening.

What does the crash stack for this look like?  Are we seeing any of them in the wild?
Flags: needinfo?(btseng)
(In reply to Ben Kelly [not reviewing due to deadline][:bkelly] from comment #6)
> What does the crash stack for this look like?
m.. it's not a crash but a endless waiting in *IPDL BackgroundThread* in parent process as mentioned in bug 1300927 comment 11 (The call stack was dump there.)
> Are we seeing any of them in the wild?
Sorry, I don't have further information about this. :|
I discovered this problem because of fixing bug 1300927.
Flags: needinfo?(btseng)
Andrea, do you have any bandwidth to look at this?  Its a race that can lead to hanging the parent PBackground thread.  Kind of concerning since I don't think we have any insight into how much it happens in the field, etc.
Flags: needinfo?(amarchesini)
Reading through this again it seems like we have two options:

1. Comment 1 suggests maybe we could fix the MessageLink loop to handle an early transition to the closing state.
2. We could try to prevent the Worker thread and its PBackground actor from closing until the MessageLink is fully open.

Bill, what do you think?  Obviously I would prefer option (1), but maybe we could try the worker-specific thing as well.
Flags: needinfo?(wmccloskey)
Attached patch patchSplinter Review
I think this fixes the problem.
Assignee: nobody → wmccloskey
Flags: needinfo?(wmccloskey)
Attachment #8888919 - Flags: review?(btseng)
Attachment #8888919 - Flags: review?(btseng) → review+
Bill, is this ready to land?
Flags: needinfo?(amarchesini) → needinfo?(wmccloskey)
Pushed by wmccloskey@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/db7e15eb0f1c
Fix race if IPDL actors are created and then quickly destroyed (r=bevis)
Flags: needinfo?(wmccloskey)
https://hg.mozilla.org/mozilla-central/rev/db7e15eb0f1c
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla56
You need to log in before you can comment on or make changes to this bug.