Closed Bug 942118 Opened 11 years ago Closed 10 years ago

Potential IPC deadlock

Categories

(Core :: IPC, defect)

x86_64
Linux
defect
Not set
normal

Tracking

()

RESOLVED FIXED
1.3 C2/1.4 S2(17jan)

People

(Reporter: gerard-majax, Assigned: gerard-majax)

Details

(Whiteboard: [systemsfe])

Attachments

(2 files, 2 obsolete files)

After looking at the locking code of mMonitor in ipc/glue/MessageChannel, there is a couple of call sites where we are making use of mListener while holding a lock on mMonitor.
ipc/glue/MessageChannel.cpp:1305:    mListener->OnProcessingError(MsgDropped);
  ipc/glue/MessageChannel.cpp:233:   ReportConnectionError("MessageChannel");
    -> called under lock, took at ipc/glue/MessageChannel.cpp:230
    --> possible issue

  ipc/glue/MessageChannel.cpp:257:   ReportConnectionError("MessageChannel");
    -> called under lock, took at ipc/glue/MessageChannel.cpp:255
    --> possible issue

  ipc/glue/MessageChannel.cpp:494:   ReportConnectionError("MessageChannel::SendAndWait");
    ipc/glue/MessageChannel.cpp:479  if (!SendAndWait(aMsg, aReply))
      -> called under lock, took at ipc/glue/MessageChannel.cpp:470
      --> possible issue

  ipc/glue/MessageChannel.cpp:514:   ReportConnectionError("MessageChannel::SendAndWait");
    ipc/glue/MessageChannel.cpp:479  if (!SendAndWait(aMsg, aReply))
      -> called under lock, took at ipc/glue/MessageChannel.cpp:470
      --> possible issue at least for the first iteration of loop

  ipc/glue/MessageChannel.cpp:575:   ReportConnectionError("MessageChannel::Call");
    -> called under lock, took at ipc/glue/MessageChannel.cpp:573
    --> possible issue

  ipc/glue/MessageChannel.cpp:602:   ReportConnectionError("MessageChannel::InterruptCall");
    -> called under lock, took at ipc/glue/MessageChannel.cpp:573
    --> possible issue at least for the first iteration of loop
    
  ipc/glue/MessageChannel.cpp:833:   ReportConnectionError("MessageChannel::OnMaybeDequeueOne");
    ipc/glue/MessageChannel.cpp:869: if (!DequeueOne(&recvd))
      -> called under lock, took at ipc/glue/MessageChannel.cpp:868
      --> possible issue
On all those call sites, I see only one set:
  ipc/glue/MessageChannel.cpp:233:   ReportConnectionError("MessageChannel");
  ipc/glue/MessageChannel.cpp:257:   ReportConnectionError("MessageChannel");
  ipc/glue/MessageChannel.cpp:575:   ReportConnectionError("MessageChannel::Call");
  ipc/glue/MessageChannel.cpp:602:   ReportConnectionError("MessageChannel::InterruptCall");
  ipc/glue/MessageChannel.cpp:833:   ReportConnectionError("MessageChannel::OnMaybeDequeueOne");
  ipc/glue/MessageChannel.cpp:494:   ReportConnectionError("MessageChannel::SendAndWait");
  ipc/glue/MessageChannel.cpp:514:   ReportConnectionError("MessageChannel::SendAndWait");

    -> This set of call sites must perform a MonitorAutoLock() before callong ReportConnectionError() because they are referring to the Connected() method, which makes use of mMonitor.

In other words, every ReportConnectionError() call MUST NOT have a lock on mMonitor(), but all those have one because of Connected() being called just before.
In fast, those call sides are also concerned:

  ipc/glue/MessageChannel.cpp:664:   ReportConnectionError("MessageChannel::DispatchMessage");
  ipc/glue/MessageChannel.cpp:722:   ReportConnectionError("MessageChannel::DispatchInterruptMessage");
  ipc/glue/MessageChannel.cpp:775:   ReportConnectionError("MessageChannel::DispatchUrgentMessage");
  ipc/glue/MessageChannel.cpp:812:   ReportConnectionError("MessageChannel::DispatchRPCMessage");

The structure for those is:
     {
         // In order to send the parent RPC messages and guarantee it will
         // wake up, we must re-use its transaction.
         AutoEnterRPCTransaction transaction(this, recvd);
 
         MonitorAutoUnlock unlock(*mMonitor);
         DispatchUrgentMessage(*recvd);
     }
     if (!Connected()) {
         ReportConnectionError("MessageChannel::DispatchUrgentMessage");
         return false;
     }

So we unlock mMonitor in its scope, meaning we re-lock it when we call Connected(). So we still have the lock on ReportConnectionError call.
The MaybeHandleError() method does the same kind of mListener call, but it is always called under an explicit MonitorAutoUnlock.
I've tested my patch against try, everything is green: https://tbpl.mozilla.org/?tree=Try&rev=4c2edea8fb28
Attachment #8337230 - Flags: review?(bent.mozilla)
Attached image locks.png (obsolete) —
Please find attached an explanatory graph of the locking.
Attached image locks.png
Attachment #8337760 - Attachment is obsolete: true
Whiteboard: [systemsfe]
Target Milestone: --- → 1.3 Sprint 6 - 12/6
Comment on attachment 8337230 [details] [diff] [review]
0001-Bug-942118-Ensure-that-we-call-MessageChannel-Report.patch

Review of attachment 8337230 [details] [diff] [review]:
-----------------------------------------------------------------

::: ipc/glue/MessageChannel.cpp
@@ +233,1 @@
>          ReportConnectionError("MessageChannel");

All of this looks right to me but I wonder if we can't make it cleaner.

Can we add |mMonitor->AssertCurrentThreadOwns()| to the beginning of ReportConnectionError, then add the |MonitorAutoUnlock unlock(*mMonitor);| there?
(In reply to ben turner [:bent] (use the needinfo? flag!) from comment #9)
> Comment on attachment 8337230 [details] [diff] [review]
> 0001-Bug-942118-Ensure-that-we-call-MessageChannel-Report.patch
> 
> Review of attachment 8337230 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: ipc/glue/MessageChannel.cpp
> @@ +233,1 @@
> >          ReportConnectionError("MessageChannel");
> 
> All of this looks right to me but I wonder if we can't make it cleaner.
> 
> Can we add |mMonitor->AssertCurrentThreadOwns()| to the beginning of
> ReportConnectionError, then add the |MonitorAutoUnlock unlock(*mMonitor);|
> there?

I was not sure it would be the correct way to deal with it, since we perform lock/unlocking before calling functions at all the other call sites. But if you prefer it this way, and since I see no callpath where it would not be valid, I'll update the patch :)
Please find attached an updated version of the patch that addresses your comments.

The Try instance is available at https://tbpl.mozilla.org/?tree=Try&rev=3592891d1b2b
Attachment #8337230 - Attachment is obsolete: true
Attachment #8337230 - Flags: review?(bent.mozilla)
Attachment #8345743 - Flags: review?(bent.mozilla)
Comment on attachment 8345743 [details] [diff] [review]
0001-Bug-942118-Ensure-that-we-call-MessageChannel-Report.patch

Review of attachment 8345743 [details] [diff] [review]:
-----------------------------------------------------------------

Looks great, thanks!
Attachment #8345743 - Flags: review?(bent.mozilla) → review+
Looks like this one has been forgotten :)
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/ee221e1e179f
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Target Milestone: 1.3 Sprint 6 - 12/6 → 1.3 C2/1.4 S2(17jan)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: