Potential IPC deadlock

RESOLVED FIXED in 1.3 C2/1.4 S2(17jan)

Status

()

Core
IPC
RESOLVED FIXED
5 years ago
4 years ago

People

(Reporter: gerard, Assigned: gerard)

Tracking

Trunk
1.3 C2/1.4 S2(17jan)
x86_64
Linux
Points:
---

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [systemsfe])

Attachments

(2 attachments, 2 obsolete attachments)

(Assignee)

Description

5 years ago
After looking at the locking code of mMonitor in ipc/glue/MessageChannel, there is a couple of call sites where we are making use of mListener while holding a lock on mMonitor.
(Assignee)

Comment 1

5 years ago
ipc/glue/MessageChannel.cpp:1305:    mListener->OnProcessingError(MsgDropped);
  ipc/glue/MessageChannel.cpp:233:   ReportConnectionError("MessageChannel");
    -> called under lock, took at ipc/glue/MessageChannel.cpp:230
    --> possible issue

  ipc/glue/MessageChannel.cpp:257:   ReportConnectionError("MessageChannel");
    -> called under lock, took at ipc/glue/MessageChannel.cpp:255
    --> possible issue

  ipc/glue/MessageChannel.cpp:494:   ReportConnectionError("MessageChannel::SendAndWait");
    ipc/glue/MessageChannel.cpp:479  if (!SendAndWait(aMsg, aReply))
      -> called under lock, took at ipc/glue/MessageChannel.cpp:470
      --> possible issue

  ipc/glue/MessageChannel.cpp:514:   ReportConnectionError("MessageChannel::SendAndWait");
    ipc/glue/MessageChannel.cpp:479  if (!SendAndWait(aMsg, aReply))
      -> called under lock, took at ipc/glue/MessageChannel.cpp:470
      --> possible issue at least for the first iteration of loop

  ipc/glue/MessageChannel.cpp:575:   ReportConnectionError("MessageChannel::Call");
    -> called under lock, took at ipc/glue/MessageChannel.cpp:573
    --> possible issue

  ipc/glue/MessageChannel.cpp:602:   ReportConnectionError("MessageChannel::InterruptCall");
    -> called under lock, took at ipc/glue/MessageChannel.cpp:573
    --> possible issue at least for the first iteration of loop
    
  ipc/glue/MessageChannel.cpp:833:   ReportConnectionError("MessageChannel::OnMaybeDequeueOne");
    ipc/glue/MessageChannel.cpp:869: if (!DequeueOne(&recvd))
      -> called under lock, took at ipc/glue/MessageChannel.cpp:868
      --> possible issue
(Assignee)

Comment 2

5 years ago
On all those call sites, I see only one set:
  ipc/glue/MessageChannel.cpp:233:   ReportConnectionError("MessageChannel");
  ipc/glue/MessageChannel.cpp:257:   ReportConnectionError("MessageChannel");
  ipc/glue/MessageChannel.cpp:575:   ReportConnectionError("MessageChannel::Call");
  ipc/glue/MessageChannel.cpp:602:   ReportConnectionError("MessageChannel::InterruptCall");
  ipc/glue/MessageChannel.cpp:833:   ReportConnectionError("MessageChannel::OnMaybeDequeueOne");
  ipc/glue/MessageChannel.cpp:494:   ReportConnectionError("MessageChannel::SendAndWait");
  ipc/glue/MessageChannel.cpp:514:   ReportConnectionError("MessageChannel::SendAndWait");

    -> This set of call sites must perform a MonitorAutoLock() before callong ReportConnectionError() because they are referring to the Connected() method, which makes use of mMonitor.

In other words, every ReportConnectionError() call MUST NOT have a lock on mMonitor(), but all those have one because of Connected() being called just before.
(Assignee)

Comment 3

5 years ago
In fast, those call sides are also concerned:

  ipc/glue/MessageChannel.cpp:664:   ReportConnectionError("MessageChannel::DispatchMessage");
  ipc/glue/MessageChannel.cpp:722:   ReportConnectionError("MessageChannel::DispatchInterruptMessage");
  ipc/glue/MessageChannel.cpp:775:   ReportConnectionError("MessageChannel::DispatchUrgentMessage");
  ipc/glue/MessageChannel.cpp:812:   ReportConnectionError("MessageChannel::DispatchRPCMessage");

The structure for those is:
     {
         // In order to send the parent RPC messages and guarantee it will
         // wake up, we must re-use its transaction.
         AutoEnterRPCTransaction transaction(this, recvd);
 
         MonitorAutoUnlock unlock(*mMonitor);
         DispatchUrgentMessage(*recvd);
     }
     if (!Connected()) {
         ReportConnectionError("MessageChannel::DispatchUrgentMessage");
         return false;
     }

So we unlock mMonitor in its scope, meaning we re-lock it when we call Connected(). So we still have the lock on ReportConnectionError call.
(Assignee)

Comment 4

5 years ago
The MaybeHandleError() method does the same kind of mListener call, but it is always called under an explicit MonitorAutoUnlock.
(Assignee)

Comment 5

5 years ago
I've tested my patch against try, everything is green: https://tbpl.mozilla.org/?tree=Try&rev=4c2edea8fb28
(Assignee)

Comment 6

5 years ago
Created attachment 8337230 [details] [diff] [review]
0001-Bug-942118-Ensure-that-we-call-MessageChannel-Report.patch
Attachment #8337230 - Flags: review?(bent.mozilla)
(Assignee)

Comment 7

5 years ago
Created attachment 8337760 [details]
locks.png

Please find attached an explanatory graph of the locking.
(Assignee)

Comment 8

5 years ago
Created attachment 8337763 [details]
locks.png
Attachment #8337760 - Attachment is obsolete: true
Whiteboard: [systemsfe]
Target Milestone: --- → 1.3 Sprint 6 - 12/6
Comment on attachment 8337230 [details] [diff] [review]
0001-Bug-942118-Ensure-that-we-call-MessageChannel-Report.patch

Review of attachment 8337230 [details] [diff] [review]:
-----------------------------------------------------------------

::: ipc/glue/MessageChannel.cpp
@@ +233,1 @@
>          ReportConnectionError("MessageChannel");

All of this looks right to me but I wonder if we can't make it cleaner.

Can we add |mMonitor->AssertCurrentThreadOwns()| to the beginning of ReportConnectionError, then add the |MonitorAutoUnlock unlock(*mMonitor);| there?
(Assignee)

Comment 10

5 years ago
(In reply to ben turner [:bent] (use the needinfo? flag!) from comment #9)
> Comment on attachment 8337230 [details] [diff] [review]
> 0001-Bug-942118-Ensure-that-we-call-MessageChannel-Report.patch
> 
> Review of attachment 8337230 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: ipc/glue/MessageChannel.cpp
> @@ +233,1 @@
> >          ReportConnectionError("MessageChannel");
> 
> All of this looks right to me but I wonder if we can't make it cleaner.
> 
> Can we add |mMonitor->AssertCurrentThreadOwns()| to the beginning of
> ReportConnectionError, then add the |MonitorAutoUnlock unlock(*mMonitor);|
> there?

I was not sure it would be the correct way to deal with it, since we perform lock/unlocking before calling functions at all the other call sites. But if you prefer it this way, and since I see no callpath where it would not be valid, I'll update the patch :)
(Assignee)

Comment 11

5 years ago
Created attachment 8345743 [details] [diff] [review]
0001-Bug-942118-Ensure-that-we-call-MessageChannel-Report.patch

Please find attached an updated version of the patch that addresses your comments.

The Try instance is available at https://tbpl.mozilla.org/?tree=Try&rev=3592891d1b2b
Attachment #8337230 - Attachment is obsolete: true
Attachment #8337230 - Flags: review?(bent.mozilla)
Attachment #8345743 - Flags: review?(bent.mozilla)
Comment on attachment 8345743 [details] [diff] [review]
0001-Bug-942118-Ensure-that-we-call-MessageChannel-Report.patch

Review of attachment 8345743 [details] [diff] [review]:
-----------------------------------------------------------------

Looks great, thanks!
Attachment #8345743 - Flags: review?(bent.mozilla) → review+
(Assignee)

Comment 13

4 years ago
Looks like this one has been forgotten :)
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/ee221e1e179f
Status: NEW → RESOLVED
Last Resolved: 4 years ago
Resolution: --- → FIXED
Target Milestone: 1.3 Sprint 6 - 12/6 → 1.3 C2/1.4 S2(17jan)
You need to log in before you can comment on or make changes to this bug.