Closed Bug 1794785 Opened 3 years ago Closed 3 years ago

nsHttpConnectionMgr::PostEvent can execute long-lasting nsSocketTransportService::OnDispatchedEvent under a lock

Categories

(Core :: Networking: HTTP, defect, P2)

defect

Tracking

()

RESOLVED FIXED
108 Branch
Tracking Status
firefox108 --- fixed

People

(Reporter: jstutte, Assigned: smayya)

References

(Blocks 1 open bug)

Details

(Whiteboard: [necko-triaged][necko-priority-queue])

Attachments

(1 file)

From 3cf227fc-23e2-4223-8143-ecae10221011 it seems that nsHttpConnectionMgr::PostEvent does ReentrantMonitorAutoEnter mon(mReentrantMonitor); which will hold a lock until the end of the function.

Inside nsIEventTarget::Dispatch we can end up calling nsSocketTransportService::OnDispatchedEvent synchronously while holding that lock. If those notifications take a while, we can experience a deadlock on other threads (here the main thread).

Blocks: 1505660
Severity: -- → S3
Priority: -- → P2
Whiteboard: [necko-triaged][necko-priority-review]
Whiteboard: [necko-triaged][necko-priority-review] → [necko-triaged][necko-priority-queue]
Assignee: nobody → smayya

Based on the crash report and description, we could see that during the shutdown, the PollableEvent::Signal() call is blocking on an I/O (file write). This causes nsSocketTransportService::OnDispatchedEvent to block and hence causes the deadlock in the main thread.

As a solution to this problem, I have drawn some ideas from BlockingIOWatcher.
I would propose to implement a watchdog timer to cancel (::CancelSynchronousIo) any pending I/O calls during shutdown inside OnDispatchedEvent


nsSocketTransportService::OnDispatchedEvent() {
  ....
  
  MutexAutoLock lock(mLock);
  if (mPollableEvent) {
       if (gIOService->IsNetTearingDown()) {
        StartBlockingIoWatchDog()
       }
    mPollableEvent->Signal();
    CancelBlockingIoWatchDog();
  }
  return NS_OK;
}

StartBlockingIoWatchDog() {
// start a one-shot timer of 1 sec and calls ::CancelSynchronousIo
}

CancelBlockingIoWatchDog() {
// cancels the timer
}


StartBlockingIoWatchDog() will implemented on similar line as StartPollWatchdog.
@randell, @valentin: please let me know your thoughts on this proposal. I would like to know your thoughts on this one before I proceed.

Flags: needinfo?(valentin.gosu)
Flags: needinfo?(rjesup)

It seems to me that the problem is that both Main thread and socket thread call nsHttpConnectionMgr::PostEvent
mReentrantMonitor is only supposed to protect mSocketThreadTarget, so I think you can just do:

  nsCOMPtr<nsIEventTarget> target;
  {
    ReentrantMonitorAutoEnter mon(mReentrantMonitor);
    target = mSocketThreadTarget;
  }

  if (!target) {
    NS_WARNING("cannot post event if not initialized");
    return NS_ERROR_NOT_INITIALIZED;
  }
  nsCOMPtr<nsIRunnable> event = new ConnEvent(this, handler, iparam, vparam);
  return target->Dispatch(event, NS_DISPATCH_NORMAL);
 
Flags: needinfo?(valentin.gosu)

(In reply to Valentin Gosu [:valentin] (he/him) from comment #2)

It seems to me that the problem is that both Main thread and socket thread call nsHttpConnectionMgr::PostEvent
mReentrantMonitor is only supposed to protect mSocketThreadTarget, so I think you can just do:

  nsCOMPtr<nsIEventTarget> target;
  {
    ReentrantMonitorAutoEnter mon(mReentrantMonitor);
    target = mSocketThreadTarget;
  }

  if (!target) {
    NS_WARNING("cannot post event if not initialized");
    return NS_ERROR_NOT_INITIALIZED;
  }
  nsCOMPtr<nsIRunnable> event = new ConnEvent(this, handler, iparam, vparam);
  return target->Dispatch(event, NS_DISPATCH_NORMAL);
 

Thank you for the quick feedback valentin. This will definitely solve our deadlock problem.
However, we still end up with a blocked call (due to blocking Signal() call) in the main thread?
Or is such behavior acceptable during shutdown?

Flags: needinfo?(rjesup) → needinfo?(valentin.gosu)

I guess that's also possible, but we want to solve the mutex deadlock here.
I haven't looked too deeply into how PollableEvent works (I'm still not sure if ::CancelSynchronousIo would abort it), there's a lot of tech debt in there trying to fix silly shutdown hangs - but I see the event has a timeout. Not sure how well that works in practice, but it's definitely a good follow-up.

Flags: needinfo?(valentin.gosu)
Pushed by valentin.gosu@gmail.com: https://hg.mozilla.org/integration/autoland/rev/315ae9ed8f16 ensure event dispatches in nsHttpConnectionMgr is not executed under a lock. r=necko-reviewers,valentin
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → 108 Branch
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: