Closed Bug 1794785 Opened 3 years ago Closed 3 years ago

nsHttpConnectionMgr::PostEvent can execute long-lasting nsSocketTransportService::OnDispatchedEvent under a lock

Tracking

()

Status:

RESOLVED FIXED

Milestone:

108 Branch

Tracking Flags:

Tracking

Status

firefox108

---

fixed

People

(Reporter: jstutte, Assigned: smayya)

References

(Blocks 1 open bug)

Details

(Whiteboard: [necko-triaged][necko-priority-queue])

Attachments

(1 file)

Bug 1794785 - ensure event dispatches in nsHttpConnectionMgr is not executed under a lock. r=#necko 3 years ago Sunil Mayya 48 bytes, text/x-phabricator-request		Details \| Review

Jens Stutte [:jstutte]

Reporter

Description

•

3 years ago

From 3cf227fc-23e2-4223-8143-ecae10221011 it seems that nsHttpConnectionMgr::PostEvent does ReentrantMonitorAutoEnter mon(mReentrantMonitor); which will hold a lock until the end of the function.

Inside nsIEventTarget::Dispatch we can end up calling nsSocketTransportService::OnDispatchedEvent synchronously while holding that lock. If those notifications take a while, we can experience a deadlock on other threads (here the main thread).

Jens Stutte [:jstutte]

Reporter

Updated

•

3 years ago

Blocks: 1505660

edgul

Updated

•

3 years ago

Severity: -- → S3

Priority: -- → P2

Whiteboard: [necko-triaged][necko-priority-review]

Valentin Gosu [:valentin] (he/him)

Updated

•

3 years ago

Whiteboard: [necko-triaged][necko-priority-review] → [necko-triaged][necko-priority-queue]

Sunil Mayya

Assignee

Updated

•

3 years ago

Assignee: nobody → smayya

Sunil Mayya

Assignee

Comment 1

•

3 years ago

•

Edited

Based on the crash report and description, we could see that during the shutdown, the PollableEvent::Signal() call is blocking on an I/O (file write). This causes nsSocketTransportService::OnDispatchedEvent to block and hence causes the deadlock in the main thread.

As a solution to this problem, I have drawn some ideas from BlockingIOWatcher.
I would propose to implement a watchdog timer to cancel (::CancelSynchronousIo) any pending I/O calls during shutdown inside OnDispatchedEvent


nsSocketTransportService::OnDispatchedEvent() {
  ....
  
  MutexAutoLock lock(mLock);
  if (mPollableEvent) {
       if (gIOService->IsNetTearingDown()) {
        StartBlockingIoWatchDog()
       }
    mPollableEvent->Signal();
    CancelBlockingIoWatchDog();
  }
  return NS_OK;
}

StartBlockingIoWatchDog() {
// start a one-shot timer of 1 sec and calls ::CancelSynchronousIo
}

CancelBlockingIoWatchDog() {
// cancels the timer
}

StartBlockingIoWatchDog() will implemented on similar line as StartPollWatchdog.
@randell, @valentin: please let me know your thoughts on this proposal. I would like to know your thoughts on this one before I proceed.

Flags: needinfo?(valentin.gosu)

Flags: needinfo?(rjesup)

Valentin Gosu [:valentin] (he/him)

Comment 2

•

3 years ago

It seems to me that the problem is that both Main thread and socket thread call nsHttpConnectionMgr::PostEvent
mReentrantMonitor is only supposed to protect mSocketThreadTarget, so I think you can just do:

  nsCOMPtr<nsIEventTarget> target;
  {
    ReentrantMonitorAutoEnter mon(mReentrantMonitor);
    target = mSocketThreadTarget;
  }

  if (!target) {
    NS_WARNING("cannot post event if not initialized");
    return NS_ERROR_NOT_INITIALIZED;
  }
  nsCOMPtr<nsIRunnable> event = new ConnEvent(this, handler, iparam, vparam);
  return target->Dispatch(event, NS_DISPATCH_NORMAL);

Flags: needinfo?(valentin.gosu)

Sunil Mayya

Assignee

Comment 3

•

3 years ago

•

Edited

(In reply to Valentin Gosu [:valentin] (he/him) from comment #2)

It seems to me that the problem is that both Main thread and socket thread call nsHttpConnectionMgr::PostEvent
mReentrantMonitor is only supposed to protect mSocketThreadTarget, so I think you can just do:
  nsCOMPtr<nsIEventTarget> target;
  {
    ReentrantMonitorAutoEnter mon(mReentrantMonitor);
    target = mSocketThreadTarget;
  }

  if (!target) {
    NS_WARNING("cannot post event if not initialized");
    return NS_ERROR_NOT_INITIALIZED;
  }
  nsCOMPtr<nsIRunnable> event = new ConnEvent(this, handler, iparam, vparam);
  return target->Dispatch(event, NS_DISPATCH_NORMAL);
 

Thank you for the quick feedback valentin. This will definitely solve our deadlock problem.
However, we still end up with a blocked call (due to blocking Signal() call) in the main thread?
Or is such behavior acceptable during shutdown?

Flags: needinfo?(rjesup) → needinfo?(valentin.gosu)

Valentin Gosu [:valentin] (he/him)

Comment 4

•

3 years ago

I guess that's also possible, but we want to solve the mutex deadlock here.
I haven't looked too deeply into how PollableEvent works (I'm still not sure if ::CancelSynchronousIo would abort it), there's a lot of tech debt in there trying to fix silly shutdown hangs - but I see the event has a timeout. Not sure how well that works in practice, but it's definitely a good follow-up.

Flags: needinfo?(valentin.gosu)

Sunil Mayya

Assignee

Comment 5

•

3 years ago

Attached file Bug 1794785 - ensure event dispatches in nsHttpConnectionMgr is not executed under a lock. r=#necko — Details

Pulsebot

Comment 6

•

3 years ago

Pushed by valentin.gosu@gmail.com: https://hg.mozilla.org/integration/autoland/rev/315ae9ed8f16 ensure event dispatches in nsHttpConnectionMgr is not executed under a lock. r=necko-reviewers,valentin

Cristian Tuns

Comment 7

•

3 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/315ae9ed8f16

Status: NEW → RESOLVED

Closed: 3 years ago

status-firefox108: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → 108 Branch

You need to log in before you can comment on or make changes to this bug.

Bugzilla

nsHttpConnectionMgr::PostEvent can execute long-lasting nsSocketTransportService::OnDispatchedEvent under a lock

Categories

(Core :: Networking: HTTP, defect, P2)

Tracking

()

People

(Reporter: jstutte, Assigned: smayya)

References

(Blocks 1 open bug)

Details

(Whiteboard: [necko-triaged][necko-priority-queue])

Crash Data

Security

(public)

User Story

Attachments

(1 file)

Description

Updated

Updated

Updated

Updated

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Attachment

General

Description

File Name

Content Type