nsHttpConnectionMgr::PostEvent can execute long-lasting nsSocketTransportService::OnDispatchedEvent under a lock
Categories
(Core :: Networking: HTTP, defect, P2)
Tracking
()
| Tracking | Status | |
|---|---|---|
| firefox108 | --- | fixed |
People
(Reporter: jstutte, Assigned: smayya)
References
(Blocks 1 open bug)
Details
(Whiteboard: [necko-triaged][necko-priority-queue])
Attachments
(1 file)
From 3cf227fc-23e2-4223-8143-ecae10221011 it seems that nsHttpConnectionMgr::PostEvent does ReentrantMonitorAutoEnter mon(mReentrantMonitor); which will hold a lock until the end of the function.
Inside nsIEventTarget::Dispatch we can end up calling nsSocketTransportService::OnDispatchedEvent synchronously while holding that lock. If those notifications take a while, we can experience a deadlock on other threads (here the main thread).
Updated•3 years ago
|
| Assignee | ||
Updated•3 years ago
|
| Assignee | ||
Comment 1•3 years ago
•
|
||
Based on the crash report and description, we could see that during the shutdown, the PollableEvent::Signal() call is blocking on an I/O (file write). This causes nsSocketTransportService::OnDispatchedEvent to block and hence causes the deadlock in the main thread.
As a solution to this problem, I have drawn some ideas from BlockingIOWatcher.
I would propose to implement a watchdog timer to cancel (::CancelSynchronousIo) any pending I/O calls during shutdown inside OnDispatchedEvent
nsSocketTransportService::OnDispatchedEvent() {
....
MutexAutoLock lock(mLock);
if (mPollableEvent) {
if (gIOService->IsNetTearingDown()) {
StartBlockingIoWatchDog()
}
mPollableEvent->Signal();
CancelBlockingIoWatchDog();
}
return NS_OK;
}
StartBlockingIoWatchDog() {
// start a one-shot timer of 1 sec and calls ::CancelSynchronousIo
}
CancelBlockingIoWatchDog() {
// cancels the timer
}
StartBlockingIoWatchDog() will implemented on similar line as StartPollWatchdog.
@randell, @valentin: please let me know your thoughts on this proposal. I would like to know your thoughts on this one before I proceed.
Comment 2•3 years ago
|
||
It seems to me that the problem is that both Main thread and socket thread call nsHttpConnectionMgr::PostEvent
mReentrantMonitor is only supposed to protect mSocketThreadTarget, so I think you can just do:
nsCOMPtr<nsIEventTarget> target;
{
ReentrantMonitorAutoEnter mon(mReentrantMonitor);
target = mSocketThreadTarget;
}
if (!target) {
NS_WARNING("cannot post event if not initialized");
return NS_ERROR_NOT_INITIALIZED;
}
nsCOMPtr<nsIRunnable> event = new ConnEvent(this, handler, iparam, vparam);
return target->Dispatch(event, NS_DISPATCH_NORMAL);
| Assignee | ||
Comment 3•3 years ago
•
|
||
(In reply to Valentin Gosu [:valentin] (he/him) from comment #2)
It seems to me that the problem is that both Main thread and socket thread call
nsHttpConnectionMgr::PostEvent
mReentrantMonitoris only supposed to protectmSocketThreadTarget, so I think you can just do:nsCOMPtr<nsIEventTarget> target; { ReentrantMonitorAutoEnter mon(mReentrantMonitor); target = mSocketThreadTarget; } if (!target) { NS_WARNING("cannot post event if not initialized"); return NS_ERROR_NOT_INITIALIZED; } nsCOMPtr<nsIRunnable> event = new ConnEvent(this, handler, iparam, vparam); return target->Dispatch(event, NS_DISPATCH_NORMAL);
Thank you for the quick feedback valentin. This will definitely solve our deadlock problem.
However, we still end up with a blocked call (due to blocking Signal() call) in the main thread?
Or is such behavior acceptable during shutdown?
Comment 4•3 years ago
|
||
I guess that's also possible, but we want to solve the mutex deadlock here.
I haven't looked too deeply into how PollableEvent works (I'm still not sure if ::CancelSynchronousIo would abort it), there's a lot of tech debt in there trying to fix silly shutdown hangs - but I see the event has a timeout. Not sure how well that works in practice, but it's definitely a good follow-up.
| Assignee | ||
Comment 5•3 years ago
|
||
Comment 7•3 years ago
|
||
| bugherder | ||
Description
•