Closed Bug 1816633 Opened 2 years ago Closed 2 years ago

Daily hangs frequently from nsImapProtocol::LoadImapUrlInternal

Categories

(Thunderbird :: General, defect, P1)

Thunderbird 111
Unspecified
All

Tracking

(thunderbird_esr102 unaffected, thunderbird110+ wontfix, thunderbird111 fixed)

RESOLVED FIXED
112 Branch
Tracking Status
thunderbird_esr102 --- unaffected
thunderbird110 + wontfix
thunderbird111 --- fixed

People

(Reporter: emilio, Assigned: mkmelin)

References

(Blocks 1 open bug, Regression)

Details

(Keywords: hang, regression, Whiteboard: [regression: TB111][fixed in bug 1801067])

Attachments

(1 file)

Daily has hung a couple times this week. I caught it with GDB and it seems like a bug in the imap code when called potentially reentrantly?

Main thread stack here (sorry, my window manager killed TB so I couldn't investigate much more):

(gdb) up
#1  0x00007fb73a09df90 in pthread_cond_wait () from /usr/lib/libc.so.6
(gdb) 
#2  0x000055ffc7efa17f in mozilla::detail::ConditionVariableImpl::wait(mozilla::detail::MutexImpl&) ()
(gdb) 
#3  0x00007fb72edc68f1 in mozilla::SyncRunnable::DispatchToThread(nsIEventTarget*, bool) () from /home/emilio/thunderbird/libxul.so
(gdb) 
#4  0x00007fb733190219 in nsImapProtocol::LoadImapUrlInternal() () from /home/emilio/thunderbird/libxul.so
(gdb) 
#5  0x00007fb73318f655 in nsImapProtocol::SetupWithUrl(nsIURI*, nsISupports*) () from /home/emilio/thunderbird/libxul.so
(gdb) 
#6  0x00007fb73319926a in nsImapProtocol::LoadImapUrl(nsIURI*, nsISupports*) () from /home/emilio/thunderbird/libxul.so
(gdb) 
#7  0x00007fb733156d63 in nsImapIncomingServer::LoadNextQueuedUrl(nsIImapProtocol*, bool*) () from /home/emilio/thunderbird/libxul.so
(gdb) 
#8  0x00007fb733156249 in nsImapIncomingServer::GetImapConnectionAndLoadUrl(nsIImapUrl*, nsISupports*) () from /home/emilio/thunderbird/libxul.so
(gdb) 
#9  0x00007fb7331b5794 in nsImapService::GetImapConnectionAndLoadUrl(nsIImapUrl*, nsISupports*, nsIURI**) () from /home/emilio/thunderbird/libxul.so
(gdb) 
#10 0x00007fb7331b9b8e in nsImapService::GetMessageFromUrl(nsIImapUrl*, int, nsIMsgFolder*, nsIImapMessageSink*, nsIMsgWindow*, nsISupports*, bool, nsIURI**) () from /home/emilio/thunderbird/libxul.so
(gdb) 
#11 0x00007fb7331b9630 in nsImapService::FetchMessage(nsIImapUrl*, int, nsIMsgFolder*, nsIImapMessageSink*, nsIMsgWindow*, nsISupports*, nsTSubstring<char> const&, bool, nsIURI**) () from /home/emilio/thunderbird/libxul.so
(gdb) 
#12 0x00007fb7331c1c12 in nsImapService::DownloadMessagesForOffline(nsTSubstring<char> const&, nsIMsgFolder*, nsIUrlListener*, nsIMsgWindow*) () from /home/emilio/thunderbird/libxul.so
(gdb) 
#13 0x00007fb73314f361 in nsAutoSyncState::DownloadMessagesForOffline(nsTArray<RefPtr<nsIMsgDBHdr> > const&) () from /home/emilio/thunderbird/libxul.so
(gdb) 
#14 0x00007fb73314a09c in nsAutoSyncManager::DownloadMessagesForOffline(nsIAutoSyncState*, unsigned int) () from /home/emilio/thunderbird/libxul.so
(gdb) 
#15 0x00007fb73314a38c in nsAutoSyncManager::HandleDownloadErrorFor(nsIAutoSyncState*, nsresult) () from /home/emilio/thunderbird/libxul.so
(gdb) 
#16 0x00007fb73314bed1 in nsAutoSyncManager::OnDownloadCompleted(nsIAutoSyncState*, nsresult) () from /home/emilio/thunderbird/libxul.so
(gdb) 
#17 0x00007fb73314e8b2 in nsAutoSyncState::OnStopRunningUrl(nsIURI*, nsresult) () from /home/emilio/thunderbird/libxul.so
(gdb) 
#18 0x00007fb73309ce0f in nsMsgMailNewsUrl::SetUrlState(bool, nsresult) () from /home/emilio/thunderbird/libxul.so
(gdb) 
#19 0x00007fb73317ce2d in nsImapMailFolder::SetUrlState(nsIImapProtocol*, nsIMsgMailNewsUrl*, bool, bool, nsresult) () from /home/emilio/thunderbird/libxul.so
(gdb) 
#20 0x00007fb7331d2463 in (anonymous namespace)::SyncRunnable5<nsIImapMailFolderSink, nsIImapProtocol*, nsIMsgMailNewsUrl*, bool, bool, nsresult>::Run() () from /home/emilio/thunderbird/libxul.so
(gdb) 
#21 0x00007fb72ed3d095 in mozilla::RunnableTask::Run() () from /home/emilio/thunderbird/libxul.so
(gdb) 
#22 0x00007fb72ed38de3 in mozilla::TaskController::DoExecuteNextTaskOnlyMainThreadInternal(mozilla::detail::BaseAutoLock<mozilla::Mutex&> const&) () from /home/emilio/thunderbird/libxul.so
(gdb) 
#23 0x00007fb72ed38049 in mozilla::TaskController::ExecuteNextTaskOnlyMainThreadInternal(mozilla::detail::BaseAutoLock<mozilla::Mutex&> const&) () from /home/emilio/thunderbird/libxul.so
(gdb) 
#24 0x00007fb72ed38298 in mozilla::TaskController::ProcessPendingMTTask(bool) () from /home/emilio/thunderbird/libxul.so
(gdb) 
#25 0x00007fb72ed3e9f2 in mozilla::detail::RunnableFunction<mozilla::TaskController::InitializeInternal()::$_2>::Run() () from /home/emilio/thunderbird/libxul.so
(gdb) 
#26 0x00007fb72ed4b997 in nsThread::ProcessNextEvent(bool, bool*) () from /home/emilio/thunderbird/libxul.so
(gdb) 
#27 0x00007fb72ed4f64c in NS_ProcessNextEvent(nsIThread*, bool) () from /home/emilio/thunderbird/libxul.so
Regressed by: 1801067

Okay, caught this again, and killed the process with -11 to get a crash report: bp-110edd22-232f-4cde-9fd4-1e6990230214.

So the socket thread is in nsSocketTransport::ResolveHost waiting sync for the main thread:

 3 	libxul.so 	mozilla::detail::BaseMonitorAutoLock<mozilla::Monitor>::Wait() 	xpcom/threads/Monitor.h:138 	inlined
3 	libxul.so 	mozilla::SyncRunnable::DispatchToThread(nsIEventTarget*, bool) 	xpcom/threads/SyncRunnable.h:72 	cfi
4 	libxul.so 	mozilla::SyncRunnable::DispatchToThread(nsIEventTarget*, already_AddRefed<nsIRunnable>, bool) 	xpcom/threads/SyncRunnable.h:117 	inlined
4 	libxul.so 	mozilla::net::nsSocketTransport::ResolveHost()

And the main thread is waiting sync on the socket thread:

3 	libxul.so 	mozilla::SyncRunnable::DispatchToThread(nsIEventTarget*, bool) 	xpcom/threads/SyncRunnable.h:72 	cfi
4 	libxul.so 	mozilla::SyncRunnable::DispatchToThread(nsIEventTarget*, already_AddRefed<nsIRunnable>, bool) 	xpcom/threads/SyncRunnable.h:117 	inlined
4 	libxul.so 	nsImapProtocol::LoadImapUrlInternal() 	mailnews/imap/src/nsImapProtocol.cpp:2417 	cfi
5 	libxul.so 	nsImapProtocol::SetupWithUrl(nsIURI*, nsISupports*) 	mailnews/imap/src/nsImapProtocol.cpp:962 	cfi
6 	libxul.so 	nsImapProtocol::LoadImapUrl(nsIURI*, nsISupports*)

Thus, trivial deadlock :/

Bug 1809755 introduced a very similar issue (sync dispatch from main to socket thread). Is there any reason that really needs to be sync?

Flags: needinfo?(kaie)
See Also: → 1809755
Severity: -- → S1
Flags: needinfo?(kaie)
Priority: -- → P1

Is it just random, or is there thought a specific situation triggers it?

Patch has been on daily since Feb 1, I've not had trouble using daily, and reports are rare, so I doubt this would block release of 111 beta.

Keywords: hang

It's pretty random, but I hit it a few times a day.

Note that I haven't been using daily so much because the message list had a variety of bugs, otherwise I'm pretty sure I would've hit this more often.

Severity: S1 → S2
OS: Unspecified → All
Priority: P1 → --
Whiteboard: [regression: TB111]
Version: unspecified → Thunderbird 111

I've also had hangs daily, but hadn't been able to pinpoint it. Actually, 3 today already...
I would consider it a blocker for beta.

Severity: S2 → S1
Priority: -- → P1

Until we have a solution, we may have to back out the landings from bug 1801067 comment 49 and 50.

(In reply to Magnus Melin [:mkmelin] from comment #7)

Until we have a solution, we may have to back out the landings from bug 1801067 comment 49 and 50.

agreed

If we turn IMAP-JS on as planned then it would prevent this from having any effect on users. [on Daily]

It still needs to be fixed, though.

Right, but beta users (+ people on daily who turn off imap-js for testing) would be hit by this.
It could be possible to back out on beta only, but doing so is usually such a mess later on.

(In reply to Magnus Melin [:mkmelin] from comment #7)

Until we have a solution, we may have to back out the landings from bug 1801067 comment 49 and 50.

It's like you read my mind.

But I do not anticipate shipping beta this week no matter what you do, because build 1 of beta 1 hasn't happened yet, it still needs to go through QA, and I'm not going to release a new beta on a Friday. So you have a few days to fix it from its current state, if you want. But it would need to be a solid fix.

If the regression has been on trunk for 12 days, it makes my wonder why there are not more complaints. Are most of the daily users on holiday?

Will back out for daily.

Assignee: nobody → mkmelin+mozilla
Target Milestone: --- → 112 Branch

Backed out as requested.

Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Whiteboard: [regression: TB111] → [regression: TB111][fixed in bug 1801067]
Blocks: 1817356

(In reply to Emilio Cobos Álvarez (:emilio) from comment #2)

Bug 1809755 introduced a very similar issue (sync dispatch from main to socket thread). Is there any reason that really needs to be sync?

Moved this to bug 1817356.

(In reply to Geoff Lankow (:darktrojan) from comment #13)

Backed out as requested.

https://hg.mozilla.org/comm-central/rev/3d90da3f05c2

I'll export that and attach to this bug to have something to hand uplift approval on.

[Approval Request Comment]
Beta backout needed to avoid hangs.

Attachment #9318321 - Flags: approval-comm-beta?

Comment on attachment 9318321 [details] [diff] [review]
bug1816633_backouts.patch

[Triage Comment]
Approved for beta

Attachment #9318321 - Flags: approval-comm-beta? → approval-comm-beta+

Note: That attached backout patch is wrong. Just graft 3d90da3f05c2.

You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: