Thunderbird Crash in [@ shutdownhang | nsThread::Shutdown | nsThreadManager::Shutdown] (password?)
Categories
(Thunderbird :: General, defect)
Tracking
(thunderbird_esr102+ fixed, thunderbird104 fixed)
People
(Reporter: worcester12345, Assigned: KaiE)
References
(Depends on 1 open bug, Blocks 1 open bug)
Details
(Keywords: crash, dupeme, topcrash-thunderbird, Whiteboard: [TM:102.2.+])
Crash Data
Attachments
(1 file, 2 obsolete files)
48 bytes,
text/x-phabricator-request
|
wsmwk
:
approval-comm-beta+
wsmwk
:
approval-comm-esr102+
|
Details | Review |
This bug is for crash report bp-088ba0db-c9c5-456a-91f6-41ebf0200413.
Top 10 frames of crashing thread:
0 ntdll.dll NtWaitForAlertByThreadId
1 ntdll.dll RtlSleepConditionVariableSRW
2 kernelbase.dll SleepConditionVariableSRW
3 mozglue.dll mozilla::detail::ConditionVariableImpl::wait mozglue/misc/ConditionVariable_windows.cpp:50
4 xul.dll mozilla::ThreadEventQueue<mozilla::PrioritizedEventQueue<mozilla::EventQueue> >::GetEvent xpcom/threads/ThreadEventQueue.cpp:149
5 xul.dll nsThread::ProcessNextEvent xpcom/threads/nsThread.cpp:1108
6 xul.dll NS_ProcessNextEvent xpcom/threads/nsThreadUtils.cpp:486
7 xul.dll nsThread::Shutdown xpcom/threads/nsThread.cpp:881
8 xul.dll nsThreadManager::Shutdown xpcom/threads/nsThreadManager.cpp:284
9 xul.dll mozilla::ShutdownXPCOM xpcom/build/XPCOMInit.cpp:649
Reporter | ||
Comment 1•5 years ago
|
||
Report ID
1f603768-0e2a-4839-ae27-cef399cbe0c2
68.7.0 (32-bit)
Comment 3•5 years ago
|
||
What were you doing at the time of your crashes?
Reporter | ||
Comment 4•5 years ago
|
||
I think I was in Firefox, and Thunderbird was just sitting there in the background. Maybe in Excel. I just noticed the "Mozilla Crash Reporter" in the taskbar and went over to it. Glad I didn't lose my browser, as I have lost data there before.
Comment 5•5 years ago
|
||
This could be related to bug 1524247.
During startup or while thunderbird was running, had you dismissed a password prompt without entering a password? Or entered an incorrect password?
Reporter | ||
Comment 6•5 years ago
|
||
I don't think so. I had already been in email, and was doing something else in Firefox. I noticed the Mozilla Crash Reporter had come up, and Firefox was still running. I think it (Thunderbird) was just sitting there idle in the background.
Comment 7•5 years ago
|
||
Thanks. But I'm not asking whether you had answered a password prompt in the seconds before crash. I mean literally, "During startup or while thunderbird was running" ... and then crashed hours or days later.
Reporter | ||
Comment 8•5 years ago
|
||
Oh, dismissed a password dialog ever when that happened? I don't recall. Usually, I log in and then do other stuff with the window minimized. For this account, I usually just put it in at first launch, then leave it alone.
Comment hidden (obsolete) |
Comment 11•4 years ago
|
||
Are you still crashing?
90% of crashes with this signature are Thunderbird - the minority are Firefox
Reporter | ||
Comment 12•4 years ago
|
||
Not as often, and if I did, I'd log it in here, and it would either remain as new or get "duplicated".
Reporter | ||
Comment 13•4 years ago
|
||
(In reply to Worcester12345 from comment #8)
Oh, dismissed a password dialog ever when that happened? I don't recall. Usually, I log in and then do other stuff with the window minimized. For this account, I usually just put it in at first launch, then leave it alone.
I could attempt to do this, if it helps you. Let me know.
Comment 14•4 years ago
|
||
(In reply to Worcester12345 from comment #13)
(In reply to Worcester12345 from comment #8)
Oh, dismissed a password dialog ever when that happened? I don't recall. Usually, I log in and then do other stuff with the window minimized. For this account, I usually just put it in at first launch, then leave it alone.
I could attempt to do this, if it helps you. Let me know.
sure
Comment 15•4 years ago
|
||
Perhaps a duplicate of bug 1524247 or one of the other shutdown crashes
Comment 16•3 years ago
•
|
||
Looking at one of the crashes, I arrived at nsThread::ShutdownInternal
where I read:
// XXXroc What if posting the event fails due to OOM?
mEvents->PutEvent(event.forget(), EventQueuePriority::Normal);
Inside PutEvent
, there seem to be more than just OOM conditions that might have us return false
here. Shouldn't we check for false
and in case return a nullptr
as shutdown context (which would us not make enter the SpinEventLoopUntil
that apparently waits forever for a flag on the shutdown context to be set) ?
It also seems that we sequentially shut down the threads, waiting for each thread to join before we shut down the next. We might want to decouple the sending of the shutdown event to all threads from the waiting for the threads to join.
Randell, WDYT ?
Comment hidden (obsolete) |
Comment hidden (obsolete) |
Comment hidden (obsolete) |
Comment hidden (obsolete) |
Comment hidden (obsolete) |
Reporter | ||
Comment 22•3 years ago
|
||
By "something", I mean try to click on a folder, open an email, pretty much anything.
I went to "More troubleshooting info" and clicked submit to the right of cb7914eb-4a79-436d-87e0-f5ff0eec3f94, and that is where it turned to "failed". There is nothing further to do there, and no instructions on what to do to fix it. In the past, these have started working again sometimes on their own, with no rhyme or reason as to why or how.
Reporter | ||
Comment 23•3 years ago
|
||
It finally let me submit a report:
https://crash-stats.mozilla.org/report/index/6e57f4c2-c7ff-48cc-8b01-4809b0210928
Comment 24•3 years ago
|
||
(In reply to Worcester12345 from comment #23)
It finally let me submit a report:
https://crash-stats.mozilla.org/report/index/6e57f4c2-c7ff-48cc-8b01-4809b0210928
shutdownhang | nsAppShell::ProcessNextNativeEvent is primarily version 91. I believe this is a continuation of the original crash signature of this bug, which is not so prevalent in version 91. And so perhaps not related to bug 1671405.
Comment 25•3 years ago
|
||
(In reply to Jens Stutte [:jstutte] from comment #16)
Looking at one of the crashes, I arrived at
nsThread::ShutdownInternal
where I read:// XXXroc What if posting the event fails due to OOM? mEvents->PutEvent(event.forget(), EventQueuePriority::Normal);
Inside
PutEvent
, there seem to be more than just OOM conditions that might have us returnfalse
here. Shouldn't we check forfalse
and in case return anullptr
as shutdown context (which would us not make enter theSpinEventLoopUntil
that apparently waits forever for a flag on the shutdown context to be set) ?
The false returns from PutEventInternal() are pretty limited - mEventsAreDoomed or no mQueue in the target sink. mEventsAreDoomed can happen if we try to send an event to a thread after it gets past nsThread.cpp:414, where we process events at shutdown time until the event queue is empty. At that point we set mEventsAreDoomed and start cleanup, but an attempt to send us a message could come in there I think.
However for this to happen, we'd have to have already started shutdown for the thread, and ShutdownInternal() doesn't let itself be called twice. If a thread exited on it's own, and Shutdown() was called on it while it was in the process of shutting down, perhaps this sequence could happen. Seems unlikely, but in theory possible for threads that commit suicide.
We could add a MOZ_RELEASE_ASSERT to flag this case, and see if it's being hit.
It also seems that we sequentially shut down the threads, waiting for each thread to join before we shut down the next. We might want to decouple the sending of the shutdown event to all threads from the waiting for the threads to join.
Not shutting them down sequentially should in theory let us shut down faster -- however, it's probably much more likely to expose latent race conditions. So a somewhat risky change, in the short term. Of course, there's no real order imposed currently, so ordering-races could in theory happen already (though in practice they may not), but simultaneous shutdown will definitely make more races possible.
Nika: thoughts?
Comment 26•3 years ago
|
||
(In reply to Randell Jesup [:jesup] (needinfo me) from comment #25)
It also seems that we sequentially shut down the threads, waiting for each thread to join before we shut down the next. We might want to decouple the sending of the shutdown event to all threads from the waiting for the threads to join.
Not shutting them down sequentially should in theory let us shut down faster -- however, it's probably much more likely to expose latent race conditions. So a somewhat risky change, in the short term. Of course, there's no real order imposed currently, so ordering-races could in theory happen already (though in practice they may not), but simultaneous shutdown will definitely make more races possible.
IIUC in the long run we want to skip this stage anyway (if toolkit.shutdown.fastShutdownStage >= 3
). So it is unclear to me if optimizing performance here is still worth the effort. But we did a similar parallelization for quota clients shutdown and it moved the needle quite significantly. And having some extra opportunity to find (and eliminate) races could be interesting, too. Maybe we could hide it behind a pref and enable it in the beginning only on TSAN for a while?
Comment 27•3 years ago
|
||
(In reply to Randell Jesup [:jesup] (needinfo me) from comment #25)
However for this to happen, we'd have to have already started shutdown for the thread, and ShutdownInternal() doesn't let itself be called twice. If a thread exited on it's own, and Shutdown() was called on it while it was in the process of shutting down, perhaps this sequence could happen. Seems unlikely, but in theory possible for threads that commit suicide.
We could add a MOZ_RELEASE_ASSERT to flag this case, and see if it's being hit.
I made a try push with a direct return (and warning) for that case to see what happens. There are also other crash reports that indicate this might happen also during profile-before-change
for individual thread shutdowns.
Comment 28•3 years ago
•
|
||
(In reply to Jens Stutte [:jstutte] from comment #16)
Looking at one of the crashes, I arrived at
nsThread::ShutdownInternal
where I read:// XXXroc What if posting the event fails due to OOM? mEvents->PutEvent(event.forget(), EventQueuePriority::Normal); ```... It also seems that we [sequentially shut down the threads](https://searchfox.org/mozilla-central/rev/50c3cf7a3c931409b54efa009795b69c19383541/xpcom/threads/nsThreadManager.cpp#411-413), waiting for each thread to join before we shut down the next. We might want to decouple the sending of the shutdown event to all threads from the waiting for the threads to join. Randell, WDYT ?
Magnus, any concerns about doing this?
(In reply to Jens Stutte [:jstutte] from comment #27)
...
I made a try push with a direct return (and warning) for that case to see what happens. There are also other crash reports that indicate this might happen also duringprofile-before-change
for individual thread shutdowns.
I suspect no one in the bug here did any testing (I know I didn't - sorry). Is it worth sending this out to a few community members or development team, and if so what do we want to ask of them?
Comment 29•3 years ago
|
||
Jens
The crash signatures for this bug [1] are good for Thunderbird 78 (which is the #1 shutdown crash sig), but neither exists for Thunderbird 91.
The #1 shutdown crash sig for 91, and a good statistical match, is shutdownhang | nsThread::Shutdown | nsThreadManager::Shutdown which corresponds to Bug 1505660. Is that signature a match for the scenario you have detailed here?
[1] [@ shutdownhang | nsThread::Shutdown | nsThreadManager::Shutdown ] [@ @ shutdownhang | nsAppShell::ProcessNextNativeEvent ]
Updated•3 years ago
|
Comment 30•3 years ago
|
||
(In reply to Wayne Mery (:wsmwk) from comment #29)
Jens
The crash signatures for this bug [1] are good for Thunderbird 78 (which is the #1 shutdown crash sig), but neither exists for Thunderbird 91.
The #1 shutdown crash sig for 91, and a good statistical match, is shutdownhang | nsThread::Shutdown | nsThreadManager::Shutdown which corresponds to Bug 1505660. Is that signature a match for the scenario you have detailed here?
No, the scenario from bug 1735284 is supposed to never happen, and by now the MOZ_CRASH we added did not occur.
Comment 31•3 years ago
|
||
Worcester12345 are you still seeing this crash?
Reporter | ||
Comment 33•3 years ago
|
||
(In reply to Wayne Mery (:wsmwk) from comment #31)
Worcester12345 are you still seeing this crash?
If I did, I think bugzilla would hopefully steer me here. If not, I imagine you would. The only question mark on that is that sometimes I can't get Thunderbird to open links to bugzilla correctly. I think eventually they will make their way, or I have to brute-force it to work, so there's that. I'd say you could just mark it wont fix for now if that is what you want. Can't really say it is fixed, since nothing was done to fix it. Also can't say works for me, since there is no way to tell when it might happen again.
Comment 34•3 years ago
|
||
This type of crash occurs when a thread isn't responsive during shutdown, e.g. due to some background blocking work. For exmaple in a crash like https://crash-stats.mozilla.org/report/index/b24de20f-8c19-46ae-a1f8-af2a50220425#allthreads, the blocked thread is probably Thread 26, which is currently running what looks like blocking win32 API calls to verify the signature of loaded DLLs (https://searchfox.org/mozilla-central/rev/13d69189a8abfc5064fe44944550b9b6eb4544f5/toolkit/xre/dllservices/mozglue/Authenticode.cpp#161-189). There's not much we can do about situations like that unfortunately.
In the thunderbird crashes I looked at (admittedly, I only poked at a couple: https://crash-stats.mozilla.org/report/index/b1183302-cc57-47ef-92b0-1b6390220424#allthreads and https://crash-stats.mozilla.org/report/index/ccea46a5-21db-48f3-b9f6-0968d0220423#allthreads), the blocked thread appears to be the "IMAP" thread, which is blocked in nsImapProtocol::GetPassword
waiting on a monitor: https://searchfox.org/comm-central/rev/16430df4c00f639809ba0aea5cf6c31575d55419/mailnews/imap/src/nsImapProtocol.cpp#8473. It looks like whatever system is supposed to tell the thread to die during shutdown in that situation isn't working properly (though it appears there's supposed to be one based on the conditions there).
Comment 35•3 years ago
|
||
important |
IIUC either this loop needs an additional check if we are in shutdown or something else should ensure that nsImapProtocol::OnPromptCanceled()
is called during shutdown if we are waiting for a password?
Comment 36•3 years ago
|
||
(In reply to Jens Stutte [:jstutte] from comment #35)
IIUC either this loop needs an additional check if we are in shutdown or something else should ensure that
nsImapProtocol::OnPromptCanceled()
is called during shutdown if we are waiting for a password?
Comment 37•3 years ago
|
||
This particular signature no longer exists in version 91 and newer.
Nor does the signature for bug 1524247.
But the problem surely is not gone.
Comment 38•3 years ago
|
||
Updated•3 years ago
|
Updated•3 years ago
|
Comment 39•3 years ago
|
||
If at all related to bug 1768344, please see STR from bug 1768344 and its related crash using 104.0b2 for any clues / similarities: https://crash-stats.mozilla.org/report/index/e0dca0f9-14f7-492e-b507-4e4e40220802
Updated•3 years ago
|
Comment 40•3 years ago
|
||
Pushed by mkmelin@iki.fi:
https://hg.mozilla.org/comm-central/rev/173576a762db
Fix Thunderbird Crash in [@ shutdownhang | nsThread::Shutdown | nsThreadManager::Shutdown] - might be stuck waiting for password on shutdown. r=benc
Assignee | ||
Comment 41•3 years ago
•
|
||
This change introduced startup lockup bug 1783710
Assignee | ||
Comment 42•3 years ago
|
||
I think it's best to reopen.
The call to GetServerShuttingDown() isn't simple - it involves a runnable that's dispatched - probably to the main thread?
Apparently that dispatching, while having the monitor locked, causes a deadlock.
I think we must rework that loop to check GetServerShuttingDown() while the monitor is released.
And while I was looking at this code, I'm worried about the related variables m_passwordObtained and m_passwordStatus. IMHO those variables must only be accessed while holding the monitor, otherwise there are races.
Assignee | ||
Comment 43•3 years ago
|
||
Assignee | ||
Comment 44•3 years ago
|
||
The attached patch fixes the deadlock for me.
I'm not yet asking for review. First, let's check if tests pass:
https://treeherder.mozilla.org/jobs?repo=try-comm-central&revision=cb72d4b0a166337cf1d7c4f936d402450c70a8b1
Updated•3 years ago
|
Comment 45•3 years ago
|
||
(In reply to Kai Engert (:KaiE:) from comment #44)
The attached patch fixes the deadlock for me.
If this is at all related to what I am able to repro in bug 1768344 and will get pulled into 104.0b3/b4, I can test to see if it at least resolves things there.
Comment 46•3 years ago
|
||
(In reply to Kai Engert (:KaiE:) from comment #44)
The attached patch fixes the deadlock for me.
I'm not yet asking for review. First, let's check if tests pass:
https://treeherder.mozilla.org/jobs?repo=try-comm-central&revision=cb72d4b0a166337cf1d7c4f936d402450c70a8b1
Try run looks good I think...
You left me standing - I'm still peppering printf()s everywhere to try and understand the flow of things!
I'll just wrap up what I've done so far then give your patch a proper look.
Comment 47•3 years ago
|
||
(In reply to Kai Engert (:KaiE:) from comment #42)
The call to GetServerShuttingDown() isn't simple - it involves a runnable that's dispatched - probably to the main thread?
Apparently that dispatching, while having the monitor locked, causes a deadlock.
I think we must rework that loop to check GetServerShuttingDown() while the monitor is released.
And while I was looking at this code, I'm worried about the related variables m_passwordObtained and m_passwordStatus. IMHO those variables must only be accessed while holding the monitor, otherwise there are races.
Yep, I think you're right about GetServerShuttingDown() causing it.
I think the deadlock was probably always there, but the loop was preventing event dispatch so it didn't actually happen.
Adding the GetServerShuttingDown() gives the event dispatching a kick and allows the queued-up prompt code to run.... and to deadlock!
Something along those lines anyway. It's not super clear which things are running on which threads (sigh).
Comment 48•3 years ago
|
||
Comment 49•3 years ago
|
||
Backout: https://hg.mozilla.org/comm-central/rev/e8d3e664b980fd02d1ca1c3500391ef4423838cb
Let's land a proper fix instead.
Kai, thanks for jumping in!
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Assignee | ||
Updated•3 years ago
|
Assignee | ||
Comment 50•3 years ago
|
||
Comment on attachment 9288919 [details]
Bug 1629669 - Rework nsImapProtocol::GetPassword to avoid deadlock, and better protect m_passwordObtained/m_passwordStatus. r=benc,mkmelin
[Approval Request Comment]
Regression caused by (bug #):
User impact if declined: crash
Testing completed (on c-c, etc.):
Risk to taking this patch (and alternatives if risky): minor risk for IMAP connectivity
Comment 51•3 years ago
|
||
Pushed by mkmelin@iki.fi:
https://hg.mozilla.org/comm-central/rev/31d6fb3903f2
Rework nsImapProtocol::GetPassword to avoid deadlock, and better protect m_passwordObtained/m_passwordStatus. r=benc,mkmelin
Comment 52•3 years ago
|
||
Just out of interest, what does password = nsString();
do?
https://hg.mozilla.org/comm-central/rev/31d6fb3903f23ad0b6e4e97324636d0f7283a007#l1.12
Why not passwort.Truncate();
or `password = "";`` (if the latter compiles).
Assignee | ||
Comment 53•3 years ago
|
||
(In reply to newsfan from comment #52)
Just out of interest, what does
password = nsString();
do?
https://hg.mozilla.org/comm-central/rev/31d6fb3903f23ad0b6e4e97324636d0f7283a007#l1.12Why not
passwort.Truncate();
or `password = "";`` (if the latter compiles).
It erases the string. Assigning to "" creates local data, so might be less efficient. Calling Truncate would have been perfect, but I couldn't find that function, after trying Delete, Erase, and others, I gave up ;-)
Comment 54•3 years ago
|
||
The line appears to be unnecessary since the password is set in the next line m_imapServerSink->GetServerPassword(password);
. That ultimately runs https://searchfox.org/comm-central/rev/30b643d4b2d25ac9d72567bc0af3f08177d2fd1d/mailnews/base/src/nsMsgIncomingServer.cpp#638 via https://searchfox.org/comm-central/rev/30b643d4b2d25ac9d72567bc0af3f08177d2fd1d/mailnews/imap/src/nsImapIncomingServer.cpp#3016.
Assignee | ||
Comment 55•3 years ago
|
||
That's currently true, but the API is designed to potentially fail (and the code checks for a returned failure). Because an API might not fill out parameters on failure, it's safer to init the out parameter.
Comment 56•3 years ago
|
||
Comment on attachment 9288919 [details]
Bug 1629669 - Rework nsImapProtocol::GetPassword to avoid deadlock, and better protect m_passwordObtained/m_passwordStatus. r=benc,mkmelin
[Triage Comment]
Approved for beta
Comment 57•3 years ago
|
||
FYI I'm excited to have this, but suspect we should let this sit on beta a while, and not take it too quickly to 102, to give us time to watch what happens on beta for 2-3 weeks.
Comment 58•3 years ago
|
||
bugherder uplift |
Thunderbird 104.0b4:
https://hg.mozilla.org/releases/comm-beta/rev/40b774c74ff9
Comment 59•2 years ago
|
||
Time for 102 uplift?
Assignee | ||
Comment 60•2 years ago
|
||
Comment on attachment 9288919 [details]
Bug 1629669 - Rework nsImapProtocol::GetPassword to avoid deadlock, and better protect m_passwordObtained/m_passwordStatus. r=benc,mkmelin
see comment 50 for approval comment
Comment 61•2 years ago
|
||
Comment on attachment 9288919 [details]
Bug 1629669 - Rework nsImapProtocol::GetPassword to avoid deadlock, and better protect m_passwordObtained/m_passwordStatus. r=benc,mkmelin
[Triage Comment]
Approved for esr102
Comment 62•2 years ago
|
||
bugherder uplift |
Thunderbird 102.3.3:
https://hg.mozilla.org/releases/comm-esr102/rev/9710cd8cbcc4
Comment 63•2 years ago
|
||
(Collecting information here, which eventually will go into other bug reports)
I believe the signature has shifted in the last 10 months because shutdownhang | nsThread::Shutdown | nsThreadManager::Shutdown has no v102 Thunderbird crashes in the past 6 months, which should be impossible - I would expect to see some pre-102.3.3 examples on crash-stats. Perhaps the crash signature morphed somewhere along the way and I failed to update the bug(s).
Anyway, post-102.3.3, we still have plenty of examples of users crash comments mentioning password. The majority of crash signatures are
- mozilla::dom::workerinternals::RuntimeService::CrashIfHanging - bug 1435343 (where I have also commented)
- shutdownhang | nsThread::Shutdown | nsThreadManager::ShutdownNonMainThreads - Bug 1505660
- shutdownhang | mozilla::SpinEventLoopUntil | nsThread::Shutdown | nsThreadManager::ShutdownNonMainThreads - Bug 1505660
Most mention SMTP. For example bp-fe8b8258-7db3-4c48-9244-d9e3c0230323
CrashIfHanging bug 1435343 examples:
- bp-6f1e0582-cbe9-46b3-846d-aab290230402 "Trying to abort/close the master password window at the startup"
- bp-0716a418-1c7e-4586-9586-4adb20230403
- bp-6ec67d36-0c11-4739-a2e4-6025b0230322 "password changed and got locked out"
- bp-8f87a4de-19d9-44c8-a326-3b9730230420
- bp-8d270318-ca53-4210-b9b6-677630230502 smtp, "Unable to authenticate to Outgoing server (SMTP) smtp.office.365.com. Please check the password and verify the "Authentification method""
- bp-c50acea1-4f88-4f03-a2b5-f34ec0230518 warning about security certificates
- bp-3d09bcb6-0e24-4701-bb76-0f6130230530 canceled password prompt after entering incorrect password
SpinEventLoopUntil Bug 1505660 examples:
- bp-f58780e4-ed36-4586-8b82-b45c40230322 "password manager"
Updated•3 months ago
|
Description
•