Open Bug 1586494 Opened 6 months ago Updated 14 days ago

Crash in [@ NS_CycleCollectorSuspect3] from nsImapMockChannel.

Categories

(MailNews Core :: Networking: IMAP, defect, P1, critical)

Unspecified
All
defect

Tracking

(thunderbird_esr68+ affected)

ASSIGNED
Thunderbird 76.0
Tracking Status
thunderbird_esr68 + affected

People

(Reporter: emilio, Assigned: benc)

References

(Depends on 1 open bug, Blocks 1 open bug)

Details

(Keywords: crash, leave-open, topcrash-thunderbird)

Crash Data

Attachments

(2 files)

This bug is for crash report bp-6e4e9c87-b863-4b7d-812e-82d560190830.

Top 10 frames of crashing thread:

0 libxul.so NS_CycleCollectorSuspect3 xpcom/base/nsCycleCollector.cpp:3763
1 libxul.so <name omitted> uriloader/base/nsDocLoader.cpp:171
2 libxul.so nsImapMockChannel::~nsImapMockChannel comm/mailnews/imap/src/nsImapProtocol.cpp:8480
3 libxul.so nsImapMockChannel::~nsImapMockChannel comm/mailnews/imap/src/nsImapProtocol.cpp:8473
4 libxul.so <name omitted> comm/mailnews/imap/src/nsImapProtocol.cpp:8460
5 libxul.so nsImapProtocol::RetryUrl comm/mailnews/imap/src/nsImapProtocol.cpp:1896
6 libxul.so nsImapProtocol::ProcessCurrentURL comm/mailnews/imap/src/nsImapProtocol.cpp
7 libxul.so nsImapProtocol::ImapThreadMainLoop comm/mailnews/imap/src/nsImapProtocol.cpp:1420
8 libxul.so nsImapProtocol::Run comm/mailnews/imap/src/nsImapProtocol.cpp:1103
9 libxul.so non-virtual thunk to nsImapProtocol::Run comm/mailnews/imap/src/nsImapProtocol.cpp

Summary: Crash in [@ NS_CycleCollectorSuspect3] → Crash in [@ NS_CycleCollectorSuspect3] from nsImapMockChannel.

I've been crashing on imap code two or three times. It wouldn't be fun if these
assertions were the culprit, but an NS_WARNING_ASSERTION is not going to find
out.

So make these MOZ_DIAGNOSTIC_ASSERT.

Keywords: checkin-needed

Hmm, looking at the crash data, there are mostly crashes in TB 68.1. here:
0 xul.dll NS_CycleCollectorSuspect3 xpcom/base/nsCycleCollector.cpp:3761 context
1 xul.dll CompositeDataSourceImpl::Release() comm/rdf/base/nsCompositeDataSource.cpp:515 cfi
2 xul.dll nsImapMailCopyState::~nsImapMailCopyState() comm/mailnews/imap/src/nsImapMailFolder.cpp:7507 cfi

We've finally killed RDF, so that code path no longer exists. Let's see how that affects the crash situation.

Keywords: leave-open

Pushed by mozilla@jorgk.com:
https://hg.mozilla.org/comm-central/rev/cda2408c98e8
Strengthen threading assertions in IMAP code. r=jorgk

Keywords: checkin-needed

(In reply to Jorg K (GMT+2) from comment #2)

Hmm, looking at the crash data, there are mostly crashes in TB 68.1. here:
0 xul.dll NS_CycleCollectorSuspect3 xpcom/base/nsCycleCollector.cpp:3761 context
1 xul.dll CompositeDataSourceImpl::Release() comm/rdf/base/nsCompositeDataSource.cpp:515 cfi
2 xul.dll nsImapMailCopyState::~nsImapMailCopyState() comm/mailnews/imap/src/nsImapMailFolder.cpp:7507 cfi

I don't want to discourage in any way the investigation but I want to bring to your attention:

So what is interesting about nsImapMockChannel, are there 71.0a1 crashes, and do you actually find an increase correlating to nsImapMockChannel?

Component: General → Networking: IMAP
Product: Thunderbird → MailNews Core
Pushed by mozilla@jorgk.com:
https://hg.mozilla.org/comm-central/rev/58c46e562118
Follow-up: Reformat. rs=reformat

Here's a crash id from a minute ago, crashing on this assertion: bp-345ec887-e60b-4e6f-ba56-0485b0191009

https://searchfox.org/comm-central/rev/becdd368fb58a470c60d247e235ca07da787f694/mailnews/imap/src/nsImapProtocol.cpp#8467 :-(

I wonder why this matters in the destructor. But we could proxy to the main thread. Maybe we should file a new bug for this crash since bug is about another crash, unless I misunderstand something.

Emilio, any STR that provoke this crash?

Flags: needinfo?(benc)

(In reply to Jorg K (GMT+2) from comment #8)

https://searchfox.org/comm-central/rev/becdd368fb58a470c60d247e235ca07da787f694/mailnews/imap/src/nsImapProtocol.cpp#8467 :-(

I wonder why this matters in the destructor. But we could proxy to the main thread. Maybe we should file a new bug for this crash since bug is about another crash, unless I misunderstand something.

I don't think it's just a matter of proxying to the main thread, necessarily... There's some code around that doesn't make much sense to me. I'm not familiar with it of course but:

https://searchfox.org/comm-central/rev/becdd368fb58a470c60d247e235ca07da787f694/mailnews/imap/src/nsImapProtocol.cpp#1298 is in a function which asserts that it is called off the main thread. But it calls a main-thread-only function, unless I'm holding it wrong:

https://searchfox.org/comm-central/rev/becdd368fb58a470c60d247e235ca07da787f694/mailnews/imap/src/nsImapProtocol.cpp#1138

That code looks somewhat broken.

Emilio, any STR that provoke this crash?

Nothing reliable I fear :(

Yep, as Emilio says, the code looks very confused as to what it expects to be running on :-(

There is a comment in the imap thread cleanup:

 // Release protocol object on the main thread to avoid destruction of 'this'
 // on the IMAP thread, which causes grief for weak references.

https://searchfox.org/comm-central/rev/becdd368fb58a470c60d247e235ca07da787f694/mailnews/imap/src/nsImapProtocol.cpp#1127

Maybe a similar motivation for the mockchannel destruction?

The stack frame for the latest crash capture bp-0bd5b09f-3485-4643-a6ef-2f0ac0191010:

nsImapMockChannel::~nsImapMockChannel() 	comm/mailnews/imap/src/nsImapProtocol.cpp:8467  (ASSERT)
nsImapMockChannel::~nsImapMockChannel() 	comm/mailnews/imap/src/nsImapProtocol.cpp:8463 	
<name omitted> 	comm/mailnews/imap/src/nsImapProtocol.cpp:8450 	
nsImapProtocol::RetryUrl() 	comm/mailnews/imap/src/nsImapProtocol.cpp:1894
nsImapProtocol::ProcessCurrentURL() 	comm/mailnews/imap/src/nsImapProtocol.cpp:0
nsImapProtocol::ImapThreadMainLoop() 	comm/mailnews/imap/src/nsImapProtocol.cpp:1418 	
...

It appears to me that the dtor is being called when RetryUrl() exits, and it's local saveMockChannel goes out of scope.
(which, indeed, happens in the imap thread, not the main thread).

RetryUrl():
https://searchfox.org/comm-central/rev/becdd368fb58a470c60d247e235ca07da787f694/mailnews/imap/src/nsImapProtocol.cpp#1881

If the saveMockChannel release inside RetryUrl() can be deferred to the main thread, maybe that'd help? But it seems a bit of a nasty band-aid fix without really understanding what's going on...

(Either way, I think the proper approach - for this and other IMAP bugs - is to sit down and pick through, mapping out all the IMAP code and figure out just what's happening (and on which threads), and if it's sane. Which is on my TODO list, but it's not a small job :-(

Flags: needinfo?(benc)

Uff, that call to main tread nsImapProtocol::CloseStreams() from a non-main thread function nsImapProtocol::TellThreadToDie() is really confused. The former only MOZ_ASSERTS(), but I've never seen it crash there in a debug build, so I guess that code protected by if (m_imapProtocolSink) never runs.

As for saveMockChannel going out of scope in RetryUrl() on the IMAP thread (which I haven't) checked, how about adding
NS_ReleaseOnMainThreadSystemGroup("nsImapProtocol::RetryUrl", saveMockChannel .forget());since this is how we destroy it in other places like:
https://searchfox.org/comm-central/rev/29706036071c4629c2b44512d1acecb64008cc46/mailnews/imap/src/nsImapProtocol.cpp#1011

At least that will fix Emilio's crash. Ben, can you do a patch with that. I don't want to rip off your research and it's just a two-liner.

Flags: needinfo?(benc)

Hmm. There's a patch which might address that specific crash...
But I'm really unhappy with it. There are lots of other places in RetryUrl() which should also cause an assert
eg:

  if (m_imapServerSink)
    (void)m_imapServerSink->PrepareToRetryUrl(kungFuGripImapUrl,
                                              getter_AddRefs(saveMockChannel));

PrepareToRetryUrl() calls nsImapUrl::GetMockChannel() which has a similar thread assert in it.
It's not called if m_imapServerSink is null... but then saveMockChannel isn't set either. It's a rabbithole.

I'm going to leave the NI on it, and next week I'll try and sit down and map out all the IMAP code and get some kind of understanding of it. Without that I'm just looking at an elephant through a microscope :-)

IOW: For saveMockChannel to be set PrepareToRetryUrl()
https://searchfox.org/comm-central/rev/29706036071c4629c2b44512d1acecb64008cc46/mailnews/imap/src/nsImapIncomingServer.cpp#448
and GetMockChannel() would need to run and that would crash before the channel is even set if run on the IMAP thread:
https://searchfox.org/comm-central/rev/29706036071c4629c2b44512d1acecb64008cc46/mailnews/imap/src/nsImapUrl.cpp#984

So in conclusion: The patch can't really help. It's crazy since the crash is at the end of RetryUrl() where a mock channel is being destroyed that can't even exist.

Duplicate of this bug: 1589572
See Also: → 1132339
OS: Linux → All
See Also: → 1581075
Version: unspecified → Trunk
Flags: needinfo?(benc)
Blocks: 1517464

(bringing assignment from bug 1586494. note also bug 1175168 comment 7 )

Assignee: nobody → benc
Status: NEW → ASSIGNED
See Also: → 1616581

Magnus, as soon as we can manage it, I'd suggest we have Ben spend significant additional time to slay this, because we are approaching 1% of users affected - based on stats of 77k - 92k users affected in the past month [1] and a baseline of 10M users measured [2]. And this is just one crash signature - there are surely other signatures.

[1] https://crash-stats.mozilla.org/signature/?signature=NS_CycleCollectorSuspect3&date=%3E%3D2020-02-10T11%3A14%3A00.000Z&date=%3C2020-03-10T11%3A14%3A00.000Z
[2] https://stats.thunderbird.net/#default

Yeah we should see if we can track this down. Ben is going to spend some time on imap threading (bug 1618934). If we're lucky that could fix things. Likely not, but should give a better understanding and hopefully enough clues to figure this out.

Priority: -- → P1
Depends on: 1618934

FYI, got a crash again in 74.0b2 (32-bit)...
bp-0dc515a3-75af-4a2e-bce3-6e4a90200312
I was not even using the app when it happened... just crashed by itself in the background...

OK, so the patch I proposed back in comment #12 looked plausible, except for the fact that Jorg and I both thought it was completely bonkers and could never possibly work.

The Short Version:

I've changed my mind and think it could help. I've got a try run going here:

https://treeherder.mozilla.org/#/jobs?repo=try-comm-central&revision=6c9481e9ef055152f3e14800101c3102acf85a13

If no problems, we just apply the existing patch as is.

The Long Version:

(mainly just to leave myself a trail of breadcrumbs!)
This is the code in nsImapProtocol::RetryUrl() that had Jorg (comment #14) and I both scratching our heads:
(In reply to Ben Campbell from comment #13)

  if (m_imapServerSink)
    (void)m_imapServerSink->PrepareToRetryUrl(kungFuGripImapUrl,
                                              getter_AddRefs(saveMockChannel));

PrepareToRetryUrl() will assert if called on any thread other than the main thread. And the whole problem here is that we're asserting because saveMockChannel is being deleted on the IMAP thread. So the patch I posted was entirely academic - the stack trace must have been wrong, because PrepareToRetryUrl() wasn't asserting, so we must be on the main thread. The patch would never work.

Except.

After poring through the IMAP code then coming back here and taking a fresh look, I spot that m_imapServerSink isn't actually an nsIMapServerSink. It is, in fact, a ImapServerSinkProxy. This uses the magic of macros to synchronously dispatch nsIMapServerSink
calls to the main thread (see https://searchfox.org/comm-central/source/mailnews/imap/src/nsSyncRunnableHelpers.cpp#419 ).

So it looks like it's being called on the IMAP thread, but it is actually doing the call on the main thread, blocking the IMAP thread until it returns.

Phew.

Doesn't appear to have broken anything (try build), so let's check it in.

Pushed by thunderbird@calypsoblue.org:
https://hg.mozilla.org/comm-central/rev/30c25adcb3d8
Workaround for mainthread assertion in nsImapMockChannel dtor. r=jorgk

Target Milestone: --- → Thunderbird 76.0

Hmm, I can't remember reviewing this. Anyway, there's a grammar error here:

  • // (this is a workaround to try and prevent a specific crash, and
  • // does nothing clarify the threading mess!)

There's a "to" missing, right?

You need to log in before you can comment on or make changes to this bug.