startup or shutdown crash in nsObserverService::RemoveObserver(nsIObserver*, char const*) via nsMsgIncomingServer. reference to nsImapIncomingServer released off the main thread in destructors, due to preferences no longer use threadsafe refcounting

RESOLVED FIXED in Thunderbird 35.0

Status

defect
--
critical
RESOLVED FIXED
5 years ago
4 years ago

People

(Reporter: wsmwk, Assigned: neil)

Tracking

({crash, regression, topcrash-thunderbird})

Dependency tree / graph

Thunderbird Tracking Flags

(thunderbird32 affected, thunderbird33 fixed, thunderbird34 fixed, thunderbird_esr3132+ fixed)

Details

(Whiteboard: [startupcrash][regression:tb31][post-fix check coment 5], crash signature)

Attachments

(2 attachments)

(Reporter)

Description

5 years ago
Currently, this is #1 crashes for TB31. averaging 1 per day.

bp-3d9cd808-8e72-4a68-98ac-592c22140416 (below) is the oldest build that I have found so far, and is the most common stack of this crash signature.
=============================================================
0	xul.dll	nsObserverService::RemoveObserver(nsIObserver *,char const *)	xpcom/ds/nsObserverService.cpp
1	xul.dll	xul.dll@0x6dd0c	
2	xul.dll	nsPrefBranch::~nsPrefBranch()	modules/libpref/src/nsPrefBranch.cpp
3	mozglue.dll	arena_dalloc_small	memory/mozjemalloc/jemalloc.c
4	xul.dll	nsPrefBranch::`vector deleting destructor'(unsigned int)	
5	xul.dll	nsPrefBranch::Release()	modules/libpref/src/nsPrefBranch.cpp
6	xul.dll	nsMsgIncomingServer::~nsMsgIncomingServer()	mailnews/base/util/nsMsgIncomingServer.cpp
7	xul.dll	nsImapIncomingServer::`vector deleting destructor'(unsigned int)	
8	xul.dll	nsMsgIncomingServer::Release()	mailnews/base/util/nsMsgIncomingServer.cpp
9	xul.dll	nsOfflineCacheBinding::Release()	mailnews/imap/src/nsSyncRunnableHelpers.cpp
10	xul.dll	nsImapProtocol::ProcessCurrentURL()	mailnews/imap/src/nsImapProtocol.cpp 

A few others like this:
bp-bc9e4198-e2a0-465f-a5c3-b6fc82140424
bp-8f37559b-98f3-4e28-9fa8-0922a2140426
bp-47c473ab-42c3-4d11-8f48-73eb42140429
bp-92b7e793-2b24-4bee-8f2c-d2dec2140429
bp-8778b0e8-f0fa-4555-bb78-3b0b12140430
(Reporter)

Comment 1

5 years ago
TB32 - #1 crash for nightly/daily 
TB31 - is now #2 (#1 is now intel_aes_gcmENC bug 1005988)
(Reporter)

Comment 2

5 years ago
regression?  
or addon/user environment?

earliest buildid 20140418030203 // bp-3d9cd808-8e72-4a68-98ac-592c22140416

(Firefox has this signature, but only about 10% are startup)
OS: Windows NT → All
Whiteboard: [startupcrash] → [startupcrash][regression?]
(Reporter)

Comment 3

5 years ago
#1 crash for TB31 beta.
correction to comment 2 ... I think build 20140416030203 crash bp-3d9cd808-8e72-4a68-98ac-592c22140416 is the earliest. tb31a1 crash frequency high enough (about 1 per day) that I suspect the checkin causing this occurred within 48 hours of the build

From reporter of bp-77e428e3-2d2a-4e95-8301-9cc3c2140613 "My probs starts with the migration of Thunderbird from Thunderbird v30.00 Beta 1 Build 2 to Thunderbird v31.00 Beta 1 Build 1".  So I don't have any steps from crash reporters, and I suspect we won't be getting any.
Summary: startup crash in nsObserverService::RemoveObserver(nsIObserver*, char const*) → startup crash in nsObserverService::RemoveObserver(nsIObserver*, char const*) via nsMsgIncomingServer
Whiteboard: [startupcrash][regression?] → [startupcrash][regression:tb31]
(Reporter)

Comment 4

5 years ago
tatze, you've crashed a few times, mostly startup
bp-9c781958-fe09-4022-83a3-d4c962140613
(...a month of no crashes...)
bp-f80b6012-633e-4515-a894-d66092140512
bp-5b6dafed-cd3e-4694-ae68-9e71a2140512
bp-9100edae-bc85-4b3d-a8a1-b5ee32140504
bp-348b0036-8a33-4edb-b807-0304b2140503

Do you have any suspicions what is happening in your evironment?
Flags: needinfo?(tatze)
These might not all be the same crash; the direct cause is that the last reference to an nsImapIncomingServer is being released off the main thread, which is causing a chain of object destructors to run. Some of the calls in those destructors need to run on the main thread, including the one in nsObserverService that is forcing the crash.

I've only looked at a couple of the crashes, but they don't all show the release coming from the same place, which means they might not all be the same bug. That said, it's also possible that there's a single underlying bug causing the main thread to *not* hold a reference that it's supposed to hold.
(Reporter)

Updated

5 years ago
Blocks: TB31found
(Reporter)

Comment 6

5 years ago
#2 crash now that TB31.0 shipped
Irving, can we use nsMainThreadPtrHandle / nsMainThreadPtrHolder ?
Comment on attachment 8461330 [details] [diff] [review]
Proposal fix

setting a review flag to get irving's attention.
Attachment #8461330 - Flags: review?(irving)
(Reporter)

Comment 10

5 years ago
Is this a regression of Bug 991626 - Thunderbird crash in NS_CycleCollectorSuspect3 ?
Flags: needinfo?(tatze) → needinfo?(neil)
(Assignee)

Comment 11

5 years ago
(In reply to Wayne Mery from comment #10)
> Is this a regression of Bug 991626 - Thunderbird crash in
> NS_CycleCollectorSuspect3 ?

No, that was caused by the imap thread trying to release the msgWindow object instead of proxying it to the main thread. This is the imap thread releasing the last (!) ref to an incoming server causing it to release its non-threadsafe members (in particular, preferences no longer use threadsafe refcounting).
Flags: needinfo?(neil)
(Reporter)

Comment 12

5 years ago
(In reply to neil@parkwaycc.co.uk from comment #11)
>  This is the imap thread
> releasing the last (!) ref to an incoming server causing it to release its
> non-threadsafe members (in particular, preferences no longer use threadsafe
> refcounting).

does this suggest which bug# we can block?
Component: General → Networking: IMAP
Product: Thunderbird → MailNews Core
Summary: startup crash in nsObserverService::RemoveObserver(nsIObserver*, char const*) via nsMsgIncomingServer → startup crash in nsObserverService::RemoveObserver(nsIObserver*, char const*) via nsMsgIncomingServer. reference to nsImapIncomingServer released off the main thread in destructors, due to preferences no longer use threadsafe refcounting
Whiteboard: [startupcrash][regression:tb31] → [startupcrash][regression:tb31][post-fix check coment 5]
Comment on attachment 8461330 [details] [diff] [review]
Proposal fix

Review of attachment 8461330 [details] [diff] [review]:
-----------------------------------------------------------------

I can't think of any reason why it would be wise for an object with thread-safe ref counting to hold references to objects with non-thread-safe ref counts. I'm not sure if releasing the last ref to the incoming server on the imap thread is a bug or not, but it certainly seems odd.

If we're going to fix this by proxying a destructor back on to the main thread, I *think* the right place to switch back to the main thread is on the ImapServerSinkProxy, so that when nsImapProtocol releases the reference the whole stack gets destroyed in the right place.

I wonder how many things would break if we changed nsMsgIncomingServer (and perhaps all the other proxied objects) to use main-thread-only reference counting, to enforce the destruction happening on the right thread.

Which is all to say that I'm tentatively OK with this patch, but I'd want all kinds of super-review.
Attachment #8461330 - Flags: superreview?(standard8)
Attachment #8461330 - Flags: review?(irving)
Attachment #8461330 - Flags: review+
Comment on attachment 8461330 [details] [diff] [review]
Proposal fix

Unfortunately I'm not really going to get to spend the time to look at this that this deserves over the next few days. Hence seeing if Neil can take a look.
Attachment #8461330 - Flags: superreview?(standard8) → superreview?(neil)
(Assignee)

Comment 15

5 years ago
Comment on attachment 8461330 [details] [diff] [review]
Proposal fix

These main thread pointer handle holders are overkill. Can't you just proxy the release of the receiver to the main thread in the destructor?

Alternatively, make an nsMainThreadPtr template that's like nsMainThreadPtrHolder but a) doesn't do the refcounting and b) has operator->() and use that as the type of the receiver member.
Attachment #8461330 - Flags: superreview?(neil) → superreview-
(Reporter)

Comment 16

5 years ago
I'm not sure to whom this is directed, so I'm picking both ofyou

(In reply to neil@parkwaycc.co.uk from comment #15)
> Comment on attachment 8461330 [details] [diff] [review]
> Proposal fix
> 
> These main thread pointer handle holders are overkill. Can't you just proxy
> the release of the receiver to the main thread in the destructor?
> 
> Alternatively, make an nsMainThreadPtr template that's like
> nsMainThreadPtrHolder but a) doesn't do the refcounting and b) has
> operator->() and use that as the type of the receiver member.

Linux signature is nsObserverService::RemoveObserver
Crash Signature: [@ nsObserverService::RemoveObserver(nsIObserver*, char const*)] → [@ nsObserverService::RemoveObserver(nsIObserver*, char const*)] [@ nsObserverService::RemoveObserver]
Flags: needinfo?(m_kato)
Flags: needinfo?(irving)
(Reporter)

Comment 17

5 years ago
shutdown, windows   nsObserverService::RemoveObserver(nsIObserver*, char const*)
bp-dde15dba-c2c7-4ab7-8812-c5f3b2140730
startup, linux, same user as above   nsObserverService::RemoveObserver 
bp-50168d86-2cb9-421e-8aa4-fa1682140827
(Assignee)

Comment 18

5 years ago
Attachment #8484510 - Flags: review?(irving)
Attachment #8484510 - Flags: review?(irving) → review+
Comment on attachment 8484510 [details] [diff] [review]
Proposed patch

[Triage Comment]

[Triage Comment]

[Triage Comment]

I'm happy for this to land on trunk whenever.  Also giving a+ for branchws.
Attachment #8484510 - Flags: approval-comm-esr31+
Attachment #8484510 - Flags: approval-comm-beta+
Attachment #8484510 - Flags: approval-comm-aurora+
(Reporter)

Comment 20

5 years ago
(In reply to Wayne Mery (:wsmwk) from comment #12)
> (In reply to neil@parkwaycc.co.uk from comment #11)
> >  This is the imap thread
> > releasing the last (!) ref to an incoming server causing it to release its
> > non-threadsafe members (in particular, preferences no longer use threadsafe
> > refcounting).
> 
> does this suggest which bug# we can block?

irving, I believe we can blame bug 993734 - The observer service should not pretend to be threadsafe - which checked in 4014-04-15, the day before thunderbird crashes started

Murali had one of the earliest crashes bp-5e3cf8fd-9a27-42cc-96ae-62a592140424 April 24
Blocks: 993734
Summary: startup crash in nsObserverService::RemoveObserver(nsIObserver*, char const*) via nsMsgIncomingServer. reference to nsImapIncomingServer released off the main thread in destructors, due to preferences no longer use threadsafe refcounting → startup or shutdown crash in nsObserverService::RemoveObserver(nsIObserver*, char const*) via nsMsgIncomingServer. reference to nsImapIncomingServer released off the main thread in destructors, due to preferences no longer use threadsafe refcounting
Wayne, yes, that's likely the change that turned our previous behaviour into crashes; what I'm not sure about is whether our previous behaviour was correct, or whether we were just getting away with it because the previous thread-unsafe condition very rarely caused errors.

In any case, we're better off with this patch than we were before.
Flags: needinfo?(irving)
https://hg.mozilla.org/comm-central/rev/a8b23b49d30f
Assignee: nobody → neil
Status: NEW → RESOLVED
Last Resolved: 5 years ago
Flags: needinfo?(m_kato)
Resolution: --- → FIXED
Target Milestone: --- → Thunderbird 35.0
(Reporter)

Updated

4 years ago
See Also: → 1116502
You need to log in before you can comment on or make changes to this bug.