Closed Bug 461621 Opened 16 years ago Closed 16 years ago

Crash in nsNNTPProtocol::~nsNNTPProtocol() on shutdown after Tb is left open for some time

Categories

(MailNews Core :: Networking: NNTP, defect)

defect
Not set
major

Tracking

(Not tracked)

RESOLVED FIXED
mozilla1.9.1b2

People

(Reporter: rain1, Assigned: rain1)

Details

(Keywords: crash)

Attachments

(1 file)

For some time, I've been getting crashes if I shutdown after Thunderbird is left open for a few hours. I'm not sure what triggers it to happen only after a while.

The stack is:

> 	msgnews.dll!nsNNTPProtocol::~nsNNTPProtocol()  Line 365 + 0x1d bytes	C++
 	msgnews.dll!nsNNTPProtocol::`scalar deleting destructor'()  + 0xf bytes	C++
 	msgbsutl.dll!nsMsgProtocol::Release()  Line 67 + 0x92 bytes	C++
 	msgnews.dll!nsNNTPProtocol::Release()  Line 309 + 0x11 bytes	C++
 	xpcom_core.dll!nsCOMArray_base::RemoveObjectAt(int aIndex=0)  Line 138 + 0x14 bytes	C++
 	msgnews.dll!nsCOMArray<nsINNTPProtocol>::RemoveObjectAt(int aIndex=0)  Line 265	C++
 	msgnews.dll!nsNntpIncomingServer::CloseCachedConnections()  Line 436	C++
 	msgbase.dll!hashCloseCachedConnections(const nsACString_internal & aKey={...}, nsCOMPtr<nsIMsgIncomingServer> & aServer={...}, void * aClosure=0x00000000)  Line 1043	C++
 	msgbase.dll!nsBaseHashtable<nsCStringHashKey,nsCOMPtr<nsIMsgIncomingServer>,nsIMsgIncomingServer *>::s_EnumStub(PLDHashTable * table=0x04d19ed0, PLDHashEntryHdr * hdr=0x04ecda6c, unsigned int number=2, void * arg=0x0037ee2c)  Line 346 + 0x1e bytes	C++
 	xpcom_core.dll!PL_DHashTableEnumerate(PLDHashTable * table=0x04d19ed0, PLDHashOperator (PLDHashTable *, PLDHashEntryHdr *, unsigned int, void *)* etor=0x715a58b0, void * arg=0x0037ee2c)  Line 735 + 0x19 bytes	C
 	msgbase.dll!nsBaseHashtable<nsCStringHashKey,nsCOMPtr<nsIMsgIncomingServer>,nsIMsgIncomingServer *>::Enumerate(PLDHashOperator (const nsACString_internal &, nsCOMPtr<nsIMsgIncomingServer> &, void *)* enumFunc=0x7159aef0, void * userArg=0x00000000)  Line 221 + 0x13 bytes	C++
 	msgbase.dll!nsMsgAccountManager::CloseCachedConnections()  Line 1436	C++
 	msgbase.dll!nsMsgAccountManager::Observe(nsISupports * aSubject=0x003ab848, const char * aTopic=0x71253e20, const wchar_t * someData=0x71253e10)  Line 301	C++
 	xpcom_core.dll!nsObserverList::NotifyObservers(nsISupports * aSubject=0x003ab848, const char * aTopic=0x71253e20, const wchar_t * someData=0x71253e10)  Line 129	C++
 	xpcom_core.dll!nsObserverService::NotifyObservers(nsISupports * aSubject=0x003ab848, const char * aTopic=0x71253e20, const wchar_t * someData=0x71253e10)  Line 184	C++
 	necko.dll!nsIOService::SetOffline(int offline=1)  Line 613	C++
 	necko.dll!nsIOService::Observe(nsISupports * subject=0x049b4aa8, const char * topic=0x72a9dd60, const wchar_t * data=0x72a9ec58)  Line 776	C++
 	xpcom_core.dll!nsObserverList::NotifyObservers(nsISupports * aSubject=0x049b4aa8, const char * aTopic=0x72a9dd60, const wchar_t * someData=0x72a9ec58)  Line 129	C++
 	xpcom_core.dll!nsObserverService::NotifyObservers(nsISupports * aSubject=0x049b4aa8, const char * aTopic=0x72a9dd60, const wchar_t * someData=0x72a9ec58)  Line 184	C++
 	xul.dll!nsXREDirProvider::DoShutdown()  Line 853	C++
 	xul.dll!ScopedXPCOMStartup::~ScopedXPCOMStartup()  Line 946	C++
 	xul.dll!XRE_main(int argc=1, char * * argv=0x00390fa8, const nsXREAppData * aAppData=0x00393a78)  Line 3309	C++
 	thunderbird.exe!NS_internal_main(int argc=1, char * * argv=0x00390fa8)  Line 103 + 0x12 bytes	C++
 	thunderbird.exe!wmain(int argc=1, wchar_t * * argv=0x00391c88)  Line 87 + 0xd bytes	C++
 	thunderbird.exe!__tmainCRTStartup()  Line 583 + 0x19 bytes	C
 	thunderbird.exe!wmainCRTStartup()  Line 403	C
 	kernel32.dll!@BaseThreadInitThunk@12()  + 0xe bytes	
 	ntdll.dll!___RtlUserThreadStart@8()  + 0x23 bytes	
 	ntdll.dll!__RtlUserThreadStart@8()  + 0x1b bytes	

The prlog just before the crash is:

0[4977d0]: (7996650) ClosingConnection
0[4977d0]: (7996650) Sending: QUIT 0[4977d0]: (7996650) ClosingSocket()
0[4977d0]: (7996650) CleanupAfterRunningUrl()
0[4977d0]: (7996650) setting busy to 0
0[4977d0]: (7996320) destroying
0[4977d0]: (7996320) destroying

Essentially, the destructor for the nsNNTPProtocol object at 0x07996320 is indirectly calling itself.

A proposed fix is coming up in a few minutes.
Seems to be several things coming together to create the perfect storm.

In http://mxr.mozilla.org/mozilla/source/mailnews/news/src/nsNntpIncomingServer.cpp?mark=430-435#430, connection->CloseConnection() is very likely to remove the connection from mConnectionCache. This means that connections are actually being removed two at a time!

While this part is seemingly unrelated, what _is_ somewhat related is that the array holds the only reference to the second protocol object (the one for which CloseConnection() hasn't been called). So RemoveObjectAt releases the last ref to the object, which means that the object's destructor gets called.

http://mxr.mozilla.org/mozilla/source/mailnews/news/src/nsNNTPProtocol.cpp#358

However, since we haven't called CloseConnection() yet for the object, m_nntpServer is not null, which means that it attempts to remove itself from the connection cache array again. From this point, bad things start happening.

The fix does three things
- it removes connections one at a time instead of two at a time.
- it ensures that CloseConnection() is called for each connection, thus m_nntpServer is set to null before destruction
- it adds a ref to the protocol object, which means that the array doesn't hold the only ref, so that RemoveObject won't destroy the object

I'd like to be braver and do one or both of
- removing the RemoveObject there entirely -- which means that connection->CloseConnection() will be solely responsible for removing the object from the connection cache array. However the connection's only removed if m_nntpServer is not null, and I'm not exactly sure of which circumstances this is true in.
- tweaking the destructor somehow so that it never causes itself to be called again -- I don't know how this could be done.

However, at least for this particular case, this patch should fix it.
Attachment #344730 - Flags: superreview?(bienvenu)
Attachment #344730 - Flags: review?
Attachment #344730 - Flags: review? → review?(Pidgeot18)
Component: Networking → Networking: News
Product: MailNews Core → Core
QA Contact: mailnews.networking → networking.news
Attachment #344730 - Flags: superreview?(bienvenu) → superreview+
Comment on attachment 344730 [details] [diff] [review]
proposed fix, v1
[Checkin: Comment 3]

I'll admit I have some reservations about this patch, but without being able to investigate the cause further, this is probably the best way to continue.

Ideally, the removal shouldn't need to be performed at all, but that would require ensuring that nsNntpProtocol doesn't escape the server, which is an assumption we fail right now.

I'd like to see a reliable way to generate the crash, but it doesn't look like it's anything simple.

In short, r+.
Attachment #344730 - Flags: review?(Pidgeot18) → review+
Keywords: checkin-needed
Summary: Crash in nsNNTPProtocol::~nsNNTPProtocol() on shutdown after Tb is left open for a few hours → Crash in nsNNTPProtocol::~nsNNTPProtocol() on shutdown after Tb is left open for some time
Comment on attachment 344730 [details] [diff] [review]
proposed fix, v1
[Checkin: Comment 3]

http://hg.mozilla.org/comm-central/rev/efd446a5a224
Attachment #344730 - Attachment description: proposed fix, v1 → proposed fix, v1 [Checkin: Comment 3]
Status: ASSIGNED → RESOLVED
Closed: 16 years ago
Keywords: checkin-needed
Resolution: --- → FIXED
Target Milestone: --- → mozilla1.9.1b2
Product: Core → MailNews Core
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: