Closed Bug 537017 Opened 10 years ago Closed Last year

crash [@ nsMsgFilterAfterTheFact::ApplyFilter(int*)]

Categories

(MailNews Core :: Filters, defect, critical)

1.9.1 Branch
x86
All
defect
Not set
critical

Tracking

(thunderbird_esr6063+ fixed, thunderbird63 fixed, thunderbird64 fixed)

RESOLVED FIXED
Thunderbird 64.0
Tracking Status
thunderbird_esr60 63+ fixed
thunderbird63 --- fixed
thunderbird64 --- fixed

People

(Reporter: wsmwk, Assigned: mkmelin)

References

Details

(Keywords: crash, Whiteboard: [rare])

Crash Data

Attachments

(1 file, 1 obsolete file)

crash [@ nsMsgFilterAfterTheFact::ApplyFilter(int*)]
very low volume crash, like 2/week

bp-b8d38796-b190-4c00-9fec-03c482091224
0	thunderbird.exe	nsMsgFilterAfterTheFact::ApplyFilter	 mailnews/base/search/src/nsMsgFilterService.cpp:576
1	thunderbird.exe	nsMsgApplyFiltersToMessages::RunNextFilter	mailnews/base/search/src/nsMsgFilterService.cpp:979
2	thunderbird.exe	nsMsgFilterAfterTheFact::AdvanceToNextFolder
3	thunderbird.exe	nsMsgFilterService::ApplyFilters
4	xpcom_core.dll	NS_InvokeByIndex_P	xpcom/reflect/xptcall/src/md/win32/xptcinvoke.cpp:101 


also ~8/week [@ nsCOMPtr_base::~nsCOMPtr_base() | nsMsgFilterAfterTheFact::~nsMsgFilterAfterTheFact()] .. bogus top of stack?

bp-0d89f9d4-25f7-4bcf-9892-ece0f2091228
0	xpcom_core.dll	nsCOMPtr_base::~nsCOMPtr_base	 objdir-tb/mozilla/xpcom/build/nsCOMPtr.cpp:81
1	thunderbird.exe	nsMsgFilterAfterTheFact::~nsMsgFilterAfterTheFact	mailnews/base/search/src/nsMsgFilterService.cpp:332
2	thunderbird.exe	nsMsgFilterAfterTheFact::`scalar deleting destructor'	
3	thunderbird.exe	nsReferencedElement::DocumentLoadNotification::Release	mailnews/imap/src/nsImapOfflineSync.cpp:61
4	xpcom_core.dll	nsCOMPtr_base::~nsCOMPtr_base	objdir-tb/mozilla/xpcom/build/nsCOMPtr.cpp:81
5	thunderbird.exe	nsTArray<nsCOMPtr<nsIMsgSendLaterListener> >::DestructRange	objdir-tb/mozilla/dist/include/xpcom/nsTArray.h:848
6	thunderbird.exe	nsTArray<nsCOMPtr<nsIMsgSendListener> >::RemoveElementsAt	objdir-tb/mozilla/dist/include/xpcom/nsTArray.h:663
7	thunderbird.exe	nsAutoTObserverArray<nsCOMPtr<nsIDBChangeListener>,0>::Clear	objdir-tb/mozilla/dist/include/xpcom/nsTObserverArray.h:242
8	thunderbird.exe	nsMsgMailNewsUrl::SetUrlState	mailnews/base/util/nsMsgMailNewsUrl.cpp:136
9	thunderbird.exe	nsImapMailFolder::SetUrlState	mailnews/imap/src/nsImapMailFolder.cpp:6596
crash rate for v3 is now double compared to comment 0, about 4/week, but still tiny. bp-686d0ada-2b73-42c9-bb7f-e31442100322 has an email address which I have pinged.

filed bug 555539 for nsCOMPtr_base::~nsCOMPtr_base() |
nsMsgFilterAfterTheFact::~nsMsgFilterAfterTheFact()
OS: Windows XP → All
Crash Signature: [@ nsMsgFilterAfterTheFact::ApplyFilter(int*)]
rkent, is bp-e9bf7e3b-f5c4-46e5-95bb-d01722111207 happening during shutdown?

(never got a response from bp-686d0ada-2b73-42c9-bb7f-e31442100322)
Crash Signature: [@ nsMsgFilterAfterTheFact::ApplyFilter(int*)] → [@ nsMsgFilterAfterTheFact::ApplyFilter(int*)] [@ nsMsgFilterAfterTheFact::ApplyFilter ]
Whiteboard: [rare]
(In reply to Wayne Mery (:wsmwk) from comment #2)
> rkent, is bp-e9bf7e3b-f5c4-46e5-95bb-d01722111207 happening during shutdown?
> 
> (never got a response from bp-686d0ada-2b73-42c9-bb7f-e31442100322)

With nsXREDirProvider::DoShutdown in the stack, my guess is yes. It is also happening with a post-bayes filter, which is quite rare (the user must select this manually during filter definition).
In version 12, I think, signature morphed to nsMsgFilterAfterTheFact::ApplyFilter(bool*) 

(In reply to Kent James (:rkent) from comment #3)
> (In reply to Wayne Mery (:wsmwk) from comment #2)
> > rkent, is bp-e9bf7e3b-f5c4-46e5-95bb-d01722111207 happening during shutdown?
> > 
> > (never got a response from bp-686d0ada-2b73-42c9-bb7f-e31442100322)
> 
> With nsXREDirProvider::DoShutdown in the stack, my guess is yes. It is also
> happening with a post-bayes filter, which is quite rare (the user must
> select this manually during filter definition).

None of the current stacks I viewed have shutdown, so that must have been a fluke.

Do these newer stacks reveal anything more useful?

bp-7aa07743-cac8-4603-9d3a-3bc672121203
0	xul.dll	nsMsgFilterAfterTheFact::ApplyFilter	mailnews/base/search/src/nsMsgFilterService.cpp:538
1	xul.dll	nsMsgFilterAfterTheFact::OnSearchDone	mailnews/base/search/src/nsMsgFilterService.cpp:400
2	xul.dll	nsMsgSearchSession::NotifyListenersDone	mailnews/base/search/src/nsMsgSearchSession.cpp:568
3	xul.dll	nsMsgSearchSession::TimerCallback	mailnews/base/search/src/nsMsgSearchSession.cpp:504
4	xul.dll	nsTimerImpl::Fire	xpcom/threads/nsTimerImpl.cpp:473
5	xul.dll	nsTimerEvent::Run	xpcom/threads/nsTimerImpl.cpp:556 

kent@3238 530 // We'll continue after a copy, but not after a move
kent@3238 531 if (NS_SUCCEEDED(rv) && actionType == nsMsgFilterAction::CopyToFolder
kent@3238 532 && actionIndex < numActions - 1)
kent@3238 533 m_nextAction = actionIndex + 1;
kent@3238 534 else
kent@3238 535 m_nextAction = 0; // OnStopCopy tests this to move to next filter
kent@3330 536 // Tell postplugin filters if we are moving the message.
kent@3330 537 if (actionType == nsMsgFilterAction::MoveToFolder) 


FWIW, 3/4 of the random stacks I examined incllude exquilla.
Crash Signature: [@ nsMsgFilterAfterTheFact::ApplyFilter(int*)] [@ nsMsgFilterAfterTheFact::ApplyFilter ] → [@ nsMsgFilterAfterTheFact::ApplyFilter(int*)] [@ nsMsgFilterAfterTheFact::ApplyFilter ] [@ nsMsgFilterAfterTheFact::ApplyFilter(bool*) ]
Flags: needinfo?
still low volume

hmm, the failing line didn't paste in the prior comment

bp-9e1d337f-073a-4008-980b-effe32140710
hg@0 536 if (copyService)
kent@3238 537 {
kent@3238 538 rv = copyService->CopyMessages(m_curFolder, m_searchHitHdrs,
kent@3238 539 destIFolder, actionType == nsMsgFilterAction::MoveToFolder,
bugzilla@9082 540 this, m_msgWindow, false);
kent@3238 541 // We'll continue after a copy, but not after a move
kent@3238 542 if (NS_SUCCEEDED(rv) && actionType == nsMsgFilterAction::CopyToFolder
kent@3238 543 && actionIndex < numActions - 1)
kent@3238 544 m_nextAction = actionIndex + 1;
kent@3238 545 else
kent@3238 546 m_nextAction = 0; // OnStopCopy tests this to move to next filter
kent@3330 547 // Tell postplugin filters if we are moving the message.
kent@3330 548 if (actionType == nsMsgFilterAction::MoveToFolder)
ehsan@10936 549 c(uint32_t i = 0; i < m_searchHits.Length(); i++)
Flags: needinfo?
* nsMsgFilterAfterTheFact::ApplyFilter(bool*) signature ends after TB33.0b1
* nsMsgFilterAfterTheFact::ApplyFilter linux signature ends after TB31.8.0

That leaves nsMsgFilterAfterTheFact::ApplyFilter() which is a mess of different stacks, but adding it to the signature list
Crash Signature: [@ nsMsgFilterAfterTheFact::ApplyFilter(int*)] [@ nsMsgFilterAfterTheFact::ApplyFilter ] [@ nsMsgFilterAfterTheFact::ApplyFilter(bool*) ] → [@ nsMsgFilterAfterTheFact::ApplyFilter(int*)] [@ nsMsgFilterAfterTheFact::ApplyFilter ] [@ nsMsgFilterAfterTheFact::ApplyFilter(bool*) ] [@ nsMsgFilterAfterTheFact::ApplyFilter()]
Depends on: 797710
Whiteboard: [rare] → [rare][check TB38.3.0 crash-stats]
No longer depends on: 797710
still happens in 38.4.0 and betas.
bp-ac52d722-2d21-4d74-ab45-b24e72151206
bp-1761da40-1ca0-4f7e-9c40-f142c2151104
Blocks: 915107
See Also: → 693452
Whiteboard: [rare][check TB38.3.0 crash-stats] → [rare]
See Also: → 1444379
After Feb 1 the crash rate dropped about 30%. All or most of the reduction appeared to be from versions prior to version 52 - so perhaps some where in the version 52 cycle we landed something that helped
Looks like m_curFolder was null. So keep a local ref?
Assignee: nobody → mkmelin+mozilla
Status: NEW → ASSIGNED
Attachment #9012851 - Flags: review?(jorgk)
Comment on attachment 9012851 [details] [diff] [review]
bug537017_apply_filter_crash.patch

You think this is going to help? So basically you're assuming 'm_curFolder' is going away somewhere during the processing? How would that be caused? Wouldn't it be useful to check 'm_curFolder' at the top of the function?
I'm not sure how, it is null for some reason though. 
We already check m_curFolder on top.
Comment on attachment 9012851 [details] [diff] [review]
bug537017_apply_filter_crash.patch

I don't like this at all. Basically you're adding another reference to something that's going missing during the big processing loop for some unknown reason. In doing so, you could cause unpredictable side-effects, especially accessing via that reference. I don't know how that member gets nulled, maybe someone is deleting a folder while the filter is busily processing it in the background. It's a low crash rate, so it must be some unlikely case.

IMHO, you have two options: Either find out what is really happening or apply a different sort of band-aid that just avoids the crash:
  NS_ENSURE_TRUE(m_curFolder, NS_ERROR_UNEXPECTED)
at all call sites using 'm_curFolder'. Do we know which one is crashing?
Attachment #9012851 - Flags: review?(jorgk)
The one I tracked down was https://dxr.mozilla.org/comm-esr60/source/mailnews/base/search/src/nsMsgFilterService.cpp#608

I think we do use the pattern I suggested elsewhere, that of keeping a local ref. Can't find and example now though.

As for the options, I don't think we're likely to find out, and  NS_ENSURE_TRUE(m_curFolder, NS_ERROR_UNEXPECTED) before all call sites is not any more likely to help. I assume threading is somehow involved here, or else the crash wouldn't be possible. So the folder could still go away in between such added check and the next line. We're already checking m_curFolder at the top of the loop, so adding any further checks is just a game of timing, and you will still sometimes lose.
(In reply to Magnus Melin [:mkmelin] from comment #13)
> I think we do use the pattern I suggested elsewhere, that of keeping a local
> ref. Can't find and example now though.
I think you're referring to the kungFuDeathGrip. That's a different story:
https://dxr.mozilla.org/comm-central/search?q=kungFuDeathGrip+path%3Acomm&redirect=false

> So the folder could still go away in between such added check and the next line.
Not likely. 'm_curFolder' is a member of the object and no one else will clear that out unless the object itself subjects itself to some async processing. It would be good to know where which dereferencing actually fails. Can we have a recent crash rather than the ones from 2015 :-(

If some strange thing happens, how do you know your copied folder handle still points to a valid folder? And adding another reference to a thing that's been killed elsewhere is asking for trouble since you might block the destruction of the folder.

Maybe Neil wants to comment, he's been active lately.
Flags: needinfo?(vseerror)
E.g. bp-b3d73398-4556-4303-906a-0f77b0180926
Flags: needinfo?(vseerror)
> Can we have a recent crash rather than the ones from 2015 :-(

One can just click the crash sig link at the top of the bug :)
Crash Signature: [@ nsMsgFilterAfterTheFact::ApplyFilter(int*)] [@ nsMsgFilterAfterTheFact::ApplyFilter ] [@ nsMsgFilterAfterTheFact::ApplyFilter(bool*) ] [@ nsMsgFilterAfterTheFact::ApplyFilter()] → [@ nsMsgFilterAfterTheFact::ApplyFilter ]
Comment on attachment 9012851 [details] [diff] [review]
bug537017_apply_filter_crash.patch

Review of attachment 9012851 [details] [diff] [review]:
-----------------------------------------------------------------

I've looked at all 32 reports:
https://crash-stats.mozilla.com/signature/?product=Thunderbird&version=60.0&signature=nsMsgFilterAfterTheFact%3A%3AApplyFilter&date=%3E%3D2018-06-28T22%3A59%3A47.000Z&date=%3C2018-09-28T22%3A59%3A47.000Z&_columns=date&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=reason&_columns=address&_columns=install_time&_sort=-date&page=1#reports

Most crash on comm/mailnews/base/search/src/nsMsgFilterService.cpp:608 (CopyToFolder) with very few failing on lines 589 (Delelete), 670 (MarkUnread) and 863 (Custom), all places where 'm_curFolder' is dereferenced.

So let's focus on "copy" for a moment. That calls copyService->CopyMessages(..., this, ...), where 'this' is the nsMsgFilterAfterTheFact object which is also a listener. It gets called on OnStartCopy() (no-op) and OnStopCopy(). That one recursively calls ApplyFilter() or may call RunNextFilter(). That one calls eventually calls AdvanceToNextFolder() which resets  'm_curFolder' :-(

So what I imagine happens is that the filtering is happily running and at some stage a failed copy comes back and smashes 'm_curFolder'.

Looks like Magnus solution with a variation is the way to go after all :-) - I'm more convinced with my analysis now.

::: mailnews/base/search/src/nsMsgFilterService.cpp
@@ +525,4 @@
>  nsresult nsMsgFilterAfterTheFact::ApplyFilter()
>  {
>    nsresult rv;
>    do { // error management block, break if unable to continue with filter.

Add this:
// 'm_curFolder' can be reset asynchronously by the copy service calling OnStopCopy(). So take a local copy here and use it throughout the function.
nsCOMPtr<nsIMsgFolder> curFolder = m_curFolder;

@@ +527,5 @@
>    nsresult rv;
>    do { // error management block, break if unable to continue with filter.
>      if (!m_curFilter)
>        break; // Maybe not an error, we just need to call RunNextFilter();
>      if (!m_curFolder)

Change this to curFolder.

@@ +530,5 @@
>        break; // Maybe not an error, we just need to call RunNextFilter();
>      if (!m_curFolder)
>        break; // Maybe not an error, we just need to call AdvanceToNextFolder();
> +
> +    nsCOMPtr<nsIMsgFolder> curFolder = m_curFolder;

Please move this up until after the 'do' since m_curFolder may already be gone here although it tested non-null before.
Attachment #9012851 - Flags: review+
Attachment #9012851 - Attachment is obsolete: true
Attachment #9012977 - Flags: review+
Quite the detective work!
Keywords: checkin-needed
Pushed by mozilla@jorgk.com:
https://hg.mozilla.org/comm-central/rev/4568dcf1dd57
fix crash in nsMsgFilterAfterTheFact::ApplyFilter() caused by async reset of 'm_curFolder'. r=jorgk
Status: ASSIGNED → RESOLVED
Closed: Last year
Keywords: checkin-needed
Resolution: --- → FIXED
Target Milestone: --- → Thunderbird 64.0
Comment on attachment 9012977 [details] [diff] [review]
bug537017_apply_filter_crash.patch

Maybe run this through a beta cycle since it's only 98% straight forward.
Attachment #9012977 - Flags: approval-comm-esr60?
Attachment #9012977 - Flags: approval-comm-beta+
Attachment #9012977 - Flags: approval-comm-esr60? → approval-comm-esr60+
You need to log in before you can comment on or make changes to this bug.