Closed Bug 789397 Opened 12 years ago Closed 10 years ago

searching virtual folder. crash in nsACString_internal::Assign due to out of range access from nsMsgSearchSession::GetNextUrl. Often compact related.

Categories

(Thunderbird :: Search, defect)

x86
Windows XP
defect
Not set
critical

Tracking

(thunderbird36 fixed)

RESOLVED FIXED
Thunderbird 36.0
Tracking Status
thunderbird36 --- fixed

People

(Reporter: m_kato, Assigned: rkent)

References

()

Details

(Keywords: crash, Whiteboard: [gs])

Crash Data

Attachments

(1 file, 2 obsolete files)

m_urlQueueIndex can be out of range by OnStopRunningUrl.

This bug was filed from the Socorro interface and is 
report bp-a95c4a66-30fe-465b-8026-b8f612120904 .
============================================================= 
0 	xul.dll 	nsACString_internal::Assign 	xpcom/string/src/nsTSubstring.cpp:381
1 	xul.dll 	nsACString_internal::Assign 	xpcom/string/src/nsTSubstring.cpp:347
2 	xul.dll 	nsMsgSearchSession::GetNextUrl 	mailnews/base/search/src/nsMsgSearchSession.cpp:464
3 	xul.dll 	nsMsgSearchSession::BuildUrlQueue 	mailnews/base/search/src/nsMsgSearchSession.cpp:446
4 	xul.dll 	nsMsgSearchSession::TimerCallback 	mailnews/base/search/src/nsMsgSearchSession.cpp:502
5 	xul.dll 	nsTimerImpl::Fire 	xpcom/threads/nsTimerImpl.cpp:473
6 	xul.dll 	nsTimerEvent::Run 	xpcom/threads/nsTimerImpl.cpp:556
7 	xul.dll 	nsThread::ProcessNextEvent 	xpcom/threads/nsThread.cpp:624
8 	xul.dll 	NS_ProcessNextEvent_P 	objdir-tb/mozilla/xpcom/build/nsThreadUtils.cpp:213
9 	xul.dll 	mozilla::ipc::MessagePump::Run 	ipc/glue/MessagePump.cpp:113
10 	xul.dll 	MessageLoop::RunHandler 	ipc/chromium/src/base/message_loop.cc:201
11 	xul.dll 	MessageLoop::Run 	ipc/chromium/src/base/message_loop.cc:175
12 	xul.dll 	nsBaseAppShell::Run 	widget/xpwidgets/nsBaseAppShell.cpp:163
13 	xul.dll 	nsAppShell::Run 	widget/windows/nsAppShell.cpp:232
14 	xul.dll 	nsAppStartup::Run 	toolkit/components/startup/nsAppStartup.cpp:255
15 	xul.dll 	XREMain::XRE_mainRun 	toolkit/xre/nsAppRunner.cpp:3793
16 	xul.dll 	XREMain::XRE_main 	toolkit/xre/nsAppRunner.cpp:3870
17 	xul.dll 	XRE_main 	toolkit/xre/nsAppRunner.cpp:3946
18 	thunderbird.exe 	NS_internal_main 	mail/app/nsMailApp.cpp:200
19 	thunderbird.exe 	wmain 	toolkit/xre/nsWindowsWMain.cpp:100
20 	thunderbird.exe 	__tmainCRTStartup 	crtexe.c:552
21 	kernel32.dll 	BaseProcessStart
Summary: crash in nsACString_internal::Assign due to out of range access in nsMsgSearchSession::BuildUrlQueue → crash in nsACString_internal::Assign due to out of range access
also bp-aea4e391-b47f-433f-b5d3-c67cc2120904 memcpy | nsACString_internal::Assign(char const*, unsigned int, mozilla::fallible_t const&)  

note not all such signatures fail from OnStopRunningUrl ...  for example bp-ab62f29a-fb56-4f83-84c0-58d9f2120830
Crash Signature: [@ nsACString_internal::Assign(nsACString_internal const&, mozilla::fallible_t const&)] → [@ nsACString_internal::Assign(nsACString_internal const&, mozilla::fallible_t const&)] [@ memcpy | nsACString_internal::Assign(char const*, unsigned int, mozilla::fallible_t const&) ]
Attached patch fix (obsolete) — Splinter Review
Attachment #663903 - Flags: review?(mconley)
Comment on attachment 663903 [details] [diff] [review]
fix

Review of attachment 663903 [details] [diff] [review]:
-----------------------------------------------------------------

So this looks OK to me, but we might want some people more familiar with back-end to make sure I'm doing the right thing.
Attachment #663903 - Flags: review?(mconley) → feedback+
Comment on attachment 663903 [details] [diff] [review]
fix

Irving:

This looks like an OK solution, but it also might be a kind of end-of-pipe-y solution. I'm not sure though, seeing as this mailnews back-endy stuff isn't really my forte.

Care to give this a look?

-Mike
Attachment #663903 - Flags: review?(irving)
Comment on attachment 663903 [details] [diff] [review]
fix

Review of attachment 663903 [details] [diff] [review]:
-----------------------------------------------------------------

Do you have steps to reproduce this? With a bit of digging and help from :rkent I came up with a possible cause for the inconsistent value; I'll attach a Work In Progress patch for feedback.

Whatever we decide to do with my WIP, adding the check from this patch is a good idea.

::: mailnews/base/search/src/nsMsgSearchSession.cpp
@@ +462,5 @@
>      return NS_OK;
>  
> +  // OnStopRunningUrl may set index to out of range
> +  if (m_urlQueueIndex >= m_urlQueue.Length())
> +    return NS_OK;

Because this is error checking for an unusual software state, I'd prefer to see

NS_ENSURE_TRUE(m_urlQueueIndex < m_urlQueue.Length(), NS_ERROR_UNEXPECTED);
Attachment #663903 - Flags: review?(irving) → review-
I'm not completely sure about this patch; it makes a significant change to the logic for when we call GetNextUrl() and it's possible that some of the removed calls are necessary
Attachment #665622 - Flags: feedback?(mozilla)
Attachment #665622 - Flags: feedback?(m_kato)
Attachment #665622 - Flags: feedback?(kent)
Attachment #665622 - Flags: feedback?(m_kato) → feedback+
Comment on attachment 665622 [details] [diff] [review]
WIP - clean up logic for possible underlying cause of this crash

I spent a bit of time looking at this. I think it is OK, but it is worth documenting why.

My concern was what happens when we have needUrlQueue == true but queuedUrl == false. That is the condition that I believe leads to the crash in the first place.

What happens is that in DoNextSearch we call SearchWOUrls(), which makes no sense because if needUrlQueue == true then you really can't search without a URL.

What happens though is that you end up eventually in nsMsgSearchSession::TimeSliceSerial which tries to do the invalid search using:

  nsresult rv = scope->TimeSlice(aDone);

which does on the relevant adapter:

  return m_adapter->Search(aDone);
  
Both news and imap search return errors in that case, and we end up cleaning up with:

 if (*aDone || NS_FAILED(rv))
  {
    EnableFolderNotifications(true);
    ReleaseFolderDBRef();
    m_idxRunningScope++;
    EnableFolderNotifications(false);
    // check if the next scope is an online search; if so,
    // set *aDone to true so that we'll try to run the next
    // search in TimerCallback.
    scope = GetRunningScope();
    if (scope && (scope->m_attribute == nsMsgSearchScope::onlineMail ||
      (scope->m_attribute == nsMsgSearchScope::news && scope->m_searchServer)))
    {
      *aDone = true;
      return rv;
    }
  }

That seems to me to be the correct cleanup. Note however that the above code is another place where you should implement needUrlQueue.
Attachment #665622 - Flags: feedback?(kent) → feedback+
m_kato, are you interested in implementing Kent's suggestion? I don't have time to work on this right now.
Comment on attachment 665622 [details] [diff] [review]
WIP - clean up logic for possible underlying cause of this crash

So is this happening in mixed scope searches, e.g., a saved search that searches over both imap and local folders? I wonder if it's possible to recreate the crash and add a test.

nit - extra blank line after needUrlQueue:

+}
+
+
Attachment #665622 - Flags: feedback?(mozilla) → feedback+
(In reply to Irving Reid (:irving) from comment #8)
> m_kato, are you interested in implementing Kent's suggestion? I don't have
> time to work on this right now.
Flags: needinfo?(m_kato)
Summary: crash in nsACString_internal::Assign due to out of range access → searching crash in nsACString_internal::Assign due to out of range access
(In reply to Irving Reid (:irving) from comment #8)
> m_kato, are you interested in implementing Kent's suggestion? I don't have
> time to work on this right now.
10 of 13 crash comments mention compact. Perhaps they are in unified inbox?
For example bp-6c8da6f1-7158-4071-9be0-186052121111 bp-cf3fafb1-ca2a-4f89-a0d9-8c7bb2121115
so far, all the user I have contacted are using unified folder view - so the search is operating on a virtual folder.
Summary: searching crash in nsACString_internal::Assign due to out of range access → searching virtual crash in nsACString_internal::Assign due to out of range access
top 45 crash for TB17.0.2
Keywords: topcrash
(In reply to David :Bienvenu from comment #9)
> So is this happening in mixed scope searches, e.g., a saved search that
> searches over both imap and local folders? 

Irving, Is there a clue that indicates both type folders are involved?
Also, this hasn't picked been up yet by m_kato - perhaps won't be able to. Are you in a better position now to possibly finish it off?

I am trying to get a user testcase.
Flags: needinfo?(irving)
With combined signatures, it's #36 top crasher in TB 17.0.2 so not a top crasher according to https://wiki.mozilla.org/CrashKill/Topcrash
Keywords: topcrash
sad to see this languish halfway

(In reply to David :Bienvenu from comment #9)
> Comment on attachment 665622 [details] [diff] [review]
> WIP - clean up logic for possible underlying cause of this crash
> 
> So is this happening in mixed scope searches, e.g., a saved search that
> searches over both imap and local folders? I wonder if it's possible to
> recreate the crash and add a test.

I might have a user of Unified Folders in bp-b23140ab-c30d-4cdc-b84b-001062130211.
We had a user n bug 739949, but that ship has passed.
bp-4ed28f32-e756-4b07-8cdc-e761f2121115 uses Unified but cannot reproduce

I don't think we can't depend on finding a user to move this bug along 


(In reply to Scoobidiver from comment #16)
> With combined signatures, it's #36 top crasher in TB 17.0.2 so not a top
> crasher according to https://wiki.mozilla.org/CrashKill/Topcrash

I expect the sum of crashes makes this easily top 20-25 crash
https://crash-stats.mozilla.com/query/query?product=Thunderbird&version=ALL%3AALL&range_value=1&range_unit=weeks&date=02%2F20%2F2013+14%3A14%3A52&query_search=signature&query_type=contains&query=nsACString_internal%3A%3AAssign&reason=&build_id=&process_type=any&hang_type=any&do_query=1
Blocks: 739949
Keywords: topcrash
Summary: searching virtual crash in nsACString_internal::Assign due to out of range access → searching virtual crash in nsACString_internal::Assign due to out of range access from nsMsgSearchSession::GetNextUrl. Some compact related
Hi,
Wayne suggested I join this bug report because this crash has happened to me several times and so I might be able to help.
I'm happy to try and help but I'm not a techie, so if you want to ask questions, etc. words of one syllable please.
(In reply to DaveL from comment #18)
> Hi,
> Wayne suggested I join this bug report because this crash has happened to me
> several times and so I might be able to help.
> I'm happy to try and help but I'm not a techie, so if you want to ask
> questions, etc. words of one syllable please.

Hi,

I'm here for the same reason as DaveL. Furthermore, I think I might qualify as techie, so maybe I can cope with two syllable words wrt. tech stuff ;)

So, please advise how I could help.
I suppose that I could finish this patch adding my suggestions from comment 7, but irving who would normally review probably should not review since it is his patch to begin with. We are sort of running out reviewers here (my other patches to Irving and Standard8 are at 13 days and counting now, so I am guessing they are probably absorbed in other Mozilla work now.)

I could also do the review if someone else finished the patch. Would any qualified reviewer be willing to review if I finished the patch?
curious that coders no outstrip reviewers. :)

Kent, given your deep understanding of the issues involved, I suggest plunging ahead to produce an updated patch and we'll be resourceful in getting the review done - maybe Neil?  Or can magnus review this if you review his patch in bug 558452? :
Flags: needinfo?(m_kato) → needinfo?(neil)
Only DoNextSearch() calls BuildUrlQueue() and it does so only if the current (m_idxRunningScope) search scope term is online. BuildUrlQueue() then tries to add all of the remaining online search scope terms to the URL queue. We then run the URLs, skipping the search scope terms as we go! This looks wrong if it is possible to have a mixture of online and offline search scope terms.
The only reason I can see for the crash is if no online terms have a search adapter. I don't know what a search adapter is, or what should happen if we hit such a search term.
I agree with the comments regarding needUrlQueue though.
Flags: needinfo?(neil)
Just had another crash.
Id: http://crash-stats.mozilla.com/report/index/bp-f64f74b2-3e64-4991-98af-74f0d2130226.
Sequence:
1.delete item from inbox
2. Popup: do you want to compact now?
3. Yes: crash
Running with unified view and with TB Conversations add-on.
First for a while.
Flags: needinfo?(irving)
With combined signatures, it's #32 crasher in TB 17.0.5.
Keywords: topcrash
(In reply to Scoobidiver from comment #24)
> With combined signatures, it's #32 crasher in TB 17.0.5.

But there are more specific signatures than currently cited in the field. (ref comment 17)  For example ...

mozalloc_abort | NS_DebugBreak_P | nsACString_internal::Assign -- bp-d0725e1d-1240-4aa5-8a59-247c92130424 cites compact // bp-4409b8c6-a61c-4cde-842f-ad3262130410 emptying trash

nsACString_internal::Assign -- bp-b2f19de7-2ce5-4d54-a3db-534bb2130417 compact

mozalloc_abort(char const* const) | NS_DebugBreak_P | nsACString_internal::Assign(char const*, unsigned int) -- bp-c6f38f53-dffb-48f6-a99d-5a2692130426
Crash Signature: [@ nsACString_internal::Assign(nsACString_internal const&, mozilla::fallible_t const&)] [@ memcpy | nsACString_internal::Assign(char const*, unsigned int, mozilla::fallible_t const&) ] → [@ nsACString_internal::Assign(nsACString_internal const&, mozilla::fallible_t const&)] [@ memcpy | nsACString_internal::Assign(char const*, unsigned int, mozilla::fallible_t const&) ] [@ mozalloc_abort | NS_DebugBreak_P | nsACString_internal::Assign ] …
Keywords: topcrash
Kent, does comment 22 offer enough additional fodder for you to proceed?  (hopefully)
Flags: needinfo?(kent)
Summary: searching virtual crash in nsACString_internal::Assign due to out of range access from nsMsgSearchSession::GetNextUrl. Some compact related → searching virtual crash in nsACString_internal::Assign due to out of range access from nsMsgSearchSession::GetNextUrl. Often compact related.
I've spent a couple of days now playing with the search session logic, trying to see if I am comfortable changing this (because if I change it, I'm afraid that I will own any future issues).

Perhaps in agreement with comment 22, the logic of managing multiple URLs does not seem to make any sense to me. On each call to DoNextSearch where the current scope requires a URL, we append a list of all of the urls to the queue. Doesn't that mean that if there is a random mix of scopes that need or don't need urls, then we will add the urls multiple times?

It also seems like there is the assumption in OnStopRunningUrl that the current running scope & adapter match the url, yet it seems like urls are pulled from the queue independently of the running scope.

I've spent two days trying to figure out how this is supposed to work, and can't convince myself that it does. What I am tempted to do (and have a patch for) is to strip the whole idea of a url queue, and just have a single active url that matches the scope and adapter.

Does anyone understand this better than I do, to convince that is a bad idea?
Flags: needinfo?(kent)
Kent thanks for the analysis!

(In reply to Kent James (:rkent) from comment #27)
> I've spent two days trying to figure out how this is supposed to work, and
> can't convince myself that it does. What I am tempted to do (and have a
> patch for) is to strip the whole idea of a url queue, and just have a single
> active url that matches the scope and adapter.
> 
> Does anyone understand this better than I do, to convince that is a bad idea?

There being no objection ... do you want to float the patch further?
Flags: needinfo?(kent)
Not a significant enough crasher to track specifically for 24, but we'll obviously take a patch if we get it.
(In reply to Wayne Mery (:wsmwk) from comment #28)
> Kent thanks for the analysis!
> 
> (In reply to Kent James (:rkent) from comment #27)
> > I've spent two days trying to figure out how this is supposed to work, and
> > can't convince myself that it does. What I am tempted to do (and have a
> > patch for) is to strip the whole idea of a url queue, and just have a single
> > active url that matches the scope and adapter.
> > 
> > Does anyone understand this better than I do, to convince that is a bad idea?
> 
> There being no objection ... do you want to float the patch further?

m_kato, might you be interested in finishing patch?
Flags: needinfo?(m_kato)
https://getsatisfaction.com/mozilla_messaging/tags/bug_789397 reports crashing on every automatic and manual compact
(marginally top 20, so still topcrash.
skiplisting nsACString_internal::Assign would really nice)
Crash Signature: [@ nsACString_internal::Assign(nsACString_internal const&, mozilla::fallible_t const&)] [@ memcpy | nsACString_internal::Assign(char const*, unsigned int, mozilla::fallible_t const&) ] [@ mozalloc_abort | NS_DebugBreak_P | nsACString_internal::Assign ] … → [@ mozalloc_abort | NS_DebugBreak_P | nsACString_internal::Assign ] [@ nsACString_internal::Assign ] [@ mozalloc_abort(char const* const) | NS_DebugBreak_P | nsACString_internal::Assign(char const*, unsigned int) ] [@ nsACString_internal::Assign(nsACSt…
Whiteboard: [gs] → [patchlove][has draft patch][gs]
Depends on: 557234
bp-7f138340-4e13-4330-be7f-f21c82130717 David reports using Unified Folders view and "crashes only occur when Thunderbird initiates the process. [i.e. automatic compact triggered by an immediately preceding delete message] When Thunderbird is still - not pulling in any emails - I can compact the folders myself, and 99% of the time, the process completes"
(In reply to :Irving Reid from comment #6)
> Created attachment 665622 [details] [diff] [review]
> WIP - clean up logic for possible underlying cause of this crash
> 
> I'm not completely sure about this patch; it makes a significant change to
> the logic for when we call GetNextUrl() and it's possible that some of the
> removed calls are necessary

Irving, have subsequent patch revisions, or comments satisfied your concerns on this point?

I'd like to do this as a try build with some users who can reproduce. If the patch is flawed, what ill effects should user watch for?


(In reply to David :Bienvenu from comment #9)
> Comment on attachment 665622 [details] [diff] [review]
> WIP - clean up logic for possible underlying cause of this crash
> 
> So is this happening in mixed scope searches, e.g., a saved search that
> searches over both imap and local folders?

"Yes" would be my answer - many of the crash users are using unified folder view.  (However, I have not determined if 100% of cases are mixed scope)

> I wonder if it's possible to recreate the crash and add a test.

If you mean users, we have users who can reproduce.
Flags: needinfo?(irving)
Summary: searching virtual crash in nsACString_internal::Assign due to out of range access from nsMsgSearchSession::GetNextUrl. Often compact related. → searching virtual folder. crash in nsACString_internal::Assign due to out of range access from nsMsgSearchSession::GetNextUrl. Often compact related.
This patch needs someone to integrate Kent's remarks; at this point I don't have time to put further work into this.
Flags: needinfo?(irving)
Wayne asked today in the TB meeting if someone would take a look at this, and I agreed to look at it again.
Flags: needinfo?(kent)
Attached patch 789397_v3.patchSplinter Review
So here's my theory. The urlQueue was included to support running multiple URLs in parallel, but in fact that was never actually implemented. The logic to support those queues has issues. So I just eliminated the queues and only use a single URL.

I'm not real happy with the error handling in all of this, but I did not really change this. Evidence seems to be that the error happens in cases that are likely to be generating errors. In error cases, the errors seem to be dropped, which would result in the search never terminating. I suppose that is better than crashing.
Attachment #8460753 - Flags: review?(irving)
Comment on attachment 8460753 [details] [diff] [review]
789397_v3.patch

Review of attachment 8460753 [details] [diff] [review]:
-----------------------------------------------------------------

removing unnecessary complication makes me almost as happy as these fresh local plums i'm eating
Attachment #8460753 - Flags: review?(irving) → review+
for unknown reason - no longer topcrash

signature for version 24 is nsACString_internal::Assign(nsACString_internal const&, mozilla::fallible_t const&) | nsACString_internal::Assign(nsACString_internal const&) | nsMsgSearchSession::GetNextUrl()

and in version 31 there is some OOM | large | NS_ABORT_OOM(unsigned int) | nsACString_internal::Assign(nsACString_internal const&) | nsMsgSearchSession::GetNextUrl()
Crash Signature: , unsigned int) ] [@ nsACString_internal::Assign(nsACString_internal const&, mozilla::fallible_t const&)] [@ memcpy | nsACString_internal::Assign(char const*, unsigned int, mozilla::fallible_t const&) ] → , unsigned int) ] [@ nsACString_internal::Assign(nsACString_internal const&, mozilla::fallible_t const&)] [@ memcpy | nsACString_internal::Assign(char const*, unsigned int, mozilla::fallible_t const&) ] [@ nsACString_internal::Assign(nsACString_interna…
Flags: needinfo?(m_kato)
Whiteboard: [patchlove][has draft patch][gs] → [gs]
Assignee: nobody → kent
Status: NEW → ASSIGNED
Checked in https://hg.mozilla.org/comm-central/rev/33d45705a4d1
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Target Milestone: --- → Thunderbird 36.0
Attachment #663903 - Attachment is obsolete: true
Attachment #665622 - Attachment is obsolete: true
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: