Last Comment Bug 770262 - crash in mozalloc_abort ... nsMsgSearchTerm::MatchBody
: crash in mozalloc_abort ... nsMsgSearchTerm::MatchBody
Status: RESOLVED FIXED
[gs][gssolved]
: crash, regression, topcrash+
Product: MailNews Core
Classification: Components
Component: Search (show other bugs)
: Trunk
: x86 Windows NT
: -- critical (vote)
: Thunderbird 18.0
Assigned To: Kent James (:rkent)
:
Mentors:
https://getsatisfaction.com/mozilla_m...
: 786955 787626 790172 790282 813903 833243 (view as bug list)
Depends on:
Blocks: 737164
  Show dependency treegraph
 
Reported: 2012-07-02 11:04 PDT by Wayne Mery (:wsmwk, NI for questions)
Modified: 2013-01-28 03:29 PST (History)
20 users (show)
rkent: in‑testsuite-
See Also:
Crash Signature:
QA Whiteboard:
Iteration: ---
Points: ---
+
fixed
+
fixed
fixed


Attachments
Check for errors (1.36 KB, patch)
2012-08-29 11:45 PDT, Kent James (:rkent)
no flags Details | Diff | Review
Rev2: Consolidation of comments (1.32 KB, patch)
2012-08-31 15:00 PDT, Kent James (:rkent)
neil: review+
standard8: approval‑comm‑aurora+
standard8: approval‑comm‑beta+
standard8: approval‑comm‑release+
Details | Diff | Review

Description Wayne Mery (:wsmwk, NI for questions) 2012-07-02 11:04:48 PDT
This bug was filed from the Socorro interface and is 
report bp-cf3afdb6-5bb7-4c04-9042-080e12120515 .
============================================================= 

15.0a1 is where crash first appears. And seen in 
15.0a2 	20120630
16.0a1 	20120629

0	mozalloc.dll	mozalloc_abort	memory/mozalloc/mozalloc_abort.cpp:79
1	xul.dll	NS_DebugBreak_P	xpcom/base/nsDebugImpl.cpp:404
2	xul.dll	nsACString_internal::SetCapacity	xpcom/string/src/nsTSubstring.cpp:565
3	xul.dll	nsACString_internal::SetLength	xpcom/string/src/nsTSubstring.cpp:614
4	xul.dll	nsMsgSearchTerm::MatchBody	mailnews/base/search/src/nsMsgSearchTerm.cpp:993
5	xul.dll	nsMsgSearchOfflineMail::ProcessSearchTerm	mailnews/base/search/src/nsMsgLocalSearch.cpp:529
6	xul.dll	nsMsgSearchBoolExpression::OfflineEvaluate	mailnews/base/search/src/nsMsgLocalSearch.cpp:172
7	xul.dll	nsMsgSearchBoolExpression::OfflineEvaluate	mailnews/base/search/src/nsMsgLocalSearch.cpp:193
8	xul.dll	nsMsgSearchBoolExpression::OfflineEvaluate	mailnews/base/search/src/nsMsgLocalSearch.cpp:193
9	xul.dll	nsMsgSearchOfflineMail::MatchTerms	mailnews/base/search/src/nsMsgLocalSearch.cpp:720
10	xul.dll	nsMsgSearchOfflineMail::MatchTermsForSearch	mailnews/base/search/src/nsMsgLocalSearch.cpp:364
11	xul.dll	nsMsgSearchOfflineMail::Search	mailnews/base/search/src/nsMsgLocalSearch.cpp:770
12	xul.dll	nsMsgSearchScopeTerm::TimeSlice	mailnews/base/search/src/nsMsgSearchTerm.cpp:1921
13	xul.dll	nsMsgSearchSession::TimeSliceSerial	mailnews/base/search/src/nsMsgSearchSession.cpp:676
14	xul.dll	nsMsgSearchSession::TimerCallback	mailnews/base/search/src/nsMsgSearchSession.cpp:524
15	xul.dll	nsTimerImpl::Fire	xpcom/threads/nsTimerImpl.cpp:508
Comment 1 Wayne Mery (:wsmwk, NI for questions) 2012-07-02 11:13:14 PDT
#1 crash in dailies
Comment 2 Wayne Mery (:wsmwk, NI for questions) 2012-08-07 13:55:41 PDT
Mark - suggest this is needs to be fixed for TB15 - #2 crash for tb15 beta 

(if one excludes #2 crash arena_dalloc | libsystem_c.dylib@0x2d8c7 coming from a single Mac user)
Comment 3 Wayne Mery (:wsmwk, NI for questions) 2012-08-07 14:01:07 PDT
suspect also Mac crash TouchBadMemory | mozalloc_abort | NS_DebugBreak_P | nsACString_internal::SetLength bp-43aa4258-b5f2-41d4-aa51-1d28b2120806
Comment 4 Kent James (:rkent) 2012-08-08 10:48:35 PDT
This is probably a regression from Bug 737164 - Make strings malloc-infallible by default

Without trying to understand the root causes, it might be possible to prevent the crash by switching buf from nsCAutoString to nsCString, or we could catch the OOM error with the new fallible forms of the string function.
Comment 5 Mark Banner (:standard8) 2012-08-08 10:53:11 PDT
I was casually thinking about this earlier, and wondering if this line:

http://hg.mozilla.org/comm-central/annotate/b79072737e40/mailnews/base/search/src/nsMsgSearchTerm.cpp#l993

could be calling SetLength with -1.

What I don't really like is the way the previous function call plays around with the buffer of the nsCAutoString without actually using that object. I don't think that would cause an issue but it doesn't really look nice or obvious.
Comment 6 Mark Banner (:standard8) 2012-08-08 10:55:34 PDT
Oh, on the SetLength issue, I could see a buf such as "=\n" causing the crash (because MsgStripQuotedPrintable would strip this to "" - zero length string and with softLineBreak = 1 would give a -1 to SetLength)

Although I'm not convinced we'd actually get a situation where that buf possible.
Comment 8 Kent James (:rkent) 2012-08-29 09:23:56 PDT
The easiest solution to this would be to just use the fallible form of the SetCapacity function, like this example from nsXMLHttpRequest.cpp:

854   if (!mResponseText.SetCapacity(mResponseText.Length() + destBufferLen, fallible_t())) {
855     return NS_ERROR_OUT_OF_MEMORY;
Comment 9 Kent James (:rkent) 2012-08-29 11:45:51 PDT
Created attachment 656540 [details] [diff] [review]
Check for errors

I cannot reproduce this crash, so I'm simply adding checks in this patch. Not asking for review yet since I have not build yet on TB 18, only on TB 17.
Comment 10 Wayne Mery (:wsmwk, NI for questions) 2012-08-29 13:59:21 PDT
TB16 signature for Mac is mozalloc_abort | NS_DebugBreak_P | nsACString_internal::SetLength  bp-cade7bef-b36c-4470-93bd-b0f812120809
Comment 11 Makoto Kato [:m_kato] (PTO 6/20-21, 6/24) 2012-08-29 19:35:21 PDT
Comment on attachment 656540 [details] [diff] [review]
Check for errors

Should we use buf.Length() instead of strlen(buf.get())?
Comment 12 Makoto Kato [:m_kato] (PTO 6/20-21, 6/24) 2012-08-29 19:46:18 PDT
Also, MsgStripQuotedPrintable ((unsigned char*)buf.get()); is invalid usage for Mozilla code-style. If we need writable access, you have to use BeginWriting()/EndWriting() instread of
Comment 13 Makoto Kato [:m_kato] (PTO 6/20-21, 6/24) 2012-08-29 20:00:22 PDT
Comment on attachment 656540 [details] [diff] [review]
Check for errors

Sorry, I cancel this.  MsgStripQuotedPrintable can causes bug easy.  We should use nsACString by this function.
Comment 14 Andrew Sutherland [:asuth] 2012-08-29 20:09:49 PDT
I think you'll want to ask :bienvenu, :rkent, :NeilAway, or :irving (reid) for review on this for your next rev of the patch.
Comment 15 Ludovic Hirlimann [:Usul] 2012-08-30 01:41:59 PDT
*** Bug 786955 has been marked as a duplicate of this bug. ***
Comment 16 Kent James (:rkent) 2012-08-30 11:00:47 PDT
:asuth I am not a valid reviewer for that patch because a) I wrote it, and b) I am not a legal reviewer.

To comment 12: My preference for this would be to do the most conservative patch possible, with the hopes of getting this into a 15.0.1 release. So though there are aspects to the underlying code I don't really like, any change beyond the more simple check is probably too risky for 15.0.1

I'm going to ask for review, with the one caveat: I don't really understand how mozilla::fallible_t() works, but that seems to be the way this check is done.
Comment 17 neil@parkwaycc.co.uk 2012-08-30 12:17:34 PDT
Comment on attachment 656540 [details] [diff] [review]
Check for errors

I'm confused here, because you're checking that you're not passing ~0U to SetLength but you're also calling the fallible version?
Comment 18 Kent James (:rkent) 2012-08-30 13:28:21 PDT
I am doing that because I do not know really why it is crashing, so I am adding as many checks as feasible.

And what I am trying to avoid with the first check is not doing SetLength((uint32_t)(-1))
Comment 19 :Irving Reid (No longer working on Firefox) 2012-08-31 11:01:55 PDT
(In reply to Makoto Kato from comment #11)
> 
> Should we use buf.Length() instead of strlen(buf.get())?

Because MsgStripQuotedPrintable alters the underlying string, I suspect buf.Length() might be invalid. strlen() on the underlying buffer gets the altered length.

(In reply to Makoto Kato from comment #13)
> 
> Sorry, I cancel this.  MsgStripQuotedPrintable can causes bug easy.  We
> should use nsACString by this function.

I agree; maybe not for a 15.0.1 patch but certainly for a permanent solution.


I'd rather not hide our logic behind a ternary operator; how about

size_t bufLength = strlen(buf.get());
if ((bufLength > 0) && softLineBreak)
  bufLength -= 1;
buf.setLength(bufLength);

I don't have a strong opinion about whether to also use fallible_t.
Comment 20 Kent James (:rkent) 2012-08-31 12:28:43 PDT
(In reply to Irving Reid (:irving) from comment #19)
> 
> I'd rather not hide our logic behind a ternary operator; how about
> 
> size_t bufLength = strlen(buf.get());
> if ((bufLength > 0) && softLineBreak)
>   bufLength -= 1;
> buf.setLength(bufLength);
>

In the past, I've had reviews rejected because I added extra code like that for explanatory purposes rather than collapse the result into a ternary operator. Neil, before I bother to change this could you tell me your preference?

> I don't have a strong opinion about whether to also use fallible_t.

The reason for including this is that we really don't know what is causing the crash. using -1 for bufLength is one theory, but in case that is wrong don't you think we should include the other option?

I did this patch because of Wayne's flags, not because I am the expert or have some special insight here. Irving feel free to propose your own patch if you like - it might move this forward more quickly.
Comment 21 neil@parkwaycc.co.uk 2012-08-31 12:58:55 PDT
(In reply to Kent James from comment #20)
> (In reply to Irving Reid from comment #19)
> > I'd rather not hide our logic behind a ternary operator; how about
> > 
> > size_t bufLength = strlen(buf.get());
> > if ((bufLength > 0) && softLineBreak)
> >   bufLength -= 1;
> > buf.setLength(bufLength);
> 
> In the past, I've had reviews rejected because I added extra code like that
> for explanatory purposes rather than collapse the result into a ternary
> operator. Neil, before I bother to change this could you tell me your
> preference?
I slightly prefer this version in the case, as long as you use -- ;-)

> > I don't have a strong opinion about whether to also use fallible_t.
> 
> The reason for including this is that we really don't know what is causing
> the crash. using -1 for bufLength is one theory, but in case that is wrong
> don't you think we should include the other option?
> 
Are we seeing crashes on trunk/aurora, in which case we can tell if it works?
(i.e. I'd rather avoid fallible_t if I can.)
Comment 22 Wayne Mery (:wsmwk, NI for questions) 2012-08-31 13:40:40 PDT
(In reply to neil@parkwaycc.co.uk from comment #21)
> Are we seeing crashes on trunk/aurora, in which case we can tell if it works?

There are roughly 4-5 crashes per day per version for TB16 and Tb17, so the crash is prevalent enough that we should know the impact within a day or two of landing a fix.  Not so for trunk - there are not enough users there - TB17.0a1 had only 1-2 crashes per week for the past month.  If you want to know the impact you are going to have to land it on aurora IMO.
Comment 23 :Irving Reid (No longer working on Firefox) 2012-08-31 13:42:27 PDT
(In reply to Kent James (:rkent) from comment #20)
> In the past, I've had reviews rejected because I added extra code like that
> for explanatory purposes rather than collapse the result into a ternary
> operator.

I will happily join you in fighting those people.

(In reply to neil@parkwaycc.co.uk from comment #21)
> I slightly prefer this version in the case, as long as you use -- ;-)

I was wondering if that would push anyone's button ;->

> Are we seeing crashes on trunk/aurora, in which case we can tell if it works?
> (i.e. I'd rather avoid fallible_t if I can.)

I say we land it without fallible_t, ask standard8 to push it to aurora, and watch crashes for a few days.
Comment 24 Kent James (:rkent) 2012-08-31 15:00:28 PDT
Created attachment 657456 [details] [diff] [review]
Rev2: Consolidation of comments
Comment 25 Kent James (:rkent) 2012-09-01 10:42:11 PDT
Comment on attachment 657456 [details] [diff] [review]
Rev2: Consolidation of comments

Checked in https://hg.mozilla.org/comm-central/rev/25cd109a6a33
Comment 26 Wayne Mery (:wsmwk, NI for questions) 2012-09-03 06:25:18 PDT
Comment on attachment 657456 [details] [diff] [review]
Rev2: Consolidation of comments

standard8?

[Approval Request Comment]
ref comment 23
Comment 27 Ludovic Hirlimann [:Usul] 2012-09-04 07:33:19 PDT
*** Bug 787626 has been marked as a duplicate of this bug. ***
Comment 28 Mark Banner (:standard8) 2012-09-04 11:55:35 PDT
Comment on attachment 657456 [details] [diff] [review]
Rev2: Consolidation of comments

[Triage Comment]
Lets get this landed on aurora and beta for some testing.
Comment 29 Kent James (:rkent) 2012-09-04 12:06:24 PDT
Mark, can you please do the beta and aurora checkins?
Comment 31 Mark Banner (:standard8) 2012-09-07 07:55:57 PDT
Comment on attachment 657456 [details] [diff] [review]
Rev2: Consolidation of comments

[Triage Comment]
Taking for release for a 15.0.1.
Comment 32 Mark Banner (:standard8) 2012-09-07 07:59:39 PDT
Checked in:

http://hg.mozilla.org/releases/comm-release/rev/b4366b1e750c
Comment 33 javad 2012-09-11 10:25:29 PDT
seems to hve been fixed in 15.0.1
Comment 34 Wayne Mery (:wsmwk, NI for questions) 2012-09-13 12:31:31 PDT
*** Bug 790172 has been marked as a duplicate of this bug. ***
Comment 35 Wayne Mery (:wsmwk, NI for questions) 2012-09-13 17:34:31 PDT
also being reported as solved in getstatisfaction.
crash-stats looks pretty clean.
Comment 36 ole.langbehn 2012-09-16 07:14:37 PDT
*** Bug 790282 has been marked as a duplicate of this bug. ***
Comment 37 klonos 2012-10-03 01:24:45 PDT
...coming from bug 790282.

Still seeing this in latest 18.0a1 nightlies (x64 on Win7 x64):

- manually create a simple message filter rule (move email from Inbox to certain subfolder).
- save filter and execute it on inbox.
- tb crashes.
- after restarting tb, the message is *copied* (instead of moved) in the folder specified in the filter. Original message remain in Inbox.

How can I troubleshoot this further?

PS: I'm using mail.serverDefaultStoreContractID @mozilla.org/msgstore/maildirstore;1 but I've had this with the default @mozilla.org/msgstore/berkeleystore;1 too. Don't know if that's useful info, but I thought I should mention it since it deviates from the default config.
Comment 38 Ludovic Hirlimann [:Usul] 2012-10-03 01:29:14 PDT
(In reply to klonos from comment #37)
> ...coming from bug 790282.
> 
> Still seeing this in latest 18.0a1 nightlies (x64 on Win7 x64):
> 
> - manually create a simple message filter rule (move email from Inbox to
> certain subfolder).
> - save filter and execute it on inbox.
> - tb crashes.
> - after restarting tb, the message is *copied* (instead of moved) in the
> folder specified in the filter. Original message remain in Inbox.
> 
> How can I troubleshoot this further?
> 
> PS: I'm using mail.serverDefaultStoreContractID
> @mozilla.org/msgstore/maildirstore;1 but I've had this with the default
> @mozilla.org/msgstore/berkeleystore;1 too. Don't know if that's useful info,
> but I thought I should mention it since it deviates from the default config.

Try to catch it in a debugger and bug some devs on #maildev when you do (try rkent, irving or standard8)
Comment 39 Wayne Mery (:wsmwk, NI for questions) 2012-10-03 04:41:40 PDT
please file a new bug
Comment 40 Mark Banner (:standard8) 2012-10-03 04:59:45 PDT
(In reply to klonos from comment #37)
> ...coming from bug 790282.
> 
> Still seeing this in latest 18.0a1 nightlies (x64 on Win7 x64):

You're likely seeing something different, especially as I see no crash reports for this crash on 18.0a1. As suggested, please file a new bug and include your steps to repeat and crash ids (https://support.mozillamessaging.com/en-US/kb/Mozilla-Crash-Reporter#w_viewing-crash-reports).

> PS: I'm using mail.serverDefaultStoreContractID
> @mozilla.org/msgstore/maildirstore;1 but I've had this with the default
> @mozilla.org/msgstore/berkeleystore;1 too. Don't know if that's useful info,
> but I thought I should mention it since it deviates from the default config.

Please also mention this, it is entirely possible we've fixed the default case but not the maildirstore one (or they are just different cases anyway).
Comment 41 klonos 2012-10-03 19:07:07 PDT
Yes, you are right, the title of the other bug marked as duplicate of this one here had a misleading title (using the term "message filter" when it was actually referring to message quicksearch).

I went ahead and filed bug 797659
Comment 42 Wayne Mery (:wsmwk, NI for questions) 2013-01-27 14:55:54 PST
*** Bug 813903 has been marked as a duplicate of this bug. ***
Comment 43 Wayne Mery (:wsmwk, NI for questions) 2013-01-28 03:29:40 PST
*** Bug 833243 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.