Last Comment Bug 857678 - crash in nsMsgDatabase::RowCellColumnToCollationKey
: crash in nsMsgDatabase::RowCellColumnToCollationKey
Status: RESOLVED FIXED
[tbird betatopcrash][regression:TB22]
: crash, regression, topcrash
Product: MailNews Core
Classification: Components
Component: Database (show other bugs)
: 22
: All All
: -- critical (vote)
: Thunderbird 26.0
Assigned To: Magnus Melin
:
Mentors:
Depends on:
Blocks: 834757
  Show dependency treegraph
 
Reported: 2013-04-03 10:59 PDT by Wayne Mery (:wsmwk, NI for questions)
Modified: 2013-09-10 02:15 PDT (History)
11 users (show)
See Also:
Crash Signature:
QA Whiteboard:
Iteration: ---
Points: ---
+
fixed
fixed
fixed
fixed
fixed


Attachments
proposed fix (2.16 KB, patch)
2013-06-14 11:26 PDT, Magnus Melin
Pidgeot18: review+
jh: feedback-
Details | Diff | Splinter Review
Possible patch (629 bytes, patch)
2013-06-27 01:32 PDT, neil@parkwaycc.co.uk
Pidgeot18: review+
standard8: approval‑comm‑aurora+
standard8: approval‑comm‑beta+
standard8: approval‑comm‑release+
Details | Diff | Splinter Review
Another possible fix (14.18 KB, patch)
2013-07-15 18:25 PDT, Hiroyuki Ikezoe (:hiro)
neil: review-
jh: feedback+
Details | Diff | Splinter Review
Add null check barrier (1.95 KB, patch)
2013-07-24 01:11 PDT, Hiroyuki Ikezoe (:hiro)
no flags Details | Diff | Splinter Review
Add null check barrier (1003 bytes, patch)
2013-07-24 01:14 PDT, Hiroyuki Ikezoe (:hiro)
neil: review-
Details | Diff | Splinter Review
Add null check barrier v2 (1009 bytes, patch)
2013-08-20 01:38 PDT, Hiroyuki Ikezoe (:hiro)
neil: review+
standard8: approval‑comm‑aurora+
standard8: approval‑comm‑beta+
Details | Diff | Splinter Review

Description Wayne Mery (:wsmwk, NI for questions) 2013-04-03 10:59:45 PDT
This bug was filed from the Socorro interface and is 
report bp-ff9b71a1-3473-432e-892c-42b6a2130323 .
============================================================= 
unknown reporter "Crashing when accessing a newsgroup."

This bug was filed from the Socorro interface and is 
report bp-ff9b71a1-3473-432e-892c-42b6a2130323 .
============================================================= 
0	libxul.so	nsMsgDatabase::RowCellColumnToCollationKey	objdir-tb/mozilla/dist/include/nsCharTraits.h:514
1	libxul.so	morkBookAtom::EqualFormAndBody const	db/mork/src/morkAtom.cpp:450
2	libxul.so	morkMap::find const	db/mork/src/morkMap.cpp:268
3	libxul.so	morkMap::Get	db/mork/src/morkMap.cpp:698
4	libxul.so	nsMsgDBView::GetCollationKey	mailnews/base/src/nsMsgDBView.cpp:4067
5	libxul.so	nsMsgDatabase::YarnTonsString	objdir-tb/mozilla/dist/include/nsTSubstring.h:503
6	libxul.so	nsMsgDatabase::RowCellColumnTonsString	mailnews/db/msgdb/src/nsMsgDatabase.cpp:3483
7	libxul.so	nsMsgDatabase::GetPropertyAsNSString	mailnews/db/msgdb/src/nsMsgDatabase.cpp:3933
8	libxul.so	nsMsgDBView::GetCurCustomColumn	objdir-tb/mozilla/dist/include/nsCOMPtr.h:453
9	libxul.so	nsMsgDBView::GetColumnHandler	objdir-tb/mozilla/dist/include/nsTSubstring.h:85
10	libxul.so	nsMsgDBView::GetCurColumnHandlerFromDBInfo	objdir-tb/mozilla/dist/include/nsTSubstring.h:85
11	libxul.so	nsMsgDatabase::GetMsgHdrForKey	mailnews/db/msgdb/src/nsMsgDatabase.cpp:1813
Comment 1 Wayne Mery (:wsmwk, NI for questions) 2013-05-15 11:22:55 PDT
perhaps regression from bug 834757.
first crash 2013-03-21 bp-a15d794c-ae78-4fea-8974-b2b262130321 22.0a1

0	msvcr100.dll	zzz_AsmCodeRange_Begin	f:\dd\vctools\crt_bld\SELF_X86\crt\src\INTEL\strlen.asm:81
1	xul.dll	nsMsgDatabase::RowCellColumnToCollationKey	mailnews/db/msgdb/src/nsMsgDatabase.cpp:3645
2	xul.dll	nsMsgHdr::GetSubjectCollationKey	mailnews/db/msgdb/src/nsMsgHdr.cpp:694
3	xul.dll	nsMsgDBView::GetCollationKey	mailnews/base/src/nsMsgDBView.cpp:4067
4	xul.dll	nsMsgDBView::GetIndexForThread	mailnews/base/src/nsMsgDBView.cpp:5072
5	xul.dll	nsMsgSearchDBView::AddHdrFromFolder	mailnews/base/src/nsMsgSearchDBView.cpp:444
6	xul.dll	nsMsgSearchDBView::InsertHdrFromFolder	mailnews/base/src/nsMsgSearchDBView.cpp:658
7	xul.dll	nsMsgXFVirtualFolderDBView::OnSearchHit	mailnews/base/src/nsMsgXFVirtualFolderDBView.cpp:281 

Pidgeot18@12125 3643 err = m_mimeConverter->DecodeMimeHeaderToUTF8(
Pidgeot18@12125 3644 nsDependentCString(nakedString), charSet.get(), characterSetOverride,
Pidgeot18@12125 3645 true, decodedStr);
Comment 2 Joshua Cranmer [:jcranmer] 2013-05-15 12:41:02 PDT
I'm going to guess that nakedString here is null, and it looks like nsDependentCString constructors aren't happy about getting NULL pointers as a value.
Comment 3 Mark Banner (:standard8) (afk until 26th July) 2013-05-15 13:20:48 PDT
(In reply to Joshua Cranmer [:jcranmer] from comment #2)
> I'm going to guess that nakedString here is null, and it looks like
> nsDependentCString constructors aren't happy about getting NULL pointers as
> a value.

Yep, that's a definite
Comment 4 Wayne Mery (:wsmwk, NI for questions) 2013-05-30 07:10:46 PDT
datapoint - a few dozen individuals are crashing in beta in roughly one week of usage, which is a significant rate for our betas. not a plague, but significant. I'm still assessing impact but for now I'd say it would be best to have patched for TB23 beta lest it potentially hinder wider testing.

Is a testcase needed?
Comment 5 Wayne Mery (:wsmwk, NI for questions) 2013-06-14 03:37:36 PDT
bp-655f20b4-4f04-41a4-9a19-fe7282130611 rkrug is a frequent crasher
Comment 6 Magnus Melin 2013-06-14 11:26:11 PDT
Created attachment 762812 [details] [diff] [review]
proposed fix

RowCellColumnToConstCharPtr shouldn't return NS_OK when it doesn't set the outparam.
Comment 7 Joshua Cranmer [:jcranmer] 2013-06-14 11:43:28 PDT
I don't like this patch, since it makes things throw errors that didn't use to happen.

I'd prefer a more minimal fix, which I haven't had time to prepare yet.
Comment 8 Magnus Melin 2013-06-14 12:10:24 PDT
Well, it's certainly against conventions to allow ok when you pass out null - if you can't trust that you end up null-checking everything. Now would give it some time to bake before the ESR, and if there's problems we can always back it out and just null-check nakedString instead, but there might be other cases the current patch also fixes.
Comment 9 :aceman 2013-06-20 12:00:56 PDT
So any other potential reviewers?
Comment 10 Adam Hauner 2013-06-25 03:08:36 PDT
Mozilla/5.0 (Windows NT 5.1; rv:23.0) Gecko/20100101 Firefox/23.0 SeaMonkey/2.20a2
Build identifier: 20130624013001

I was crashing with this bug with this repro:
1. I created view and saved it as folder.
2. When saved view show several e-mails, I moved them all to another folder.
3. In this moment saved view was still showing same count of unread e-mails.
4. When I click next time on saved view name in Folder Pane, SeaMonkey instantly
   crashed.

I didn't try this reproduction steps on another "physical" folder.

Relevant crash report:
https://crash-stats.mozilla.com/report/index/bp-b2344e3a-a798-46ce-abfb-6f6652130625
Comment 11 Jens Hatlak (:InvisibleSmiley) 2013-06-26 02:04:39 PDT
I experience this crash 100% each time I click one of my search folders (which happens to be a "Status is New" search across multiple folders of multiple accounts) in SM 2.19b1. Is there any workaround (other than not clicking the search folder of course)? I already tried compacting all folders.
Comment 12 :aceman 2013-06-26 02:42:06 PDT
The line in question (from the last crash report) was specifically changed by jcranmer in this patch http://hg.mozilla.org/releases/comm-aurora/diff/93e89494abfd/mailnews/db/msgdb/src/nsMsgDatabase.cpp (not sure which bug). Maybe some of the strings is null and it is not handled properly.
Comment 13 Magnus Melin 2013-06-26 03:30:29 PDT
I think it's pretty sure it's nakedString being null... (The check that it's set was bitten by the broken conventions of RowCellColumnToConstCharPtr)
Comment 14 neil@parkwaycc.co.uk 2013-06-27 01:32:26 PDT
Created attachment 768210 [details] [diff] [review]
Possible patch

How about this for a workaround?
Comment 15 Mark Banner (:standard8) (afk until 26th July) 2013-06-27 01:52:33 PDT
Looks like we now need feedback from Joshua.
Comment 16 Justin Wood (:Callek) 2013-06-27 02:22:22 PDT
joshua, for reference, urgency here is a SeaMonkey 2.19 final release due out on Tuesday, July 2. So if we can eliminate this fairly common/large crasher from our users, we'll be very happy. We just need a correctness/risk assessment by someone who knows this code in order for us to land it.
Comment 17 Justin Wood (:Callek) 2013-06-27 11:59:51 PDT
Comment on attachment 768210 [details] [diff] [review]
Possible patch

a=me for comm-release (a SeaMonkey-only branch)
Comment 18 neil@parkwaycc.co.uk 2013-06-27 12:13:06 PDT
Comment on attachment 768210 [details] [diff] [review]
Possible patch

[Approval Request Comment]
Regression caused by (bug #): 834757
User impact if declined: crash
Risk to taking this patch (and alternatives if risky): low
Comment 19 neil@parkwaycc.co.uk 2013-06-27 12:17:01 PDT
http://hg.mozilla.org/comm-central/rev/c95e6993f4df

Dunno whether this counts as fixed or whether someone wants to write some better code.
Comment 20 Magnus Melin 2013-06-27 12:23:14 PDT
Comment on attachment 762812 [details] [diff] [review]
proposed fix

I'd still like this to go in (for trunk). Passing to Joshua for a verdict, seems i never set the review flag for this.
Comment 22 Hiroyuki Ikezoe (:hiro) 2013-07-01 16:05:12 PDT
I've noticed non-null-terminated string in nsParseMailbox.cpp might be the root cause. See bug 877831 comment 13.
Comment 23 Joshua Cranmer [:jcranmer] 2013-07-06 13:04:20 PDT
Comment on attachment 762812 [details] [diff] [review]
proposed fix

The ramifications of this make me nervous, but I don't see anybody who would be negatively impacted by this as far as I could look...
Comment 24 Jens Hatlak (:InvisibleSmiley) 2013-07-11 23:15:41 PDT
I'm still seeing this with SM 2.20b1:
https://crash-stats.mozilla.com/report/index/1e94edaa-e1d1-4220-b361-529cb2130712
Comment 25 :aceman 2013-07-12 00:14:05 PDT
So cay you try the patch?
Comment 26 Jens Hatlak (:InvisibleSmiley) 2013-07-12 01:05:56 PDT
(In reply to :aceman from comment #25)
> So cay you try the patch?

You mean the one from comment 6? Only if someone provided me an SM build (ideally built from comm-beta code) containing the proposed fix within the next 5 hours. After that I'll be AFK for a week. When I'm back I'll be able to build/test myself again.
Comment 27 Hiroyuki Ikezoe (:hiro) 2013-07-15 18:25:19 PDT
Created attachment 776096 [details] [diff] [review]
Another possible fix

Here is another fix for this crash.

I am thinking the root cause of this crash is come from nsParseMailMessageState.cpp because nsParseMailMessageState passes non-null-terminated string to nsMsgHdr (then, to nsMsgDatabase).

This patch can be applied after the fix for bug 877831 is applied.
Comment 28 Jens Hatlak (:InvisibleSmiley) 2013-07-24 00:11:30 PDT
Comment on attachment 762812 [details] [diff] [review]
proposed fix

This patch does not fix the crash I'm experiencing (just compiled two builds off comm-beta tip yesterday, one vanilla and one with just this patch). Will try to create a build with the other patch later today.
Comment 29 Hiroyuki Ikezoe (:hiro) 2013-07-24 01:11:15 PDT
Created attachment 780258 [details] [diff] [review]
Add null check barrier

I don't actually know what will happen once non-null-terminated string has stored in the db.

But anyway this patch might be solve the crash in case of NULL string.
Comment 30 Hiroyuki Ikezoe (:hiro) 2013-07-24 01:12:18 PDT
(In reply to Hiroyuki Ikezoe (:hiro) from comment #29)

> But anyway this patch might be solve the crash in case of NULL string.
this patch *will* solve, I meant.
Comment 31 Hiroyuki Ikezoe (:hiro) 2013-07-24 01:14:39 PDT
Created attachment 780261 [details] [diff] [review]
Add null check barrier

I am sorry the previous patch has garbage.
Comment 32 Jens Hatlak (:InvisibleSmiley) 2013-07-25 00:32:14 PDT
Comment on attachment 776096 [details] [diff] [review]
Another possible fix

This fixed the crash for me. :-) I built from the same c-b/m-b changesets as with my previous try, but now with only the following applied:
* attachment 766955 [details] [diff] [review] from bug 877831 (adapted/applied manually)
* attachment 776096 [details] [diff] [review] (this patch)
* attachment 780261 [details] [diff] [review] (Add null check barrier)

After I had run this build and opened the search folder which triggered the crash with other builds, I can now open the affected search folder with the other builds, too, so for the time being I won't be able to reproduce the crash anymore.
Comment 33 Scoobidiver (away) 2013-08-08 04:46:10 PDT
It's currently #1 top crasher in TB 23.0b1 that contains the possible patch fix.
Comment 35 Scoobidiver (away) 2013-08-11 01:30:06 PDT
Only 17% of crashes happen within one minute.
Comment 36 Wayne Mery (:wsmwk, NI for questions) 2013-08-15 14:26:41 PDT
(In reply to Jens Hatlak (:InvisibleSmiley) from comment #32)
> Comment on attachment 776096 [details] [diff] [review]
> Another possible fix
> 
> This fixed the crash for me. :-) I built from the same c-b/m-b changesets as
> with my previous try, but now with only the following applied:
> * attachment 766955 [details] [diff] [review] from bug 877831
> (adapted/applied manually)
> * attachment 776096 [details] [diff] [review] (this patch)
> * attachment 780261 [details] [diff] [review] (Add null check barrier)
> 
> After I had run this build and opened the search folder which triggered the
> crash with other builds, I can now open the affected search folder with the
> other builds, too, so for the time being I won't be able to reproduce the
> crash anymore.

nice testing

(Mark sure has been stretched thin)
Sounds like we should get these landed for trunk testing so they have potential to get in TB24.
Comment 37 Mark Banner (:standard8) (afk until 26th July) 2013-08-15 14:38:56 PDT
Comment on attachment 776096 [details] [diff] [review]
Another possible fix

I suggest Neil should have a look at these, as I'm going to be afk for a few days.
Comment 38 neil@parkwaycc.co.uk 2013-08-18 14:51:50 PDT
Comment on attachment 780261 [details] [diff] [review]
Add null check barrier

># HG changeset patch
># Parent 2827cd29313bb7118826c9a39a5613cd2ebe3596
># User Hiroyuki Ikezoe <hiikezoe@mozilla-japan.org>
>Bug 857678 - Add null check barrier in nsMsgDatabase::RowCellColumnToCollationKey
>
>diff --git a/mailnews/db/msgdb/src/nsMsgDatabase.cpp b/mailnews/db/msgdb/src/nsMsgDatabase.cpp
>--- a/mailnews/db/msgdb/src/nsMsgDatabase.cpp
>+++ b/mailnews/db/msgdb/src/nsMsgDatabase.cpp
>@@ -3750,17 +3750,17 @@ nsresult nsMsgDatabase::GetCollationKeyG
> }
> 
> nsresult nsMsgDatabase::RowCellColumnToCollationKey(nsIMdbRow *row, mdb_token columnToken, uint8_t **result, uint32_t *len)
> {
>   const char *nakedString = "";
>   nsresult err;
> 
>   err = RowCellColumnToConstCharPtr(row, columnToken, &nakedString);
>-  if (NS_SUCCEEDED(err))
>+  if (NS_SUCCEEDED(err) && nakedString)
I see from YarnTonsCString that mYarn_Buf could be null, but in that case I'd prefer to back out my original patch and replace null with "" after fetching the pointer.
Comment 39 neil@parkwaycc.co.uk 2013-08-18 14:54:17 PDT
Comment on attachment 776096 [details] [diff] [review]
Another possible fix

>-        m_newMsgHdr->SetStringProperty("replyTo", replyTo->value);
>+        m_newMsgHdr->SetStringProperty("replyTo", nsAutoCString(replyTo->value, replyTo->length).get());
What are you hoping to achieve by this? If you're trying to avoid writing nulls then I guess it's too late for that now, there are people with existing summary files with null values in them.
Comment 40 Hiroyuki Ikezoe (:hiro) 2013-08-18 15:07:43 PDT
(In reply to neil@parkwaycc.co.uk from comment #39)
> Comment on attachment 776096 [details] [diff] [review]
> Another possible fix
> 
> >-        m_newMsgHdr->SetStringProperty("replyTo", replyTo->value);
> >+        m_newMsgHdr->SetStringProperty("replyTo", nsAutoCString(replyTo->value, replyTo->length).get());
> What are you hoping to achieve by this? If you're trying to avoid writing
> nulls then I guess it's too late for that now, there are people with
> existing summary files with null values in them.

The purpose is to make the string passing to SetStringProperty null-terminated.
Comment 41 neil@parkwaycc.co.uk 2013-08-19 14:09:24 PDT
Comment on attachment 776096 [details] [diff] [review]
Another possible fix

I don't think this is the right approach, FinializeHeaders (and InternSubject which it calls) should be able to assume that the headers are null-terminated, and wallpapering over unterminated headers isn't a reasonable solution.
Comment 42 Hiroyuki Ikezoe (:hiro) 2013-08-19 16:50:39 PDT
(In reply to neil@parkwaycc.co.uk from comment #41)
> Comment on attachment 776096 [details] [diff] [review]
> Another possible fix
> 
> I don't think this is the right approach, FinializeHeaders (and
> InternSubject which it calls) should be able to assume that the headers are
> null-terminated, and wallpapering over unterminated headers isn't a
> reasonable solution.

Then we need to change ParseHeaders(), right? I do not think that such changes are good in the meantime.
Comment 43 Hiroyuki Ikezoe (:hiro) 2013-08-20 01:38:24 PDT
Created attachment 792686 [details] [diff] [review]
Add null check barrier v2

Addressed the comment.
Comment 44 neil@parkwaycc.co.uk 2013-08-20 02:25:58 PDT
Comment on attachment 792686 [details] [diff] [review]
Add null check barrier v2

This is OK but I would move the nakedString check to nearer where it is used because that would make it clearer that we need it because nsDependentCString doesn't accept null pointers.
Comment 45 neil@parkwaycc.co.uk 2013-08-20 02:27:38 PDT
(In reply to Hiroyuki Ikezoe from comment #42)
> (In reply to comment #41)
> > I don't think this is the right approach, FinializeHeaders (and
> > InternSubject which it calls) should be able to assume that the headers are
> > null-terminated, and wallpapering over unterminated headers isn't a
> > reasonable solution.
> 
> Then we need to change ParseHeaders(), right? I do not think that such
> changes are good in the meantime.

If ParseHeaders has any bugs then they should be fixed. I would consider generating unterminated headers a bug.
Comment 46 Hiroyuki Ikezoe (:hiro) 2013-08-21 02:27:54 PDT
I do not object to fix ParseHeaders in the future.

My thought of priority is that:

1) stop the crash 
2) stop storing non null-terminated strings in db to avoid data corruption in the db
3) refactor ParseHeaders and related functions (and write more unit tests)
Comment 48 Wayne Mery (:wsmwk, NI for questions) 2013-08-28 11:12:06 PDT
(In reply to Magnus Melin from comment #47)
> Comment on attachment 762812 [details] [diff] [review]
> proposed fix
> 
> https://hg.mozilla.org/comm-central/rev/339bc9eac2fc

I see no obvious crash reports related to this checkin.

But 26.0a1 has an odd crash, most with mork on the stack, which does not coincide with any checkin from this bug, so I am mentioning it now so it's confusing in future 
strlen | nsMsgDatabase::RowCellColumnToCollationKey(nsIMdbRow*, unsigned int, unsigned char**, unsigned int*) 
first crash 20130802030203
last crash 20130823064649
https://crash-stats.mozilla.com/report/list?signature=strlen+|+nsMsgDatabase%3A%3ARowCellColumnToCollationKey%28nsIMdbRow*%2C+unsigned+int%2C+unsigned+char**%2C+unsigned+int*%29&product=Thunderbird&query_type=is_exactly&range_unit=weeks&process_type=any&hang_type=any&date=2013-08-28+16%3A00%3A00&range_value=4
Comment 49 Mark Banner (:standard8) (afk until 26th July) 2013-08-28 11:42:38 PDT
Comment on attachment 792686 [details] [diff] [review]
Add null check barrier v2

[Triage Comment]
As trunk is a bit broken, but I wanted to land this for the upcoming beta, I've just pushed this to aurora and beta. As a reminder, we need to fix the review comment before this lands on trunk.

https://hg.mozilla.org/releases/comm-aurora/rev/50e64aa0beb7
https://hg.mozilla.org/releases/comm-beta/rev/bccdca1a2011
Comment 50 Mark Banner (:standard8) (afk until 26th July) 2013-09-10 02:15:50 PDT
I've just pushed this so that we can get this bug closed:

https://hg.mozilla.org/comm-central/rev/54fc7ea4e21d

If we want any more patches for or related to this bug, please do them as new bugs so that we can easily track them for branch landings, especially now that we're heading into the 24 end game.

Note You need to log in before you can comment on or make changes to this bug.