Closed Bug 857678 Opened 11 years ago Closed 11 years ago

crash in nsMsgDatabase::RowCellColumnToCollationKey

Categories

(MailNews Core :: Database, defect)

defect
Not set
critical

Tracking

(thunderbird24+ fixed, thunderbird25 fixed, seamonkey2.19 fixed, seamonkey2.20 fixed, seamonkey2.21 fixed)

RESOLVED FIXED
Thunderbird 26.0
Tracking Status
thunderbird24 + fixed
thunderbird25 --- fixed
seamonkey2.19 --- fixed
seamonkey2.20 --- fixed
seamonkey2.21 --- fixed

People

(Reporter: wsmwk, Assigned: mkmelin)

References

Details

(Keywords: crash, regression, topcrash, Whiteboard: [tbird betatopcrash][regression:TB22])

Crash Data

Attachments

(4 files, 2 obsolete files)

This bug was filed from the Socorro interface and is 
report bp-ff9b71a1-3473-432e-892c-42b6a2130323 .
============================================================= 
unknown reporter "Crashing when accessing a newsgroup."

This bug was filed from the Socorro interface and is 
report bp-ff9b71a1-3473-432e-892c-42b6a2130323 .
============================================================= 
0	libxul.so	nsMsgDatabase::RowCellColumnToCollationKey	objdir-tb/mozilla/dist/include/nsCharTraits.h:514
1	libxul.so	morkBookAtom::EqualFormAndBody const	db/mork/src/morkAtom.cpp:450
2	libxul.so	morkMap::find const	db/mork/src/morkMap.cpp:268
3	libxul.so	morkMap::Get	db/mork/src/morkMap.cpp:698
4	libxul.so	nsMsgDBView::GetCollationKey	mailnews/base/src/nsMsgDBView.cpp:4067
5	libxul.so	nsMsgDatabase::YarnTonsString	objdir-tb/mozilla/dist/include/nsTSubstring.h:503
6	libxul.so	nsMsgDatabase::RowCellColumnTonsString	mailnews/db/msgdb/src/nsMsgDatabase.cpp:3483
7	libxul.so	nsMsgDatabase::GetPropertyAsNSString	mailnews/db/msgdb/src/nsMsgDatabase.cpp:3933
8	libxul.so	nsMsgDBView::GetCurCustomColumn	objdir-tb/mozilla/dist/include/nsCOMPtr.h:453
9	libxul.so	nsMsgDBView::GetColumnHandler	objdir-tb/mozilla/dist/include/nsTSubstring.h:85
10	libxul.so	nsMsgDBView::GetCurColumnHandlerFromDBInfo	objdir-tb/mozilla/dist/include/nsTSubstring.h:85
11	libxul.so	nsMsgDatabase::GetMsgHdrForKey	mailnews/db/msgdb/src/nsMsgDatabase.cpp:1813
perhaps regression from bug 834757.
first crash 2013-03-21 bp-a15d794c-ae78-4fea-8974-b2b262130321 22.0a1

0	msvcr100.dll	zzz_AsmCodeRange_Begin	f:\dd\vctools\crt_bld\SELF_X86\crt\src\INTEL\strlen.asm:81
1	xul.dll	nsMsgDatabase::RowCellColumnToCollationKey	mailnews/db/msgdb/src/nsMsgDatabase.cpp:3645
2	xul.dll	nsMsgHdr::GetSubjectCollationKey	mailnews/db/msgdb/src/nsMsgHdr.cpp:694
3	xul.dll	nsMsgDBView::GetCollationKey	mailnews/base/src/nsMsgDBView.cpp:4067
4	xul.dll	nsMsgDBView::GetIndexForThread	mailnews/base/src/nsMsgDBView.cpp:5072
5	xul.dll	nsMsgSearchDBView::AddHdrFromFolder	mailnews/base/src/nsMsgSearchDBView.cpp:444
6	xul.dll	nsMsgSearchDBView::InsertHdrFromFolder	mailnews/base/src/nsMsgSearchDBView.cpp:658
7	xul.dll	nsMsgXFVirtualFolderDBView::OnSearchHit	mailnews/base/src/nsMsgXFVirtualFolderDBView.cpp:281 

Pidgeot18@12125 3643 err = m_mimeConverter->DecodeMimeHeaderToUTF8(
Pidgeot18@12125 3644 nsDependentCString(nakedString), charSet.get(), characterSetOverride,
Pidgeot18@12125 3645 true, decodedStr);
Assignee: nobody → Pidgeot18
Blocks: 834757
Crash Signature: [@ nsMsgDatabase::RowCellColumnToCollationKey(nsIMdbRow*, unsigned int, unsigned char**, unsigned int*)] → [@ nsMsgDatabase::RowCellColumnToCollationKey(nsIMdbRow*, unsigned int, unsigned char**, unsigned int*)] [@ zzz_AsmCodeRange_Begin | nsMsgDatabase::RowCellColumnToCollationKey(nsIMdbRow*, unsigned int, unsigned char**, unsigned int*)]
Keywords: regression, topcrash
OS: Linux → All
I'm going to guess that nakedString here is null, and it looks like nsDependentCString constructors aren't happy about getting NULL pointers as a value.
(In reply to Joshua Cranmer [:jcranmer] from comment #2)
> I'm going to guess that nakedString here is null, and it looks like
> nsDependentCString constructors aren't happy about getting NULL pointers as
> a value.

Yep, that's a definite
datapoint - a few dozen individuals are crashing in beta in roughly one week of usage, which is a significant rate for our betas. not a plague, but significant. I'm still assessing impact but for now I'd say it would be best to have patched for TB23 beta lest it potentially hinder wider testing.

Is a testcase needed?
Flags: needinfo?(Pidgeot18)
bp-655f20b4-4f04-41a4-9a19-fe7282130611 rkrug is a frequent crasher
Whiteboard: [startupcrash][regression:TB22]
Attached patch proposed fixSplinter Review
RowCellColumnToConstCharPtr shouldn't return NS_OK when it doesn't set the outparam.
Assignee: Pidgeot18 → mkmelin+mozilla
Status: NEW → ASSIGNED
I don't like this patch, since it makes things throw errors that didn't use to happen.

I'd prefer a more minimal fix, which I haven't had time to prepare yet.
Flags: needinfo?(Pidgeot18)
Well, it's certainly against conventions to allow ok when you pass out null - if you can't trust that you end up null-checking everything. Now would give it some time to bake before the ESR, and if there's problems we can always back it out and just null-check nakedString instead, but there might be other cases the current patch also fixes.
So any other potential reviewers?
Flags: needinfo?(mozilla)
Flags: needinfo?(mbanner)
Mozilla/5.0 (Windows NT 5.1; rv:23.0) Gecko/20100101 Firefox/23.0 SeaMonkey/2.20a2
Build identifier: 20130624013001

I was crashing with this bug with this repro:
1. I created view and saved it as folder.
2. When saved view show several e-mails, I moved them all to another folder.
3. In this moment saved view was still showing same count of unread e-mails.
4. When I click next time on saved view name in Folder Pane, SeaMonkey instantly
   crashed.

I didn't try this reproduction steps on another "physical" folder.

Relevant crash report:
https://crash-stats.mozilla.com/report/index/bp-b2344e3a-a798-46ce-abfb-6f6652130625
I experience this crash 100% each time I click one of my search folders (which happens to be a "Status is New" search across multiple folders of multiple accounts) in SM 2.19b1. Is there any workaround (other than not clicking the search folder of course)? I already tried compacting all folders.
The line in question (from the last crash report) was specifically changed by jcranmer in this patch http://hg.mozilla.org/releases/comm-aurora/diff/93e89494abfd/mailnews/db/msgdb/src/nsMsgDatabase.cpp (not sure which bug). Maybe some of the strings is null and it is not handled properly.
I think it's pretty sure it's nakedString being null... (The check that it's set was bitten by the broken conventions of RowCellColumnToConstCharPtr)
Attached patch Possible patchSplinter Review
How about this for a workaround?
Attachment #768210 - Flags: review?(Pidgeot18)
Looks like we now need feedback from Joshua.
Flags: needinfo?(mozilla)
Flags: needinfo?(mbanner)
joshua, for reference, urgency here is a SeaMonkey 2.19 final release due out on Tuesday, July 2. So if we can eliminate this fairly common/large crasher from our users, we'll be very happy. We just need a correctness/risk assessment by someone who knows this code in order for us to land it.
Attachment #768210 - Flags: review?(Pidgeot18) → review+
Comment on attachment 768210 [details] [diff] [review]
Possible patch

a=me for comm-release (a SeaMonkey-only branch)
Comment on attachment 768210 [details] [diff] [review]
Possible patch

[Approval Request Comment]
Regression caused by (bug #): 834757
User impact if declined: crash
Risk to taking this patch (and alternatives if risky): low
Attachment #768210 - Flags: approval-comm-release?
Attachment #768210 - Flags: approval-comm-beta?
Attachment #768210 - Flags: approval-comm-aurora?
http://hg.mozilla.org/comm-central/rev/c95e6993f4df

Dunno whether this counts as fixed or whether someone wants to write some better code.
Comment on attachment 762812 [details] [diff] [review]
proposed fix

I'd still like this to go in (for trunk). Passing to Joshua for a verdict, seems i never set the review flag for this.
Attachment #762812 - Flags: review?(Pidgeot18)
Attachment #768210 - Flags: approval-comm-release?
Attachment #768210 - Flags: approval-comm-release+
Attachment #768210 - Flags: approval-comm-beta?
Attachment #768210 - Flags: approval-comm-beta+
Attachment #768210 - Flags: approval-comm-aurora?
Attachment #768210 - Flags: approval-comm-aurora+
I've noticed non-null-terminated string in nsParseMailbox.cpp might be the root cause. See bug 877831 comment 13.
Comment on attachment 762812 [details] [diff] [review]
proposed fix

The ramifications of this make me nervous, but I don't see anybody who would be negatively impacted by this as far as I could look...
Attachment #762812 - Flags: review?(Pidgeot18) → review+
So cay you try the patch?
(In reply to :aceman from comment #25)
> So cay you try the patch?

You mean the one from comment 6? Only if someone provided me an SM build (ideally built from comm-beta code) containing the proposed fix within the next 5 hours. After that I'll be AFK for a week. When I'm back I'll be able to build/test myself again.
Here is another fix for this crash.

I am thinking the root cause of this crash is come from nsParseMailMessageState.cpp because nsParseMailMessageState passes non-null-terminated string to nsMsgHdr (then, to nsMsgDatabase).

This patch can be applied after the fix for bug 877831 is applied.
Attachment #776096 - Flags: review?(mbanner)
Comment on attachment 762812 [details] [diff] [review]
proposed fix

This patch does not fix the crash I'm experiencing (just compiled two builds off comm-beta tip yesterday, one vanilla and one with just this patch). Will try to create a build with the other patch later today.
Attachment #762812 - Flags: feedback-
Attached patch Add null check barrier (obsolete) — Splinter Review
I don't actually know what will happen once non-null-terminated string has stored in the db.

But anyway this patch might be solve the crash in case of NULL string.
Attachment #780258 - Flags: review?(mbanner)
(In reply to Hiroyuki Ikezoe (:hiro) from comment #29)

> But anyway this patch might be solve the crash in case of NULL string.
this patch *will* solve, I meant.
Attached patch Add null check barrier (obsolete) — Splinter Review
I am sorry the previous patch has garbage.
Attachment #780258 - Attachment is obsolete: true
Attachment #780258 - Flags: review?(mbanner)
Attachment #780261 - Flags: review?(mbanner)
Comment on attachment 776096 [details] [diff] [review]
Another possible fix

This fixed the crash for me. :-) I built from the same c-b/m-b changesets as with my previous try, but now with only the following applied:
* attachment 766955 [details] [diff] [review] from bug 877831 (adapted/applied manually)
* attachment 776096 [details] [diff] [review] (this patch)
* attachment 780261 [details] [diff] [review] (Add null check barrier)

After I had run this build and opened the search folder which triggered the crash with other builds, I can now open the affected search folder with the other builds, too, so for the time being I won't be able to reproduce the crash anymore.
Attachment #776096 - Flags: feedback+
It's currently #1 top crasher in TB 23.0b1 that contains the possible patch fix.
Only 17% of crashes happen within one minute.
Whiteboard: [startupcrash][regression:TB22] → [tbird betatopcrash][regression:TB22]
(In reply to Jens Hatlak (:InvisibleSmiley) from comment #32)
> Comment on attachment 776096 [details] [diff] [review]
> Another possible fix
> 
> This fixed the crash for me. :-) I built from the same c-b/m-b changesets as
> with my previous try, but now with only the following applied:
> * attachment 766955 [details] [diff] [review] from bug 877831
> (adapted/applied manually)
> * attachment 776096 [details] [diff] [review] (this patch)
> * attachment 780261 [details] [diff] [review] (Add null check barrier)
> 
> After I had run this build and opened the search folder which triggered the
> crash with other builds, I can now open the affected search folder with the
> other builds, too, so for the time being I won't be able to reproduce the
> crash anymore.

nice testing

(Mark sure has been stretched thin)
Sounds like we should get these landed for trunk testing so they have potential to get in TB24.
Flags: needinfo?(mbanner)
Comment on attachment 776096 [details] [diff] [review]
Another possible fix

I suggest Neil should have a look at these, as I'm going to be afk for a few days.
Attachment #776096 - Flags: review?(mbanner) → review?(neil)
Attachment #780261 - Flags: review?(mbanner) → review?(neil)
Comment on attachment 780261 [details] [diff] [review]
Add null check barrier

># HG changeset patch
># Parent 2827cd29313bb7118826c9a39a5613cd2ebe3596
># User Hiroyuki Ikezoe <hiikezoe@mozilla-japan.org>
>Bug 857678 - Add null check barrier in nsMsgDatabase::RowCellColumnToCollationKey
>
>diff --git a/mailnews/db/msgdb/src/nsMsgDatabase.cpp b/mailnews/db/msgdb/src/nsMsgDatabase.cpp
>--- a/mailnews/db/msgdb/src/nsMsgDatabase.cpp
>+++ b/mailnews/db/msgdb/src/nsMsgDatabase.cpp
>@@ -3750,17 +3750,17 @@ nsresult nsMsgDatabase::GetCollationKeyG
> }
> 
> nsresult nsMsgDatabase::RowCellColumnToCollationKey(nsIMdbRow *row, mdb_token columnToken, uint8_t **result, uint32_t *len)
> {
>   const char *nakedString = "";
>   nsresult err;
> 
>   err = RowCellColumnToConstCharPtr(row, columnToken, &nakedString);
>-  if (NS_SUCCEEDED(err))
>+  if (NS_SUCCEEDED(err) && nakedString)
I see from YarnTonsCString that mYarn_Buf could be null, but in that case I'd prefer to back out my original patch and replace null with "" after fetching the pointer.
Attachment #780261 - Flags: review?(neil) → review-
Comment on attachment 776096 [details] [diff] [review]
Another possible fix

>-        m_newMsgHdr->SetStringProperty("replyTo", replyTo->value);
>+        m_newMsgHdr->SetStringProperty("replyTo", nsAutoCString(replyTo->value, replyTo->length).get());
What are you hoping to achieve by this? If you're trying to avoid writing nulls then I guess it's too late for that now, there are people with existing summary files with null values in them.
(In reply to neil@parkwaycc.co.uk from comment #39)
> Comment on attachment 776096 [details] [diff] [review]
> Another possible fix
> 
> >-        m_newMsgHdr->SetStringProperty("replyTo", replyTo->value);
> >+        m_newMsgHdr->SetStringProperty("replyTo", nsAutoCString(replyTo->value, replyTo->length).get());
> What are you hoping to achieve by this? If you're trying to avoid writing
> nulls then I guess it's too late for that now, there are people with
> existing summary files with null values in them.

The purpose is to make the string passing to SetStringProperty null-terminated.
Comment on attachment 776096 [details] [diff] [review]
Another possible fix

I don't think this is the right approach, FinializeHeaders (and InternSubject which it calls) should be able to assume that the headers are null-terminated, and wallpapering over unterminated headers isn't a reasonable solution.
Attachment #776096 - Flags: review?(neil) → review-
(In reply to neil@parkwaycc.co.uk from comment #41)
> Comment on attachment 776096 [details] [diff] [review]
> Another possible fix
> 
> I don't think this is the right approach, FinializeHeaders (and
> InternSubject which it calls) should be able to assume that the headers are
> null-terminated, and wallpapering over unterminated headers isn't a
> reasonable solution.

Then we need to change ParseHeaders(), right? I do not think that such changes are good in the meantime.
Addressed the comment.
Attachment #780261 - Attachment is obsolete: true
Attachment #792686 - Flags: review?(neil)
Comment on attachment 792686 [details] [diff] [review]
Add null check barrier v2

This is OK but I would move the nakedString check to nearer where it is used because that would make it clearer that we need it because nsDependentCString doesn't accept null pointers.
Attachment #792686 - Flags: review?(neil) → review+
(In reply to Hiroyuki Ikezoe from comment #42)
> (In reply to comment #41)
> > I don't think this is the right approach, FinializeHeaders (and
> > InternSubject which it calls) should be able to assume that the headers are
> > null-terminated, and wallpapering over unterminated headers isn't a
> > reasonable solution.
> 
> Then we need to change ParseHeaders(), right? I do not think that such
> changes are good in the meantime.

If ParseHeaders has any bugs then they should be fixed. I would consider generating unterminated headers a bug.
I do not object to fix ParseHeaders in the future.

My thought of priority is that:

1) stop the crash 
2) stop storing non null-terminated strings in db to avoid data corruption in the db
3) refactor ParseHeaders and related functions (and write more unit tests)
(In reply to Magnus Melin from comment #47)
> Comment on attachment 762812 [details] [diff] [review]
> proposed fix
> 
> https://hg.mozilla.org/comm-central/rev/339bc9eac2fc

I see no obvious crash reports related to this checkin.

But 26.0a1 has an odd crash, most with mork on the stack, which does not coincide with any checkin from this bug, so I am mentioning it now so it's confusing in future 
strlen | nsMsgDatabase::RowCellColumnToCollationKey(nsIMdbRow*, unsigned int, unsigned char**, unsigned int*) 
first crash 20130802030203
last crash 20130823064649
https://crash-stats.mozilla.com/report/list?signature=strlen+|+nsMsgDatabase%3A%3ARowCellColumnToCollationKey%28nsIMdbRow*%2C+unsigned+int%2C+unsigned+char**%2C+unsigned+int*%29&product=Thunderbird&query_type=is_exactly&range_unit=weeks&process_type=any&hang_type=any&date=2013-08-28+16%3A00%3A00&range_value=4
Flags: needinfo?(mbanner)
Comment on attachment 792686 [details] [diff] [review]
Add null check barrier v2

[Triage Comment]
As trunk is a bit broken, but I wanted to land this for the upcoming beta, I've just pushed this to aurora and beta. As a reminder, we need to fix the review comment before this lands on trunk.

https://hg.mozilla.org/releases/comm-aurora/rev/50e64aa0beb7
https://hg.mozilla.org/releases/comm-beta/rev/bccdca1a2011
Attachment #792686 - Flags: approval-comm-beta+
Attachment #792686 - Flags: approval-comm-aurora+
I've just pushed this so that we can get this bug closed:

https://hg.mozilla.org/comm-central/rev/54fc7ea4e21d

If we want any more patches for or related to this bug, please do them as new bugs so that we can easily track them for branch landings, especially now that we're heading into the 24 end game.
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Target Milestone: --- → Thunderbird 26.0
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: