Open Bug 698093 Opened 8 years ago Updated 3 days ago

crash @ nsMsgDBView::FnSortIdKeyPtr and @ med3 sorting message list columns

Categories

(MailNews Core :: Backend, defect, critical)

x86
All
defect
Not set
critical

Tracking

(Not tracked)

People

(Reporter: wsmwk, Unassigned)

References

(Depends on 1 open bug, )

Details

(Keywords: crash, testcase-wanted, Whiteboard: [rare])

Crash Data

This bug was filed from the Socorro interface and is 
report bp-b3d9a73b-403f-48a1-b40a-d9c772111017 .
============================================================= 

In Bug 420257 comment 0 I reported the same stack/signature
nsMsgDBView::FnSortIdKeyPtr(void const*, void const*, void*)

no comments on crash-stats = no steps to reproduce

bp-b3d9a73b-403f-48a1-b40a-d9c772111017 version 7

0	xul.dll	nsMsgDBView::FnSortIdKeyPtr	mailnews/base/src/nsMsgDBView.cpp:3796
1	xul.dll	nsMsgDBView::GetIndexForThread	mailnews/base/src/nsMsgDBView.cpp:5157
2	xul.dll	nsMsgSearchDBView::AddHdrFromFolder	mailnews/base/src/nsMsgSearchDBView.cpp:476
3	xul.dll	nsMsgSearchDBView::InsertHdrFromFolder	mailnews/base/src/nsMsgSearchDBView.cpp:690
4	xul.dll	nsMsgXFVirtualFolderDBView::OnSearchHit	mailnews/base/src/nsMsgXFVirtualFolderDBView.cpp:312
5	xul.dll	nsMsgSearchSession::AddSearchHit	mailnews/base/search/src/nsMsgSearchSession.cpp:601
6	xul.dll	nsMsgSearchOfflineMail::AddResultElement	mailnews/base/search/src/nsMsgLocalSearch.cpp:818
7	xul.dll	nsMsgSearchOfflineMail::Search	mailnews/base/search/src/nsMsgLocalSearch.cpp:770
8	xul.dll	nsMsgSearchScopeTerm::TimeSlice	mailnews/base/search/src/nsMsgSearchTerm.cpp:1934
9	xul.dll	nsMsgSearchSession::TimeSliceSerial	mailnews/base/search/src/nsMsgSearchSession.cpp:691
10	xul.dll	nsMsgSearchSession::TimerCallback	mailnews/base/search/src/nsMsgSearchSession.cpp:544
Blocks: 420257
onlly a couple per month.
eg. bp-c9331c2b-fded-4f5c-9254-7e7f02121223
Whiteboard: [rare]
Removing myslef on all the bugs I'm cced on. Please NI me if you need something on MailNews Core bugs from me.
I picked 2 from the list (TB 45 and 38) and they crash on stack overflow in NS_QUICKSORT?

Maybe we pass some array with bogus values (uninitialized memory) to sort and it crashes on it.

Or, the FnSortIdUint32 function is used as the member comparison function in the quicksort. Maybe it compares incorrectly in some cases thus causing an infinite loop (until stack is depleted). E.g. I do not see when the function would return 0 (members equal). But I don't know if some members can ever be equal.
Flags: needinfo?(acelists)
(In reply to :aceman from comment #4)
> I picked 2 from the list (TB 45 and 38) and they crash on stack overflow in
> NS_QUICKSORT?
> 
> Maybe we pass some array with bogus values (uninitialized memory) to sort
> and it crashes on it.
> 
> Or, the FnSortIdUint32 function is used as the member comparison function in
> the quicksort. Maybe it compares incorrectly in some cases thus causing an
> infinite loop (until stack is depleted). E.g. I do not see when the function
> would return 0 (members equal). But I don't know if some members can ever be
> equal.

Let's assume this morphed to @ nsMsgDBView::FnSortIdUint32 bp-f8194569-0ed5-4fa3-b72b-748e82170202 and a friend  @ med3 bp-a894f465-c3b2-412c-8c87-53b922170126

see also http://forums.mozillazine.org/viewtopic.php?f=39&t=3027062
Crash Signature: [@ nsMsgDBView::FnSortIdKeyPtr(void const*, void const*, void*)] [@ nsMsgDBView::FnSortIdKeyPtr ] → [@ nsMsgDBView::FnSortIdKeyPtr ] [@ nsMsgDBView::FnSortIdUint32 ] [@ med3 ]
Both the reports crash due to EXCEPTION_STACK_OVERFLOW ...

Derrick, you last made a logic change in the https://hg.mozilla.org/releases/mozilla-esr45/file/tip/xpcom/glue/nsQuickSort.cpp, can you please see if you can spot some problem in the algorithm, why the stack may be depleted?
Flags: needinfo?(derrick_moser)
I tried to PM Derrick but no response, so he seems to be gone. (And hasn't logged into bz since 2014)
Flags: needinfo?(derrick_moser) → needinfo?(acelists)
The report at bp-1c5cb55c-9318-4717-b727-617180170727 seems to have a recursive stack of xpcom/glue/nsQuickSort.cpp:171 calls. I think we either send some malformed data to the quicksort (but I'm not sure what that would be, the sort should handle any possible array of ints), or there is a bug in the algorithm so it runs endlessly on some specific type of input and exhausts the stack.
Flags: needinfo?(acelists)
66% of crash locales are ja for  signature  nsMsgDatabase::GetCollationKeyGenerator bp-470d4b51-5b1c-4238-868f-0269d0171222

 0 	xul.dll	nsMsgDatabase::GetCollationKeyGenerator()	C:/builds/moz2_slave/tb-rel-c-esr52-w32_bld-0000000/build/mailnews/db/msgdb/src/nsMsgDatabase.cpp:3670
1 	xul.dll	nsMsgDatabase::CompareCollationKeys(unsigned int, unsigned char*, unsigned int, unsigned char*, int*)	C:/builds/moz2_slave/tb-rel-c-esr52-w32_bld-0000000/build/mailnews/db/msgdb/src/nsMsgDatabase.cpp:3732
2 	xul.dll	nsMsgDBView::FnSortIdKey(void const*, void const*, void*)	C:/builds/moz2_slave/tb-rel-c-esr52-w32_bld-0000000/build/mailnews/base/src/nsMsgDBView.cpp:3850
3 	xul.dll	med3	xpcom/glue/nsQuickSort.cpp:93
4 	xul.dll	NS_QuickSort	xpcom/glue/nsQuickSort.cpp:125
5 	xul.dll	NS_QuickSort	xpcom/glue/nsQuickSort.cpp:171
m_kato, can you spot the error?

nsMsgDBView::FnSortIdUint32 bp-3e92fc71-249b-4510-8e4e-691f90181116

med3 is the most common signature.  
bp-b3117ab5-5707-4bc4-ade7-213e10181123 ja locale
bp-e74b3f95-d67f-47bd-913d-364e60181121 ru locale
bp-522bd067-d970-4390-b3c7-bf6ca0180902 sorted junk% score (which can only be done with junquilla addon installed)
bp-e2f6b40d-721c-425e-9588-123980180711 sorted on junk status
Flags: needinfo?(m_kato)
Summary: crash nsMsgDBView::FnSortIdKeyPtr → crash @ nsMsgDBView::FnSortIdKeyPtr and @ med3 sorting message list columns
might bug 1498313 help?
Maybe since that switches from NS_QuickSort() to C++'s standard qsort(). However, I don't think NS_QuickSort() is that buggy, so there might be an underlying issue.
Depends on: 1498313
Flags: needinfo?(m_kato)

User in https://support.mozilla.org/en-US/questions/1267021

Crashing with https://crash-stats.mozilla.com/report/index/dc508528-c545-4b53-a0b0-54d1c0190821

In this instance the crash was initiated by clicking on the sort by star. Thunderbird now crashes on each restart.

nsMsgThreadedDBView.cpp:428 calls nsMsgDBView::Sort(sortType, sortOrder);, something goes wrong in there, and then it crashes in NS_QuickSort, or iterates until it gets a stack overflow. Bug 1498313 might help, or standard qsort will than also crash.

This needs careful code inspection. Maybe Ben wants to take a look.

Flags: needinfo?(benc)

That's an epic callstack :-)
I've had a look through the code and nothing obvious stands out, but there's a lot of code and a lot of potential for subtle issues there.
Is there a reliable way to trigger this crash? I reckon I could nail down it down pretty quickly with a debugger attached to it...

Flags: needinfo?(benc)

That poor reporter in https://support.mozilla.org/questions/1267021... The advice should have been to rename and save the folder's .msf file as well as a copy of the folder in some place outside the profile, and maybe also rename and save panacea.dat only if the .msf rename didn't work. Asking users to recreate profiles is very extreme and painful.

Perhaps this is another case:
Thunderbird 68.4.1
Crash Reason EXCEPTION_STACK_OVERFLOW
Signature nsMsgDBView::FnSortIdKey
https://crash-stats.mozilla.com/report/index/7f1ac55a-b890-410b-808f-040790200123
"function": "nsMsgDBView::FnSortIdKey(void const *,void const *,void *)"

Further to comment 21
Person uses Quick Filter Bar which also coincides with another person who had crash after using the star option on Quick filter Bar.
https://support.mozilla.org/en-US/questions/1277937

We'll see if Mark can provide us with a testcase wanted by Ben

Keywords: testcase-wanted
You need to log in before you can comment on or make changes to this bug.