Open Bug 698093 Opened 13 years ago Updated 6 months ago

crash @ nsMsgDBView::FnSortIdKeyPtr and @ med3 sorting message list columns

Categories

(MailNews Core :: Backend, defect)

x86
All
defect

Tracking

(Not tracked)

People

(Reporter: wsmwk, Unassigned)

References

()

Details

(Keywords: crash, testcase-wanted, Whiteboard: [rare])

Crash Data

This bug was filed from the Socorro interface and is report bp-b3d9a73b-403f-48a1-b40a-d9c772111017 . ============================================================= In Bug 420257 comment 0 I reported the same stack/signature nsMsgDBView::FnSortIdKeyPtr(void const*, void const*, void*) no comments on crash-stats = no steps to reproduce bp-b3d9a73b-403f-48a1-b40a-d9c772111017 version 7 0 xul.dll nsMsgDBView::FnSortIdKeyPtr mailnews/base/src/nsMsgDBView.cpp:3796 1 xul.dll nsMsgDBView::GetIndexForThread mailnews/base/src/nsMsgDBView.cpp:5157 2 xul.dll nsMsgSearchDBView::AddHdrFromFolder mailnews/base/src/nsMsgSearchDBView.cpp:476 3 xul.dll nsMsgSearchDBView::InsertHdrFromFolder mailnews/base/src/nsMsgSearchDBView.cpp:690 4 xul.dll nsMsgXFVirtualFolderDBView::OnSearchHit mailnews/base/src/nsMsgXFVirtualFolderDBView.cpp:312 5 xul.dll nsMsgSearchSession::AddSearchHit mailnews/base/search/src/nsMsgSearchSession.cpp:601 6 xul.dll nsMsgSearchOfflineMail::AddResultElement mailnews/base/search/src/nsMsgLocalSearch.cpp:818 7 xul.dll nsMsgSearchOfflineMail::Search mailnews/base/search/src/nsMsgLocalSearch.cpp:770 8 xul.dll nsMsgSearchScopeTerm::TimeSlice mailnews/base/search/src/nsMsgSearchTerm.cpp:1934 9 xul.dll nsMsgSearchSession::TimeSliceSerial mailnews/base/search/src/nsMsgSearchSession.cpp:691 10 xul.dll nsMsgSearchSession::TimerCallback mailnews/base/search/src/nsMsgSearchSession.cpp:544
Blocks: 420257
onlly a couple per month. eg. bp-c9331c2b-fded-4f5c-9254-7e7f02121223
Whiteboard: [rare]
Removing myslef on all the bugs I'm cced on. Please NI me if you need something on MailNews Core bugs from me.
I picked 2 from the list (TB 45 and 38) and they crash on stack overflow in NS_QUICKSORT? Maybe we pass some array with bogus values (uninitialized memory) to sort and it crashes on it. Or, the FnSortIdUint32 function is used as the member comparison function in the quicksort. Maybe it compares incorrectly in some cases thus causing an infinite loop (until stack is depleted). E.g. I do not see when the function would return 0 (members equal). But I don't know if some members can ever be equal.
Flags: needinfo?(acelists)
(In reply to :aceman from comment #4) > I picked 2 from the list (TB 45 and 38) and they crash on stack overflow in > NS_QUICKSORT? > > Maybe we pass some array with bogus values (uninitialized memory) to sort > and it crashes on it. > > Or, the FnSortIdUint32 function is used as the member comparison function in > the quicksort. Maybe it compares incorrectly in some cases thus causing an > infinite loop (until stack is depleted). E.g. I do not see when the function > would return 0 (members equal). But I don't know if some members can ever be > equal. Let's assume this morphed to @ nsMsgDBView::FnSortIdUint32 bp-f8194569-0ed5-4fa3-b72b-748e82170202 and a friend @ med3 bp-a894f465-c3b2-412c-8c87-53b922170126 see also http://forums.mozillazine.org/viewtopic.php?f=39&t=3027062
Crash Signature: [@ nsMsgDBView::FnSortIdKeyPtr(void const*, void const*, void*)] [@ nsMsgDBView::FnSortIdKeyPtr ] → [@ nsMsgDBView::FnSortIdKeyPtr ] [@ nsMsgDBView::FnSortIdUint32 ] [@ med3 ]
Both the reports crash due to EXCEPTION_STACK_OVERFLOW ... Derrick, you last made a logic change in the https://hg.mozilla.org/releases/mozilla-esr45/file/tip/xpcom/glue/nsQuickSort.cpp, can you please see if you can spot some problem in the algorithm, why the stack may be depleted?
Flags: needinfo?(derrick_moser)
I tried to PM Derrick but no response, so he seems to be gone. (And hasn't logged into bz since 2014)
Flags: needinfo?(derrick_moser) → needinfo?(acelists)
The report at bp-1c5cb55c-9318-4717-b727-617180170727 seems to have a recursive stack of xpcom/glue/nsQuickSort.cpp:171 calls. I think we either send some malformed data to the quicksort (but I'm not sure what that would be, the sort should handle any possible array of ints), or there is a bug in the algorithm so it runs endlessly on some specific type of input and exhausts the stack.
Flags: needinfo?(acelists)
66% of crash locales are ja for signature nsMsgDatabase::GetCollationKeyGenerator bp-470d4b51-5b1c-4238-868f-0269d0171222 0 xul.dll nsMsgDatabase::GetCollationKeyGenerator() C:/builds/moz2_slave/tb-rel-c-esr52-w32_bld-0000000/build/mailnews/db/msgdb/src/nsMsgDatabase.cpp:3670 1 xul.dll nsMsgDatabase::CompareCollationKeys(unsigned int, unsigned char*, unsigned int, unsigned char*, int*) C:/builds/moz2_slave/tb-rel-c-esr52-w32_bld-0000000/build/mailnews/db/msgdb/src/nsMsgDatabase.cpp:3732 2 xul.dll nsMsgDBView::FnSortIdKey(void const*, void const*, void*) C:/builds/moz2_slave/tb-rel-c-esr52-w32_bld-0000000/build/mailnews/base/src/nsMsgDBView.cpp:3850 3 xul.dll med3 xpcom/glue/nsQuickSort.cpp:93 4 xul.dll NS_QuickSort xpcom/glue/nsQuickSort.cpp:125 5 xul.dll NS_QuickSort xpcom/glue/nsQuickSort.cpp:171
m_kato, can you spot the error? nsMsgDBView::FnSortIdUint32 bp-3e92fc71-249b-4510-8e4e-691f90181116 med3 is the most common signature. bp-b3117ab5-5707-4bc4-ade7-213e10181123 ja locale bp-e74b3f95-d67f-47bd-913d-364e60181121 ru locale bp-522bd067-d970-4390-b3c7-bf6ca0180902 sorted junk% score (which can only be done with junquilla addon installed) bp-e2f6b40d-721c-425e-9588-123980180711 sorted on junk status
Flags: needinfo?(m_kato)
Summary: crash nsMsgDBView::FnSortIdKeyPtr → crash @ nsMsgDBView::FnSortIdKeyPtr and @ med3 sorting message list columns
bp-b3117ab5-5707-4bc4-ade7-213e10181123, wow, stack overflow in Quicksort :-(
might bug 1498313 help?
Maybe since that switches from NS_QuickSort() to C++'s standard qsort(). However, I don't think NS_QuickSort() is that buggy, so there might be an underlying issue.
Depends on: 1498313
Flags: needinfo?(m_kato)

User in https://support.mozilla.org/en-US/questions/1267021

Crashing with https://crash-stats.mozilla.com/report/index/dc508528-c545-4b53-a0b0-54d1c0190821

In this instance the crash was initiated by clicking on the sort by star. Thunderbird now crashes on each restart.

nsMsgThreadedDBView.cpp:428 calls nsMsgDBView::Sort(sortType, sortOrder);, something goes wrong in there, and then it crashes in NS_QuickSort, or iterates until it gets a stack overflow. Bug 1498313 might help, or standard qsort will than also crash.

This needs careful code inspection. Maybe Ben wants to take a look.

Flags: needinfo?(benc)

That's an epic callstack :-)
I've had a look through the code and nothing obvious stands out, but there's a lot of code and a lot of potential for subtle issues there.
Is there a reliable way to trigger this crash? I reckon I could nail down it down pretty quickly with a debugger attached to it...

Flags: needinfo?(benc)

That poor reporter in https://support.mozilla.org/questions/1267021... The advice should have been to rename and save the folder's .msf file as well as a copy of the folder in some place outside the profile, and maybe also rename and save panacea.dat only if the .msf rename didn't work. Asking users to recreate profiles is very extreme and painful.

Perhaps this is another case:
Thunderbird 68.4.1
Crash Reason EXCEPTION_STACK_OVERFLOW
Signature nsMsgDBView::FnSortIdKey
https://crash-stats.mozilla.com/report/index/7f1ac55a-b890-410b-808f-040790200123
"function": "nsMsgDBView::FnSortIdKey(void const *,void const *,void *)"

Further to comment 21
Person uses Quick Filter Bar which also coincides with another person who had crash after using the star option on Quick filter Bar.
https://support.mozilla.org/en-US/questions/1277937

We'll see if Mark can provide us with a testcase wanted by Ben

Keywords: testcase-wanted

still quite rare.

Severity: critical → S3
See Also: → 1628242

Perhaps this will become moot when new folder management/display comes to pass?

91.2.1 nsMsgDBView::FnSortIdUint32 bp-984d4918-59f7-44d0-85d0-11a9e0211027

Severity: S3 → S4
Flags: needinfo?(mkmelin+mozilla)

Might be.

Flags: needinfo?(mkmelin+mozilla)
You need to log in before you can comment on or make changes to this bug.