Closed Bug 90215 Opened 24 years ago Closed 24 years ago

M092 & Trunk crash pressing "n" in thread pane [@ MSVCRT.dll - morkNode::CutWeakRef]

Categories

(MailNews Core :: Backend, defect)

x86
All
defect
Not set
critical

Tracking

(Not tracked)

VERIFIED FIXED

People

(Reporter: dmosedale, Assigned: Bienvenu)

References

Details

(Keywords: crash, topcrash, Whiteboard: PDT+)

Crash Data

Attachments

(2 files)

This has been around since at least 0.9.1, and is still present on the 0.9.2 branch as well as the trunk. I've got a particular mail folder where all the messages except one are marked read. I use the threaded view, and the one unread message is a child in a collapsed thread. When I open the folder and press n, which should take me to the unread message, instead it crashes. Top of the stack trace is attached; the full stack is _extremely_ deep (at least 1500 frames, perhaps a lot more), some sort of incredibly deep recursion in ListIdsInThreadOrder. Note that the mailbox in question only has about 50 messages in it.
Attached file top of the stack trace —
your mail folder database is corrupt - you'll need to delete it. If after the db gets recreated, you still see this crash, then I'll look at it, but this db corruption is an old bug that was fixed between 0.9.1 and 0.9.2 and regenerating it should fix it.
Corrupt or not, the code should cope with this condition more gracefully than just crashing. Folks upgrading from previous versions (perhaps including NS 6.0x?) who have corrupt databases won't have any idea what to do.
Dan, can you apply the attached patch in mailnews/base/src and try it out? I can't get a corrupted db anymore. The patch just checks if we've found more msgs in a thread then we thought there were, and if so, invalidates the db and errors out. Since I can't test this code, I don't know if it will work. I know that I'm not hitting the assert so I don't think it will cause any problems with db's that are not corrupt.
And, Dan, you could either attach the corrupt .msf file to this bug, along with the folder it is belongs to, or send them to m privately. Assuming you haven't deleted the .msf file, of course. Assuming it's not an IMAP folder, that is, though I might be able to fake my way through with just the .msf file if it was.
Seth and Navin, can I get reviews for this patch? Lisa sent me a corrupted news database and I verified that we don't crash and that the next time you open the newsgroup, all the headers are re-downloaded and the db is no longer corrupt. Thanks.
Excellent; thanks Lisa & David! As David guessed, I made the dumb move of deleting my .msf file already, so I can no longer verify.
makes sense, r=naving.
looks good, two minor comments / questions: 1) + threadHdr->GetNumChildren(&numChildren); do we want to check the return result of that? rv = threadHdr->GetNumChildren(&numChildren); 2) + while (NS_SUCCEEDED(rv) && NS_SUCCEEDED(rv = msgEnumerator->HasMoreElements(&hasMore)) && (hasMore == PR_TRUE)) "&& (hasMore == PR_TRUE)" should be "&& hasMore" sr=sspitzer
GetNumChildren is just an accessor for a member variable - it won't fail. I'll change the hasMore part. Thanks for the review.
fix checked in.
Status: NEW → RESOLVED
Closed: 24 years ago
Resolution: --- → FIXED
wonderful, david! Jay - can you see if this stack trace is a topcrash? http://bugzilla.mozilla.org/showattachment.cgi?attach_id=38533 Thanks
Keywords: crash
*** Bug 89506 has been marked as a duplicate of this bug. ***
QA Contact: esther → stephend
Adding topcrash keyword and Trunk & M092, [@ MSVCRT.dll - morkNode::CutWeakRef] to summary for tracking. Lisa, this indeed is a topcrasher for Mozilla 0.9.2 and I also found a few crashes on the Trunk. I'll keep any eye out for anymore of these crashes to see if we can verify the fix. I'm pretty sure this crash is also happening on the branch, but since we don't have a lot of Talkback data on those builds, it's hard to know for sure. stephend, could you try to reproduce this with builds prior to the fix and then see if you can verify the fix with the latest builds on the trunk and branch? If you crash, please enter the bug number somewhere so I can easily find it in the Talkback reports...thanks.
Keywords: topcrash
Summary: crash pressing "n" in thread pane → M092 & Trunk crash pressing "n" in thread pane [@ MSVCRT.dll - morkNode::CutWeakRef]
Changing OS to All since the majority of the Talkback data shows this crash occurring on Win32 builds.
OS: Linux → All
thanks, Jay. I'm pretty sure this crash is on the branch also as I had reported a similar bug using branch. (David - pls correct if I'm wrong.) David and Scott P. Since this is a topcrash, how do you feel about the risk factor of the fix to see if we can get this into the branch? If ok, then I will try to run this by the rest of PDT. Thanks.
*** Bug 88471 has been marked as a duplicate of this bug. ***
It looks like none of my changes to this bug yesterday made it. Stephen, can you test around this bug (it sounds like we don't have a reproduceable case, but we should at least make sure that nothing else has broken). Adding the nsbranch as a candidate.
Keywords: nsBranch
Disclaimer: I think this bug might be slightly more complex (or require numerous DB/msf syncs, etc.). The limited testing that I did was as follows (and please let me know if there are additional steps, or if I should omit some.) 1. Marked all of netscape.test read (news.mozilla.org). 2. Replied to a thread's top level posting. 3. Did a "GetMsg" to get the new posting. (Didn't read the posting.) 4. Did a "ReplyTo" on the posting, thereby creating a child of the child I created in step #2. 5. Now that we are 2 levels deep in hierarchy, went back to the newsgroup with the 2 unread messages and used View | Messages | Thread with Unread, and pressed n twice, effectively reading both of the sibling messages that I created in steps #2 and #4. The truly 100% method of verifying this fix is to check with people (like Bradley Baetz, Dan Mosedale, etc.) and to also check the Talkback reports (assuming people who crash with this stack are using Talkback for those instances.) Anyway, I'll go ahead and say that my rudimentary testing is done (for now.) Trunk builds: Mac OS 9.1 - 2001-07-13-09 Windows 2K - 2001-07-13-04 RedHat 7.1 - 2001-07-13-08
In today's PDT meeting, lchiang mentioend she has a reproduceable case. Lisa, can you try this out on the trunk?
Well, since it's impossible to reply to a message without selecting it (and therefore reading it ;-) I meant that I changed the diamond icon in the threadpane (I do this automatically, so much so that I don't think about it, sorry about that.) So basically, I've altered by hand the unread status of those two messages that I replied to. But that shouldn't (and didn't in this case) cause me to crash.
My reproducible case is for bug http://bugzilla.mozilla.org/show_bug.cgi?id=86016 which David says has the same fix as this bug report. Here are my findings: 1. 2001-07-13-06-0.9.2 branch build. Selecting the newsgroup for my particular case as described in bug 86016 causes a crash. 2. 2001-07-13-19-trunk build. Selecting the newsgroup for my particular case as described in bug 86016 no longer crashes. The news messages appear in the thread pane. David, I did have one question though: I went back to the branch build, following the correction of the .msf file in step 2 by your fix, and the crash still occurs. Is this expected if the fix were on the trunk only? thanks.
*** Bug 90812 has been marked as a duplicate of this bug. ***
I've run through Lisa's steps on: Windows 2000 - 2001-07-15-08 MacOS 9.1 - 2001-07-13-09 RedHat 7.1 - 2001-07-13-21 I ran into no crashers, or any other odd occurances. Still, I'd like to wait a while and keep checking this stack on the Talkback server. Reason being, many of the crashes we have seen were with older profiles. If this fix on the trunk works, even if they get a corrupted .msf, we will rebuild it, and they shouldn't crash.
Lisa, to answer your question, I belive it means there still is some sort of threading problem that's not fixed, that causes this problem. I think we suspected this because you, among others, reported that they could recreate this crash with builds after my previous threading fix. The trick is for me to reproduce this and get a minimal case. I could not reproduce it on the newsgroup you mentioned - how many headers did you download again?
100, I believe. Marking rest as read. I can show you on Tuesday.
PDT+
Whiteboard: PDT+
fix checked into branch as well (sorry, I haven't a clue what keyword/status whiteboard mumbo jumbo I should use to indicate that this is now fixed on the branch)
You don't need to know all the keyword stuff, david :-) mark vbranch for verification on branch since stephen has already tested on trunk.
Keywords: vbranch
I've run through the testcase that Lisa provided, as well as ad-hoc testing of my own, along with the comments in this bug. Still no crashing. Keeping my fingers crossed though, and I'll wait a while before I verify this. Builds: 2001-07-20-06-0.9.2 Windows 2000 2001-07-20-03-0.9.2 Mac OS 9.1 2001-07-20-04-0.9.2 RedHat Linux 7.1
I did a stack trace search on http://climate (internal Talkback servers) for MSVCRT.dll - morkNode::CutWeakRef and found nothing. Since I've already checked this on both trunk and branch, and was just waiting to check Talkback, I'm confident that this is now fixed. Certainly will file again if I/others see this.
Status: RESOLVED → VERIFIED
stephend, if you run a talkback query for the string "MSVCRT.dll - morkNode::CutWeakRef", you will never get any data back...since the "- morkNode::CutWeakRef" part is the second frame in the stack for the stack signature "MSVCRT.dll". In the future, the easiest way to verify crashes like this (.DLL and .so crashes), just look at the detailed topcrash report and click through all the stack traces for the given stack signature and see if the second (sometimes 3rd or 4th) frame in the stack isn't there. If you need more clarification, contact me offline.
Product: MailNews → Core
Product: Core → MailNews Core
Crash Signature: [@ MSVCRT.dll - morkNode::CutWeakRef]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: