Open Bug 818325 Opened 12 years ago Updated 4 months ago

crash in nsMsgLocalMailFolder::EndCopy

Categories

(MailNews Core :: Backend, defect)

x86
All
defect

Tracking

(Not tracked)

People

(Reporter: wsmwk, Unassigned)

References

(Depends on 1 open bug)

Details

(Keywords: crash, regression, Whiteboard: [regression:TB12][rare])

Crash Data

Attachments

(1 file)

This bug was filed from the Socorro interface and is 
report bp-502d97f1-54e8-4cb5-86ff-debd52121203 .
============================================================= 

0	xul.dll	nsMsgLocalMailFolder::EndCopy	mailnews/local/src/nsLocalMailFolder.cpp:2257
1	xul.dll	nsCopyMessageStreamListener::EndCopy	mailnews/base/src/nsCopyMessageStreamListener.cpp:121
2	xul.dll	nsMailboxProtocol::OnStopRequest	mailnews/local/src/nsMailboxProtocol.cpp:297
3	xul.dll	nsInputStreamPump::OnStateStop	netwerk/base/src/nsInputStreamPump.cpp:555
4	xul.dll	nsInputStreamPump::OnInputStreamReady	netwerk/base/src/nsInputStreamPump.cpp:373
5	xul.dll	nsOutputStreamReadyEvent::Run	xpcom/io/nsStreamUtils.cpp:82
6	xul.dll	nsThread::ProcessNextEvent	xpcom/threads/nsThread.cpp:624
7	xul.dll	NS_ProcessNextEvent_P	objdir-tb/mozilla/xpcom/build/nsThreadUtils.cpp:217
8	xul.dll	nsXULWindow::ShowModal	xpfe/appshell/src/nsXULWindow.cpp:378
9	xul.dll	nsContentTreeOwner::ShowAsModal	xpfe/appshell/src/nsContentTreeOwner.cpp:529
10	xul.dll	nsWindowWatcher::OpenWindowJSInternal	embedding/components/windowwatcher/src/nsWindowWatcher.cpp:995
11	xul.dll	nsWindowWatcher::OpenWindowJS	embedding/components/windowwatcher/src/nsWindowWatcher.cpp:446
12	xul.dll	nsGlobalWindow::OpenInternal	dom/base/nsGlobalWindow.cpp:9217
13	xul.dll	nsGlobalWindow::OpenDialog	dom/base/nsGlobalWindow.cpp:5941
14	xul.dll	NS_InvokeByIndex_P	xpcom/reflect/xptcall/src/md/win32/xptcinvoke.cpp:70 

Other stacks are not so long. lilke
bp-9d56da9d-090a-4d0d-92c7-ed9552121204
bp-502d97f1-54e8-4cb5-86ff-debd52121203
bp-1d4ee2fc-5fd6-4824-bc8e-31e382121204 Mac
This crash begsin in version 12, so I believe it is a regression of bug 402392 (maildir)


bp-cc8b6ee5-315f-4906-a075-580382140901 TB31

0 	xul.dll	nsMsgLocalMailFolder::EndCopy(bool)	mailnews/local/src/nsLocalMailFolder.cpp
1 	xul.dll	nsCopyMessageStreamListener::EndCopy(nsISupports*, tag_nsresult)	mailnews/base/src/nsCopyMessageStreamListener.cpp
2 	xul.dll	nsMailboxProtocol::OnStopRequest(nsIRequest*, nsISupports*, tag_nsresult)	mailnews/local/src/nsMailboxProtocol.cpp
3 	xul.dll	nsInputStreamPump::OnStateStop()	netwerk/base/src/nsInputStreamPump.cpp 

http://hg.mozilla.org/releases/comm-esr31/annotate/6219e5a9a8d8/mailnews/base/src/nsCopyMessageStreamListener.cpp#l114
bienvenu@10291 109 nsresult rv;
bienvenu@10291 110 nsCOMPtr<nsIURI> uri = do_QueryInterface(url, &rv);
hg@0 111
bienvenu@10291 112 NS_ENSURE_SUCCESS(rv, rv);
bienvenu@10291 113 bool copySucceeded = (aStatus == NS_BINDING_SUCCEEDED);
bienvenu@10291 114 rv = mDestination->EndCopy(copySucceeded); 

http://hg.mozilla.org/releases/comm-esr31/annotate/6219e5a9a8d8/mailnews/local/src/nsLocalMailFolder.cpp#l2447
hg@0 2434 if (mCopyState->m_isFolder)
mconley@13187 2435 CopyAllSubFolders(srcFolder, nullptr, nullptr); //Copy all subfolders then notify completion
hg@0 2436
hg@0 2437 if (mCopyState->m_msgWindow && mCopyState->m_undoMsgTxn)
hg@0 2438 {
hg@0 2439   nsCOMPtr<nsITransactionManager> txnMgr;
hg@0 2440   mCopyState->m_msgWindow->GetTransactionManager(getter_AddRefs(txnMgr));
hg@0 2441   if (txnMgr)
hg@0 2442     txnMgr->DoTransaction(mCopyState->m_undoMsgTxn);
hg@0 2443 }
kent@3538 2444 
hg@0           2445 // enable the dest folder
bugzilla@10318 2446 EnableNotifications(allMessageCountNotifications, true, false /*dbBatching*/); //dest folder doesn't need db batching
kent@3538      2447 if (srcFolder && !mCopyState->m_isFolder) 


bp-a031c5eb-ae6f-406a-aa6e-c6c732140903 TB12

http://hg.mozilla.org/releases/comm-beta/annotate/33ee8cf6630f/mailnews/base/src/nsCopyMessageStreamListener.cpp#l154
bienvenu@10291 152 NS_ENSURE_SUCCESS(rv, rv);
bienvenu@10291 153 bool copySucceeded = (aStatus == NS_BINDING_SUCCEEDED);
bienvenu@10291 154 rv = mDestination->EndCopy(copySucceeded); 

http://hg.mozilla.org/releases/comm-beta/annotate/33ee8cf6630f/mailnews/local/src/nsLocalMailFolder.cpp#l2287
bienvenu@10291 2281    if (mCopyState->m_parseMsgState)
bienvenu@10291 2282      mCopyState->m_parseMsgState->ParseAFolderLine(CRLF, MSG_LINEBREAK_LEN);
bienvenu@10291  283 }
bienvenu@10291 2284 // flush the copied message. We need a close at the end to get the
bienvenu@10291 2285 // file size and time updated correctly.
bienvenu@10291 2286 rv = mCopyState->m_msgStore->FinishNewMessage(mCopyState->m_fileStream,
bienvenu@10291 2287 mCopyState->m_newHdr);
Blocks: 402392
Keywords: regression
Whiteboard: [regression:TB12]
Removing myslef on all the bugs I'm cced on. Please NI me if you need something on MailNews Core bugs from me.

Pretty steady at 20-25 crashes per week (and equal number of users)
bp-9ec2f96d-ad12-4038-9917-8889e0200105 is a current example (and using Avast)
bp-a527365d-db1c-4e3c-9ade-e53f50200104 is not using avast
0 xul.dll nsMsgLocalMailFolder::EndCopy(bool) comm/mailnews/local/src/nsLocalMailFolder.cpp:2135 context
1 xul.dll nsCopyMessageStreamListener::EndCopy(nsISupports*, nsresult) comm/mailnews/base/src/nsCopyMessageStreamListener.cpp:107 cfi
2 xul.dll nsCopyMessageStreamListener::OnStopRequest(nsIRequest*, nsresult) comm/mailnews/base/src/nsCopyMessageStreamListener.cpp:145 cfi
3 xul.dll nsMsgProtocol::OnStopRequest(nsIRequest*, nsresult) comm/mailnews/base/util/nsMsgProtocol.cpp:384 cfi
4 xul.dll nsMailboxProtocol::OnStopRequest(nsIRequest*, nsresult) comm/mailnews/local/src/nsMailboxProtocol.cpp:384 cfi
5 xul.dll nsInputStreamPump::OnStateStop() netwerk/base/nsInputStreamPump.cpp:655 cfi
6 xul.dll nsInputStreamPump::OnInputStreamReady(nsIAsyncInputStream*) netwerk/base/nsInputStreamPump.cpp:403 cfi
7 xul.dll nsInputStreamReadyEvent::Run() xpcom/io/nsStreamUtils.cpp:91 cfi
8 xul.dll nsThread::ProcessNextEvent(bool, bool*) xpcom/threads/nsThread.cpp:1175 cfi
9 xul.dll NS_ProcessNextEvent(nsIThread*, bool) xpcom/threads/nsThreadUtils.cpp:486 cfi
10 xul.dll nsThread::Shutdown() xpcom/threads/nsThread.cpp:881 cfi
11 xul.dll mozilla::storage::Connection::shutdownAsyncThread() storage/mozStorageConnection.cpp:1000 cfi
12 xul.dll nsresult mozilla::detail::RunnableMethodImpl<(anonymous namespace)::HangMonitorChild*, void ((anonymous namespace)::HangMonitorChild::)() attribute((thiscall)), 0, mozilla::RunnableKind::Standard>::Run() xpcom/threads/nsThreadUtils.h:1174 cfi
13 xul.dll nsThread::ProcessNextEvent(bool, bool
) xpcom/threads/nsThread.cpp:1175 cfi

Crash Signature: [@ nsMsgLocalMailFolder::EndCopy(bool)] [@ nsMsgLocalMailFolder::EndCopy ] → [@ nsMsgLocalMailFolder::EndCopy ]
OS: Windows NT → All
Whiteboard: [regression:TB12] → [regression:TB12][rare]

50% increase over the last six months, so higher crash rate for 78.

bp-2c4d7f1f-6460-47a9-8d56-3b9f40201231
0 xul.dll nsMsgLocalMailFolder::EndCopy(bool) comm/mailnews/local/src/nsLocalMailFolder.cpp:2148
1 xul.dll nsCopyMessageStreamListener::EndCopy(nsISupports*, nsresult) comm/mailnews/base/src/nsCopyMessageStreamListener.cpp:107
2 xul.dll nsCopyMessageStreamListener::OnStopRequest(nsIRequest*, nsresult) comm/mailnews/base/src/nsCopyMessageStreamListener.cpp:145
3 xul.dll nsMsgProtocol::OnStopRequest(nsIRequest*, nsresult) comm/mailnews/base/src/nsMsgProtocol.cpp:386
4 xul.dll nsMailboxProtocol::OnStopRequest(nsIRequest*, nsresult) comm/mailnews/local/src/nsMailboxProtocol.cpp:384
5 xul.dll nsInputStreamPump::OnStateStop() netwerk/base/nsInputStreamPump.cpp:649
6 xul.dll nsInputStreamPump::OnInputStreamReady(nsIAsyncInputStream*) netwerk/base/nsInputStreamPump.cpp:397
7 xul.dll nsInputStreamReadyEvent::Run() xpcom/io/nsStreamUtils.cpp:94
8 xul.dll nsThread::ProcessNextEvent(bool, bool*) xpcom/threads/nsThread.cpp:1211

Flags: needinfo?(mkmelin+mozilla)

https://hg.mozilla.org/releases/comm-esr78/file/tip/mailnews/local/src/nsLocalMailFolder.cpp#l2148
Possibly making mCopyState a local reference could help.

The root cause may be that this is in MOZ_CAN_RUN_SCRIPT_BOUNDARY (see https://searchfox.org/comm-central/source/mozilla/build/clang-plugin/CanRunScriptChecker.cpp)

Flags: needinfo?(mkmelin+mozilla)

(In reply to Magnus Melin [:mkmelin] from comment #5)

https://hg.mozilla.org/releases/comm-esr78/file/tip/mailnews/local/src/nsLocalMailFolder.cpp#l2148
Possibly making mCopyState a local reference could help.

Ben, does a patch based on that make sense? Or is more needed?

The root cause may be that this is in MOZ_CAN_RUN_SCRIPT_BOUNDARY (see https://searchfox.org/comm-central/source/mozilla/build/clang-plugin/CanRunScriptChecker.cpp)

bp-f7583133-9e03-4ed8-a394-426110210610 is one of the simpler stacks.

Most of the others are messier, like bp-57720592-0238-4a56-aee7-bfd110210617a and bp-99f4be67-8e93-49ad-94ab-7b5000210628

Flags: needinfo?(benc)

(In reply to Wayne Mery (:wsmwk) from comment #6)

(In reply to Magnus Melin [:mkmelin] from comment #5)

https://hg.mozilla.org/releases/comm-esr78/file/tip/mailnews/local/src/nsLocalMailFolder.cpp#l2148
Possibly making mCopyState a local reference could help.

Ben, does a patch based on that make sense? Or is more needed?

mCopyState is used a few lines earlier, and I don't see any way for it to get nulled out, so I think it's something else.
My guess is it's crashing inside FinishNewLocalMessage(), but that's being inlined and so it's not showing up in the callstack. I think one of the params being passed in to it might be null. Maybe some asserts would help catch it in the wild? If it's that bad it'll crash anyway, so a fatal assert doesn't make the user experience any worse.
Magnus: I was sure there was some macro which would fatal in release builds... or was I imagining it?

Related: I've got some pretty big changes planned which will probably affect (i.e. replace!) this code in the coming weeks. The current copy system has gotten very convoluted and I'm aiming to straighten it all out as part of the ongoing maildir work.

Flags: needinfo?(benc) → needinfo?(mkmelin+mozilla)

MOZ_RELEASE_ASSERT?

Flags: needinfo?(mkmelin+mozilla)

(In reply to Ben Campbell from comment #7)

Related: I've got some pretty big changes planned which will probably affect (i.e. replace!) this code in the coming weeks. The current copy system has gotten very convoluted and I'm aiming to straighten it all out as part of the ongoing maildir work.

It would be insanely great if you can add error code check for low-level I/O routine, mostly write() and close() for output stream.
Currently you can see that write(), close(), etc. are used without any checking of error code at all.

In my attempt to enable buffering for writing in TB (which is a huge win since this cuts down the number of system calls by leaps and bounds, thus reducing context switching. I am sorry that I have kept this on back burner for a while due to family matters and such.),
I realized this omission of error checking and needed to check them all as far as message writing goes. (Enabling buffering means write error would be postponed until the next buffer flushing and at closing time, and thus I had to make sure that close() calls are all properly checked, etc.)
bug 1242030

(In reply to Ben Campbell from comment #7)

Related: I've got some pretty big changes planned which will probably affect (i.e. replace!) this code in the coming weeks. The current copy system has gotten very convoluted and I'm aiming to straighten it all out as part of the ongoing maildir work.

bug 1308335 comment 9?
Additional bug number?

Crash ranks #190 for 78.14.0 and #260 for 92.2.1. (signature my have shifted some in version 91)

Flags: needinfo?(benc)
See Also: → 1749276
Flags: needinfo?(benc)

benc, comment 10

Flags: needinfo?(benc)

(In reply to Wayne Mery (:wsmwk) from comment #11)

benc, comment 10

No, I've been stepping carefully around the message-copying code for now, just trying not to break anything.
So there's been no sweeping-everything-clean I'm afraid. (it'll be a big job)
I'll keep on at this crash in Bug 1749276.

Flags: needinfo?(benc)
Severity: critical → S2
Depends on: 1749276
See Also: 1749276

Version 115 crash rate is roughly 1/4 that of 102. But not so for the signatures cited in bug 1749276

Severity: S2 → S3
Attached image 6 month crash rate
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: