Closed Bug 818325 Opened 13 years ago Closed 4 months ago

crash in nsMsgLocalMailFolder::EndCopy

Categories

(MailNews Core :: Backend, defect)

x86
All
defect

Tracking

(Not tracked)

RESOLVED WORKSFORME

People

(Reporter: wsmwk, Unassigned)

References

(Depends on 1 open bug)

Details

(Keywords: crash, regression, Whiteboard: [regression:TB12][rare])

Crash Data

Attachments

(1 file)

This bug was filed from the Socorro interface and is report bp-502d97f1-54e8-4cb5-86ff-debd52121203 . ============================================================= 0 xul.dll nsMsgLocalMailFolder::EndCopy mailnews/local/src/nsLocalMailFolder.cpp:2257 1 xul.dll nsCopyMessageStreamListener::EndCopy mailnews/base/src/nsCopyMessageStreamListener.cpp:121 2 xul.dll nsMailboxProtocol::OnStopRequest mailnews/local/src/nsMailboxProtocol.cpp:297 3 xul.dll nsInputStreamPump::OnStateStop netwerk/base/src/nsInputStreamPump.cpp:555 4 xul.dll nsInputStreamPump::OnInputStreamReady netwerk/base/src/nsInputStreamPump.cpp:373 5 xul.dll nsOutputStreamReadyEvent::Run xpcom/io/nsStreamUtils.cpp:82 6 xul.dll nsThread::ProcessNextEvent xpcom/threads/nsThread.cpp:624 7 xul.dll NS_ProcessNextEvent_P objdir-tb/mozilla/xpcom/build/nsThreadUtils.cpp:217 8 xul.dll nsXULWindow::ShowModal xpfe/appshell/src/nsXULWindow.cpp:378 9 xul.dll nsContentTreeOwner::ShowAsModal xpfe/appshell/src/nsContentTreeOwner.cpp:529 10 xul.dll nsWindowWatcher::OpenWindowJSInternal embedding/components/windowwatcher/src/nsWindowWatcher.cpp:995 11 xul.dll nsWindowWatcher::OpenWindowJS embedding/components/windowwatcher/src/nsWindowWatcher.cpp:446 12 xul.dll nsGlobalWindow::OpenInternal dom/base/nsGlobalWindow.cpp:9217 13 xul.dll nsGlobalWindow::OpenDialog dom/base/nsGlobalWindow.cpp:5941 14 xul.dll NS_InvokeByIndex_P xpcom/reflect/xptcall/src/md/win32/xptcinvoke.cpp:70 Other stacks are not so long. lilke bp-9d56da9d-090a-4d0d-92c7-ed9552121204 bp-502d97f1-54e8-4cb5-86ff-debd52121203 bp-1d4ee2fc-5fd6-4824-bc8e-31e382121204 Mac
This crash begsin in version 12, so I believe it is a regression of bug 402392 (maildir) bp-cc8b6ee5-315f-4906-a075-580382140901 TB31 0 xul.dll nsMsgLocalMailFolder::EndCopy(bool) mailnews/local/src/nsLocalMailFolder.cpp 1 xul.dll nsCopyMessageStreamListener::EndCopy(nsISupports*, tag_nsresult) mailnews/base/src/nsCopyMessageStreamListener.cpp 2 xul.dll nsMailboxProtocol::OnStopRequest(nsIRequest*, nsISupports*, tag_nsresult) mailnews/local/src/nsMailboxProtocol.cpp 3 xul.dll nsInputStreamPump::OnStateStop() netwerk/base/src/nsInputStreamPump.cpp http://hg.mozilla.org/releases/comm-esr31/annotate/6219e5a9a8d8/mailnews/base/src/nsCopyMessageStreamListener.cpp#l114 bienvenu@10291 109 nsresult rv; bienvenu@10291 110 nsCOMPtr<nsIURI> uri = do_QueryInterface(url, &rv); hg@0 111 bienvenu@10291 112 NS_ENSURE_SUCCESS(rv, rv); bienvenu@10291 113 bool copySucceeded = (aStatus == NS_BINDING_SUCCEEDED); bienvenu@10291 114 rv = mDestination->EndCopy(copySucceeded); http://hg.mozilla.org/releases/comm-esr31/annotate/6219e5a9a8d8/mailnews/local/src/nsLocalMailFolder.cpp#l2447 hg@0 2434 if (mCopyState->m_isFolder) mconley@13187 2435 CopyAllSubFolders(srcFolder, nullptr, nullptr); //Copy all subfolders then notify completion hg@0 2436 hg@0 2437 if (mCopyState->m_msgWindow && mCopyState->m_undoMsgTxn) hg@0 2438 { hg@0 2439 nsCOMPtr<nsITransactionManager> txnMgr; hg@0 2440 mCopyState->m_msgWindow->GetTransactionManager(getter_AddRefs(txnMgr)); hg@0 2441 if (txnMgr) hg@0 2442 txnMgr->DoTransaction(mCopyState->m_undoMsgTxn); hg@0 2443 } kent@3538 2444 hg@0 2445 // enable the dest folder bugzilla@10318 2446 EnableNotifications(allMessageCountNotifications, true, false /*dbBatching*/); //dest folder doesn't need db batching kent@3538 2447 if (srcFolder && !mCopyState->m_isFolder) bp-a031c5eb-ae6f-406a-aa6e-c6c732140903 TB12 http://hg.mozilla.org/releases/comm-beta/annotate/33ee8cf6630f/mailnews/base/src/nsCopyMessageStreamListener.cpp#l154 bienvenu@10291 152 NS_ENSURE_SUCCESS(rv, rv); bienvenu@10291 153 bool copySucceeded = (aStatus == NS_BINDING_SUCCEEDED); bienvenu@10291 154 rv = mDestination->EndCopy(copySucceeded); http://hg.mozilla.org/releases/comm-beta/annotate/33ee8cf6630f/mailnews/local/src/nsLocalMailFolder.cpp#l2287 bienvenu@10291 2281 if (mCopyState->m_parseMsgState) bienvenu@10291 2282 mCopyState->m_parseMsgState->ParseAFolderLine(CRLF, MSG_LINEBREAK_LEN); bienvenu@10291 283 } bienvenu@10291 2284 // flush the copied message. We need a close at the end to get the bienvenu@10291 2285 // file size and time updated correctly. bienvenu@10291 2286 rv = mCopyState->m_msgStore->FinishNewMessage(mCopyState->m_fileStream, bienvenu@10291 2287 mCopyState->m_newHdr);
Blocks: 402392
Keywords: regression
Whiteboard: [regression:TB12]
Removing myslef on all the bugs I'm cced on. Please NI me if you need something on MailNews Core bugs from me.

Pretty steady at 20-25 crashes per week (and equal number of users)
bp-9ec2f96d-ad12-4038-9917-8889e0200105 is a current example (and using Avast)
bp-a527365d-db1c-4e3c-9ade-e53f50200104 is not using avast
0 xul.dll nsMsgLocalMailFolder::EndCopy(bool) comm/mailnews/local/src/nsLocalMailFolder.cpp:2135 context
1 xul.dll nsCopyMessageStreamListener::EndCopy(nsISupports*, nsresult) comm/mailnews/base/src/nsCopyMessageStreamListener.cpp:107 cfi
2 xul.dll nsCopyMessageStreamListener::OnStopRequest(nsIRequest*, nsresult) comm/mailnews/base/src/nsCopyMessageStreamListener.cpp:145 cfi
3 xul.dll nsMsgProtocol::OnStopRequest(nsIRequest*, nsresult) comm/mailnews/base/util/nsMsgProtocol.cpp:384 cfi
4 xul.dll nsMailboxProtocol::OnStopRequest(nsIRequest*, nsresult) comm/mailnews/local/src/nsMailboxProtocol.cpp:384 cfi
5 xul.dll nsInputStreamPump::OnStateStop() netwerk/base/nsInputStreamPump.cpp:655 cfi
6 xul.dll nsInputStreamPump::OnInputStreamReady(nsIAsyncInputStream*) netwerk/base/nsInputStreamPump.cpp:403 cfi
7 xul.dll nsInputStreamReadyEvent::Run() xpcom/io/nsStreamUtils.cpp:91 cfi
8 xul.dll nsThread::ProcessNextEvent(bool, bool*) xpcom/threads/nsThread.cpp:1175 cfi
9 xul.dll NS_ProcessNextEvent(nsIThread*, bool) xpcom/threads/nsThreadUtils.cpp:486 cfi
10 xul.dll nsThread::Shutdown() xpcom/threads/nsThread.cpp:881 cfi
11 xul.dll mozilla::storage::Connection::shutdownAsyncThread() storage/mozStorageConnection.cpp:1000 cfi
12 xul.dll nsresult mozilla::detail::RunnableMethodImpl<(anonymous namespace)::HangMonitorChild*, void ((anonymous namespace)::HangMonitorChild::)() attribute((thiscall)), 0, mozilla::RunnableKind::Standard>::Run() xpcom/threads/nsThreadUtils.h:1174 cfi
13 xul.dll nsThread::ProcessNextEvent(bool, bool
) xpcom/threads/nsThread.cpp:1175 cfi

Crash Signature: [@ nsMsgLocalMailFolder::EndCopy(bool)] [@ nsMsgLocalMailFolder::EndCopy ] → [@ nsMsgLocalMailFolder::EndCopy ]
OS: Windows NT → All
Whiteboard: [regression:TB12] → [regression:TB12][rare]

50% increase over the last six months, so higher crash rate for 78.

bp-2c4d7f1f-6460-47a9-8d56-3b9f40201231
0 xul.dll nsMsgLocalMailFolder::EndCopy(bool) comm/mailnews/local/src/nsLocalMailFolder.cpp:2148
1 xul.dll nsCopyMessageStreamListener::EndCopy(nsISupports*, nsresult) comm/mailnews/base/src/nsCopyMessageStreamListener.cpp:107
2 xul.dll nsCopyMessageStreamListener::OnStopRequest(nsIRequest*, nsresult) comm/mailnews/base/src/nsCopyMessageStreamListener.cpp:145
3 xul.dll nsMsgProtocol::OnStopRequest(nsIRequest*, nsresult) comm/mailnews/base/src/nsMsgProtocol.cpp:386
4 xul.dll nsMailboxProtocol::OnStopRequest(nsIRequest*, nsresult) comm/mailnews/local/src/nsMailboxProtocol.cpp:384
5 xul.dll nsInputStreamPump::OnStateStop() netwerk/base/nsInputStreamPump.cpp:649
6 xul.dll nsInputStreamPump::OnInputStreamReady(nsIAsyncInputStream*) netwerk/base/nsInputStreamPump.cpp:397
7 xul.dll nsInputStreamReadyEvent::Run() xpcom/io/nsStreamUtils.cpp:94
8 xul.dll nsThread::ProcessNextEvent(bool, bool*) xpcom/threads/nsThread.cpp:1211

Flags: needinfo?(mkmelin+mozilla)

https://hg.mozilla.org/releases/comm-esr78/file/tip/mailnews/local/src/nsLocalMailFolder.cpp#l2148
Possibly making mCopyState a local reference could help.

The root cause may be that this is in MOZ_CAN_RUN_SCRIPT_BOUNDARY (see https://searchfox.org/comm-central/source/mozilla/build/clang-plugin/CanRunScriptChecker.cpp)

Flags: needinfo?(mkmelin+mozilla)

(In reply to Magnus Melin [:mkmelin] from comment #5)

https://hg.mozilla.org/releases/comm-esr78/file/tip/mailnews/local/src/nsLocalMailFolder.cpp#l2148
Possibly making mCopyState a local reference could help.

Ben, does a patch based on that make sense? Or is more needed?

The root cause may be that this is in MOZ_CAN_RUN_SCRIPT_BOUNDARY (see https://searchfox.org/comm-central/source/mozilla/build/clang-plugin/CanRunScriptChecker.cpp)

bp-f7583133-9e03-4ed8-a394-426110210610 is one of the simpler stacks.

Most of the others are messier, like bp-57720592-0238-4a56-aee7-bfd110210617a and bp-99f4be67-8e93-49ad-94ab-7b5000210628

Flags: needinfo?(benc)

(In reply to Wayne Mery (:wsmwk) from comment #6)

(In reply to Magnus Melin [:mkmelin] from comment #5)

https://hg.mozilla.org/releases/comm-esr78/file/tip/mailnews/local/src/nsLocalMailFolder.cpp#l2148
Possibly making mCopyState a local reference could help.

Ben, does a patch based on that make sense? Or is more needed?

mCopyState is used a few lines earlier, and I don't see any way for it to get nulled out, so I think it's something else.
My guess is it's crashing inside FinishNewLocalMessage(), but that's being inlined and so it's not showing up in the callstack. I think one of the params being passed in to it might be null. Maybe some asserts would help catch it in the wild? If it's that bad it'll crash anyway, so a fatal assert doesn't make the user experience any worse.
Magnus: I was sure there was some macro which would fatal in release builds... or was I imagining it?

Related: I've got some pretty big changes planned which will probably affect (i.e. replace!) this code in the coming weeks. The current copy system has gotten very convoluted and I'm aiming to straighten it all out as part of the ongoing maildir work.

Flags: needinfo?(benc) → needinfo?(mkmelin+mozilla)

MOZ_RELEASE_ASSERT?

Flags: needinfo?(mkmelin+mozilla)

(In reply to Ben Campbell from comment #7)

Related: I've got some pretty big changes planned which will probably affect (i.e. replace!) this code in the coming weeks. The current copy system has gotten very convoluted and I'm aiming to straighten it all out as part of the ongoing maildir work.

It would be insanely great if you can add error code check for low-level I/O routine, mostly write() and close() for output stream.
Currently you can see that write(), close(), etc. are used without any checking of error code at all.

In my attempt to enable buffering for writing in TB (which is a huge win since this cuts down the number of system calls by leaps and bounds, thus reducing context switching. I am sorry that I have kept this on back burner for a while due to family matters and such.),
I realized this omission of error checking and needed to check them all as far as message writing goes. (Enabling buffering means write error would be postponed until the next buffer flushing and at closing time, and thus I had to make sure that close() calls are all properly checked, etc.)
bug 1242030

(In reply to Ben Campbell from comment #7)

Related: I've got some pretty big changes planned which will probably affect (i.e. replace!) this code in the coming weeks. The current copy system has gotten very convoluted and I'm aiming to straighten it all out as part of the ongoing maildir work.

bug 1308335 comment 9?
Additional bug number?

Crash ranks #190 for 78.14.0 and #260 for 92.2.1. (signature my have shifted some in version 91)

Flags: needinfo?(benc)
See Also: → 1749276
Flags: needinfo?(benc)

benc, comment 10

Flags: needinfo?(benc)

(In reply to Wayne Mery (:wsmwk) from comment #11)

benc, comment 10

No, I've been stepping carefully around the message-copying code for now, just trying not to break anything.
So there's been no sweeping-everything-clean I'm afraid. (it'll be a big job)
I'll keep on at this crash in Bug 1749276.

Flags: needinfo?(benc)
Severity: critical → S2
Depends on: 1749276
See Also: 1749276

Version 115 crash rate is roughly 1/4 that of 102. But not so for the signatures cited in bug 1749276

Severity: S2 → S3
Attached image 6 month crash rate

No version 128 crashes.

Status: NEW → RESOLVED
Closed: 4 months ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: