Closed Bug 1797616 Opened 2 years ago Closed 2 years ago

Startup crash in [@ std::_Func_impl_no_alloc<T>::_Do_call | nsFolderCompactState::OnStopRunningUrl]

Categories

(MailNews Core :: Networking: IMAP, defect)

Thunderbird 107
x86
All
defect

Tracking

(thunderbird_esr102+ verified, thunderbird107 wontfix, thunderbird108 verified)

VERIFIED FIXED
108 Branch
Tracking Status
thunderbird_esr102 + verified
thunderbird107 --- wontfix
thunderbird108 --- verified

People

(Reporter: wsmwk, Assigned: benc)

References

()

Details

(4 keywords, Whiteboard: [TM:102.6.1])

Crash Data

Attachments

(2 files)

+++ This bug was initially created as a clone of Bug #1766101 +++

Sharp uptick of crashes in 107.0b1. Regression of bug 1793599 ?

Crash report: https://crash-stats.mozilla.org/report/index/84695245-098b-415a-a750-9a6190221026

Reason: EXCEPTION_ACCESS_VIOLATION_READTop 10 frames of crashing thread:

0  xul.dll  std::_Func_impl_no_alloc<`lambda at /builds/worker/checkouts/gecko/comm/mailnews/base/src/nsMsgFolderCompactor.cpp:1331:25', void, nsresult>::_Do_call  mailnews/base/src/nsMsgFolderCompactor.cpp:103
1  xul.dll  nsFolderCompactState::FinishCompact  mailnews/base/src/nsMsgFolderCompactor.cpp:644
2  xul.dll  nsFolderCompactState::OnStopRequest  mailnews/base/src/nsMsgFolderCompactor.cpp:684
3  xul.dll  nsMsgProtocol::OnStopRequest  mailnews/base/src/nsMsgProtocol.cpp:390
4  xul.dll  nsMailboxProtocol::OnStopRequest  mailnews/local/src/nsMailboxProtocol.cpp:389
5  xul.dll  nsInputStreamPump::OnStateStop  netwerk/base/nsInputStreamPump.cpp:670
6  xul.dll  nsInputStreamPump::OnInputStreamReady  netwerk/base/nsInputStreamPump.cpp:395
7  xul.dll  CallbackHolder::CallbackHolder::<lambda_1>::operator const  xpcom/io/nsPipe3.cpp:73
7  xul.dll  NS_NewCancelableRunnableFunction<`lambda at /builds/worker/checkouts/gecko/xpcom/io/nsPipe3.cpp:71:35'>::FuncCancelableRunnable::Run  xpcom/threads/nsThreadUtils.h:650
8  xul.dll  mozilla::RunnableTask::Run  xpcom/threads/TaskController.cpp:538

Mac Crash report: https://crash-stats.mozilla.org/report/index/b18025bb-cb36-43fa-9f6d-9d5a50221026

Reason: EXC_BAD_ACCESS / KERN_INVALID_ADDRESSTop 10 frames of crashing thread:

0  XUL  nsFolderCompactState::ExpungedBytes const  mailnews/base/src/nsMsgFolderCompactor.cpp:103
0  XUL  nsMsgFolderCompactor::NextFolder const  mailnews/base/src/nsMsgFolderCompactor.cpp:1340
0  XUL  std::__1::__invoke<nsMsgFolderCompactor::NextFolder  /builds/worker/fetches/MacOSX11.3.sdk/usr/include/c++/v1/type_traits:3747
0  XUL  std::__1::__invoke_void_return_wrapper<void>::__call<nsMsgFolderCompactor::NextFolder  /builds/worker/fetches/MacOSX11.3.sdk/usr/include/c++/v1/__functional_base:348
0  XUL  std::__1::__function::__alloc_func<nsMsgFolderCompactor::NextFolder  /builds/worker/fetches/MacOSX11.3.sdk/usr/include/c++/v1/functional:1553
0  XUL  std::__1::__function::__func<nsMsgFolderCompactor::NextFolder  /builds/worker/fetches/MacOSX11.3.sdk/usr/include/c++/v1/functional:1727
1  XUL  std::__1::__function::__value_func<void  const  /builds/worker/fetches/MacOSX11.3.sdk/usr/include/c++/v1/functional:1880
1  XUL  std::__1::function<void  const  /builds/worker/fetches/MacOSX11.3.sdk/usr/include/c++/v1/functional:2555
1  XUL  nsFolderCompactState::FinishCompact  mailnews/base/src/nsMsgFolderCompactor.cpp:644
2  XUL  nsFolderCompactState::OnStopRequest  mailnews/base/src/nsMsgFolderCompactor.cpp:684
Flags: needinfo?(mkmelin+mozilla)
Flags: needinfo?(gds)
See Also: → 1766101

Note, this is a topcrash, whereas Bug 1766101 is not.

https://searchfox.org/comm-central/rev/58c2ab35a3003f751d0a9ab2886c3e580e5f259b/mailnews/base/src/nsMsgFolderCompactor.cpp#1329

Seems it for some reason doesn't like the call to uint64_t ExpungedBytes() const { return m_totalExpungedBytes; }?

Flags: needinfo?(mkmelin+mozilla) → needinfo?(benc)

The crash address is always 0x48 on 32-bit builds and 0x70 on 64-bit ones, it's likely that the nsFolderCompactState instance ExpungedBytes() is being called upon is NULL.

Assignee: nobody → benc
Status: NEW → ASSIGNED

My best guess is that a Compact() call is failing, but still (partially) kicking off iteration over the messages (which is handed by the message copy system). So when the Compact() returns the failure, it releases it's reference to the nsFolderCompactState:

https://hg.mozilla.org/releases/comm-beta/file/8b303f5a6ea43fae822fcb2a3eeeacbd48dda058/mailnews/base/src/nsMsgFolderCompactor.cpp#l1352

... but the nsFolderCompactState is still there (probably held in existence by the message copy system, for which it is a listener). When the copy completes (or fails), the completion function is invoked, which tries to get the expunged bytes from the nsFolderCompactState, using the (now null!) pointer in nsMsgFolderCompactor. The nsFolderCompactState still exists (it's the thing that is calling the completion fn), but we're trying to access it via a null pointer.

This patch bypasses all that by passing in the expunged bytes as a parameter to completionFn, so there is no need to access the nsFolderCompactState in any case. Even if this patch doesn't fix things, I think it's more logical.

SO...

I say we land this patch and see if it helps.

The proper solution is to rewrite the folder compaction code. It's waaaay too convoluted and complex, and there are so many places where the error handling will just fall flat on it's face. I do plan to do this once I've sorted out the rest of the mbox code properly (Bug 1719121).

Flags: needinfo?(benc)
Target Milestone: --- → 108 Branch

Pushed by mkmelin@iki.fi:
https://hg.mozilla.org/comm-central/rev/5837935c3864
Pass expungedBytes as parameter to nsFolderCompactState completion function. r=mkmelin

Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED

This is crashing comm/mailnews/imap/test/unit/test_offlineStoreLocking.js

Flags: needinfo?(benc)
Backout by mkmelin@iki.fi:
https://hg.mozilla.org/comm-central/rev/422819efbe1f
Backed out changeset 5837935c3864
Status: RESOLVED → REOPENED
Resolution: FIXED → ---

Patch tweaked and uploaded. Passes test_offlineStoreLocking.js fine locally now, but I just kicked off a try run to go with it:
https://treeherder.mozilla.org/jobs?repo=try-comm-central&revision=4fe001322da9e1090032e0242ebf2b735cbf3a3b

Flags: needinfo?(benc)

Pushed by alessandro@thunderbird.net:
https://hg.mozilla.org/comm-central/rev/4a5e88f6613a
Pass expungedBytes as parameter to nsFolderCompactState completion function. r=mkmelin

Status: REOPENED → RESOLVED
Closed: 2 years ago2 years ago
Resolution: --- → FIXED

Backed out again for build bustages across the board.
Backout: https://hg.mozilla.org/comm-central/rev/9daf437f80b8cc28195d8127ceb69a3489156f53

Status: RESOLVED → REOPENED
Resolution: FIXED → ---

This fix is cursed :-)
This bustage totally confused me (it worked fine in a try build a day or two ago)...
Turns out to be a simple case of 24 hour self-sabotage bitrot: I did a fix in Bug 1798181 around the same time which clashed with this patch.

Updated in phab, pending this new try build:

https://treeherder.mozilla.org/jobs?repo=try-comm-central&revision=4510aed9c4bc1f1471dc02bd5f94a68d62474318

Third time a charm? :) Try looks ok so will land it now.

Pushed by mkmelin@iki.fi:
https://hg.mozilla.org/comm-central/rev/9ef9ebcf17b5
Pass expungedBytes as parameter to nsFolderCompactState completion function. r=mkmelin

Status: REOPENED → RESOLVED
Closed: 2 years ago2 years ago
Resolution: --- → FIXED

(In reply to Wayne Mery (:wsmwk) from comment #1)

Note, this is a topcrash, whereas Bug 1766101 is not.

I think that should have been written as "this is a topcrash for BETA (starting at 107.0b1), whereas Bug 1766101 is not."

However,

  1. we haven't identified the cause of the big increase at 107.0b1, the regressing bug/patch for this bug, unless Bug 1793599 - IMAP to fastmail fails after erasing ImapMail folder - is the primary suspect and which BTW was uplifted to 102.4.2
  2. Bug 1766101 is a topcrash for 102 (but it is not marked as such - somwhere along the way I goofed) - ranking #7, #34, etc - which means we really should do something for 102 if possible
  3. #2 above is an entaglement which makes it more difficult to use crash-stats to determine both the cause of #1 and the effectiveness of the patch in this bug

Ben, Gene, thoughts?

(In reply to Ben Campbell from comment #5)

My best guess is that a Compact() call is failing, but still (partially) kicking off iteration over the messages (which is handed by the message copy system). So when the Compact() returns the failure, it releases it's reference to the nsFolderCompactState:
...
The proper solution is to rewrite the folder compaction code. It's waaaay too convoluted and complex, and there are so many places where the error handling will just fall flat on it's face. I do plan to do this once I've sorted out the rest of the mbox code properly (Bug 17191210).

You mean Bug 1719121 ?

Flags: needinfo?(benc)
See Also: → 1719121

But if beta crashes significantly drop starting with 108.0b1 (nightly crashes are virtually zero, so nightly stats are not helpful), then perhaps we have a winner that can be uplifted to 102.

Flags: needinfo?(gds)

We have a winner. Crashes end starting with 108 beta for at least three signatures:

Average user crashes 5-10 times.
Unfortunately the reporter of https://support.mozilla.org/en-US/questions/1381616 never replied.
A new report https://support.mozilla.org/en-US/questions/1399437

Flags: needinfo?(benc)

Let's get this for 102.6.1

Status: RESOLVED → VERIFIED
Flags: needinfo?(benc)
Whiteboard: [TM:102.6.1]

Comment on attachment 9300670 [details]
Bug 1797616 - Pass expungedBytes as parameter to nsFolderCompactState completion function. r=mkmelin

[Approval Request Comment]
Regression caused by (bug #): was never clearly labeled as a regression
User impact if declined: user crashes
Testing completed (on c-c, etc.): on beta for over 30 days
Risk to taking this patch (and alternatives if risky): ?

Attachment #9300670 - Flags: approval-comm-esr102?

Comment on attachment 9300670 [details]
Bug 1797616 - Pass expungedBytes as parameter to nsFolderCompactState completion function. r=mkmelin

[Triage Comment]
Approved for esr102

Risk to taking this patch (and alternatives if risky): ?

FWIW, no major new crashes or other regressions have been reported on beta.

Attachment #9300670 - Flags: approval-comm-esr102? → approval-comm-esr102+
Crash Signature: nsFolderCompactState::FinishCompact] [@ nsFolderCompactState::Compact ] → nsFolderCompactState::FinishCompact] [@ nsFolderCompactState::Compact ] [@ nsFolderCompactState::OnStopRunningUrl]
Flags: needinfo?(benc)

Also gone in same general time frame is Mac+Linux crash signature nsFolderCompactState::ExpungedBytes, which is why I am noting it this bug report. But it probably doesn't have the same cause because

Regardless, nsFolderCompactState::ExpungedBytes is gone.

You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: