Closed Bug 1568095 Opened 5 months ago Closed 4 months ago

Crash in [@ morkAtom::GetYarn]

Categories

(MailNews Core :: Database, defect, critical)

Unspecified
Windows 10
defect
Not set
critical

Tracking

(thunderbird_esr6069+ fixed, thunderbird_esr6868+ fixed, thunderbird69 fixed, thunderbird70 fixed)

RESOLVED FIXED
Thunderbird 70.0
Tracking Status
thunderbird_esr60 69+ fixed
thunderbird_esr68 68+ fixed
thunderbird69 --- fixed
thunderbird70 --- fixed

People

(Reporter: wsmwk, Assigned: jorgk-bmo)

References

()

Details

(Keywords: crash)

Crash Data

Attachments

(1 file)

This bug is for crash report bp-f45967bb-ec18-4ef4-8b24-5a2160190722 which was reported in http://forums.mozillazine.org/viewtopic.php?f=39&t=3052150

Top 10 frames of crashing thread:

0 xul.dll morkAtom::GetYarn comm/db/mork/src/morkAtom.cpp:40
1 xul.dll morkStore::CopyToken comm/db/mork/src/morkStore.cpp:959
2 xul.dll morkRow::SetRow comm/db/mork/src/morkRow.cpp:689
3 xul.dll morkRowObject::SetRow comm/db/mork/src/morkRowObject.cpp:430
4 xul.dll nsMsgDatabase::CopyHdrFromExistingHdr comm/mailnews/db/msgdb/src/nsMsgDatabase.cpp:3531
5 xul.dll nsFolderCompactState::EndCopy comm/mailnews/base/src/nsMsgFolderCompactor.cpp:1262
6 xul.dll nsMailboxProtocol::OnStopRequest comm/mailnews/local/src/nsMailboxProtocol.cpp:291
7 xul.dll nsInputStreamPump::OnStateStop netwerk/base/nsInputStreamPump.cpp:654
8 xul.dll nsInputStreamPump::OnInputStreamReady netwerk/base/nsInputStreamPump.cpp:402
9 xul.dll nsInputStreamReadyEvent::Run xpcom/io/nsStreamUtils.cpp:91

https://support.mozilla.org/en-US/questions/1245569#answer-1186355 reports
"The error returned when I tried to compact folders. I believe you are right the error seems to be related to compacting of folders. // I then reran Thunderbird in safe mode and it was able to complete successfully without crashing. Now everything seems to be working as normal (knocking on wood). // I believe this all stemmed from my faulty RAM issues which somehow ended up corrupting files."

Most of the users with whom I've been in contact since 2017 report database related issues such as addressbook search, confirmed corrupt address book, confirmed folder corruption found during compact or viewing messages. The source of corruption is unknown.

In one case (John H) "Turned out that the issue was that connections to these services: Icloud, GoogleDrive, and MicroSoftOneDrive were turned off causing thunderbird to crash.
I uninstalled these services and Thunderbird was happy."

So this started at TB 52?

Current code at comm/db/mork/src/morkAtom.cpp:40 is:

  if (this->IsWeeBook()) {
    morkWeeBookAtom* weeBook = (morkWeeBookAtom*)this;
40  source = weeBook->mWeeBookAtom_Body;

So this can't be null otherwise it would have crashed at line 38, and weeBook is just a cast. Strange.

Magnus, Ben, any ideas?

Flags: needinfo?(mkmelin+mozilla)
Flags: needinfo?(benc)

(In reply to Jorg K (GMT+2) from comment #1)

So this started at TB 52?

Not 100% certain, but I don't find any crashes from the last 6 months at version 45 or earlier.

Two older examples FWIW
bp-c9af0c28-4d00-43f2-aae8-095c00190710 60.5.0
bp-ce80a34b-cc97-4236-a3a3-9b6880190129 52.0

To clarify further about this not being a new crash, and whether it started in 52 or not:

  • I intended to file this bug a year ago, and in fact made many user contacts going back to 2017 - it just never happened for a variety of reasons
  • crash may have existed prior to version 52 but as a different signature - impossible to determine now two years later

(In reply to Jorg K (GMT+2) from comment #1)

So this can't be null otherwise it would have crashed at line 38, and weeBook is just a cast. Strange.

I concur. Looking at the current code, it seems like it should crash earlier if this is null.

But I think the source has changed since then (The source links in the crash reports don't seem to work right now).

I think GetYarn() is being called from morkStore::TokenToString(), which has the slightly concerning comment (comm/db/mork/src/morkStore.cpp:719):

    atom->GetYarn(outTokenName);  // note this is safe even when atom==nil

So it seems like there was some expectation in the past that GetYarn() could be called with a null this pointer (eek!).
That's certainly not the case in the current GetYarn()!

Flags: needinfo?(benc)

Damn, you're right. bp-f45967bb-ec18-4ef4-8b24-5a2160190722 is for 60.8 before the big reformatting. Looking at the esr60 source, we're here:
https://dxr.mozilla.org/comm-esr60/source/db/mork/src/morkAtom.cpp#40
40 if ( this->IsWeeBook() )

So clearly this is null. This reminds me of bug 1278795 which worked around the fact that a null this was used. So it converted the function in question to be static.

Right now, I can't understand why this doesn't crash constantly since we're calling it with a null this all the time. What am I missing?
https://searchfox.org/comm-central/search?q=GetYarn&case=true&regexp=false&path=
https://searchfox.org/comm-central/search?q=GetYarn&case=true&regexp=false&path=morkRow.cpp

Also note that there used to be an if (this) protection here at least until TB 45
https://dxr.mozilla.org/comm-esr45/source/db/mork/src/morkAtom.cpp#40
and was removed here:
https://hg.mozilla.org/releases/comm-esr52/rev/3e4c9e3a9099b0bb077255abb75463784ee929e5#l5.12

Flags: needinfo?(mkmelin+mozilla)

This should have been done in bug 1278795 :-( - but I was a newbie back then.

We should review the changes in
https://hg.mozilla.org/comm-central/rev/3e4c9e3a9099b0bb077255abb75463784ee929e5
to see whether more null-checks need to be re-instated.
That said, a lot of time has passed since then, and apart from this bug here, I haven't heard of other issues.

[Approval Request Comment]
Regression caused by (bug #): 1294260
User impact if declined: They crash.
Risk to taking this patch (and alternatives if risky):
Not risky, just some C++ magic reshuffle which in the end re-instates a null-check that has existed until bug 1294260 removed it :-(

Assignee: nobody → jorgk
Status: NEW → ASSIGNED
Attachment #9080844 - Flags: review?(benc)
Attachment #9080844 - Flags: approval-comm-esr68?
Attachment #9080844 - Flags: approval-comm-beta?
Comment on attachment 9080844 [details] [diff] [review]
1568095-GetYarn-static.patch

Review of attachment 9080844 [details] [diff] [review]:
-----------------------------------------------------------------

All looks sensible to me.
Attachment #9080844 - Flags: review?(benc) → review+

Pushed by mozilla@jorgk.com:
https://hg.mozilla.org/comm-central/rev/a71fdbcf8f60
Make morkAtom::GetYarn() static. r=benc

Status: ASSIGNED → RESOLVED
Closed: 4 months ago
Resolution: --- → FIXED

Thanks, Ben. Landed with additional casts here to avoid compile problems:
https://hg.mozilla.org/comm-central/rev/a71fdbcf8f60#l5.17
https://hg.mozilla.org/comm-central/rev/a71fdbcf8f60#l5.43

Wayne, how do you feel about taking this straight to TB 68.0 ESR without running through a TB 69 beta first? It fixes a crash and is "risk free" since all it does is change the way arguments are passed to morkAtom::GetYarn() and it reinstates a null check that present until TB 45. I would take it, but I want to avoid discussions about the pro/contras. Although this hidden "time bomb" has been present since TB 52, I've never crashed on it myself, so I don't have a strong opinion here.

The option is to ship it in TB 69 beta 2 and then TB 68.1 or TB 68.0.x if we do such releases.

Flags: needinfo?(vseerror)
Target Milestone: --- → Thunderbird 70.0
Attachment #9080844 - Flags: approval-comm-esr68?
Attachment #9080844 - Flags: approval-comm-esr68+
Attachment #9080844 - Flags: approval-comm-beta?
Attachment #9080844 - Flags: approval-comm-beta+
You need to log in before you can comment on or make changes to this bug.