Crash in nsMsgDatabase::GetUint32Property

VERIFIED FIXED in Thunderbird 55.0

Status

--
critical
VERIFIED FIXED
a year ago
a year ago

People

(Reporter: wsmwk, Assigned: mkmelin+mozilla)

Tracking

({crash, regression, topcrash-thunderbird})

55 Branch
Thunderbird 55.0
Unspecified
Windows 10
crash, regression, topcrash-thunderbird

Thunderbird Tracking Flags

(thunderbird_esr5254+ fixed, thunderbird53 wontfix, thunderbird54 fixed, thunderbird55 fixed)

Details

(Whiteboard: [regression:TB55], crash signature)

Attachments

(2 attachments)

(Reporter)

Description

a year ago
New #1 crash for nightly builds, avg 15 per day, starting with 20170422030207 bp-b70bed9e-e467-4ba8-97d7-45a310170427. 
most are this user who I am attempting to contact bp-4bbd7cdc-0bc1-4e7f-9de1-d5b160170429

All crash comments:
bp-f4b846fb-2d07-4266-bad2-898e50170424  Was activating an auto compacting fllowing message deletion.
bp-c61732f1-fc68-43b1-93d5-682b70170424  I just renamed a folder. It moved accordingly to its new position, as alphabetically determined. I then noticed another folder I hadn't peered into in a while. The app crashed when I clicked bp-b5444f5c-2d8e-4887-bcc1-fdfff0170423  Now I was reading emails in Local email sent to my POP3 address, pfd@torfree.net
bp-65aeb6ce-2732-460e-81f2-334e30170423  I was clicking on and briefly examing the contents of IMAP gmail folders. 

Anything in https://hg.mozilla.org/comm-central/pushloghtml?startdate=2017-04-20+03%3A05%3A00&enddate=2017-04-22+03%3A05%3A00 ?

All the Windows crashes I examined have ucrtbase.dll on stack.  But there is one linux crash bp-73c9bcb4-7d30-4ae0-b66c-d167e0170428

Comment 1

a year ago
You can't win. We've just fixed the #1 top crash, IMAP compaction, so here's another one. I suspect this might be caused by bug 853881 where we reworked the hash table in nsMsgDatabase.cpp. That landed on 2017-04-22. It's not in the range above, but if you extend it by a few hours, you'll see it, so:
https://hg.mozilla.org/comm-central/pushloghtml?startdate=2017-04-20+03%3A05%3A00&enddate=2017-04-22+10%3A05%3A00

Comment 2

a year ago
Kent, any insight here, since we both broke this together with Joshua in absentia :-(
Flags: needinfo?(rkent)
(Reporter)

Comment 3

a year ago
Yeah, looking at build times, it's pretty clear they are starting later than 03:00
Blocks: 853881
status-thunderbird54: --- → unaffected
Keywords: regression
Whiteboard: [regression:TB55]
(Assignee)

Comment 4

a year ago
Created attachment 8863223 [details] [diff] [review]
bug1360873_Crash_in_nsMsgDatabase___GetUint32Property.patch

Don't know the actual cause, but this should help
Assignee: nobody → mkmelin+mozilla
Status: NEW → ASSIGNED
Attachment #8863223 - Flags: review?(jorgk)

Comment 5

a year ago
Created attachment 8863264 [details] [diff] [review]
1360873-diagnostic.patch

How about instead of papering over the problem, we make an attempt to see where it comes from. We let this run in the wild for a while, then back it out and land your patch instead with or without addressing the cause.
Attachment #8863264 - Flags: review?(mkmelin+mozilla)

Comment 6

a year ago
Landed the diagnostic patch which should be backed out in a few days:
https://hg.mozilla.org/comm-central/rev/6f64b91e9b8ca636e98f9d05998e8d22e420f75b

Wayne, the crash signature will disappear now and be replaced by another one coming from a programmed crash. Should be easy to recognise as coming from nsMsgDatabase.cpp:3799. Can you please keep an eye out for it and let me know.
Flags: needinfo?(vseerror)
There is code in nsMsgDatabase, ForceClosed(), which nulls m_mdbStore. Any time there is code such as that, it is important that we always check for null m_mdbStore before access. At least two of the comments concern situations where I would expect ForceClosed() to be called (compact and rename).

So I do not consider the checks in Magnus' patch to be papering over issues, but necessary. After all any addon can keep the database alive easily enough.

The whole mork/msgDatabase model is really inadequate, so checks such as these are the only approach we can take. If you can identify a particular path that causes the issues, fine fix that, but the checks are needed anyway.
Flags: needinfo?(rkent)

Comment 8

a year ago
Yes, the intention is to land Magnus' patch. However, this will then return an error to a caller which might not be expecting it, so I'd like to know who that caller is. All the crash reports indicate that this is called more or less directly from JS, so I added some code to dump out the JS call stack. I tested it locally and I did get the stack.
(Assignee)

Comment 9

a year ago
(In reply to Jorg K (GMT+2) from comment #8)
> Y However, this will then return an error to a caller which might not be expecting it

Callers should always be expecting it :)

Anyway, since this is a top crash, I don't know if it's very appropriate to have users crash for one cycle just to gather some info (which we partly know already).

Comment 10

a year ago
"One cycle"? I'd like to collect the information for a few days, then land your patch. With 15 crashes per day we should be done here mid-week.
(Reporter)

Comment 11

a year ago
I agree experimentation is worthwhile. It's a topcrash but based on crash rate [1] and comments it is not debilitating [2]. 

[1] https://crash-stats.mozilla.com/topcrashers/?product=Thunderbird&version=55.0a1

[2] https://crash-stats.mozilla.com/signature/?signature=nsMsgDatabase%3A%3AGetUint32Property&date=%3E%3D2017-04-24T19%3A23%3A00.000Z&date=%3C2017-05-01T19%3A23%3A00.000Z&_columns=date&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=reason&_columns=address&_columns=install_time&_sort=-date&page=1#comments


(In reply to Wayne Mery (:wsmwk, NI for questions) from comment #0)
> New #1 crash for nightly builds, avg 15 per day, starting with
> 20170422030207 bp-b70bed9e-e467-4ba8-97d7-45a310170427. 
> most are this user who I am attempting to contact
> bp-4bbd7cdc-0bc1-4e7f-9de1-d5b160170429

This same user sandpk crashed with today's build bp-444ddfaa-cd89-4fd9-b9d3-404cc0170501
Flags: needinfo?(vseerror) → needinfo?(jorgk)
(Assignee)

Comment 13

a year ago
Comment on attachment 8863264 [details] [diff] [review]
1360873-diagnostic.patch

Review of attachment 8863264 [details] [diff] [review]:
-----------------------------------------------------------------

Ok!
Attachment #8863264 - Flags: review?(mkmelin+mozilla) → review+

Comment 14

a year ago
Comment on attachment 8863223 [details] [diff] [review]
bug1360873_Crash_in_nsMsgDatabase___GetUint32Property.patch

Already landed ;-)
Attachment #8863223 - Flags: review?(jorgk) → review+

Comment 15

a year ago
https://hg.mozilla.org/comm-central/rev/6f64b91e9b8ca636e98f9d05998e8d22e420f75b (diagnostic patch, comment #6)
https://hg.mozilla.org/comm-central/rev/866a0a77cc0bef57a6cb43ec246d1f5a310cea61 (backout of diagnostic patch)
https://hg.mozilla.org/comm-central/rev/95a8c829e1d69b5c729f6fa6d0318a29693be982 (real fix)

So sad, my diagnostic patch was really nice ;-)
Status: ASSIGNED → RESOLVED
Last Resolved: a year ago
Resolution: --- → FIXED
Target Milestone: --- → Thunderbird 55.0

Comment 16

a year ago
Comment on attachment 8863223 [details] [diff] [review]
bug1360873_Crash_in_nsMsgDatabase___GetUint32Property.patch

Simple patch to avoid crashes by dereferencing null pointers. Good to have on other branches, too, although we don't have crash reports there.
Attachment #8863223 - Flags: approval-comm-esr52?
Attachment #8863223 - Flags: approval-comm-beta+

Comment 17

a year ago
Beta (TB 54):
https://hg.mozilla.org/releases/comm-beta/rev/d1895c8d37042ceae39ac3d8cd39daff46f36e34
status-thunderbird53: --- → wontfix
status-thunderbird54: unaffected → fixed
status-thunderbird55: --- → fixed
status-thunderbird_esr52: --- → affected
tracking-thunderbird_esr52: --- → +
(Reporter)

Comment 18

a year ago
(In reply to Jorg K (GMT+2) from comment #12)
> ...
> 
> All show the very same cause:
> MOZ_CRASH Reason - gloda-id: 0 PendingCommitTracker_commitCallback()
> ["resource:///modules/gloda/index_msg.js":165]
> 
> https://dxr.mozilla.org/comm-central/rev/
> fa94fa8799570d1cad75f2f44a667ff3f514779c/mailnews/db/gloda/modules/index_msg.
> js#165

I looked at a few more crashes submitted today and they all show the same reason.
Enough info to file a new bug?

Also, no crashes reported so far with today's nightly build 20170502, so v.fixed
Status: RESOLVED → VERIFIED

Comment 19

a year ago
(In reply to Wayne Mery (:wsmwk, NI for questions) from comment #18)
> I looked at a few more crashes submitted today and they all show the same
> reason.
> Enough info to file a new bug?
I don't understand.

The diagnostic patch that produced the reason stated above was live for one day in the Daily of May 1st. So the only binary ever to report this has now been replaced with one that doesn't crash any more, as you confirmed.

So which bug would you like to report now? Or give me a crash ID and I'll look into it.
(Reporter)

Comment 20

a year ago
The premise of your diagnostic patch was to learn the "actual cause". And you pointed to the code in comment 12. No?

Comment 21

a year ago
We found the cause, somewhat. Gloda accesses a stale header, and there's a long comment explaining that:
https://dxr.mozilla.org/comm-central/rev/fa94fa8799570d1cad75f2f44a667ff3f514779c/mailnews/db/gloda/modules/index_msg.js#158
The JS code in question is wrapped in a try/catch, so now we return an error from the C++ code instead of crashing and Gloda deals with it. It will forever remain a mystery how bug 853881 slightly changed code paths so now that potential hibernating crash cause was exposed. There's nothing else we can do, we're done here. For more bad news, re-read the last paragraph of Kent's comment #7.
(Reporter)

Comment 22

a year ago
We can remove tracking esr52?
Flags: needinfo?(jorgk)

Comment 23

a year ago
(In reply to Wayne Mery (:wsmwk, NI for questions) from comment #22)
> We can remove tracking esr52?
Does it bother you? It's not an issue in TB 2, but I'd like to uplift it eventually for good house-keeping.
tracking-thunderbird_esr52: + → ---
Flags: needinfo?(jorgk)

Updated

a year ago
Attachment #8863223 - Flags: approval-comm-esr52? → approval-comm-esr52+

Comment 24

a year ago
TB 52 ESR:
https://hg.mozilla.org/releases/comm-esr52/rev/b48ca07ea45ff07428ddc0c898e7971d2681abfb
status-thunderbird_esr52: affected → fixed
tracking-thunderbird_esr52: --- → 54+
You need to log in before you can comment on or make changes to this bug.