Closed
Bug 1360873
Opened 8 years ago
Closed 8 years ago
Crash in nsMsgDatabase::GetUint32Property
Categories
(Thunderbird :: General, defect)
Tracking
(thunderbird_esr5254+ fixed, thunderbird53 wontfix, thunderbird54 fixed, thunderbird55 fixed)
VERIFIED
FIXED
Thunderbird 55.0
People
(Reporter: wsmwk, Assigned: mkmelin)
References
Details
(Keywords: crash, regression, topcrash-thunderbird, Whiteboard: [regression:TB55])
Crash Data
Attachments
(2 files)
3.12 KB,
patch
|
jorgk-bmo
:
review+
jorgk-bmo
:
approval-comm-beta+
jorgk-bmo
:
approval-comm-esr52+
|
Details | Diff | Splinter Review |
2.20 KB,
patch
|
mkmelin
:
review+
|
Details | Diff | Splinter Review |
New #1 crash for nightly builds, avg 15 per day, starting with 20170422030207 bp-b70bed9e-e467-4ba8-97d7-45a310170427.
most are this user who I am attempting to contact bp-4bbd7cdc-0bc1-4e7f-9de1-d5b160170429
All crash comments:
bp-f4b846fb-2d07-4266-bad2-898e50170424 Was activating an auto compacting fllowing message deletion.
bp-c61732f1-fc68-43b1-93d5-682b70170424 I just renamed a folder. It moved accordingly to its new position, as alphabetically determined. I then noticed another folder I hadn't peered into in a while. The app crashed when I clicked bp-b5444f5c-2d8e-4887-bcc1-fdfff0170423 Now I was reading emails in Local email sent to my POP3 address, pfd@torfree.net
bp-65aeb6ce-2732-460e-81f2-334e30170423 I was clicking on and briefly examing the contents of IMAP gmail folders.
Anything in https://hg.mozilla.org/comm-central/pushloghtml?startdate=2017-04-20+03%3A05%3A00&enddate=2017-04-22+03%3A05%3A00 ?
All the Windows crashes I examined have ucrtbase.dll on stack. But there is one linux crash bp-73c9bcb4-7d30-4ae0-b66c-d167e0170428
Comment 1•8 years ago
|
||
You can't win. We've just fixed the #1 top crash, IMAP compaction, so here's another one. I suspect this might be caused by bug 853881 where we reworked the hash table in nsMsgDatabase.cpp. That landed on 2017-04-22. It's not in the range above, but if you extend it by a few hours, you'll see it, so:
https://hg.mozilla.org/comm-central/pushloghtml?startdate=2017-04-20+03%3A05%3A00&enddate=2017-04-22+10%3A05%3A00
Comment 2•8 years ago
|
||
Kent, any insight here, since we both broke this together with Joshua in absentia :-(
Flags: needinfo?(rkent)
Reporter | ||
Comment 3•8 years ago
|
||
Yeah, looking at build times, it's pretty clear they are starting later than 03:00
Blocks: 853881
status-thunderbird54:
--- → unaffected
Keywords: regression
Whiteboard: [regression:TB55]
Assignee | ||
Comment 4•8 years ago
|
||
Don't know the actual cause, but this should help
Assignee: nobody → mkmelin+mozilla
Status: NEW → ASSIGNED
Attachment #8863223 -
Flags: review?(jorgk)
Comment 5•8 years ago
|
||
How about instead of papering over the problem, we make an attempt to see where it comes from. We let this run in the wild for a while, then back it out and land your patch instead with or without addressing the cause.
Attachment #8863264 -
Flags: review?(mkmelin+mozilla)
Comment 6•8 years ago
|
||
Landed the diagnostic patch which should be backed out in a few days:
https://hg.mozilla.org/comm-central/rev/6f64b91e9b8ca636e98f9d05998e8d22e420f75b
Wayne, the crash signature will disappear now and be replaced by another one coming from a programmed crash. Should be easy to recognise as coming from nsMsgDatabase.cpp:3799. Can you please keep an eye out for it and let me know.
Flags: needinfo?(vseerror)
Comment 7•8 years ago
|
||
There is code in nsMsgDatabase, ForceClosed(), which nulls m_mdbStore. Any time there is code such as that, it is important that we always check for null m_mdbStore before access. At least two of the comments concern situations where I would expect ForceClosed() to be called (compact and rename).
So I do not consider the checks in Magnus' patch to be papering over issues, but necessary. After all any addon can keep the database alive easily enough.
The whole mork/msgDatabase model is really inadequate, so checks such as these are the only approach we can take. If you can identify a particular path that causes the issues, fine fix that, but the checks are needed anyway.
Flags: needinfo?(rkent)
Comment 8•8 years ago
|
||
Yes, the intention is to land Magnus' patch. However, this will then return an error to a caller which might not be expecting it, so I'd like to know who that caller is. All the crash reports indicate that this is called more or less directly from JS, so I added some code to dump out the JS call stack. I tested it locally and I did get the stack.
Assignee | ||
Comment 9•8 years ago
|
||
(In reply to Jorg K (GMT+2) from comment #8)
> Y However, this will then return an error to a caller which might not be expecting it
Callers should always be expecting it :)
Anyway, since this is a top crash, I don't know if it's very appropriate to have users crash for one cycle just to gather some info (which we partly know already).
Comment 10•8 years ago
|
||
"One cycle"? I'd like to collect the information for a few days, then land your patch. With 15 crashes per day we should be done here mid-week.
Reporter | ||
Comment 11•8 years ago
|
||
I agree experimentation is worthwhile. It's a topcrash but based on crash rate [1] and comments it is not debilitating [2].
[1] https://crash-stats.mozilla.com/topcrashers/?product=Thunderbird&version=55.0a1
[2] https://crash-stats.mozilla.com/signature/?signature=nsMsgDatabase%3A%3AGetUint32Property&date=%3E%3D2017-04-24T19%3A23%3A00.000Z&date=%3C2017-05-01T19%3A23%3A00.000Z&_columns=date&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=reason&_columns=address&_columns=install_time&_sort=-date&page=1#comments
(In reply to Wayne Mery (:wsmwk, NI for questions) from comment #0)
> New #1 crash for nightly builds, avg 15 per day, starting with
> 20170422030207 bp-b70bed9e-e467-4ba8-97d7-45a310170427.
> most are this user who I am attempting to contact
> bp-4bbd7cdc-0bc1-4e7f-9de1-d5b160170429
This same user sandpk crashed with today's build bp-444ddfaa-cd89-4fd9-b9d3-404cc0170501
Flags: needinfo?(vseerror) → needinfo?(jorgk)
Comment 12•8 years ago
|
||
This is super-interesting, my stuff actually worked ;-) Four crashes right now:
https://crash-stats.mozilla.com/signature/?signature=nsMsgDatabase%3A%3AGetUint32Property&date=%3E%3D2017-04-24T19%3A23%3A00.000Z&date=%3C2017-05-01T19%3A23%3A00.000Z&_columns=date&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=reason&_columns=address&_columns=install_time&_sort=-date&page=1#reports
All show the very same cause:
MOZ_CRASH Reason - gloda-id: 0 PendingCommitTracker_commitCallback() ["resource:///modules/gloda/index_msg.js":165]
https://dxr.mozilla.org/comm-central/rev/fa94fa8799570d1cad75f2f44a667ff3f514779c/mailnews/db/gloda/modules/index_msg.js#165
OK, let's put the users out of their misery.
Flags: needinfo?(jorgk)
Assignee | ||
Comment 13•8 years ago
|
||
Comment on attachment 8863264 [details] [diff] [review]
1360873-diagnostic.patch
Review of attachment 8863264 [details] [diff] [review]:
-----------------------------------------------------------------
Ok!
Attachment #8863264 -
Flags: review?(mkmelin+mozilla) → review+
Comment 14•8 years ago
|
||
Comment on attachment 8863223 [details] [diff] [review]
bug1360873_Crash_in_nsMsgDatabase___GetUint32Property.patch
Already landed ;-)
Attachment #8863223 -
Flags: review?(jorgk) → review+
Comment 15•8 years ago
|
||
https://hg.mozilla.org/comm-central/rev/6f64b91e9b8ca636e98f9d05998e8d22e420f75b (diagnostic patch, comment #6)
https://hg.mozilla.org/comm-central/rev/866a0a77cc0bef57a6cb43ec246d1f5a310cea61 (backout of diagnostic patch)
https://hg.mozilla.org/comm-central/rev/95a8c829e1d69b5c729f6fa6d0318a29693be982 (real fix)
So sad, my diagnostic patch was really nice ;-)
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → Thunderbird 55.0
Comment 16•8 years ago
|
||
Comment on attachment 8863223 [details] [diff] [review]
bug1360873_Crash_in_nsMsgDatabase___GetUint32Property.patch
Simple patch to avoid crashes by dereferencing null pointers. Good to have on other branches, too, although we don't have crash reports there.
Attachment #8863223 -
Flags: approval-comm-esr52?
Attachment #8863223 -
Flags: approval-comm-beta+
Comment 17•8 years ago
|
||
Beta (TB 54):
https://hg.mozilla.org/releases/comm-beta/rev/d1895c8d37042ceae39ac3d8cd39daff46f36e34
status-thunderbird53:
--- → wontfix
status-thunderbird55:
--- → fixed
status-thunderbird_esr52:
--- → affected
tracking-thunderbird_esr52:
--- → +
Reporter | ||
Comment 18•8 years ago
|
||
(In reply to Jorg K (GMT+2) from comment #12)
> ...
>
> All show the very same cause:
> MOZ_CRASH Reason - gloda-id: 0 PendingCommitTracker_commitCallback()
> ["resource:///modules/gloda/index_msg.js":165]
>
> https://dxr.mozilla.org/comm-central/rev/
> fa94fa8799570d1cad75f2f44a667ff3f514779c/mailnews/db/gloda/modules/index_msg.
> js#165
I looked at a few more crashes submitted today and they all show the same reason.
Enough info to file a new bug?
Also, no crashes reported so far with today's nightly build 20170502, so v.fixed
Status: RESOLVED → VERIFIED
Comment 19•8 years ago
|
||
(In reply to Wayne Mery (:wsmwk, NI for questions) from comment #18)
> I looked at a few more crashes submitted today and they all show the same
> reason.
> Enough info to file a new bug?
I don't understand.
The diagnostic patch that produced the reason stated above was live for one day in the Daily of May 1st. So the only binary ever to report this has now been replaced with one that doesn't crash any more, as you confirmed.
So which bug would you like to report now? Or give me a crash ID and I'll look into it.
Reporter | ||
Comment 20•8 years ago
|
||
The premise of your diagnostic patch was to learn the "actual cause". And you pointed to the code in comment 12. No?
Comment 21•8 years ago
|
||
We found the cause, somewhat. Gloda accesses a stale header, and there's a long comment explaining that:
https://dxr.mozilla.org/comm-central/rev/fa94fa8799570d1cad75f2f44a667ff3f514779c/mailnews/db/gloda/modules/index_msg.js#158
The JS code in question is wrapped in a try/catch, so now we return an error from the C++ code instead of crashing and Gloda deals with it. It will forever remain a mystery how bug 853881 slightly changed code paths so now that potential hibernating crash cause was exposed. There's nothing else we can do, we're done here. For more bad news, re-read the last paragraph of Kent's comment #7.
Comment 23•8 years ago
|
||
(In reply to Wayne Mery (:wsmwk, NI for questions) from comment #22)
> We can remove tracking esr52?
Does it bother you? It's not an issue in TB 2, but I'd like to uplift it eventually for good house-keeping.
tracking-thunderbird_esr52:
+ → ---
Flags: needinfo?(jorgk)
Updated•7 years ago
|
Attachment #8863223 -
Flags: approval-comm-esr52? → approval-comm-esr52+
Comment 24•7 years ago
|
||
You need to log in
before you can comment on or make changes to this bug.
Description
•