Crash in mozilla::mailnews::MsgDBReporter::GetPath and mozilla::mailnews::MsgDBReporter::CollectReports - lifetime of reporter object shoud be bound to (and owned by) the nsMsgDatabase
Categories
(Thunderbird :: General, defect)
Tracking
(thunderbird_esr52 wontfix, thunderbird_esr60 wontfix, thunderbird_esr68 affected, thunderbird51 wontfix, thunderbird52 wontfix, thunderbird53 wontfix, thunderbird54 wontfix, thunderbird67 wontfix)
People
(Reporter: wsmwk, Assigned: jorgk-bmo)
References
()
Details
(Keywords: crash, regression, Whiteboard: [regression:TB51?])
Crash Data
Attachments
(1 file, 3 obsolete files)
3.31 KB,
patch
|
benc
:
review+
|
Details | Diff | Splinter Review |
Reporter | ||
Comment 1•8 years ago
|
||
Reporter | ||
Comment 3•8 years ago
|
||
Reporter | ||
Comment 4•8 years ago
|
||
Comment 5•8 years ago
|
||
Reporter | ||
Updated•8 years ago
|
Reporter | ||
Comment 6•8 years ago
|
||
Reporter | ||
Comment 7•8 years ago
|
||
Reporter | ||
Comment 8•8 years ago
|
||
Reporter | ||
Comment 9•7 years ago
|
||
Reporter | ||
Comment 10•7 years ago
|
||
Comment 11•7 years ago
|
||
Reporter | ||
Comment 12•7 years ago
|
||
Comment 13•7 years ago
|
||
Comment 14•7 years ago
|
||
Reporter | ||
Comment 15•6 years ago
|
||
Looking at graph for past 6 months, crash rate of version 60 appears to be about the same as version 52.
Reporter | ||
Comment 17•6 years ago
|
||
combined signature makes this a top 30 crash for 60.6.1
bp-848ff6de-158f-4411-bd03-8a8430190504 67.0b
bp-37ab03bc-afaa-44da-bd1c-507d90190504 67.0b (same user)
bp-0d39159e-aabb-4f57-a8c4-02d310190430 67.0b different user
Although for mozilla::mailnews::MsgDBReporter::CollectReports there are no beta crashes for current versions.
https://crash-stats.mozilla.com/signature/?release_channel=%21release&signature=mozilla%3A%3Amailnews%3A%3AMsgDBReporter%3A%3ACollectReports&date=%3E%3D2019-02-04T18%3A25%3A00.000Z&date=%3C2019-05-04T18%3A25%3A00.000Z
Comment 18•5 years ago
|
||
just had https://crash-stats.mozilla.com/report/index/e5b51280-f57a-45e6-851f-1a16b0190715 At the time I was looking for why Thunderbird was sitting there consuming over 1200mb. Opened activity manager and sat for minutes observing it telling me it was indexing message 3 or 4 in an RSS feed folder. Then I asked for an about:memory and crash happened.
Comment 19•5 years ago
|
||
Crashed again today. This time I got a memory report before it crashed. 622mb of 1137mb in maildb.... 266mb on a feed that has a 220mb mbox and a 94mb msf database. If the memory Gz is of interest to anyone. Just ask.
Reporter | ||
Comment 20•5 years ago
|
||
(In reply to Matt from comment #18)
just had https://crash-stats.mozilla.com/report/index/e5b51280-f57a-45e6-851f-1a16b0190715 60.7.2 Crash Report [@ shutdownhang | nsThread::Shutdown | nsThreadManager::Shutdown ]
Matt, see bug 1526127.
Comment 21•5 years ago
|
||
Does this also affect 64 bit Thunderbird?
If not, another good reason to start switching over.
Comment 22•5 years ago
|
||
I am using TB 64bit and have been for more than 12 months. Despite Wayne's pointing me to bug 1526127 I really do not see that as the correct bug for my crash as whilst the updater was supposedly running, it completed and post crash I was on the new version.
What I do have are a number of accounts that are "non functional" so I cancel password prompts, but most of my high memory issues appear to be related to a freenode certificate request that displays an incorrect certificate and just stops network functions until it is acknowledged.
Comment 23•5 years ago
|
||
I only have one account, but it keeps asking for password retry/renew. It is a Yahoo account. Can't find specifics on how this should be set up for server names, but I did it the way I found it once in the past. It gets mail, but re-requests frequently, and a popup in lower right hand says yahoo mail not available. Not sure if this is the same problem as you. Let me know if there is a way I can check or help with this. I'm on 69.0b2 (64-bit).
Reporter | ||
Comment 24•5 years ago
|
||
I clicked reply on a message in my gmail account. The compose window didn't come up. (Thunderbird had been running at 800MB and according to task manager there was not a big increase at the time of the crash)
bp-3f039407-0341-49ed-9b4b-458a50190814
Assignee | ||
Comment 25•5 years ago
|
||
Looks like something bad happened, and to add insult to injury, it then crashed in error reporting :-(
void GetPath(nsACString &memoryPath, bool aAnonymize) {
memoryPath.AssignLiteral("explicit/maildb/database(");
nsCOMPtr<nsIMsgFolder> folder;
935 mDatabase->GetFolder(getter_AddRefs(folder));
Looks like the database was null here. We can protect against that.
Assignee | ||
Comment 26•5 years ago
|
||
This will fix the crash in reporting, but it won't fix the horrible thing that caused the report to be requested.
Reporter | ||
Comment 27•5 years ago
•
|
||
(In reply to Jorg K (GMT+2) from comment #26)
Created attachment 9085413 [details] [diff] [review]
1353704-fix-crash.patchThis will fix the crash in reporting, but it won't fix the horrible thing that caused the report to be requested.
Indeed. So with the crash "gone", what kind of behavior might users see, or what types of issues might we have now papered over?
I can say a little more about my environment at the time although not sure it will help 1. many tabs open and several compose windows, 2. TB was open for several days (but less than a week), 3. using ~800MB last I looked, which is more than normal for me but not high enough to be alarming (probably higher than normal because I had chat open and RSS feeds running which I don't normally have), 4. I was starting to reply to a message in my Inbox from cinymini of bug 1422251 and just before the crash I searched for her full address which includes @gmx.de but search would return any results, but searching on only cinymini was successful (so very strange)
Assignee | ||
Comment 28•5 years ago
|
||
(In reply to Wayne Mery (:wsmwk) from comment #27)
So with the crash "gone", what kind of behavior might users see, or what types of issues might we have now papered over?
We'll find out ;-)
Reporter | ||
Comment 29•5 years ago
|
||
Going beyond my incident, as suggested in earlier comments, I've been in touch with ~6 reporters since 2017. Some anecdotal information from them:
-
"It looks like if the folders are kept with a lower number of records it helps with alleviating the issue but not with fixing it." and (same user) " if I restart TB often it will not crash". (no RSS feeds) Other crashes from this user bp-e5a9a202-8e82-4c52-8d3b-f36f20190520
-
For some users the crash has simply gone away (again, like bug 1353702) after some period of time, and in some cases after switching machines. In the later case, bad memory certainly is a possibility. I'm really looking forward to the breakpad folks or someone hooking up a memory tester to the crash reporter - bug 995652
Assignee | ||
Comment 30•5 years ago
|
||
I can find a reviewer soon and we can ship it in TB 69 beta 3 if you agree to do a new build.
Assignee | ||
Comment 31•5 years ago
|
||
Comment 32•5 years ago
|
||
Assignee | ||
Comment 33•5 years ago
|
||
Hmm, good point. The problem is that there is no ref-counting here:
class MsgDBReporter final : public nsIMemoryReporter {
nsMsgDatabase *mDatabase;
So most likely the database is already destroyed by the time we come to report. Let me try this again ;-)
Assignee | ||
Comment 34•5 years ago
|
||
This should be better.
Assignee | ||
Comment 35•5 years ago
|
||
Oops, commit message should be:
Use a ref-counted reference to the database for the DB reporter
Comment 36•5 years ago
|
||
The question is whether we want to keep a reference to the DB even somehow after its lifetime. This is something for jcranmer ;)
Assignee | ||
Comment 37•5 years ago
|
||
Sorry about the noise. We need to null-out mMemReporter so it gets destroyed or the reference it keeps will stop the database from being destroyed. Here's an example of a one-line change that can upset the entire system.
The funny thing is, the DTOR actually de-registers using UnregisterWeakMemoryReporter(), so after the database was destroyed, there shouldn't have been any more reporting. As many crashes, the whole thing is tricky. I won't take this to ESR in a hurry.
BTW, Joshua won't help us here, if you're lucky, he'll answer general questions on IRC.
Comment 38•5 years ago
|
||
Assignee | ||
Comment 39•5 years ago
|
||
Assignee | ||
Updated•5 years ago
|
Assignee | ||
Comment 40•5 years ago
•
|
||
(In reply to Ben Campbell from comment #38)
Will the
~nsMsgDatabase()
dtor ever be called?
Good question, see bug 1342858. Kent's personal prediction was that we'll never solve it unless we get rid of Mork. That does its own ref-counting and other hair-raising stuff :-(
Comment 41•5 years ago
|
||
(In reply to Jorg K (GMT+2) from comment #25)
Looks like the database was null here. We can protect against that.
Crash address was 0xe5e5e631. Windows' free replaces memory with 0xe5e5e5e5, so this looks like nsMsgDatabase * was used after free, and the vtable pointer was offset by 19 entries before attempting to call it.
Have you tried making the reporter hold an nsWeakPtr<nsMsgDatabase>? That should be legal, and it will be cleared to null after nsMsgDatabase::~nsMsgDatabase is called, allowing you to legitimately check for null (null checks would also have to be added, of course, but weak pointers do remind you to try that).
(In reply to Jorg K (GMT+2) from comment #40)
Good question, see 1342858. Kent's personal prediction was that we'll never solve it unless we get rid of Mork. That does its own ref-counting and other hair-raising stuff :-(
The Mork side of things isn't too bad, if my stale brain cells are any good. The strong references of mdbI* will hold whatever needs to be held, and they won't hold any cyclic references back into nsMsgDatabase and friends. The problem is nsMsgHdr's... funky... approach to ref-counting, IIRC.
Comment 42•5 years ago
|
||
(In reply to Joshua Cranmer [:jcranmer] from comment #41)
Crash address was 0xe5e5e631. Windows' free replaces memory with 0xe5e5e5e5, so this looks like nsMsgDatabase * was used after free, and the vtable pointer was offset by 19 entries before attempting to call it.
Ahh, now that sounds like the core problem to fix. Either somewhere is holding a raw pointer to nsMsgDatabase
and accessing it after free, or perhaps somewhere has screwed-up refcounting (eg forgetting to AddRef() a return value or something?).
Have you tried making the reporter hold an nsWeakPtr<nsMsgDatabase>? That should be legal, and it will be cleared to null after nsMsgDatabase::~nsMsgDatabase is called, allowing you to legitimately check for null (null checks would also have to be added, of course, but weak pointers do remind you to try that).
The more I think about it, the more the holding-the-DB-open-in-the-reporter seems to be the wrong approach. The lifetime of the reporter object is meant to be bound to (and owned by) the nsMsgDatabase
. The reporter is registered/unregistered in the nsMsgDatabase
ctor/dtor.
Comment 43•5 years ago
|
||
Grepping through, it looks like raw nsMsgDatabase
pointers are relatively rare:
https://searchfox.org/comm-central/search?q=nsMsgDatabase%5B+%5D%3F%5B*%5D&case=false®exp=true&path=
There are a couple there that smell a little odd:
nsDBFolderInfo
holds a raw nsMsgDatabase
pointer (no refcounting that I can see):
https://searchfox.org/comm-central/source/mailnews/db/msgdb/public/nsDBFolderInfo.h#125
There's also one in nsMsgOfflineImapOperation
, although it looks like its ctor/dtor does increment/decrement the refcount manually...
https://searchfox.org/comm-central/source/mailnews/db/msgdb/src/nsMsgOfflineImapOperation.h#48
Comment 44•5 years ago
|
||
Raw nsIMsgDatabase
pointers are a little more prevalent:
https://searchfox.org/comm-central/search?q=nsIMsgDatabase%5B+%5D%3F%5B*%5D®exp=true&path=
(I haven't picked through these ones at all)
Reporter | ||
Comment 46•5 years ago
|
||
Noting for the record, I crashed today bp-0b9d399f-ec1c-4995-9264-59e8c0190819. I had just fired up my laptop from sleep and was deleting messages from vseerror Inbox while network was coming up - unclear whether it had yet started to filter the 56 new messages. So completely different circumstances from user POV from comment 24
Assignee | ||
Updated•5 years ago
|
Reporter | ||
Comment 47•5 years ago
|
||
Roughly same rate of version 68 crashes as version 60 for the last 6 months. But at roughly 30 per day (if you include v60 crashes whose users will be upgraded) this is no longer a topcrash.
bp-c20aa3bb-808a-4757-a238-043f60200118 is a 68.4.1 example of mozilla::mailnews::MsgDBReporter::GetPath
What's worth a quick fix? And what's worth filing follow up bugs?
What do you think?
Comment 48•5 years ago
|
||
(In reply to Wayne Mery (:wsmwk) from comment #29)
In the later case, bad memory certainly is a possibility. I'm really looking forward to the breakpad folks or someone hooking up a memory tester to the crash reporter - bug 995652
I might be willing to try this if not too complicated and you give instructions.
Reporter | ||
Updated•5 years ago
|
Reporter | ||
Comment 49•5 years ago
|
||
Comment 50•5 years ago
|
||
Updated•5 years ago
|
Comment 51•5 years ago
|
||
Pushed by geoff@darktrojan.net:
https://hg.mozilla.org/comm-central/rev/e47ba37adeb2
Use a ref-counted reference to the database for the DB reporter. r=BenC
Description
•