Crash in mozilla::mailnews::MsgDBReporter::GetPath and mozilla::mailnews::MsgDBReporter::CollectReports - lifetime of reporter object shoud be bound to (and owned by) the nsMsgDatabase
Categories
(Thunderbird :: General, defect)
Tracking
(thunderbird_esr52 wontfix, thunderbird_esr60 wontfix, thunderbird_esr68 affected, thunderbird51 wontfix, thunderbird52 wontfix, thunderbird53 wontfix, thunderbird54 wontfix, thunderbird67 wontfix)
People
(Reporter: wsmwk, Assigned: jorgk-bmo)
References
()
Details
(Keywords: crash, regression, Whiteboard: [regression:TB51?])
Crash Data
Attachments
(1 file, 3 obsolete files)
3.31 KB,
patch
|
benc
:
review+
|
Details | Diff | Splinter Review |
#15 crash for TB52.0. Like bug 1353702, users with this crash signature also tend to report OOM signatures, but in this case not OOM | Small. And again, perhaps addon or some other program involved. Three pairings listed below bp-864a16f5-dd57-43c4-abcb-463d72170404 bp-567ae4a1-3956-4028-a358-9dcb32170404 OOM | unknown | js::AutoEnterOOMUnsafeRegion::crash | js::TenuringTracer::moveToTenured bp-58dd7fb9-2c28-435b-9b8d-8880b2170404 bp-bfa45f39-3ce7-4904-a25b-c0c6e2170403 OOM | large | NS_ABORT_OOM | PLDHashTable::Add | nsImapFlagAndUidState::AddUidCustomFlagPair I suspect these two crashes are the same user: bp-56435972-9fea-495d-886c-7d9e72170322 bp-4f4d5acf-12fe-463f-ade1-918532170329 0 xul.dll mozilla::mailnews::MsgDBReporter::GetPath(nsACString_internal&, bool) c:/builds/moz2_slave/tb-rel-c-esr45-w32_bld-0000000/build/mailnews/db/msgdb/src/nsMsgDatabase.cpp:1054 1 xul.dll mozilla::mailnews::MsgDBReporter::CollectReports(nsIMemoryReporterCallback*, nsISupports*, bool) c:/builds/moz2_slave/tb-rel-c-esr45-w32_bld-0000000/build/mailnews/db/msgdb/src/nsMsgDatabase.cpp:1041 2 xul.dll nsRunnableFunction<<lambda_4186443ceb77967b62731b2342d35d74> >::Run() xpcom/glue/nsThreadUtils.h:261 3 xul.dll nsThread::ProcessNextEvent(bool, bool*) xpcom/threads/nsThread.cpp:972 4 xul.dll NS_ProcessNextEvent(nsIThread*, bool) xpcom/glue/nsThreadUtils.cpp:297 5 xul.dll nsThebesFontEnumerator::EnumerateFonts(char const*, char const*, unsigned int*, wchar_t***) gfx/src/nsThebesFontEnumerator.cpp:44
Reporter | ||
Comment 1•7 years ago
|
||
Earliest crashes per [1] are buildid 20161113004004 bp-f11b5a9a-4837-45d4-83bf-240af2161115 51.0a2 20161115004020 bp-3f6bf4f3-1dff-4285-b605-1950f2161117 52.0a2 20161118004004 bp-df0a00dd-f145-46b3-ba87-7e5f82161119 52.0a2 https://hg.mozilla.org/comm-central/pushloghtml?startdate=2016-11-05+03%3A05%3A00&enddate=2016-11-13+03%3A05%3A00 [1] https://crash-stats.mozilla.com/signature/?release_channel=nightly&release_channel=aurora&signature=mozilla%3A%3Amailnews%3A%3AMsgDBReporter%3A%3AGetPath&date=%3E%3D2016-10-15T23%3A05%3A51.000Z&date=%3C2017-04-15T23%3A05%3A51.000Z&_columns=date&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=reason&_columns=address&_columns=install_time&_sort=-date&page=1
Reporter | ||
Comment 3•7 years ago
|
||
now at #13 crash for TB52.0.1. I suspect connection to bug 1158578 Dave's symptoms in Bug 1360046, which I have just duplicated to this bug, are very slow, unstable and constant crashes. Some users who have this crash signature also see - Bug 1354810 - Crash in mozilla::dom::Promise::PerformWorkerMicroTaskCheckpoint - Bug 1353702 - Crash in PtrToNodeMatchEntry (CompareCacheMatchEntry) during CC - @0x0 | mozilla::mailnews::MsgDBReporter::CollectReports bp-33943c5e-47d7-49cb-936b-ece2a0170426 - CDesktopBrowser::_UpdateLayoutStoreState bp-0bc3b92d-4edb-4a6a-9112-b41120170425
Reporter | ||
Comment 4•7 years ago
|
||
Dave, You can test a draft patch from bug 1158578 using the win32.exe version in the directory at https://mzl.la/2q8E3Ec Backup your Thunderbird profile first as a precaution https://support.mozilla.org/en-US/kb/profiles-where-thunderbird-stores-user-data#w_backing-up-a-profile Does it help?
Comment 5•7 years ago
|
||
Wayne, Yes, thanks! This is the elixir I was looking for. Back to normal operation and no crashes since the install 12 hours ago.
Reporter | ||
Updated•7 years ago
|
Reporter | ||
Comment 6•7 years ago
|
||
52.1.1 is out a bit now and even with bug 1158578 fixed, it's #15 crash. Example https://support.mozilla.org/en-US/questions/1160061#answer-970951
Reporter | ||
Comment 7•7 years ago
|
||
The user's crashes from https://support.mozilla.org/en-US/questions/1160061 - CCGraphBuilder::BuildGraph bp-3a72ac8e-c0cd-4f82-bd22-bc6d70170522 bug 1272230 (and perhaps Bug 1353702) - mozilla::mailnews::MsgDBReporter::GetPath bp-4f81be9e-54f1-4aec-8304-953680170521 - mozilla::mailnews::MsgDBReporter::GetPath bp-cff9cef3-444b-4839-a631-2d1bc0170521 - xul.dll@0x11844d4 bp-ca143aae-e664-4003-a6d0-2cf920170520 - OOM | small bp-a56edee6-1ffb-42ab-a889-b835e0170516
Reporter | ||
Comment 8•7 years ago
|
||
https://support.mozilla.org/en-US/questions/116187 @ mozilla::mailnews::MsgDBReporter::CollectReports bp-b0e13cdb-1d5d-4440-ab9f-a3f1a0170528
Reporter | ||
Comment 9•7 years ago
|
||
#89 and #142 for 52.2.1, so no longer a topcrash. mozilla::mailnews::MsgDBReporter::GetPath dropped substantially ~June 14
Reporter | ||
Comment 10•6 years ago
|
||
many users of current crashes have filtaquilla addon installed
Comment 11•6 years ago
|
||
i just have numerous rss feeds in my profile, no email account (POP3, IMAP or otherwise) i leave thunderbird in background to collect feeds updates i got few counts for thunderbird 59.0b2 just now https://crash-stats.mozilla.com/report/index/d59f7e04-4088-4c8a-9d7d-9c1490180306 https://crash-stats.mozilla.com/report/index/8c29d391-d123-461c-af66-f278a0180306 i don't have filtaquilla installed and the crash happens in safe mode as well https://crash-stats.mozilla.com/report/index/ec0d1dd8-b499-4d2e-8989-7b69b0180306 it consumes 25% CPU (one of the 4 cores) and around 3.5G of physical memory at the time of each crash. it crashed as "OOM | small" in this particular instance https://crash-stats.mozilla.com/report/index/7ff4402a-6653-443a-993d-aeb730180306 given high CPU consumption, it is probably not just a simply memory leak it doesn't even run for 15 mins, falling back to 58.0b3 for now
Reporter | ||
Comment 12•6 years ago
|
||
This user also reports crashes with RSS. bp-e506e7ef-1f5e-4185-9315-fb5b10180226 mozilla::dom::workers::RuntimeService::CrashIfHanging - bug 1439893 bp-dcf4005d-6b60-4696-8130-2b1bb0180227 OOM | small bp-7fb3f27c-0e58-406e-b335-39f2d0180219 @0x0 | mozilla::mailnews::MsgDBReporter::CollectReports
Comment 13•6 years ago
|
||
looks working fine with Thunderbird 60.0b2, up and running for 2 hours with low CPU and steady memory usage :) http://ftp.mozilla.org/pub/thunderbird/releases/60.0b2/
Comment 14•6 years ago
|
||
Lot of this kind of crashes per day with TB 52.8.0, after remove RSS account, it's fine.
Reporter | ||
Comment 15•5 years ago
|
||
Looking at graph for past 6 months, crash rate of version 60 appears to be about the same as version 52.
Reporter | ||
Comment 17•5 years ago
|
||
combined signature makes this a top 30 crash for 60.6.1
bp-848ff6de-158f-4411-bd03-8a8430190504 67.0b
bp-37ab03bc-afaa-44da-bd1c-507d90190504 67.0b (same user)
bp-0d39159e-aabb-4f57-a8c4-02d310190430 67.0b different user
Although for mozilla::mailnews::MsgDBReporter::CollectReports there are no beta crashes for current versions.
https://crash-stats.mozilla.com/signature/?release_channel=%21release&signature=mozilla%3A%3Amailnews%3A%3AMsgDBReporter%3A%3ACollectReports&date=%3E%3D2019-02-04T18%3A25%3A00.000Z&date=%3C2019-05-04T18%3A25%3A00.000Z
Comment 18•5 years ago
|
||
just had https://crash-stats.mozilla.com/report/index/e5b51280-f57a-45e6-851f-1a16b0190715 At the time I was looking for why Thunderbird was sitting there consuming over 1200mb. Opened activity manager and sat for minutes observing it telling me it was indexing message 3 or 4 in an RSS feed folder. Then I asked for an about:memory and crash happened.
Comment 19•5 years ago
|
||
Crashed again today. This time I got a memory report before it crashed. 622mb of 1137mb in maildb.... 266mb on a feed that has a 220mb mbox and a 94mb msf database. If the memory Gz is of interest to anyone. Just ask.
Reporter | ||
Comment 20•5 years ago
|
||
(In reply to Matt from comment #18)
just had https://crash-stats.mozilla.com/report/index/e5b51280-f57a-45e6-851f-1a16b0190715 60.7.2 Crash Report [@ shutdownhang | nsThread::Shutdown | nsThreadManager::Shutdown ]
Matt, see bug 1526127.
Comment 21•5 years ago
|
||
Does this also affect 64 bit Thunderbird?
If not, another good reason to start switching over.
Comment 22•5 years ago
|
||
I am using TB 64bit and have been for more than 12 months. Despite Wayne's pointing me to bug 1526127 I really do not see that as the correct bug for my crash as whilst the updater was supposedly running, it completed and post crash I was on the new version.
What I do have are a number of accounts that are "non functional" so I cancel password prompts, but most of my high memory issues appear to be related to a freenode certificate request that displays an incorrect certificate and just stops network functions until it is acknowledged.
Comment 23•5 years ago
|
||
I only have one account, but it keeps asking for password retry/renew. It is a Yahoo account. Can't find specifics on how this should be set up for server names, but I did it the way I found it once in the past. It gets mail, but re-requests frequently, and a popup in lower right hand says yahoo mail not available. Not sure if this is the same problem as you. Let me know if there is a way I can check or help with this. I'm on 69.0b2 (64-bit).
Reporter | ||
Comment 24•5 years ago
|
||
I clicked reply on a message in my gmail account. The compose window didn't come up. (Thunderbird had been running at 800MB and according to task manager there was not a big increase at the time of the crash)
bp-3f039407-0341-49ed-9b4b-458a50190814
Assignee | ||
Comment 25•5 years ago
|
||
Looks like something bad happened, and to add insult to injury, it then crashed in error reporting :-(
void GetPath(nsACString &memoryPath, bool aAnonymize) {
memoryPath.AssignLiteral("explicit/maildb/database(");
nsCOMPtr<nsIMsgFolder> folder;
935 mDatabase->GetFolder(getter_AddRefs(folder));
Looks like the database was null here. We can protect against that.
Assignee | ||
Comment 26•5 years ago
|
||
This will fix the crash in reporting, but it won't fix the horrible thing that caused the report to be requested.
Reporter | ||
Comment 27•5 years ago
•
|
||
(In reply to Jorg K (GMT+2) from comment #26)
Created attachment 9085413 [details] [diff] [review]
1353704-fix-crash.patchThis will fix the crash in reporting, but it won't fix the horrible thing that caused the report to be requested.
Indeed. So with the crash "gone", what kind of behavior might users see, or what types of issues might we have now papered over?
I can say a little more about my environment at the time although not sure it will help 1. many tabs open and several compose windows, 2. TB was open for several days (but less than a week), 3. using ~800MB last I looked, which is more than normal for me but not high enough to be alarming (probably higher than normal because I had chat open and RSS feeds running which I don't normally have), 4. I was starting to reply to a message in my Inbox from cinymini of bug 1422251 and just before the crash I searched for her full address which includes @gmx.de but search would return any results, but searching on only cinymini was successful (so very strange)
Assignee | ||
Comment 28•5 years ago
|
||
(In reply to Wayne Mery (:wsmwk) from comment #27)
So with the crash "gone", what kind of behavior might users see, or what types of issues might we have now papered over?
We'll find out ;-)
Reporter | ||
Comment 29•5 years ago
|
||
Going beyond my incident, as suggested in earlier comments, I've been in touch with ~6 reporters since 2017. Some anecdotal information from them:
-
"It looks like if the folders are kept with a lower number of records it helps with alleviating the issue but not with fixing it." and (same user) " if I restart TB often it will not crash". (no RSS feeds) Other crashes from this user bp-e5a9a202-8e82-4c52-8d3b-f36f20190520
-
For some users the crash has simply gone away (again, like bug 1353702) after some period of time, and in some cases after switching machines. In the later case, bad memory certainly is a possibility. I'm really looking forward to the breakpad folks or someone hooking up a memory tester to the crash reporter - bug 995652
Assignee | ||
Comment 30•5 years ago
|
||
I can find a reviewer soon and we can ship it in TB 69 beta 3 if you agree to do a new build.
Assignee | ||
Comment 31•5 years ago
|
||
Comment on attachment 9085413 [details] [diff] [review] 1353704-fix-crash.patch First reviewer wins ;-)
Comment 32•5 years ago
|
||
Comment on attachment 9085413 [details] [diff] [review] 1353704-fix-crash.patch Review of attachment 9085413 [details] [diff] [review]: ----------------------------------------------------------------- ::: mailnews/db/msgdb/src/nsMsgDatabase.cpp @@ +924,5 @@ > GetPath(path, aAnonymize); > return aCb->Callback( > EmptyCString(), path, nsIMemoryReporter::KIND_HEAP, > nsIMemoryReporter::UNITS_BYTES, > + mDatabase ? mDatabase->SizeOfIncludingThis(GetMallocSize) : 0, mDatabase is passed as 'db' in the MsgDBReporter constructor from mMemReporter = new mozilla::mailnews::MsgDBReporter(this); in nsMsgDatabase constructor. So how can 'this' (the nsMsgDatabase object) be ever null? Is this some memory corruption?
Assignee | ||
Comment 33•5 years ago
|
||
Hmm, good point. The problem is that there is no ref-counting here:
class MsgDBReporter final : public nsIMemoryReporter {
nsMsgDatabase *mDatabase;
So most likely the database is already destroyed by the time we come to report. Let me try this again ;-)
Assignee | ||
Comment 34•5 years ago
|
||
This should be better.
Assignee | ||
Comment 35•5 years ago
|
||
Oops, commit message should be:
Use a ref-counted reference to the database for the DB reporter
Comment 36•5 years ago
|
||
The question is whether we want to keep a reference to the DB even somehow after its lifetime. This is something for jcranmer ;)
Assignee | ||
Comment 37•5 years ago
|
||
Sorry about the noise. We need to null-out mMemReporter so it gets destroyed or the reference it keeps will stop the database from being destroyed. Here's an example of a one-line change that can upset the entire system.
The funny thing is, the DTOR actually de-registers using UnregisterWeakMemoryReporter(), so after the database was destroyed, there shouldn't have been any more reporting. As many crashes, the whole thing is tricky. I won't take this to ESR in a hurry.
BTW, Joshua won't help us here, if you're lucky, he'll answer general questions on IRC.
Comment 38•5 years ago
|
||
Comment on attachment 9085434 [details] [diff] [review] 1353704-fix-crash-using-ref.patch (v2) Review of attachment 9085434 [details] [diff] [review]: ----------------------------------------------------------------- I think this comment is key: > the DTOR actually de-registers using UnregisterWeakMemoryReporter(), so after the database was destroyed, there shouldn't have been any more reporting The fact the reporter is falling down because the db is gone kind of implies something more serious going on... Regarding the patch, I think it'll cause both the `nsMsgDatabase` and `MsgDBReporter` to lock each other into existence - we'll just gradually leak objects over time. It seems like the lifetime of the reporter is tightly scoped to the `nsMsgDatabase` object, so the reporter shouldn't need a `RefPtr`. Are we sure the `nsMsgDatabase` is being destroyed before the reporter callback is called? Or is it being just being corrupted in some way? ::: mailnews/db/msgdb/src/nsMsgDatabase.cpp @@ +1002,4 @@ > m_msgReferences(nullptr), > m_cacheSize(kMaxHdrsInCache) { > mMemReporter = new mozilla::mailnews::MsgDBReporter(this); > mozilla::RegisterWeakMemoryReporter(mMemReporter); So at this point, the `MsgDBReporter` will be holding a RefPtr to `nsMsgDatabase` which will keep it from being destroyed. But `nsMsgDatabase::mMemReporter` is also a RefPtr which prevents the `MsgDBReporter` from being deleted. So aren't the database and reporter now mutually locking each other in existence? (I can't tell if `mozilla::RegisterWeakMemoryReporter` also holds the reporter object in existance or not, but I guess it's a moot point). @@ +1006,5 @@ > } > > nsMsgDatabase::~nsMsgDatabase() { > mozilla::UnregisterWeakMemoryReporter(mMemReporter); > + mMemReporter = nullptr; Will the `~nsMsgDatabase()` dtor ever be called?
Assignee | ||
Comment 39•5 years ago
|
||
Comment on attachment 9085434 [details] [diff] [review] 1353704-fix-crash-using-ref.patch (v2) Right. You have a better idea?
Assignee | ||
Updated•5 years ago
|
Assignee | ||
Comment 40•5 years ago
•
|
||
(In reply to Ben Campbell from comment #38)
Will the
~nsMsgDatabase()
dtor ever be called?
Good question, see bug 1342858. Kent's personal prediction was that we'll never solve it unless we get rid of Mork. That does its own ref-counting and other hair-raising stuff :-(
Comment 41•5 years ago
|
||
(In reply to Jorg K (GMT+2) from comment #25)
Looks like the database was null here. We can protect against that.
Crash address was 0xe5e5e631. Windows' free replaces memory with 0xe5e5e5e5, so this looks like nsMsgDatabase * was used after free, and the vtable pointer was offset by 19 entries before attempting to call it.
Have you tried making the reporter hold an nsWeakPtr<nsMsgDatabase>? That should be legal, and it will be cleared to null after nsMsgDatabase::~nsMsgDatabase is called, allowing you to legitimately check for null (null checks would also have to be added, of course, but weak pointers do remind you to try that).
(In reply to Jorg K (GMT+2) from comment #40)
Good question, see 1342858. Kent's personal prediction was that we'll never solve it unless we get rid of Mork. That does its own ref-counting and other hair-raising stuff :-(
The Mork side of things isn't too bad, if my stale brain cells are any good. The strong references of mdbI* will hold whatever needs to be held, and they won't hold any cyclic references back into nsMsgDatabase and friends. The problem is nsMsgHdr's... funky... approach to ref-counting, IIRC.
Comment 42•5 years ago
|
||
(In reply to Joshua Cranmer [:jcranmer] from comment #41)
Crash address was 0xe5e5e631. Windows' free replaces memory with 0xe5e5e5e5, so this looks like nsMsgDatabase * was used after free, and the vtable pointer was offset by 19 entries before attempting to call it.
Ahh, now that sounds like the core problem to fix. Either somewhere is holding a raw pointer to nsMsgDatabase
and accessing it after free, or perhaps somewhere has screwed-up refcounting (eg forgetting to AddRef() a return value or something?).
Have you tried making the reporter hold an nsWeakPtr<nsMsgDatabase>? That should be legal, and it will be cleared to null after nsMsgDatabase::~nsMsgDatabase is called, allowing you to legitimately check for null (null checks would also have to be added, of course, but weak pointers do remind you to try that).
The more I think about it, the more the holding-the-DB-open-in-the-reporter seems to be the wrong approach. The lifetime of the reporter object is meant to be bound to (and owned by) the nsMsgDatabase
. The reporter is registered/unregistered in the nsMsgDatabase
ctor/dtor.
Comment 43•5 years ago
|
||
Grepping through, it looks like raw nsMsgDatabase
pointers are relatively rare:
https://searchfox.org/comm-central/search?q=nsMsgDatabase%5B+%5D%3F%5B*%5D&case=false®exp=true&path=
There are a couple there that smell a little odd:
nsDBFolderInfo
holds a raw nsMsgDatabase
pointer (no refcounting that I can see):
https://searchfox.org/comm-central/source/mailnews/db/msgdb/public/nsDBFolderInfo.h#125
There's also one in nsMsgOfflineImapOperation
, although it looks like its ctor/dtor does increment/decrement the refcount manually...
https://searchfox.org/comm-central/source/mailnews/db/msgdb/src/nsMsgOfflineImapOperation.h#48
Comment 44•5 years ago
|
||
Raw nsIMsgDatabase
pointers are a little more prevalent:
https://searchfox.org/comm-central/search?q=nsIMsgDatabase%5B+%5D%3F%5B*%5D®exp=true&path=
(I haven't picked through these ones at all)
Reporter | ||
Comment 46•5 years ago
|
||
Noting for the record, I crashed today bp-0b9d399f-ec1c-4995-9264-59e8c0190819. I had just fired up my laptop from sleep and was deleting messages from vseerror Inbox while network was coming up - unclear whether it had yet started to filter the 56 new messages. So completely different circumstances from user POV from comment 24
Assignee | ||
Updated•5 years ago
|
Reporter | ||
Comment 47•4 years ago
|
||
Roughly same rate of version 68 crashes as version 60 for the last 6 months. But at roughly 30 per day (if you include v60 crashes whose users will be upgraded) this is no longer a topcrash.
bp-c20aa3bb-808a-4757-a238-043f60200118 is a 68.4.1 example of mozilla::mailnews::MsgDBReporter::GetPath
What's worth a quick fix? And what's worth filing follow up bugs?
What do you think?
Comment 48•4 years ago
|
||
(In reply to Wayne Mery (:wsmwk) from comment #29)
In the later case, bad memory certainly is a possibility. I'm really looking forward to the breakpad folks or someone hooking up a memory tester to the crash reporter - bug 995652
I might be willing to try this if not too complicated and you give instructions.
Reporter | ||
Updated•4 years ago
|
Reporter | ||
Comment 49•4 years ago
|
||
Comment on attachment 9085680 [details] [diff] [review] 1353704-fix-crash-using-ref.patch Thanks Jorg for the updated patch. Changing NI to review request
Comment 50•4 years ago
|
||
Comment on attachment 9085680 [details] [diff] [review] 1353704-fix-crash-using-ref.patch Review of attachment 9085680 [details] [diff] [review]: ----------------------------------------------------------------- Looks good. The ptr-weakening 'feels' like it's happening in the right place :-)
Updated•4 years ago
|
Comment 51•4 years ago
|
||
Pushed by geoff@darktrojan.net:
https://hg.mozilla.org/comm-central/rev/e47ba37adeb2
Use a ref-counted reference to the database for the DB reporter. r=BenC
Description
•