Closed Bug 1261702 Opened 8 years ago Closed 8 years ago

crash in nsPerformanceGroup::Dispose

Categories

(Toolkit :: Performance Monitoring, defect)

45 Branch
x86
Windows NT
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla50
Tracking Status
firefox45 --- wontfix
firefox46 - wontfix
firefox47 - wontfix
firefox48 + fixed
firefox49 --- fixed
firefox50 --- fixed

People

(Reporter: philipp, Assigned: Yoric)

References

Details

(Keywords: crash, regression)

Crash Data

Attachments

(2 files)

This bug was filed from the Socorro interface and is 
report bp-f8551137-082c-435a-8d29-a13952151107.
=============================================================
Crashing Thread (0)
Frame 	Module 	Signature 	Source
0 	xul.dll 	nsPerformanceGroup::Dispose() 	toolkit/components/perfmonitoring/nsPerformanceStats.cpp
1 	xul.dll 	nsPerformanceStatsService::Dispose() 	toolkit/components/perfmonitoring/nsPerformanceStats.cpp
2 	xul.dll 	nsPerformanceStatsService::Observe(nsISupports*, char const*, wchar_t const*) 	toolkit/components/perfmonitoring/nsPerformanceStats.cpp
3 	xul.dll 	nsObserverList::NotifyObservers(nsISupports*, char const*, wchar_t const*) 	xpcom/ds/nsObserverList.cpp
4 	xul.dll 	nsObserverService::NotifyObservers(nsISupports*, char const*, wchar_t const*) 	xpcom/ds/nsObserverService.cpp
5 	xul.dll 	nsXREDirProvider::DoShutdown() 	toolkit/xre/nsXREDirProvider.cpp
6 	xul.dll 	ScopedXPCOMStartup::~ScopedXPCOMStartup() 	toolkit/xre/nsAppRunner.cpp
7 	xul.dll 	mozilla::UniquePtr<ScopedXPCOMStartup, mozilla::DefaultDelete<ScopedXPCOMStartup> >::reset(ScopedXPCOMStartup*) 	mfbt/UniquePtr.h
8 	xul.dll 	XREMain::XRE_main(int, char** const, nsXREAppData const*) 	toolkit/xre/nsAppRunner.cpp
9 	xul.dll 	XRE_main 	toolkit/xre/nsAppRunner.cpp
10 	firefox.exe 	do_main 	browser/app/nsBrowserApp.cpp
11 	firefox.exe 	NS_internal_main(int, char**) 	browser/app/nsBrowserApp.cpp
12 	firefox.exe 	wmain 	toolkit/xre/nsWindowsWMain.cpp
13 	firefox.exe 	__tmainCRTStartup 	f:/dd/vctools/crt/crtw32/startup/crt0.c:255
14 	kernel32.dll 	BaseThreadInitThunk 	
15 	ntdll.dll 	__RtlUserThreadStart 	
16 	ntdll.dll 	_RtlUserThreadStart

this crash signature is occurring in pre-release builds since 45 and may be caused by the work in bug 1208747.
sorry, that first flag was set by accident...
Quick remarks:
- the crash takes place during shutdown phase "profile-before-change", so XPCOM should still be available;
- there are two paths that call nsPerformanceGroup::Dispose() from nsPerformanceStatsService():
  - mTopGroup->Dispose() at https://hg.mozilla.org/mozilla-central/annotate/cc48981c026c/toolkit/components/perfmonitoring/nsPerformanceStats.cpp#l451;
  - group->Dispose() at https://hg.mozilla.org/mozilla-central/annotate/cc48981c026c/toolkit/components/perfmonitoring/nsPerformanceStats.cpp#l462.

I just noticed that we do not check whether mTopGroup is null before calling `mTopGroup->Dispose(); mTopGroup = nullptr`. This should not be able to cause the issue, though, as we shouldn't be able to call this method more than once.

Other dereferences in or around Dispose():
- https://hg.mozilla.org/mozilla-central/annotate/cc48981c026c/toolkit/components/perfmonitoring/nsPerformanceStats.cpp#l1015, we have a `service.mGroups`, however, this shouldn't be the problem, as mGroups is deleted during destruction of `service`, never set to null;
- a few lines below, `service->mAddonIdToGroup`, but `mAddonIdToGroup` is deleted during destruction of `service`, never set to null;
- a few lines below, `service->mWindowIdToGroup`, but `mWindowIdToGroup` is deleted during destruction, of `service`, never set to null;
- at https://hg.mozilla.org/mozilla-central/annotate/cc48981c026c/toolkit/components/perfmonitoring/nsPerformanceStats.cpp#l458, we take a raw pointer and dereference it. However, these raw pointers are always added to `this` and always removed from the hashtable by the destructor of the corresponding class, so that should also not cause the problem.

None of these lines seem to be actually able to cause a segfault. If I'm right, the segfault comes from some memory corruption, somewhere else, which won't make it easy to find.

As a side-note, it would seem useful for peace of mind to rewrite `mGroups` to take as key a WeakPtr instead of a raw pointer.
Hello David, this was mentioned as a concern for Fx47 release. This is consistently showing up on all 47 Beta builds. Can you investigate the latest crash reports? Or help find an alternate owner. Thanks!
Flags: needinfo?(dteller)
Although I haven't been able to pinpoint why, it looks like
nsPerformanceStatsService::Dispose() may be called twice, which in
turn causes crashes. This patch makes sure that calling the method
twice is idempotent.

Also, just in case this was due to a typo in
AddObserver/RemoveObserver, this patch replaces the literal strings
used in both with constants.

Review commit: https://reviewboard.mozilla.org/r/56230/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/56230/
Attachment #8757847 - Flags: review?(nfroyd)
Flags: needinfo?(dteller)
Comment on attachment 8757847 [details]
Bug 1261702 - Make nsPerformanceStatsService::Dispose() idempotent,

https://reviewboard.mozilla.org/r/56230/#review53178

The only way I can think of this happening is that profile-before-change somehow triggers (the profile manager?) early on in the program lifecycle, and then when it's called again, bad things happen.  Or maybe this is some weirdness with multiple content processes?  Regardless, r=me
Attachment #8757847 - Flags: review?(nfroyd) → review+
Assignee: nobody → dteller
Comment on attachment 8757847 [details]
Bug 1261702 - Make nsPerformanceStatsService::Dispose() idempotent,

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/56230/diff/1-2/
Attachment #8757847 - Attachment description: MozReview Request: Bug 1261702 - Make nsPerformanceStatsService::Dispose() idempotent,r=froydnj → Bug 1261702 - Make nsPerformanceStatsService::Dispose() idempotent,
Pushed by dteller@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/66e0240f8c06
Make nsPerformanceStatsService::Dispose() idempotent,r=froydnj
Ah, conflict with bug 1270628. This should fix it.
Flags: needinfo?(dteller)
Comment on attachment 8757847 [details]
Bug 1261702 - Make nsPerformanceStatsService::Dispose() idempotent,

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/56230/diff/2-3/
Pushed by dteller@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/e4c8b7d0abed
Make nsPerformanceStatsService::Dispose() idempotent,r=froydnj
https://hg.mozilla.org/mozilla-central/rev/e4c8b7d0abed
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla50
Comment on attachment 8757847 [details]
Bug 1261702 - Make nsPerformanceStatsService::Dispose() idempotent,

Approval Request Comment
[Feature/regressing bug #]: Unleashed by bug 1243706. Actual cause is still unclear.
[User impact if declined]: A number of shutdown crashes.
[Describe test coverage new/current, TreeHerder]: Currently on m-c.
[Risks and why]: I don't think that there is any risk. Worst case scenario, we'll exchange one shutdown crash for another one.
[String/UUID change made/needed]:
Attachment #8757847 - Flags: approval-mozilla-beta?
Attachment #8757847 - Flags: approval-mozilla-aurora?
Comment on attachment 8757847 [details]
Bug 1261702 - Make nsPerformanceStatsService::Dispose() idempotent,

Fix a shutdown issue, taking it.
Should be in 48 beta 2
Attachment #8757847 - Flags: approval-mozilla-beta?
Attachment #8757847 - Flags: approval-mozilla-beta+
Attachment #8757847 - Flags: approval-mozilla-aurora?
Attachment #8757847 - Flags: approval-mozilla-aurora+
https://hg.mozilla.org/releases/mozilla-aurora/rev/2e5e0318f6e5f7db9f026dc4c9b656c64722ced6

but has problems to apply to beta:

grafting 350360:2e5e0318f6e5 "Bug 1261702 - Make nsPerformanceStatsService::Dispose() idempotent,r=froydnj, a=sylvestre"
merging toolkit/components/perfmonitoring/nsPerformanceStats.cpp
warning: conflicts while merging toolkit/components/perfmonitoring/nsPerformanceStats.cpp! (edit, then use 'hg resolve --mark')
abort: unresolved conflicts, can't continue
(use 'hg resolve' and 'hg graft --continue')
Flags: needinfo?(dteller)
Attached patch Backport to betaSplinter Review
Flags: needinfo?(dteller)
Attachment #8763958 - Flags: review+
Restoring the tracking flag to + since it was set to "-"
Flags: needinfo?(cbook)
Keywords: checkin-needed
I'm here testing a new script which could potentially update bugs automatically when we uplift a crash fix, but the crash signature still is being reported on that channel. There are still a couple of crash reports with this signature, coming in on aurora 49, since the uplift. Do you want to open a followup bug?
Flags: needinfo?(dteller)
(In reply to Liz Henry (:lizzard) (needinfo? me) from comment #19)
> I'm here testing a new script which could potentially update bugs
> automatically when we uplift a crash fix, but the crash signature still is
> being reported on that channel. There are still a couple of crash reports
> with this signature, coming in on aurora 49, since the uplift. Do you want
> to open a followup bug?

Yes, please, that would be great.
Flags: needinfo?(dteller) → needinfo?(lhenry)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: