Profiler leaks data with DEBUG build, causes shutdown crash or leaks

RESOLVED FIXED in Firefox 60

Status

()

Core
Gecko Profiler
RESOLVED FIXED
a year ago
4 months ago

People

(Reporter: sfink, Assigned: mstange)

Tracking

(Blocks: 1 bug, {perf})

unspecified
mozilla60
Points:
---

Firefox Tracking Flags

(firefox60 fixed)

Details

(Whiteboard: [qf:investigate])

MozReview Requests

()

Submitter Diff Changes Open Issues Last Updated
Loading...
Error loading review requests:

Attachments

(5 attachments)

When running with the profiler, I get frequent shutdown crashes. When I don't crash, I get a long list of things being leaked instead. Typical output:

[7778] WARNING: Fonts still alive while shutting down gfxFontCache: 'mFonts.Count() == 0', file /home/sfink/src/mozilla/gfx/thebes/gfxFont.cpp, line 206
Assertion failed at /home/sfink/src/mozilla/gfx/cairo/cairo/src/cairo-hash.c:196: hash_table->live_entries == 0
WARNING: YOU ARE LEAKING THE WORLD (at least one JSRuntime and everything alive inside it, that is) AT JS_ShutDown TIME.  FIX THIS!
[7778] ###!!! ASSERTION: Component Manager being held past XPCOM shutdown.: 'cnt == 0', file /home/sfink/src/mozilla/xpcom/build/XPCOMInit.cpp, line 1045
[7778] WARNING: '!compMgr', file /home/sfink/src/mozilla/xpcom/components/nsComponentManagerUtils.cpp, line 63
firefox: fccache.c:548: FcCacheFini: Assertion `fcCacheChains[i] == NULL' failed.
Whiteboard: [qf]
This doesn't look like a GC issue, although it might be.  Maybe profiler people will have a better idea.
Component: JavaScript: GC → Gecko Profiler
Assignee: nobody → mstange
Whiteboard: [qf] → [qf:investigate]
Julian, have you seen these warnings? I think you're the one of us who has been running the profiler in debug builds the most.
Blocks: 1329181
Flags: needinfo?(jseward)
(In reply to Markus Stange [:mstange] from comment #2)
I've been running the profiler mostly in non-debug builds, so I haven't seen
these assertions.  That said, if I now do a debug build, it asserts at exit
every time.
Flags: needinfo?(jseward)
Created attachment 8857808 [details]
Assertion stack and its preamble
And if I disable the profiler and re-run, it doesn't assert.  Tested twice.
So yes, it seems somehow profiler related.
This probably has to do with the way the profiler holds on to JS functions so that it can collect information about them when the profile is collected. profiler_clear_js_context() is supposed to release these references, but maybe that's happening too late or not working correctly.

I also get warnings like these:
ERROR: GC found live Cell 0x109194040 of kind FUNCTION at shutdown
[...]
ERROR: GC found live SharedScriptData 0x10994a200 with ref count 2 at shutdown
[Child 12169] ###!!! ASSERTION: Component Manager being held past XPCOM shutdown.: 'cnt == 0', file /Users/mstange/code/mozilla/xpcom/build/XPCOMInit.cpp, line 1045

I don't get them if I stop the profiler before I quit Firefox.

The right people to look at this bug would be Shu or Kannan, or anybody with time to do some digging.

Kannan, can you find somebody for this bug?
Assignee: mstange → nobody
Flags: needinfo?(kvijayan)
(In reply to Markus Stange [:mstange] from comment #6)
> [Child 12169] ###!!! ASSERTION: Component Manager being held past XPCOM
> shutdown.: 'cnt == 0', file
> /Users/mstange/code/mozilla/xpcom/build/XPCOMInit.cpp, line 1045

The remaining component manager reference should be easy to find with rr (using rr record -n due to https://github.com/mozilla/rr/issues/1930 ).
The component manager warning is probably a symptom of a big shutdown leak, not a cause. The profiler_clear_js_context() call is happening too late, as you guessed. It needs to happen before the cycle collector is shutdown (nsCycleCollector_shutdown()) or cycles involving JS and C++ won't get destroyed. Maybe you could just move the profiler_clear_js_context() call in ShutdownXPCOM right before the call to nsCycleCollector_shutdown(). (The ordering looks okay to me for workers.)
(In reply to Andrew McCreight [:mccr8] from comment #8)
> Maybe you could just move the profiler_clear_js_context() call in
> ShutdownXPCOM right before the call to nsCycleCollector_shutdown(). (The
> ordering looks okay to me for workers.)

Doing so gets rid of the "GC found live ..." warnings, but not of the "YOU ARE LEAKING THE WORLD" warning.
(In reply to Markus Stange [:mstange] from comment #9)
> Doing so gets rid of the "GC found live ..." warnings, but not of the "YOU
> ARE LEAKING THE WORLD" warning.

If you run with XPCOM_MEM_LEAK_LOG=1 does it say anything is leaked? There must be something else that needs to be cleared.
Summary: Profiler leaks data, causes shutdown crash or leaks → Profiler leaks data with DEBUG build, causes shutdown crash or leaks
Created attachment 8859688 [details]
XPCOM_MEM_LEAK_LOG output

(In reply to Andrew McCreight [:mccr8] from comment #10)
> (In reply to Markus Stange [:mstange] from comment #9)
> > Doing so gets rid of the "GC found live ..." warnings, but not of the "YOU
> > ARE LEAKING THE WORLD" warning.
> 
> If you run with XPCOM_MEM_LEAK_LOG=1 does it say anything is leaked? There
> must be something else that needs to be cleared.

"Everything"?

I don't have an rr machine handy at the moment, I can try debugging this properly some other time.
See Also: → bug 1360280
Flags: needinfo?(kvijayan)
The leaks go away if I make sure that js::EnableContextProfilingStack(cx, false) is called before the shutdown cycle collection.
Assignee: nobody → mstange
Status: NEW → ASSIGNED
Comment hidden (mozreview-request)

Comment 14

5 months ago
mozreview-review
Comment on attachment 8949548 [details]
Bug 1355566 - Make sure js::EnableContextProfilingStack(cx, false) is called before shutdown collection.

https://reviewboard.mozilla.org/r/218902/#review224738

Nice fix!
Attachment #8949548 - Flags: review?(n.nethercote) → review+
Created attachment 8949696 [details] [diff] [review]
0001-Add-more-mochitest.patch

Hey Markus,

I wrote a mochitest for this. Could you look at it and import it in your patch if you will?
Attachment #8949696 - Flags: review?(mstange)
This is a try with this mochitest on *central*: https://treeherder.mozilla.org/#/jobs?repo=try&revision=79aba8604fe8f9c6a0fbc6608f8df8e469a62a67

I haven't tried with your patch, so please do it :-)
> This is a try with this mochitest on *central*: https://treeherder.mozilla.org/#/jobs?repo=try&revision=79aba8604fe8f9c6a0fbc6608f8df8e469a62a67

Note that the attached patch is a slightly modified version to fix the reported ESLint issues.
I ran your test locally on Linux, stand-alone, and it works as expected: leaks without the patch and no leaks with the patch.
Hooray!

But I don't think we can land the test as-is because it will keep the profiler running until a later mochitest stops it. We would need to trigger a restart of the browser somehow, and I don't know if there's a good way to do that within the mochitest harness.

Comment 19

5 months ago
Pushed by mstange@themasta.com:
https://hg.mozilla.org/integration/autoland/rev/57f10073ac60
Make sure js::EnableContextProfilingStack(cx, false) is called before shutdown collection. r=njn
Attachment #8949696 - Flags: review?(mstange)

Comment 20

4 months ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/57f10073ac60
Status: ASSIGNED → RESOLVED
Last Resolved: 4 months ago
status-firefox60: --- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla60
Created attachment 8950202 [details] [diff] [review]
marionette-test.diff

Here is a marionette test to check for this issue. But it doesn't work properly as it passes. Well, enough time spent on this...
You need to log in before you can comment on or make changes to this bug.