Open Bug 1699681 Opened 4 years ago Updated 2 months ago

Enabling the profiler with the JS feature to profile tests leaks windows until shutdown

Categories

(Core :: Gecko Profiler, defect, P2)

defect

Tracking

()

ASSIGNED
Tracking Status
firefox-esr78 --- unaffected
firefox86 --- unaffected
firefox87 --- wontfix
firefox88 --- fix-optional
firefox89 --- fix-optional

People

(Reporter: florian, Assigned: canova)

References

(Blocks 1 open bug, Regression)

Details

(Keywords: regression, Whiteboard: [fxp])

Attachments

(1 obsolete file)

Step to reproduce:
MOZ_PROFILER_STARTUP=1 MOZ_PROFILER_STARTUP_FEATURES="stackwalk,js" ./mach test browser/components/customizableui/test/browser_885530_showInPrivateBrowsing.js --headless

Result:
ERROR TEST-UNEXPECTED-FAIL | browser/components/customizableui/test/browser_885530_showInPrivateBrowsing.js | leaked 8 window(s) until shutdown [url = chrome://browser/content/browser.xhtml]
ERROR TEST-UNEXPECTED-FAIL | browser/components/customizableui/test/browser_885530_showInPrivateBrowsing.js | leaked 8 window(s) until shutdown [url = about:blank]

This behavior is not specific to the browser_885530_showInPrivateBrowsing.js test, many tests show this bug, but I've not verified if all of them reproduce. Some tests also leak docShells.

I'm using the MOZ_PROFILER_STARTUP_FEATURES environment variable here to reduce the feature set used, because using the I/O interposer triggers another leak.

Here are the various shutdown outcomes depending on the features used:

  • "nostacksampling,js": Clean shutdown.
  • "js": Shutdown leak of windows.
  • "stackwalk", "nostacksampling": Shutdown crash on an assertion: MOZ_ASSERT(!mBufferPositionWhenReceivedJSContext) (JSContext should have been cleared before the thread was unregistered)
See Also: → 1699683
Severity: -- → S3
Priority: -- → P2

A local backout of bug 1691504 fixes (or hides) this problem.

Keywords: regression
Regressed by: 1691504
Has Regression Range: --- → yes

Set release status flags based on info from the regressing bug 1691504

I can still reproduce, and it's annoying as it makes the debugging automated tests with the profiler in debug builds very painful.

The "using the I/O interposer triggers another leak." part from comment 0 was fixed in bug 1699683.

Actually, the leak also exists in non-debug builds, and almost made me blame the test I was profiling for a leak.
Profile with the default features: https://share.firefox.dev/3J2Fv2H an increasingly large amount of time is spent GC/CC'ing as the test (which uses a lot of windows) progresses.
Profile without the JS feature: https://share.firefox.dev/3QT6LCJ The same test now takes 30s instead of 46, and most of the GC/CC time is gone.

Summary: Enabling the profiler to profile tests in debug builds leaks windows until shutdown → Enabling the profiler with the JS feature to profile tests leaks windows until shutdown

I spent some time to investigate this leak. Here are my findings:

It looks like the leak is not entirely Bug 1691504's fault. Before that bug was fixed, we weren't profiling the JIT samples properly, that's why probably all the JIT frames were missing before that patch.
After going down on that code path a bit, I realized that before that bug this branch was always executed instead of the code below. entry->isSampled(*rangeStart) was always producing false because of the wrong sample position in buffer. And since we negate that, it was always resulting in true and going in there.

So, it looks like it's about not expiring some of the JitcodeGlobalEntries properly. When I looked where we expire everything after the profiling session, I came across the profiler_clear_js_context API. This API calls JitcodeGlobalTable::setAllEntriesAsExpired method eventually by following the path below:
profiler_clear_js_context -> (StopJSSampling + PollJSSampling) -> EnableContextProfilingStack(mJSContext, false) -> GeckoProfilerRuntime::enable(false) -> JitcodeGlobalTable::setAllEntriesAsExpired.

In the end, my guess it is that either we are not calling the profiler_clear_js_context for some cases, or we are not calling it early enough during the shutdown.

When I looked at some JSContext destructors, I found that we are not calling this function inside nsJSContext::~nsJSContext. After adding it there, it seems like the leak is fixed!
But it's good to mention that my JS engine knowledge is really not that deep. I would like it if someone who knows JS engine better than me verifies that (probably I will send a WIP patch and ping people there).

Assignee: nobody → canaltinova
Status: NEW → ASSIGNED

During shutdown profiling, test suites were reporting window leaks on shutdown.
After investigating, I realized that we are not setting some
JitcodeGlobalEntries as expired. PROFILER_CLEAR_JS_CONTEXT sets these entries
as expired but it wasn't called in this case where we are destructing
nsJSContext.

See Also: → 1809581
Attachment #9317302 - Attachment description: WIP: Bug 1699681 - Clear the JS context for profiler during nsJSContext destruction → Bug 1699681 - Clear the JS context for profiler during nsJSContext destruction r?florian!,tcampbell!
Pushed by canaltinova@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/409858b7bb4a
Clear the JS context for profiler during nsJSContext destruction r=florian,sfink
Flags: needinfo?(canaltinova)
Whiteboard: [fxp]

There's a r+ patch which didn't land and no activity in this bug for 2 weeks.
:canova, could you have a look please?
If you still have some work to do, you can add an action "Plan Changes" in Phabricator.
For more information, please visit auto_nag documentation.

Flags: needinfo?(florian)
Flags: needinfo?(canaltinova)
Flags: needinfo?(florian)
Flags: needinfo?(canaltinova)
Attachment #9317302 - Attachment is obsolete: true
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: