Regression: Latest Nightly needs a longer time to Quit (Spinning Beach Ball appears)

RESOLVED FIXED in Firefox 59

Status

()

Core
DOM
--
major
RESOLVED FIXED
2 months ago
a month ago

People

(Reporter: Mehmet, Assigned: bkelly)

Tracking

(Blocks: 1 bug, {nightly-community, perf, regression})

59 Branch
mozilla59
nightly-community, perf, regression
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox-esr52 unaffected, firefox57 unaffected, firefox58 unaffected, firefox59 fixed)

Details

Attachments

(6 attachments, 8 obsolete attachments)

550.43 KB, video/quicktime
Details
1.07 MB, video/quicktime
Details
7.68 KB, image/png
Details
6.78 KB, patch
baku
: review+
Details | Diff | Splinter Review
8.12 KB, patch
bkelly
: review+
Details | Diff | Splinter Review
928 bytes, patch
bkelly
: review+
Details | Diff | Splinter Review
(Reporter)

Description

2 months ago
macOS 10.12.6
FF Nightly 59.0a1 (2017-11-25) (64-Bit)

1.) Start Nightly
2.) Quit Nightly via CMD-Q

Actual: Nightly needs a longer time to Quit. The Spinning Beach Ball appears.

Expected: Nightly should quit quickly.

This is a regression.
(Reporter)

Updated

2 months ago
Summary: Regression: Lastet Nightly needs a longer time to Quit (Spinning Beach Ball) → Regression: Latest Nightly needs a longer time to Quit (Spinning Beach Ball appears)
(Reporter)

Comment 1

2 months ago
I did a bisect, and it looks like I found the regression range:

Good: 2017-11-22-10-31-38

 Bad: 2017-11-22-22-00-56

May be you can help to find the culprit in this range that could have cause this regression.

Thanks.
Can you please use https://mozilla.github.io/mozregression/ to determine a specific bug introduced the regression?
Keywords: nightly-community, regression, regressionwindow-wanted
(Reporter)

Comment 3

2 months ago
Mozregression seems to be a Windows GUI. Since this is a macOS bug, is it possible to use it on macOS too? If yes, do you have a documentation? I found the regression range above via downloading them manually from https://ftp.mozilla.org/pub/firefox/nightly/2017/.

Thank you.
(Reporter)

Comment 4

2 months ago
Created attachment 8931938 [details]
Expected.mov

Here is a expected Screencast. Nightly quits in 1 sec.
(Reporter)

Comment 5

2 months ago
Created attachment 8931940 [details]
actual.mov

And here is a actual Screencast. Nightly quits in 7 sec. :-(


If you need more information please let me know.

Comment 6

2 months ago
I can reproduce this on Windows 10, Ubuntu 17.04 and macOS 10.13.1.

Regression range:
https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=e6ace8b5531cd2f50b6685ef8fdb180afcd9344e&tochange=dbbca05aa64a14580d3ab23950e148019a097218
Blocks: 1419536
Has Regression Range: --- → yes
status-firefox58: --- → unaffected
OS: Mac OS X → All
Hardware: Unspecified → All
(Reporter)

Comment 7

2 months ago
(In reply to magicp from comment #6)
> I can reproduce this on Windows 10, Ubuntu 17.04 and macOS 10.13.1.
> 
> Regression range:
> https://hg.mozilla.org/integration/mozilla-inbound/
> pushloghtml?fromchange=e6ace8b5531cd2f50b6685ef8fdb180afcd9344e&tochange=dbbc
> a05aa64a14580d3ab23950e148019a097218

Thanks for the regression range.

bkelly@, Could you please take a look at this regression?

Thank you in advance!
Flags: needinfo?(bkelly)
(Assignee)

Comment 8

2 months ago
I'll investigate first thing tomorrow morning.
(Reporter)

Comment 9

2 months ago
(In reply to Ben Kelly [:bkelly] from comment #8)
> I'll investigate first thing tomorrow morning.

Okay, thank you very much.
(Assignee)

Comment 10

2 months ago
My theory is my changes introduced too much IPC traffic late in shutdown.  My plan is to try eagerly shutting down ClientManagerService and all its actors at the first signs of shutdown.

A back up plan is to filter out window/worker clients we don't technically need yet.  I want to avoid this since it limits the usefulness of this code going forward, but it might be a reasonable back up plan for now.

Of course, it could also be something completely different that I broke.  I'll see what works tomorrow.
(Assignee)

Comment 11

2 months ago
(In reply to magicp from comment #6)
> I can reproduce this on Windows 10, Ubuntu 17.04 and macOS 10.13.1.

What were you looking for on windows?  I can't reproduce on that platform.  Watching task manager it seems to go away immediately.
Flags: needinfo?(magicp.jp)
(Assignee)

Comment 12

2 months ago
Hmm, I think I might be able to reproduce something on linux.
Assignee: nobody → bkelly
Status: NEW → ASSIGNED
Flags: needinfo?(magicp.jp)
Flags: needinfo?(bkelly)
(Reporter)

Comment 13

2 months ago
Ben: If you need some information from the Mac side (Repro steps etc.) I can provide. Thanks.
(Assignee)

Comment 14

2 months ago
And I can confirm at rev e6ace8b5531c the delay I see is gone.  I'll investigate.

(In reply to Mehmet from comment #13)
> Ben: If you need some information from the Mac side (Repro steps etc.) I can
> provide. Thanks.

Thanks, but I think I have enough information to proceed now.

Comment 15

2 months ago
Created attachment 8932232 [details]
close-firefox.png

(In reply to Ben Kelly [:bkelly] from comment #11)
> What were you looking for on windows?  I can't reproduce on that platform. 
> Watching task manager it seems to go away immediately.

It is easy steps, quit and restart Firefox. Then a message will be shown.
"Firefox is already running, but is not responding. The old Firefox process must be closed to open a new window [Close Firefox][Cancel]"
(Assignee)

Comment 16

2 months ago
Created attachment 8932617 [details] [diff] [review]
P1 Make ClientManagerService track active ClientManagerParent actors. r=baku
(Assignee)

Comment 17

2 months ago
Created attachment 8932619 [details] [diff] [review]
P2 Eagerly shutdown ClientManagerService. r=baku

This fixes the problem for me locally.  Lets see if it blows anything up on try:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=62269b6c4492abff89d8c1778474e83433dcbff6
(Assignee)

Comment 18

2 months ago
Created attachment 8932628 [details] [diff] [review]
P2 Eagerly shutdown ClientManagerService. r=baku

Fix some static analysis bustage.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=0ef960c14cd7314ad8c65b2397d83aeacef0c16e
Attachment #8932619 - Attachment is obsolete: true
(Assignee)

Comment 19

2 months ago
Does this test build fix the issue for you?

https://queue.taskcluster.net/v1/task/I9BZpgUkR9KMa8liN1aAjg/runs/0/artifacts/public/build/target.dmg

Note, you have to temporarily disable application code signing to run it on mac.  For example:

https://apple.stackexchange.com/a/294016
Flags: needinfo?(mehmet.sahin)
(Assignee)

Comment 20

2 months ago
Hmm, lots of orange.  It seems before-profile-change does not always fire.  I'll try switching to NS_XPCOM_WILL_SHUTDOWN_OBSERVER_ID tomorrow.
(Assignee)

Comment 21

2 months ago
Nevermind.  This binary crashes immediately on my mac.
Flags: needinfo?(mehmet.sahin)
(Reporter)

Comment 22

2 months ago
Hello Ben,

I was able to install the target.dmg on my MacBook Air 11 (Mid 2012) without a crasher. Quitting the app worked also fine and quick again as expected.

Okay, since you noticed the crasher in that build I will wait then for a new build from you to test again.

Thanks.
(Assignee)

Comment 23

2 months ago
Created attachment 8932922 [details] [diff] [review]
P2 Eagerly shutdown ClientManagerService. r=baku

https://treeherder.mozilla.org/#/jobs?repo=try&revision=81625730215c5e96d1d5b5947543d96cbabfff0e
Attachment #8932628 - Attachment is obsolete: true
(Assignee)

Comment 24

2 months ago
Running into problems because of bug 1421712.  I'll have to refactor this a bit.
(Assignee)

Comment 25

2 months ago
Created attachment 8933008 [details] [diff] [review]
P2 Eagerly shutdown ClientManagerService. r=baku

Switch to using nsIAsyncShutdown service since it tells me if the observer topic has already fired.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=bc4b733cdc4add341d19d92dd51d057a1f34194a
(Assignee)

Comment 26

2 months ago
Created attachment 8933037 [details] [diff] [review]
P2 Eagerly shutdown ClientManagerService. r=baku

https://treeherder.mozilla.org/#/jobs?repo=try&revision=43dcbfa70d21fdadee10ee16ed5ef5ffef339179
Attachment #8932922 - Attachment is obsolete: true
Attachment #8933008 - Attachment is obsolete: true
(Assignee)

Comment 27

2 months ago
Comment on attachment 8932617 [details] [diff] [review]
P1 Make ClientManagerService track active ClientManagerParent actors. r=baku

Andrea, this patch makes us track the top level PClientManagerParent actors in a list on the ClientManagerService.  This list will be used in the next patch to kill these actors at the start of the shutdown process.

I also added an Init() method so I can kill the actor immediately if shutdown has already begun.
Attachment #8932617 - Flags: review?(amarchesini)
(Assignee)

Comment 28

2 months ago
Comment on attachment 8933037 [details] [diff] [review]
P2 Eagerly shutdown ClientManagerService. r=baku

This patch adds an nsIAsyncShutdownBlocker that resolves a MozPromise when xpcom-shutdown is reached.  This then causes the ClientManagerService on the background thread to start deleting the PClientManagerParent actors which in turn stops all Client IPC actors.

We don't actually block shutdown, though.  We don't currently leak without listening to shutdown since the service is gracefully ref-counted away as windows are destroyed.  We only really care about shutdown as a way to jump start the teardown to avoid IPC traffic late in the process.

Note, this does hold the ClientManagerService alive until shutdown, but we effectively do that anyway since every window object in the system holds it alive.
Attachment #8933037 - Flags: review?(amarchesini)
Please correct the component if this does not fit. Thanks.
Component: Untriaged → General
Product: Firefox → Core
(Assignee)

Updated

2 months ago
Component: General → DOM
Attachment #8932617 - Flags: review?(amarchesini) → review+
Attachment #8933037 - Flags: review?(amarchesini) → review+

Comment 30

2 months ago
Pushed by bkelly@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/ff3ee0cc91ab
P1 Make ClientManagerService track active ClientManagerParent actors. r=baku
https://hg.mozilla.org/integration/mozilla-inbound/rev/007fde92382e
P2 Eagerly shutdown ClientManagerService. r=baku
(Assignee)

Comment 32

2 months ago
Created attachment 8933358 [details] [diff] [review]
P3 Actually block shutdown until ClientManagerService is destroyed. r=baku

https://treeherder.mozilla.org/#/jobs?repo=try&revision=24fcee31168b75b358452e8bf43b8cc01314941d
Flags: needinfo?(bkelly)
(Assignee)

Comment 33

2 months ago
Created attachment 8933360 [details] [diff] [review]
P3 Actually block shutdown until ClientManagerService is destroyed. r=baku

https://treeherder.mozilla.org/#/jobs?repo=try&revision=485de16183c06b43d75f282d80b5f6c2726ea586
Attachment #8933358 - Attachment is obsolete: true
(Assignee)

Updated

2 months ago
Attachment #8933360 - Attachment is obsolete: true
(Assignee)

Comment 34

2 months ago
Created attachment 8933411 [details] [diff] [review]
P2 Eagerly shutdown ClientManagerService. r=baku

https://treeherder.mozilla.org/#/jobs?repo=try&revision=bc974aa8737462bc17cf4296a5af97d73e43088e
Attachment #8933037 - Attachment is obsolete: true
(Assignee)

Updated

2 months ago
Attachment #8933411 - Flags: review+
(Assignee)

Comment 35

2 months ago
Created attachment 8933414 [details] [diff] [review]
P2 Eagerly shutdown ClientManagerService. r=baku

https://treeherder.mozilla.org/#/jobs?repo=try&revision=b033d1ee63fc927de73399351b96414ce4dfc6d0
Attachment #8933411 - Attachment is obsolete: true
Attachment #8933414 - Flags: review+
(Assignee)

Comment 37

2 months ago
These leaks are so weird.  Its like something landed that depends on clients API delaying shutdown and when I fix that its uncovering that new problem.
(Assignee)

Comment 38

2 months ago
Let's see if just disabling ClientSource creation added in bug 1419536 also triggers these leaks.

Comment 39

2 months ago
I am having a similar problem but I don't have any visual indication that Fx hasn't quit.

Please look at the bug report I opened regarding my problem: https://bugzilla.mozilla.org/show_bug.cgi?id=1420995
(Assignee)

Comment 40

2 months ago
(In reply to Gary [:streetwolf] from comment #39)
> I am having a similar problem but I don't have any visual indication that Fx
> hasn't quit.
> 
> Please look at the bug report I opened regarding my problem:
> https://bugzilla.mozilla.org/show_bug.cgi?id=1420995

I'm going to dupe it to here.  Thanks for pointing it out.
(Assignee)

Updated

2 months ago
Duplicate of this bug: 1420995
(Assignee)

Comment 43

2 months ago
Jan-Ivar, can you help me figure out why content/test/webrtc tests are leaking in my try builds?

Some context:

1. Last week I landed bug 1419536
2. This bug is because that landing introduced a delay in shutdown.  The browser now takes many extra seconds to stop.
3. When I fix the shutdown delay I get leaks in webrtc tests.
4. When I disable the code from bug 1419536 I get leaks in webrtc tests.

Did something land in webrtc in the last week?  Is it possible that its relying on the slow shutdown to cleanup properly?  Right now I don't even know where to look.
Flags: needinfo?(jib)
I see HTMLMediaElement::ShutdownObserver and MediaStreamGraph in comment 44 and on Windows as well in https://treeherder.mozilla.org/logviewer.html#?job_id=148926257&repo=try&lineNumber=10253

My initial suspicion given the 5 day time window was the CamerasParent reorg in bug 1399413 and bug 1388667, which backends getUserMedia calls, which are being exercised in the problematic tests in browser/base/content/test/webrtc/ (UI permission prompt tests that call getUserMedia).

But with these leaks happening on Windows, where real camera and mics are not involved in these tests, it's unlikely.

A more likely suspect might be bug 1412394 which landed 10 days ago, as that made changes to MSG and shutdown. I'm ni?ing people involved with that bug.
Flags: needinfo?(jib) → needinfo?(apehrson)
(Assignee)

Comment 46

2 months ago
Try with bug backed 1412394 out:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=3763225c80c7ff94bfc94c15cd0e58381e639a06

The time frames still don't seem to line up with any of these bugs landing, though.

Comment 48

2 months ago
Ben.  Do you want me to run Process Manager and give you the data during the 6 seconds that it takes FX to startup?  My bug report mentions that I saw under the Network data that for these 6 seconds it was executing some profiling task.  This I thought might be Fx sending data to Mozilla.
(Assignee)

Comment 49

2 months ago
(In reply to Gary [:streetwolf] from comment #48)
> Ben.  Do you want me to run Process Manager and give you the data during the
> 6 seconds that it takes FX to startup?  My bug report mentions that I saw
> under the Network data that for these 6 seconds it was executing some
> profiling task.  This I thought might be Fx sending data to Mozilla.

Thanks but the problems I'm running into are not really related to the actual delay your experienced.  Don't worry, we'll get it sorted out.
(Assignee)

Comment 50

2 months ago
(In reply to Ben Kelly [:bkelly] from comment #46)
> Try with bug backed 1412394 out:
> 
> https://treeherder.mozilla.org/#/
> jobs?repo=try&revision=3763225c80c7ff94bfc94c15cd0e58381e639a06

Backing out bug 1412394 did not help.
(Assignee)

Comment 51

2 months ago
Results are still coming in, but so far its looking like:

last good: 3f6b9aaed8cd
first bad: 7de1ab07869e
Flags: needinfo?(apehrson)
(Assignee)

Comment 53

2 months ago
last good: b0a56e8287da
first bad: 583311c68376

Only about 10 changesets in here.  This one looks like a probably candidate:

changeset:   394231:bbb02c1fccd6
user:        Andrea Marchesini <amarchesini@mozilla.com>
date:        Wed Nov 29 09:40:16 2017 +0100
summary:     Bug 1420419 - Postpone the removing of BlobURL for 5 seconds in order to allow the loading of them in a remote process, r=smaug
(Assignee)

Comment 54

2 months ago
I'm going to do a trigger for each of these revisions on try.

Andrea, in the meantime, do you think bug 1420419 could be causing this leak?

Again, the problem is that there is a bug in the tree which makes us delay shutdown for 5 to 10 seconds.  When I fix this shutdown delay I get a perma-leak in some tests.

This change in the bisect range to hold stuff alive for 5 seconds seems suspicious.  Does it listen for shutdown at all?
Flags: needinfo?(amarchesini)
Severity: normal → major
Has STR: --- → yes
status-firefox57: --- → unaffected
status-firefox-esr52: --- → unaffected
Keywords: regressionwindow-wanted → perf
Version: Trunk → 59 Branch
Depends on: 1422314
(Assignee)

Comment 57

2 months ago
Yea, my last set of try pushes are pointing at bug 1420419 as the conclusive cause of the leak.
(Assignee)

Comment 59

2 months ago
Lets see if we can get some other platforms to run these tests:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=c05550bb0ac6814e8d902f509f857f579a62b478
My local mac build consistently takes >5s to shutdown. I got a shutdown profile (https://perfht.ml/2j2lqwM) that shows we are blocked for 5s at https://searchfox.org/mozilla-central/rev/7f45cb7cc0229398fc99849bdc150cb6462d6966/dom/workers/RuntimeService.cpp#2249

Not sure if this helps, only confirms what you already knew, or is an entirely different bug.
(Assignee)

Comment 61

2 months ago
Please make sure to push this after bug 1422314.
Flags: needinfo?(amarchesini)
Keywords: checkin-needed

Comment 62

2 months ago
Pushed by bkelly@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/7908b821ad3f
P1 Make ClientManagerService track active ClientManagerParent actors. r=baku
https://hg.mozilla.org/integration/mozilla-inbound/rev/f5a3054a4c38
P2 Eagerly shutdown ClientManagerService. r=baku a=aryx on CLOSED TREE
Keywords: checkin-needed

Comment 63

2 months ago
I'm running on the latest inbound which includes this patch as well as 1422314.  Happy to say that the delay is gone.  Kudos.

Comment 64

2 months ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/7908b821ad3f
https://hg.mozilla.org/mozilla-central/rev/f5a3054a4c38
Status: ASSIGNED → RESOLVED
Last Resolved: 2 months ago
status-firefox59: affected → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla59
(Reporter)

Comment 65

2 months ago
Thanks so much! Looks fixed for me in FF Nightly 59.0a1 (2017-12-03) (64-Bit) on macOS 10.12.6

Comment 66

2 months ago
Backed out 3 changesets (bug 1422314, bug 1420594) for failing xpcshell/test_ext_contentScripts_register.js on Android debug r=backout a=backout

https://treeherder.mozilla.org/logviewer.html#?job_id=149165468&repo=mozilla-inbound&lineNumber=1874

https://hg.mozilla.org/mozilla-central/rev/195bb467e6cb5c8c5f5fb2858c0a55b2d0b9552d

https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&revision=0a559782a53a0d27424d3ec0c7f8a942ead597d7&selectedJob=149271085
Status: RESOLVED → REOPENED
status-firefox59: fixed → ---
Resolution: FIXED → ---
Target Milestone: mozilla59 → ---
(Assignee)

Comment 67

2 months ago
I'm kind of upset this was backed out for a single failing test when the backout creates the risk of more leaks being added.
(Assignee)

Comment 68

2 months ago
I'm relanding with the test disabled on android for now.
(Assignee)

Updated

2 months ago
Blocks: 1422823
(Assignee)

Comment 69

2 months ago
Created attachment 8934174 [details] [diff] [review]
P3 Disable test_ext_contentScripts_register.js on android since it fails without the shutdown delay bug. r=baku

Andrea gave me r+ for this in IRC.
Attachment #8934174 - Flags: review+

Comment 70

2 months ago
Pushed by bkelly@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/7ba9a363b3d2
P1 Make ClientManagerService track active ClientManagerParent actors. r=baku
https://hg.mozilla.org/integration/mozilla-inbound/rev/a3e3a096629c
P2 Eagerly shutdown ClientManagerService. r=baku
https://hg.mozilla.org/integration/mozilla-inbound/rev/b0f2a1cfbd45
P3 Disable test_ext_contentScripts_register.js on android since it fails without the shutdown delay bug. r=baku
(Assignee)

Comment 71

2 months ago
Please talk with me before backing out this bug.  Its very critical that it stick since backing it out will create the opportunity for more erroneous code to be added.  In the short time frame we had the extra shutdown delay we already had a late shutdown leak added and a problematic test added.  We need to try to fix any new problems detected instead of backing this out.

Thank you.

Comment 72

2 months ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/7ba9a363b3d2
https://hg.mozilla.org/mozilla-central/rev/a3e3a096629c
https://hg.mozilla.org/mozilla-central/rev/b0f2a1cfbd45
Status: REOPENED → RESOLVED
Last Resolved: 2 months ago2 months ago
status-firefox59: --- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla59

Updated

2 months ago
Depends on: 1423554

Updated

a month ago
See Also: → bug 1419605
You need to log in before you can comment on or make changes to this bug.