Closed Bug 1648224 Opened 5 years ago Closed 2 years ago

Sometimes Firefox has a 6s startup delay before the first top-level-window is ready

Categories

(Core :: Widget: Cocoa, defect, P3)

defect

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: whimboo, Unassigned)

References

(Blocks 1 open bug)

Details

(Keywords: perf)

Attachments

(1 obsolete file)

I noticed this problem when running the Webdriver spec tests on MacOS. These tests start Firefox a lot and in some cases the startup takes very long. The delay specifically happens after evaluating the command line arguments and before opening the first top-level window:

 0:07.44 pid:28455 1593026170711	geckodriver::marionette	DEBUG	Waiting 60s to connect to browser on 127.0.0.1:2828
 0:08.34 pid:28455 1593026171613	Marionette	TRACE	Marionette enabled
 0:14.44 pid:28455 1593026177720	Marionette	TRACE	Received observer notification toplevel-window-ready

In this case the delay is about 6s. Here the appropriate gecko profile:

https://share.firefox.dev/31cvbPZ

As it looks like during that time there is basically no activity and all the time is spend in mach_msg_trap.

To reproduce use the following mach command:

mach wpt --webdriver-arg=-vv testing/web-platform/tests/webdriver/tests/

If the geckodriver binary cannot be found change into testing/geckodriver and run cargo build. Afterward use --webdriver-binary to specify the path to th binary.

Markus, given that seems to be MacOS related, do you have an idea what's going wrong here?

Flags: needinfo?(mstange.moz)

It's blocking inside -[IMKInputSessionXPCInvocation invocationAwaitXPCReply], which means that it's waiting for some system process. It's unexpected that this would take 6 seconds, but it's also outside our control.

You could try to get a full system profile ("all processes") with Instruments and see if anything stands out.

Flags: needinfo?(mstange.moz)

I tried to use Instruments but it didn't actually give me some helpful information. The WindowServer process stands out for prepare core animation update state, but that only consumes about 600ms CPU time during that 6s period.

So when I checked the main thread of the Firefox profile above the stack shows that it's all under ProcessPendingGetURLAppleEvents. By searching for this method I just found bug 1433358, and I wonder if I might see something similar.

Component: General → Widget: Cocoa
Severity: -- → S3
Priority: -- → P3

I still see this in 1 of 6 cases when running specific Marionette harness dependent tests (including wpt) in a loop, where Firefox gets restarted pretty often. Stephen is there something else I could provide to actually make some progress on this bug? It's kinda frustrating to me to have to wait those extra 6s over and over again when running tests.

Flags: needinfo?(spohl.mozilla.bugs)

It is conceivable that we could be smarter about the kinds of events that we're processing (i.e. change NSAnyEventMask[1] to something more specific to GetURL events), but I wouldn't know off-hand what that could be. And there is no guarantee that it isn't the GetURL events themselves that ultimately lead or contribute to this slowdown. Another approach might be to skip processing of GetURL events when run as part of this test suite, assuming that those aren't needed for the tests.

[1] https://searchfox.org/mozilla-central/rev/5a1a34953a26117f3be1a00db20c8bbdc03273d6/toolkit/xre/MacApplicationDelegate.mm#113

Flags: needinfo?(spohl.mozilla.bugs)

Would you mind explaining what those GetURL events are, and are used for? Not sure if I get this right from the code comment as referenced in your last comment. So what I think... When a user clicks a URL eg in the Terminal and Firefox is not running yet, a new instance is created and those GetURL events are processed to append possible URLs as command line args for being opened in individual tabs?

Flags: needinfo?(spohl.mozilla.bugs)

(In reply to Henrik Skupin (:whimboo) [⌚️UTC+2] (away 12/07 - 12/11) from comment #5)

Would you mind explaining what those GetURL events are, and are used for? Not sure if I get this right from the code comment as referenced in your last comment. So what I think... When a user clicks a URL eg in the Terminal and Firefox is not running yet, a new instance is created and those GetURL events are processed to append possible URLs as command line args for being opened in individual tabs?

Yes, that's the gist of it.

Flags: needinfo?(spohl.mozilla.bugs)

I observed this for a while when running mochitests, marionette tests, and webdriver tests. As it looks like this delay only happens for me when running the webdriver (web platform) tests. I could try with Instruments the next days in case this would give more details?

(In reply to Stephen A Pohl [:spohl] from comment #4)

Another approach might be to skip processing of GetURL events when run as part of this test suite, assuming that those aren't needed for the tests.

Could this be done via a preference?

Flags: needinfo?(spohl.mozilla.bugs)

I added a pref of widget.macos.disable-geturl-processing-at-startup and kicked off a try build. Are you able to check if you are no longer seeing this delay with the try build?

https://treeherder.mozilla.org/jobs?repo=try&revision=20ed7b38ff8a9a1644c5f4b63d3b7c1cbafc92c1

Flags: needinfo?(spohl.mozilla.bugs) → needinfo?(hskupin)

So I tried to do a before/after check and it's strange but I'm not able to reproduce the problem today. I will have a look early next week again. As such keeping the needinfo.

After such a long time I have seen this delay yesterday again, but only once. Maybe it depends on how long a machine hasn't been shutdown or restarted. I will keep watching and once it happens more often I will check your try build.

Blocks: 1650025

Maybe there is a better chance to detect that when we got bug 1726465 fixed.

Depends on: 1726465
Flags: needinfo?(hskupin)

I think that I actually got a gecko profile for this particular scenario for a Firefox startup on Windows. It's not a 5s delay but still around 3s of inactivity before the browsing window is shown.

https://share.firefox.dev/3ecqAG7

Stephen and Florian, could you maybe have a look? Note that this profile is still with a build with the background hang monitor changes from bug 965392 not being backed out. Thanks.

Flags: needinfo?(spohl.mozilla.bugs)
Flags: needinfo?(florian)

Also please note that the browser-idle-startup-tasks-finished notification is send after 13s! So it's a quite slow startup of Firefox.

Finally I was also able to get a profile with startup filters set to * for all the threads: https://share.firefox.dev/3wKkEdE

(In reply to Henrik Skupin (:whimboo) [⌚️UTC+2] from comment #13)

I think that I actually got a gecko profile for this particular scenario for a Firefox startup on Windows. It's not a 5s delay but still around 3s of inactivity before the browsing window is shown.

https://share.firefox.dev/3ecqAG7

Stephen and Florian, could you maybe have a look?

There's a composite taking 3.7s on the Renderer thread on the GPU process: https://share.firefox.dev/3pYh5wH
The machine has ⁨⁨2⁩ physical cores⁩ (4⁩ logical cores⁩). The parent process main thread has a higher priority than the GPU process Renderer thread, and during that time we create 4 content processes and have activity in the WebExtension process, all at the same priority as the GPU process Renderer thread, so it's likely the composite could happen significantly faster if we increased the priority of the renderer thread.

Please note that such a delay doesn't always happen. It's very infrequent so not sure what's going on. I'm happy to also record a profile when I do not see such a delay in case of it would help to identify the cause.

I talked to Florian and as he told me Bas might have a patch that I could try soon that should fix this issue. Moving the needinfo flag to him.

Flags: needinfo?(spohl.mozilla.bugs)
Flags: needinfo?(florian)
Flags: needinfo?(bas)

Here's the build and the patch I promised Florian. Bit of a hack but should do the trick.

https://treeherder.mozilla.org/jobs?repo=try&revision=6fd11a4056cd3c4270470623cde0e4b1f3cd42a8

Flags: needinfo?(bas)

(In reply to Bas Schouten (:bas.schouten) from comment #19)

Here's the build and the patch I promised Florian. Bit of a hack but should do the trick.

https://treeherder.mozilla.org/jobs?repo=try&revision=6fd11a4056cd3c4270470623cde0e4b1f3cd42a8

Here is a warm startup profile of the try build on the 2018 reference laptop (the slow Dell one) https://share.firefox.dev/3EmHexv

As you can see, the "Current thread priority" is 12 or 13 on the Renderer thread. The parent process main thread has a current thread priority of 12 once it has a window (before that, the priority varies between 9 and 12).

There's a very big gap in the sampling in the parent process, the WebExtension process and the preallocated content process, but we can see from the awake markers that they ran anyway, it's just the profiler sampler that likely didn't get any CPU. I have never seen a gap of this size in the sampling, so it's likely a side effect of the GPU process having higher priority. Might still be an acceptable tradeoff if it means things gets painted faster.

Flags: needinfo?(bas)

Florian I assume that I do not have to test anything before your questions for Bas haven't been answered?

Flags: needinfo?(florian)

Henrik, is this bug still relevant?

Flags: needinfo?(hskupin)

We actually switched in Marionette to use the final-ui-startup notification. Hereby we can also see delays at least for CI jobs. This is covered by bug 1794745.

At least on MacOS I haven't seen a delay of such type anymore for quite some time. So I would propose to close this bug as WFM. I'll reopen in case I have some reproducible testcase or a recorded profile from such a delay during startup.

Status: NEW → RESOLVED
Closed: 2 years ago
Flags: needinfo?(hskupin)
Flags: needinfo?(florian)
Flags: needinfo?(bas)
Resolution: --- → WORKSFORME
See Also: → 1794745
Attachment #9202708 - Attachment is obsolete: true
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: