Talos reports 50% slower content process startup on MacOS 14.70 compared to MacOS 10.15
Categories
(Core :: DOM: Content Processes, defect)
Tracking
()
People
(Reporter: whimboo, Unassigned, NeedInfo)
References
(Depends on 1 open bug, )
Details
(Keywords: perf)
(In reply to Henrik Skupin [:whimboo][⌚️UTC+2] from bug 967525 comment #107)
Haik, one thing I noticed is that on our Mac minis in CI we still have a higher content process creation time of 200ms compared to the 60ms locally on my M4.
With bug 967525 fixed we can now check on a CI machine like the existing loaner for bug 967525 and create a Gecko profile.
Also note that when we switched from macOS 10.15 to 14.7 we as well saw a 50% increase of the content process creation time.
| Reporter | ||
Updated•2 months ago
|
| Reporter | ||
Comment 1•2 months ago
|
||
I'll get access to the machine within the next couple of days. Ryan will take care of it.
| Reporter | ||
Comment 2•2 months ago
|
||
We have macOS workers with 15.30 (M4 chipset) for CI and as well run various tests. It might be good to actually trigger some tests with the custom binary from bug 1967525. That way we can better compare the values with my MBP M4 locally.
| Reporter | ||
Comment 3•2 months ago
|
||
I pushed a try build for macOS 15.30 and 14.7 to get a comparision for mochitest and browser-chrome tests:
https://treeherder.mozilla.org/jobs?repo=try&revision=20682552afa439db5d533394a9df2be2042b39a0
Joel, are there plans to also run Talos jobs on the 15.30 machines? That would be good to get some graphs on perfherder.
yes, when we get more capacity. I think browsertime will be first unless we do get an extra 50 machines online for testing, then we can do browsertime and talos.
| Reporter | ||
Comment 5•2 months ago
|
||
Just some numbers as recorded by the try build...
- The content process startup time for the M4 machines is blazingly fast! Based on the log entries these are around 15ms only!! I wonder if that is true, but cannot verify right now and we do not have Talos tests to compare with. Here an excerpt:
[task 2025-08-14T08:46:52.805+00:00] 08:46:52 INFO - GECKO(1073) | 1755161212803 XXX pid: 1079, type: 2, nsAppShell init time: 8 milliseconds
[task 2025-08-14T08:46:53.033+00:00] 08:46:53 INFO - GECKO(1073) | 1755161213032 XXX pid: 1081, type: 2, nsAppShell init time: 8 milliseconds
[task 2025-08-14T08:46:54.598+00:00] 08:46:54 INFO - GECKO(1073) | 1755161214593 XXX pid: 1085, type: 2, nsAppShell init time: 12 milliseconds
[task 2025-08-14T08:46:54.654+00:00] 08:46:54 INFO - GECKO(1073) | 1755161214651 XXX pid: 1086, type: 2, nsAppShell init time: 23 milliseconds
- When I check the content process creation time for a mochitest media job on macOS 14.7 I see around 50ms:
[task 2025-08-14T08:50:53.376+00:00] 08:50:53 INFO - GECKO(1298) | 1755161453374 XXX pid: 1309, type: 2, nsAppShell init time: 46 milliseconds
[task 2025-08-14T08:50:53.621+00:00] 08:50:53 INFO - GECKO(1298) | 1755161453618 XXX pid: 1312, type: 2, nsAppShell init time: 44 milliseconds
[task 2025-08-14T08:50:55.329+00:00] 08:50:55 INFO - GECKO(1298) | 1755161455326 XXX pid: 1317, type: 2, nsAppShell init time: 53 milliseconds
[task 2025-08-14T08:50:55.374+00:00] 08:50:55 INFO - GECKO(1298) | 1755161455367 XXX pid: 1318, type: 2, nsAppShell init time: 56 milliseconds
[task 2025-08-14T08:50:55.429+00:00] 08:50:55 INFO - GECKO(1298) | 1755161455427 XXX pid: 1319, type: 2, nsAppShell init time: 58 milliseconds
[task 2025-08-14T08:50:55.442+00:00] 08:50:55 INFO - GECKO(1298) | 1755161455439 XXX pid: 1320, type: 2, nsAppShell init time: 53 milliseconds
[task 2025-08-14T08:50:56.723+00:00] 08:50:56 INFO - GECKO(1298) | 1755161456719 XXX pid: 1323, type: 2, nsAppShell init time: 47 milliseconds
[task 2025-08-14T08:51:00.910+00:00] 08:51:00 INFO - GECKO(1298) | 1755161460908 XXX pid: 1337, type: 2, nsAppShell init time: 40 milliseconds
[task 2025-08-14T08:51:01.861+00:00] 08:51:01 INFO - GECKO(1298) | 1755161461858 XXX pid: 1338, type: 2, nsAppShell init time: 48 milliseconds
Now I wonder why Talos tests show such a high value for the content process creation. This doesn't match with what I can see above for other tests. Maybe there is a bug with the calculation?
| Reporter | ||
Comment 6•2 months ago
|
||
I pushed a Wd job for the same shippable build and surprisingly it showed timings of around 50ms only:
https://treeherder.mozilla.org/logviewer?job_id=522388310&repo=try&lineNumber=977
[task 2025-08-14T14:57:59.438+00:00] 14:57:59 INFO - PID 1320 | 1755183479435 XXX pid: 1336, type: 2, nsAppShell init time: 43 milliseconds
[task 2025-08-14T14:57:59.638+00:00] 14:57:59 INFO - PID 1320 | 1755183479636 XXX pid: 1339, type: 2, nsAppShell init time: 39 milliseconds
[task 2025-08-14T14:58:01.647+00:00] 14:58:01 INFO - PID 1320 | 1755183481645 XXX pid: 1357, type: 2, nsAppShell init time: 38 milliseconds
[task 2025-08-14T14:58:01.650+00:00] 14:58:01 INFO - PID 1320 | 1755183481647 XXX pid: 1358, type: 2, nsAppShell init time: 37 milliseconds
[task 2025-08-14T14:58:01.849+00:00] 14:58:01 INFO - PID 1320 | 1755183481832 XXX pid: 1359, type: 2, nsAppShell init time: 38 milliseconds
[task 2025-08-14T14:58:03.776+00:00] 14:58:03 INFO - PID 1320 | 1755183483774 XXX pid: 1363, type: 2, nsAppShell init time: 41 milliseconds
[task 2025-08-14T14:58:04.293+00:00] 14:58:04 INFO - PID 1320 | 1755183484291 XXX pid: 1366, type: 2, nsAppShell init time: 38 milliseconds
[task 2025-08-14T14:58:04.333+00:00] 14:58:04 INFO - PID 1320 | 1755183484331 XXX pid: 1367, type: 2, nsAppShell init time: 38 milliseconds
[task 2025-08-14T14:58:04.863+00:00] 14:58:04 INFO - PID 1320 | 1755183484861 XXX pid: 1369, type: 2, nsAppShell init time: 39 milliseconds
[task 2025-08-14T14:58:05.049+00:00] 14:58:05 INFO - PID 1320 | 1755183485018 XXX pid: 1370, type: 2, nsAppShell init time: 49 milliseconds
[task 2025-08-14T14:58:05.051+00:00] 14:58:05 INFO - PID 1320 | 1755183485048 XXX pid: 1371, type: 2, nsAppShell init time: 42 milliseconds
[task 2025-08-14T14:58:05.346+00:00] 14:58:05 INFO - PID 1320 | 1755183485345 XXX pid: 1373, type: 2, nsAppShell init time: 34 milliseconds
Could it be that Talos jobs set some specific preference other test jobs don't do or vise versa? We should check that.
| Reporter | ||
Updated•2 months ago
|
| Reporter | ||
Comment 7•2 months ago
|
||
When running checks locally I saw the following in the logs:
15:33:31 INFO - PID 80148 | 1755272011815 XXX pid: 80172, type: 2, nsAppShell init time: 6 milliseconds
15:33:31 INFO - PID 80148 | Cycle 1(13): loaded http://127.0.0.1:63727/tests/cpstartup/cpstartup.html (next: http://127.0.0.1:63727/tests/cpstartup/cpstartup.html)
[..]
15:33:38 INFO - PID 80148 | [#0] content-process-startup Cycles:20 Average:66.95 Median:68.00 stddev:4.15 (6.1%) stddev-sans-first:3.13
15:33:38 INFO - PID 80148 | Values: 55.0 70.0 68.0 73.0 69.0 69.0 68.0 69.0 57.0 68.0 68.0 67.0 66.0 67.0 68.0 68.0 70.0 67.0 64.0 68.0
It shows that the real content process startup time is about 6 milliseconds, but for Talos we measure the 10th amount of it. This made me wonder what Talos actually measures. I found the documentation of this kind of test at https://firefox-source-docs.mozilla.org/testing/perfdocs/talos.html#cpstartup. And reading what it actually measures explains why it takes longer:
Time from opening a new tab (which creates a new content process) to having that new content process be ready to load URLs.
So that value is different and includes as well all the extra time as needed to open a tab. That now means that we cannot compare the values from Talos with the real timing for content process creation.
As such lets use this bug (similar to bug 1938328 for the parent process startup regression) to handle the 50% slow down when we upgraded our macOS workers from 10.15 to 14.70.
I'll try to get a gecko profile for the Talos job, which may help us to see what's taking longer. Sadly we don't have a 10.15 machine anymore to compare with.
| Reporter | ||
Comment 8•2 months ago
|
||
I triggered a Talos profiler run as a CI job.
Two profiles were created due to 2 cycles were run:
https://share.firefox.dev/3UAWPAF
https://share.firefox.dev/3HEZWo9
Haik, or Markus could one of you may take a look please? I do not see anything outstanding. Maybe some timings for the tab opening or other stuff has been changed?
Comment 9•2 months ago
|
||
Clearing the needinfo. Lmk if you need anything else :whimboo
Comment 10•2 months ago
|
||
(In reply to Henrik Skupin [:whimboo][⌚️UTC+2] from comment #8)
I triggered a Talos profiler run as a CI job.
Two profiles were created due to 2 cycles were run:
https://share.firefox.dev/3UAWPAF
https://share.firefox.dev/3HEZWo9Haik, or Markus could one of you may take a look please? I do not see anything outstanding. Maybe some timings for the tab opening or other stuff has been changed?
The things that stick out to me are the NSApplication init and sandbox_init_with_parameters calls. Bug 1983178 may help with NSApplication init runtime. For the sandbox_init_with_parameters call, I haven't investigated how we might speed that up. We might be able to by making the policy shorter by removing old rules we no longer need and comments. When I've tested locally on more modern machines, the runtime was negligible.
| Reporter | ||
Updated•1 month ago
|
| Reporter | ||
Comment 11•1 month ago
|
||
(In reply to Ryan Curran from comment #9)
Clearing the needinfo. Lmk if you need anything else :whimboo
Hi Ryan, please put back the macOS worker assigned to me into the pool. I don't need it at the moment. Thanks.
Description
•