Profiler should handle timeout when gathering child process profiles
Categories
(Core :: Gecko Profiler, defect, P1)
Tracking
()
People
(Reporter: mozbugz, Assigned: mozbugz)
References
(Blocks 1 open bug)
Details
Some tests expect for a full async profile, e.g.: https://searchfox.org/mozilla-central/rev/0688ffdef223dac527c2fcdb25560118c4e4df51/browser/components/tests/startupRecorder.js#212,217
Sometimes the test times out.
:florian was able to force-capture a shutdown profile of the main process: https://perfht.ml/3cAAZp8
Notice the 3s gap in samples, starting around the 10s mark, very visible in the Compositor track, this is (currently) normal while the main process is processing the profile buffer in the main thread.
Now, looking at the Marker Table, filtered with "1476":
- At 10.334s, there is a "PProfiler::Msg_GatherProfile — sent to 1476", this is the main process requesting a profile from sub-process 1476.
- At 6.853s, this is the last message received from 1476.
So the profiler waits for a profile that will never arrives!
Filtering the marker table with "GatherProfile" shows that 3 other processes provided their profile around 13.4s.
Some possible actions I can think of:
A. The test itself could have a timeout, in which case it could do what :florian did manually, i.e., set the MOZ_PROFILER_SHUTDOWN=...
env-var and force a Firefox shutdown to capture some profiling for later analysis.
Bug 1528859 should help capturing other processes than just the parent; though if one process is really frozen it probably won't respond anyway -- but it would be really nice if there was a way to force a shutdown-profile dump, maybe with a signal?
B. The profile-gathering logic inside Firefox could have its own timeout, so it would provide a partial profile and the test would end normally. However this may hide freezing issues! So it would be good to somehow record that some processes did not respond.
(Other suggestions welcome.)
Comment 1•5 years ago
|
||
(In reply to Gerald Squelart [:gerald] (he/him) from comment #0)
B. The profile-gathering logic inside Firefox could have its own timeout, so it would provide a partial profile
I think this would be really nice for cases where Firefox is misbehaving and a content process is not responding, outside of tests.
and the test would end normally. However this may hide freezing issues! So it would be good to somehow record that some processes did not respond.
We would probably still want tests to fail, so it would be nice to at least have a warning or an error reported somehow. But one test in the folder reads the startup profile to look at main thread I/O done by content processes, so if there's no content process at all in the profile, we'll already have a test failure.
Assignee | ||
Comment 2•3 years ago
|
||
This seems to happen more frequently these days, so I'm increasing the priority.
Assignee | ||
Updated•3 years ago
|
Assignee | ||
Comment 3•3 years ago
|
||
(In reply to Gerald Squelart [:gerald] (he/him) from comment #0)
B. The profile-gathering logic inside Firefox could have its own timeout, so it would provide a partial profile and the test would end normally. However this may hide freezing issues! So it would be good to somehow record that some processes did not respond.
Bug 1673513 just landed, it added a better timeout mechanism (waiting for as long as child processes keep progressing in their profile-gathering work).
For now, frozen processes are just silently dropped. Bug 1752930 will add information about these.
Assignee | ||
Comment 4•3 years ago
|
||
Bug 1673513 added good handling of timeouts when gathering child profiles.
Bug 1673513 added logging about child profiles that are discarded.
(And bug 1752930 now focuses on exit profiles, which are not relevant to this bug's initial goals.)
So I'm calling this bug effectively fixed.
Description
•