Missing content process data when profiling speedometer
Categories
(Core :: Gecko Profiler, defect, P3)
Tracking
()
Tracking | Status | |
---|---|---|
firefox84 | --- | fixed |
People
(Reporter: agi, Assigned: mozbugz)
References
Details
Attachments
(1 file)
I'm profiling on Android, not sure if it matters.
Run speedometer:
./mach raptor -t raptor-speedometer --app=geckoview --binary=org.mozilla.geckoview_example --activity=org.mozilla.geckoview_example.GeckoViewActivity --no-conditioned-profile
(or navigate to https://browserbench.org/Speedometer2.0/)
and profile for about a minute. The resulting profile will only have main process data.
Assignee | ||
Comment 1•4 years ago
|
||
Agi, would you have an example profile you can share?
And could you please give some more details about how you're profiling? (Is it remote profiling from desktop Firefox? etc.)
cc: Nazim, who knows more about Android&Java profling.
Comment 2•4 years ago
|
||
This might be caused by the timeout for subprocess profiles. Speedometer creates a lot of data in the content process, and almost nothing in the parent process. And mobile devices are slower at serializing the data. So the current timeout heuristics might not work well for this case.
Agi is currently checking if a longer timeout fixes this.
Reporter | ||
Comment 3•4 years ago
|
||
This is an example profile: https://share.firefox.dev/3ovRg5i
I'm profiling a Samsung S10e connected by USB (so remote debugging/profiling) using about:debugging
, latest Firefox Desktop Nightly on MacOS for the client debugger.
Assignee | ||
Comment 4•4 years ago
•
|
||
(In reply to Markus Stange [:mstange] from comment #2)
This might be caused by the timeout for subprocess profiles...
I see, it makes sense.
Yes, the current timeout is a bit crude: 2x the parent's gathering time + 1s, reset each time we receive a child profile.
I do want to improve this, e.g., with dialog-oriented IPCs, so we'd only timeout if sub-processes don't reply at all while we're waiting for their profile (assuming a process could still handle IPCs while it's serializing its profile).
Agi is currently checking if a longer timeout fixes this.
Great, thank you Agi, this will help confirm the issue.
If correct, I guess a quick fix here would be to extend the timeout as needed.
I will file a proper bug to improve the profile-gathering IPCs.
Reporter | ||
Comment 5•4 years ago
|
||
This fixes my problem:
--- a/tools/profiler/gecko/nsProfiler.cpp
+++ b/tools/profiler/gecko/nsProfiler.cpp
@@ -909,11 +909,7 @@ RefPtr<nsProfiler::GatheringPromise> nsProfiler::StartGathering(
// We know how long it took this parent process to stream its profile, give
// the slowest child twice as long, plus a bit more. (The timer will be
// restarted after each response.)
- const uint32_t streamingTimeoutMs =
- static_cast<uint32_t>(
- (TimeStamp::NowUnfuzzed() - streamingStart).ToMilliseconds()) *
- 2 +
- 1000;
+ const uint32_t streamingTimeoutMs = 1000000;
Unused << NS_NewTimerWithFuncCallback(
getter_AddRefs(mGatheringTimer), GatheringTimerCallback, this,
streamingTimeoutMs, nsITimer::TYPE_ONE_SHOT_LOW_PRIORITY, "",
So this indeed seems like a timeout problem.
Assignee | ||
Comment 6•4 years ago
|
||
Agi, do you remember roughly how long the profile capture actually took? (I'd prefer to tweak the current timeout numbers rather than removing it.)
And can you work around this bug for now in your own work (using your local fix), or do you need a better built-in fix soon?
Note to self: :florian suggested that the parent process could use its knowledge of how much profile-buffer memory other processes use, so that in cases where content processes have much more data than the parent, the (currently) * 2
factor could be changed to a more appropriate guess.
Reporter | ||
Comment 7•4 years ago
•
|
||
Agi, do you remember roughly how long the profile capture actually took? (I'd prefer to tweak the current timeout numbers rather than removing it.)
I think about 2 minutes. Could we have an about:config pref so I don't have to recompile Firefox to debug? :)
Note to self: :florian suggested that the parent process could use its knowledge of how much profile-buffer memory other processes use, so that in cases where content processes have much more data than the parent, the (currently)
* 2
factor could be changed to a more appropriate guess.
FYI I tried to change the factor to 200
and it didn't help, the ratio between main and content process in this case is extreme.
Assignee | ||
Comment 8•4 years ago
•
|
||
(In reply to Agi Sferro | :agi | ⏰ PST | he/him from comment #7)
I think [the capture took] about 2 minutes.
Oh wow, that's quite a long time indeed!
Could we have an about:config pref so I don't have to recompile Firefox to debug? :)
Excellent idea, will do.
FYI I tried to change the factor to
200
and it didn't help, the ratio between main and content process in this case is extreme.
Wow again. If handy, could you please share that profile? -- So I can get an idea whether the data ratio between processes approaches this number.
Assignee | ||
Comment 9•4 years ago
|
||
Instead of a repeating timeout of only twice the parent's serialization time + 1s, use that double parent time and multiply it by the number of children, and add the number of seconds from the about:config preference "devtools.performance.recording.child.timeout_s" (still 1s by default).
Comment 10•4 years ago
|
||
Pushed by gsquelart@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/d2e96c2c6bfb Tweak profile gathering timeout, with pref "devtools.performance.recording.child.timeout_s" - r=gregtatum
Reporter | ||
Comment 11•4 years ago
|
||
(In reply to Gerald Squelart [:gerald] (he/him) from comment #8)
Wow again. If handy, could you please share that profile? -- So I can get an idea whether the data ratio between processes approaches this number.
Sorry for the delay, here's a profile like that: https://share.firefox.dev/2Tr37mX
Reporter | ||
Comment 12•4 years ago
|
||
Thanks for adding the pref!
Comment 13•4 years ago
|
||
bugherder |
Assignee | ||
Comment 14•4 years ago
|
||
Thank you Agi.
A quick&naive comparison of the main threads in:
Parent: JSON.stringify(profile.threads[0]).length -> 5,081,059
Web Content: JSON.stringify(profile.threads[3]).length -> 29,758,344
Indeed there is a lot more data in the sub-process, but not 200+ times 🤔 So unfortunately it means that the relative size of profiler buffers won't help as hoped in comment 6.
Bug 1673513 is really needed! But in the meantime, hopefully the new pref can be a useful workaround in these situations...
Description
•