[meta] LUL initialization takes too long, slows down startup profiling and content process startup
Categories
(Core :: Gecko Profiler, defect, P2)
Tracking
()
People
(Reporter: mstange, Unassigned)
References
(Depends on 2 open bugs, Blocks 2 open bugs)
Details
(Keywords: meta, Whiteboard: [fxp])
On Linux and Android arm64, we use LUL for stackwalking. LUL's initialization takes a very long time. This manifests in the following ways:
- Manually starting the profiler for the first time causes the browser to be unresponsive for a short time.
- Content processes that start up during profiling are delayed. This is more serious with Fission.
- It's most serious when profiling Firefox startup: During startup, many processes are launched, and each of them hits the initialization overhead: parent, GPU, network, add-ons, content. With multiple processes interacting, this delay distorts the profiles; we're now profiling an unrealistic scenario.
Let's use this bug as a meta bug and file individual bugs for specific mitigations of this problem.
Reporter | ||
Updated•4 years ago
|
Based on a Pernosco trace, in bug 1653473 comment 17 I believe I've shown that LUL initialization can take tens of seconds in some tests, causing them to fail intermittently. This could affect almost any test on LInux that relies on profiles from web content processes.
Reporter | ||
Comment 2•3 years ago
|
||
Hi Julian, it looks like we're running into a fairly fundamental assumption of LuL that's turning out to be problematic: The assumption that the initial conversion time into the optimized LuL format doesn't matter, and that we only need to optimize the time it takes to walk the stack during sampling. I think there are a number of different ways forward, and I'd love to hear your input on this!
I can imagine the following solutions:
- Accept that we need to do the conversion once, but try to really only do it once per library, for example by caching the LuL representation on disk (bug 1635811).
- Find ways to optimize the initial conversion.
- Switch to a model that's at the other end of the spectrum: Do no work upfront and parse dwarf during unwinding
- Switch to a hybrid model where we gradually build up an optimized representation of the dwarf data as we encounter new functions during sampling. This would do memory allocation during sampling (but after the sampled thread has been resumed - we're already copying the sampled thread's stack into a buffer anyway).
Thoughts?
Reporter | ||
Comment 3•3 years ago
|
||
Profile of Lul initialization on a local Firefox build: https://share.firefox.dev/3If1ekq
Comment 4•3 years ago
|
||
Work to improve this is in progress. See in particular bug 1754932.
Updated•2 years ago
|
Description
•