Open Bug 1635810 Opened 4 years ago Updated 1 year ago

[meta] LUL initialization takes too long, slows down startup profiling and content process startup

Categories

(Core :: Gecko Profiler, defect, P2)

All
Linux
defect

Tracking

()

People

(Reporter: mstange, Unassigned)

References

(Depends on 2 open bugs, Blocks 1 open bug)

Details

(Keywords: meta, Whiteboard: [fxp])

On Linux and Android arm64, we use LUL for stackwalking. LUL's initialization takes a very long time. This manifests in the following ways:

  • Manually starting the profiler for the first time causes the browser to be unresponsive for a short time.
  • Content processes that start up during profiling are delayed. This is more serious with Fission.
  • It's most serious when profiling Firefox startup: During startup, many processes are launched, and each of them hits the initialization overhead: parent, GPU, network, add-ons, content. With multiple processes interacting, this delay distorts the profiles; we're now profiling an unrealistic scenario.

Let's use this bug as a meta bug and file individual bugs for specific mitigations of this problem.

Depends on: 1635811
Blocks: 1329212
Summary: [meta] LUL initialization takes too long, slows down startup profiling → [meta] LUL initialization takes too long, slows down startup profiling and content process startup

Based on a Pernosco trace, in bug 1653473 comment 17 I believe I've shown that LUL initialization can take tens of seconds in some tests, causing them to fail intermittently. This could affect almost any test on LInux that relies on profiles from web content processes.

Depends on: 1709123

Hi Julian, it looks like we're running into a fairly fundamental assumption of LuL that's turning out to be problematic: The assumption that the initial conversion time into the optimized LuL format doesn't matter, and that we only need to optimize the time it takes to walk the stack during sampling. I think there are a number of different ways forward, and I'd love to hear your input on this!

I can imagine the following solutions:

  • Accept that we need to do the conversion once, but try to really only do it once per library, for example by caching the LuL representation on disk (bug 1635811).
  • Find ways to optimize the initial conversion.
  • Switch to a model that's at the other end of the spectrum: Do no work upfront and parse dwarf during unwinding
  • Switch to a hybrid model where we gradually build up an optimized representation of the dwarf data as we encounter new functions during sampling. This would do memory allocation during sampling (but after the sampled thread has been resumed - we're already copying the sampled thread's stack into a buffer anyway).

Thoughts?

Flags: needinfo?(jseward)

Profile of Lul initialization on a local Firefox build: https://share.firefox.dev/3If1ekq

Depends on: 1754932
See Also: → 1635812

Work to improve this is in progress. See in particular bug 1754932.

Flags: needinfo?(jseward)
You need to log in before you can comment on or make changes to this bug.