Open Bug 1899618 Opened 4 months ago Updated 2 months ago

Consider supporting thread creation/destruction callbacks to register/unregister threads when building in Gecko

Categories

(Data Platform and Tools :: Glean: SDK, enhancement, P3)

enhancement

Tracking

(Not tracked)

People

(Reporter: chutten, Unassigned, NeedInfo)

Details

The GeckoProfiler is pretty great, but it doesn't know about any threads that haven't been registered to it. Or, well, it doesn't get to know enough for true awesomeness to occur.

To register and deregister a thread we need to call PROFILER_REGISTER_THREAD(aThreadName) and PROFILER_UNREGISTER_THREAD(), or the register_thread/unregister_thread fns from the profiler's rust-api crate. Since Glean doesn't always build with gecko (in fact, it mostly doesn't), and the profiler's rust-api crate isn't published to crates.io, I think that means we need to call the C++ functions instead.

Wrinkles I foresee:

  • Threads created before init. Pretty sure dispatcher does it, not sure what other threads might be kicking around before init (which I'm treating as the obvious place to tell Glean a) That it should register threads, and b) How to register threads)
  • Android. Kotlin inits the SDK in Android, and I don't know if it can find those symbols to pass into init.

This means you need to store the callback, make it side-effect free, and call it in various locations, it's can be a bit of plumbing, but it's essentially trivial. If need to have this working faster, because it is side-effect free, and because the profiler is thread safe, you can shove the two function pointer into a global in your program and get this working in a few minutes.

For java, we need to expose the register thread function somehow, it's fine, Nazım will know what path to take. It's possible that it's just a matter of exposing it like https://searchfox.org/mozilla-central/source/mobile/android/geckoview/src/main/java/org/mozilla/gecko/GeckoJavaSampler.java#783-784, since it's a free function with a C ABI.

Flags: needinfo?(canaltinova)

Indeed the Rust API is not published as a crate but there is no blockers for us to do that. I think it should be possible to do so. Rust-API crate has this enabled feature that you can disable when it's not linked to Gecko, which will make all the calls no-op. But if you wanna be faster, you can use these raw FFI calls: https://searchfox.org/mozilla-central/rev/23efe2c8c5b3a3182d449211ff9036fb34fe0219/tools/profiler/public/ProfilerBindings.h#39-40 (usage)

For JVM threads, we don't have a way to register them. But the good thing is you shouldn't need to register them :). JVM sampler works a bit differently. When we are starting it, we are getting all the active threads and then filter the ones that match the profiler thread filter that's inside about:profiling. So you can try adding the thread names you would like to profile to this thread filter which is a comma separated list.

There is one caveat though, the thread might not be initialized at the time of profiler start. After the profiler start we don't really check them again to see if there are new threads. I think that was because it was quite a bit overhead to get all the threads, but we might want to still do that every 10-100interval or so. Do you know if that's the case for these threads?

Flags: needinfo?(canaltinova)

NI chutten for Nazım's questions.

Flags: needinfo?(chutten)

Glean doesn't have JVM threads independent from Fenix, to my knowledge, so we're safe on that side.

As for whether we start and stop threads while the profiler is running... well, I'm not sure what that means. I can audit for you the entirety of all the thread behaviour in the Glean SDK if that helps. We start six threads in the SDK:

  • glean.upload is created and started when there are things to upload and allowed to end after there's nothing more to do. This comes up near Glean init (which is near startup) and also occasionally as pings are submitted at runtime.
  • glean.dispatcher is created when the global Dispatcher is created which is the first time it's referenced which is the first time something is launched to it which is the first time any metric has any operation taken on it. As a result, this thread is probably created very very early in startup as people like recording things during startup (since it's so important). It shouldn't exit until shutdown.
    • This might be the most important Glean SDK thread to profile
  • glean.init is somewhat short-lived and is kicked off early during startup, running until Glean finishes init (loading from disk, flushing the preinit queue).
    • This thread might be important for startup perf
  • glean.shutdown is somewhat short-lived and is kicked off during shutdown to ensure we don't go more than 30s waiting for pending upload tasks to complete.
    • This thread is not all that important for profiles, I'll bet.
  • glean.mps is a thread responsible for scheduling the "metrics" ping. It starts during init and doesn't exit until the process dies.
    • It does very little and is only presently in use on Desktop, so it's probably not all that interesting in profiles.
  • glean.ping_directory_manager.process_dir is another thread spawned during init. This one lives long enough to read the data storage dir into memory, and then ends.
    • Not as important to startup perf as glean.init

If I understand "the thread might not be initialized at the time of profiler start" to mean any threads started after some user has turned on profiling, then any of these threads could've been started after the profiler started (if e.g. the profiler starts as Fenix does, to profile startup). I'm not sure that interpretation is correct, though.

Flags: needinfo?(chutten) → needinfo?(canaltinova)

Fwiw I recently modified Glean to call gecko_profiler_register_thread for all thread creations when build within Gecko (didn't push the code yet),
but this then crashes Fenix as soon as Glean starts, most likely because we load libxul and some state is not yet set up correctly (bug 1892234)

Priority: -- → P3
You need to log in before you can comment on or make changes to this bug.