runUiThreadCallback blocks first frame & visual completeness for 44ms in COLD MAIN start up
Categories
(Core :: Graphics, defect, P3)
Tracking
()
People
(Reporter: mcomella, Unassigned)
References
Details
(Whiteboard: [geckoview:m89][geckoview:m90][geckoview:m91])
Attachments
(1 file)
280.98 KB,
image/png
|
Details |
In this COLD MAIN start up (to homescreen) profile from the Moto G5 https://share.firefox.dev/3k7tNW0, we see that GeckoView.runUiThreadCallback
runs on the main thread for 44ms calling native code. This occurs before the first frame is drawn, delaying it and eventual visual completeness. See the screenshot for where this occurs in the profile: runOnUiThreadCallback
is circled and the first frame is the rightmost point.
Agi has a few ideas what's happening here:
Agi Sferro
mcomella: looks like it's some sort of synchronization method? https://searchfox.org/mozilla-central/rev/281009d3b7e1e1496b9524d5478ff4f0b7369573/widget/android/AndroidUiThread.cpp#251-256 it predates me but I can look into why and what that's doing
from the comment it sounds like it needs to run before everything but I don't understand why yetAgi Sferro
interesting
it sounds like it was introduced in 2016 for the GPU process https://bugzilla.mozilla.org/show_bug.cgi?id=1319850 but we never ended up supporting that
heh yeah it looks like it's not used: https://searchfox.org/mozilla-central/search?q=symbol:_ZN7mozilla29GetAndroidUiThreadMessageLoopEv&redirect=false
mcomella: I think we can rip it out,
44ms is a big win for us so we should address this if we can. In the best case, addressing it would mean either removing it, delaying it until after visual completeness (either the homescreen or the first page load), or not executing it on the main thread.
Reporter | ||
Comment 1•3 years ago
|
||
Tracking for fenix in https://github.com/mozilla-mobile/fenix/issues/18093
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Comment 2•3 years ago
|
||
Essentially we're seeing the following:
- The app starts a new
GeckoSession
which creates thensWindow
&c; - On the Gecko main thread,
nsWindow::Create
callsCreateLayerManager
, which eventually takes us intoUiCompositorControllerChild::CreateForSameProcess
, which enqueuesUiCompositorControllerChild::OpenForSameProcess
onto the Android UI thread; - The Android UI thread is busy doing a bunch of Fenix UI stuff, and doesn't get a chance to run
UiCompositorControllerChild::OpenForSameProcess
until the UI thread processes enough events to reach theGeckoThread.runUiThreadCallback
, which finally executes the runnable; UiCompositorControllerChild::OpenForSameProcess
synchronously waits on a monitor while itsMessageChannel
is set up to communicate with the compositor thread. This is what's janking the UI thread.
Jamie, do either you or somebody else in gfx have any ideas about how we can avoid this?
Comment 3•3 years ago
|
||
The UiCompositorController is needed to communicate between the android UI thread and the compositor thread. For pausing/resuming composition, resizing the surface, etc. So creating it before first paint seems unavoidable. But 44ms does seem excessive. What is the compositor thread doing during that time?
Comment 4•3 years ago
|
||
I'll look that up (leaving a ni? for myself, but in the meantime, I think we should move this over to gfx).
Comment 5•3 years ago
|
||
Hmmm... Comment 2 is accurate, but I only see that when directly loading a page, and not when loading the homescreen. And it isn't taking 40-some ms either.
Starting today's Fenix Nightly on my Pixel 4A, I obtained this profile while cold starting to the homescreen:
https://share.firefox.dev/35hVi91
I do see a few samples in runUiThreadCallback
, but nothing like what we see in the original profile in the above bug description. Michael, are you still seeing this on your end?
Reporter | ||
Comment 6•3 years ago
|
||
On my Moto G5, I still see this in the latest nightly. Here are profiles:
- 35ms COLD VIEW https://share.firefox.dev/2SkLiZA
- 25ms COLD MAIN https://share.firefox.dev/3d8hiat
The timing is noticeably shorter (though still a good improvement for us). Ideally we could benchmark the time runUIThreadCallback
to get a consistent, trustworthy value rather than relying on one-off profiles which don't handle variance well. Unfortunately, I'm not too familiar with the C++ code so I can't write this benchmark myself but let me know if you want to work together to do that.
Starting today's Fenix Nightly on my Pixel 4A,
The Pixel 4A would likely be considered a high-end device – do you have any slower devices to test on? For context, the Pixel 2 is also considered high end-ish from device distributions in our user base.
I obtained this profile while cold starting to the homescreen:
https://share.firefox.dev/35hVi91
I notice the runUIThreadCallback
segment here doesn't block the first frame being drawn (you can see where the first frame is drawn with the onPreDraw
marker on the Android UI thread). The equivalent segment appears in my COLD VIEW profile and it takes 294ms. Since there's nothing else happening on the main thread at this time, it seems okay though concerningly long – I wonder what's running then.
Comment 7•3 years ago
|
||
I notice in the COLD VIEW profile in comment 6 that runUIThreadCallback
calls android.graphics.Bitmap.copyPixelsFromBuffer
, which I believe means we are in LayerViewSupport::RecvScreenPixels()
. Immediately prior to this I can see that the webrender renderer thread renders its first frame, which is always slow on a cold start due to having to compile all the shaders. Then it renders a frame and does a readback (which makes sense since we have requested a screenshot).
This readback could be made more efficient on the GPU/renderer thread:
- I believe we are running in to an Adreno slow path to do with the stride of the data being read
- We could also do the whole thing asynchronously using a PBO, rather than reading synchronously in to a CPU buffer
The profiler screenshots path has worked around both of those issues, for example. However, it seems like we care about the UI thread here.
It's a bit tricky to see what is taking all the time in runUIThreadCallback
since it seems to only show the Java functions. (Is there a way to show both C++ and java functions for a thread in the profiler?). I hope that it's not waiting for the renderer thread to perform the readback, but looking at the timings on the profile thankfully I don't think that's the case. So if I had to guess, I would say it's probably copying the screenshot pixels from one buffer to another in RecvScreenPixels
, and perhaps flipping them in FlipScreenPixels()
. And perhaps when we "complete" the GeckoResult, does that synchronously do some work in Fenix, or is that done asynchronously?
We could probably improve this by doing the Y-Flip and any required downscaling on the GPU/renderer thread, rather than on the CPU on the UI thread. This is also already done for the profiler screenshots. However, the UI thread will still have to copy some pixels in to the java Bitmap object. Are we deliberately taking a screenshot at this time? Can we delay the screenshot until the main thread is not as busy or doing less important work?
Comment 8•3 years ago
|
||
Oh, and can you verify that not taking a screenshot makes this issue disappear? Because the above comment is purely conjecture, so if we can verify it that would be a good start!
Reporter | ||
Comment 9•3 years ago
|
||
I notice in the COLD VIEW profile in comment 6 that runUIThreadCallback calls android.graphics.Bitmap.copyPixelsFromBuffer,
There are three places runUiThreadCallback
is called:
- Before the first frame (let's discuss in this bug)
- At an arbitrary point for 295ms (I just filed bug 1720406 for this)
- At an even later point for 126ms (seems similar to the one above so I didn't file separately)
I think the two bugs are of different priorities. I'm more concerned about the first frame bug because it blocks visual completeness, making it longer for the user to see any content from Firefox and making it feel slower. I think :jnicol is talking about the second bug – let's further the discussion around that bug in the newly filed issue.
(Is there a way to show both C++ and java functions for a thread in the profiler?).
:canova, do you know why the GeckoView C++ isn't shown in profiles? Have we not instrumented that thread before?
Clearing NI: I hope I answered all the questions! If not, NI me again.
Reporter | ||
Updated•3 years ago
|
Comment 10•3 years ago
|
||
Oh right, sorry I missed that first call. Without knowing what is actually happening during that first callback it's hard to see how this is actionable. Perhaps we could add a profiler annotation whenever anything is scheduled on the UI thread.
Comment 11•3 years ago
|
||
The native stack for the Android UI is treated distinctly from the Java stack. You need to include AndroidUI
in the list of threads to profile when you're starting the profiler.
Updated•2 years ago
|
Description
•