Open Bug 1693911 Opened 3 years ago Updated 2 years ago

runUiThreadCallback blocks first frame & visual completeness for 44ms in COLD MAIN start up

Categories

(Core :: Graphics, defect, P3)

Unspecified
All
defect

Tracking

()

People

(Reporter: mcomella, Unassigned)

References

Details

(Whiteboard: [geckoview:m89][geckoview:m90][geckoview:m91])

Attachments

(1 file)

In this COLD MAIN start up (to homescreen) profile from the Moto G5 https://share.firefox.dev/3k7tNW0, we see that GeckoView.runUiThreadCallback runs on the main thread for 44ms calling native code. This occurs before the first frame is drawn, delaying it and eventual visual completeness. See the screenshot for where this occurs in the profile: runOnUiThreadCallback is circled and the first frame is the rightmost point.

Agi has a few ideas what's happening here:

Agi Sferro
mcomella: looks like it's some sort of synchronization method? https://searchfox.org/mozilla-central/rev/281009d3b7e1e1496b9524d5478ff4f0b7369573/widget/android/AndroidUiThread.cpp#251-256 it predates me but I can look into why and what that's doing
from the comment it sounds like it needs to run before everything but I don't understand why yet

Agi Sferro
interesting
it sounds like it was introduced in 2016 for the GPU process https://bugzilla.mozilla.org/show_bug.cgi?id=1319850 but we never ended up supporting that
heh yeah it looks like it's not used: https://searchfox.org/mozilla-central/search?q=symbol:_ZN7mozilla29GetAndroidUiThreadMessageLoopEv&redirect=false
mcomella: I think we can rip it out,

44ms is a big win for us so we should address this if we can. In the best case, addressing it would mean either removing it, delaying it until after visual completeness (either the homescreen or the first page load), or not executing it on the main thread.

Severity: -- → S3
Priority: -- → P2
Whiteboard: [geckoview:m89]
Priority: P2 → P1
Rank: 7
Whiteboard: [geckoview:m89] → [geckoview:m89][geckoview:m90]
Rank: 7 → 5
Whiteboard: [geckoview:m89][geckoview:m90] → [geckoview:m89][geckoview:m90][geckoview:m91]
Assignee: nobody → aklotz
Status: NEW → ASSIGNED

Essentially we're seeing the following:

  • The app starts a new GeckoSession which creates the nsWindow &c;
  • On the Gecko main thread, nsWindow::Create calls CreateLayerManager, which eventually takes us into UiCompositorControllerChild::CreateForSameProcess, which enqueues UiCompositorControllerChild::OpenForSameProcess onto the Android UI thread;
  • The Android UI thread is busy doing a bunch of Fenix UI stuff, and doesn't get a chance to run UiCompositorControllerChild::OpenForSameProcess until the UI thread processes enough events to reach the GeckoThread.runUiThreadCallback, which finally executes the runnable;
  • UiCompositorControllerChild::OpenForSameProcess synchronously waits on a monitor while its MessageChannel is set up to communicate with the compositor thread. This is what's janking the UI thread.

Jamie, do either you or somebody else in gfx have any ideas about how we can avoid this?

Flags: needinfo?(jnicol)

The UiCompositorController is needed to communicate between the android UI thread and the compositor thread. For pausing/resuming composition, resizing the surface, etc. So creating it before first paint seems unavoidable. But 44ms does seem excessive. What is the compositor thread doing during that time?

Flags: needinfo?(jnicol)

I'll look that up (leaving a ni? for myself, but in the meantime, I think we should move this over to gfx).

Assignee: aklotz → nobody
Status: ASSIGNED → NEW
Component: General → Graphics
Flags: needinfo?(aklotz)
Product: GeckoView → Core

Hmmm... Comment 2 is accurate, but I only see that when directly loading a page, and not when loading the homescreen. And it isn't taking 40-some ms either.

Starting today's Fenix Nightly on my Pixel 4A, I obtained this profile while cold starting to the homescreen:
https://share.firefox.dev/35hVi91

I do see a few samples in runUiThreadCallback, but nothing like what we see in the original profile in the above bug description. Michael, are you still seeing this on your end?

Flags: needinfo?(aklotz) → needinfo?(michael.l.comella)

On my Moto G5, I still see this in the latest nightly. Here are profiles:

The timing is noticeably shorter (though still a good improvement for us). Ideally we could benchmark the time runUIThreadCallback to get a consistent, trustworthy value rather than relying on one-off profiles which don't handle variance well. Unfortunately, I'm not too familiar with the C++ code so I can't write this benchmark myself but let me know if you want to work together to do that.

Starting today's Fenix Nightly on my Pixel 4A,

The Pixel 4A would likely be considered a high-end device – do you have any slower devices to test on? For context, the Pixel 2 is also considered high end-ish from device distributions in our user base.

I obtained this profile while cold starting to the homescreen:
https://share.firefox.dev/35hVi91

I notice the runUIThreadCallback segment here doesn't block the first frame being drawn (you can see where the first frame is drawn with the onPreDraw marker on the Android UI thread). The equivalent segment appears in my COLD VIEW profile and it takes 294ms. Since there's nothing else happening on the main thread at this time, it seems okay though concerningly long – I wonder what's running then.

Flags: needinfo?(michael.l.comella)

I notice in the COLD VIEW profile in comment 6 that runUIThreadCallback calls android.graphics.Bitmap.copyPixelsFromBuffer, which I believe means we are in LayerViewSupport::RecvScreenPixels(). Immediately prior to this I can see that the webrender renderer thread renders its first frame, which is always slow on a cold start due to having to compile all the shaders. Then it renders a frame and does a readback (which makes sense since we have requested a screenshot).

This readback could be made more efficient on the GPU/renderer thread:

  • I believe we are running in to an Adreno slow path to do with the stride of the data being read
  • We could also do the whole thing asynchronously using a PBO, rather than reading synchronously in to a CPU buffer

The profiler screenshots path has worked around both of those issues, for example. However, it seems like we care about the UI thread here.

It's a bit tricky to see what is taking all the time in runUIThreadCallback since it seems to only show the Java functions. (Is there a way to show both C++ and java functions for a thread in the profiler?). I hope that it's not waiting for the renderer thread to perform the readback, but looking at the timings on the profile thankfully I don't think that's the case. So if I had to guess, I would say it's probably copying the screenshot pixels from one buffer to another in RecvScreenPixels, and perhaps flipping them in FlipScreenPixels(). And perhaps when we "complete" the GeckoResult, does that synchronously do some work in Fenix, or is that done asynchronously?

We could probably improve this by doing the Y-Flip and any required downscaling on the GPU/renderer thread, rather than on the CPU on the UI thread. This is also already done for the profiler screenshots. However, the UI thread will still have to copy some pixels in to the java Bitmap object. Are we deliberately taking a screenshot at this time? Can we delay the screenshot until the main thread is not as busy or doing less important work?

Flags: needinfo?(michael.l.comella)

Oh, and can you verify that not taking a screenshot makes this issue disappear? Because the above comment is purely conjecture, so if we can verify it that would be a good start!

I notice in the COLD VIEW profile in comment 6 that runUIThreadCallback calls android.graphics.Bitmap.copyPixelsFromBuffer,

There are three places runUiThreadCallback is called:

  1. Before the first frame (let's discuss in this bug)
  2. At an arbitrary point for 295ms (I just filed bug 1720406 for this)
  3. At an even later point for 126ms (seems similar to the one above so I didn't file separately)

I think the two bugs are of different priorities. I'm more concerned about the first frame bug because it blocks visual completeness, making it longer for the user to see any content from Firefox and making it feel slower. I think :jnicol is talking about the second bug – let's further the discussion around that bug in the newly filed issue.

(Is there a way to show both C++ and java functions for a thread in the profiler?).

:canova, do you know why the GeckoView C++ isn't shown in profiles? Have we not instrumented that thread before?

Clearing NI: I hope I answered all the questions! If not, NI me again.

Flags: needinfo?(michael.l.comella) → needinfo?(canaltinova)

Oh right, sorry I missed that first call. Without knowing what is actually happening during that first callback it's hard to see how this is actionable. Perhaps we could add a profiler annotation whenever anything is scheduled on the UI thread.

The native stack for the Android UI is treated distinctly from the Java stack. You need to include AndroidUI in the list of threads to profile when you're starting the profiler.

Flags: needinfo?(canaltinova)
Priority: P1 → P3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: