Open Bug 1608826 Opened 5 years ago Updated 9 months ago

[meta] Geckoview load initiation performance

Categories

(GeckoView :: General, task, P3)

Unspecified
All

Tracking

(Performance Impact:low)

Performance Impact low

People

(Reporter: jesup, Unassigned)

References

(Depends on 4 open bugs, Blocks 1 open bug)

Details

(Keywords: meta, perf:pageload)

Metabug for anything impacting the initiation of pageload in geckoview browsers. This includes processing between receipt of an intent or a keypress and when the network request to load the page is on the network.

Moto G5 profile of loading https://accounts.google.com after a fresh start of a local-built GVE. No cache clear.
https://perfht.ml/2FJBPB4
Used a YAML file with:

env:
MOZ_LOG: nsHttp:5,profilermarkers
prefs:
geckoview.logging: Warn

preliminary analysis:

In the child, before we ask the master process to start a load:
About 20ms was spent in nsDocShell::MaybeHandleLoadDelegate(), most in LoadURI for resource:///actors/LoadURIDelegateChild.jsm.
another 8-12ms was spend in WindowsGlobalActor construction; about half of it in initLogging(), the rest in ChromeUtils::Import resource://gre/modules/LoadURIDelegate.jsm.

We then have a delay, and spend 28m in shouldLoadURI(), and some more bits: https://perfht.ml/2t5fWtA

There's then an about 40ms delay from that point until the master process indicates the network load starts.

All in all, there's about 120ms from the start of this process until the master process indicates a load starts (and perhaps more time until it hits the network with anything - from that point to http-on-before-connect is another 49ms: https://perfht.ml/36Lspky)

And then there's a delay (especially in the child) waiting for data (the redirect); the child is largely idle and has time to run a 100ms GC (including 18, 34, and 12ms slices).

On a reload, much less time is spent in content before initiating the request (discounting beforeunload). However, on reload it took ~300ms to load the redirect - far longer than it took the first time. See https://perfht.ml/39YU17M, where you see a lot of processing on SocketThread during that delay. At least 100ms+ of it is just in the SSL handshakes: https://perfht.ml/30ipXQb. After that, it takes another ~85ms to do some SPDY stuff and get data: https://perfht.ml/2NorCOZ

Adding jcj, mayhemer and mt for this part of it. Note: there are nsHttp:5 log messages you can look at in the profile which may be especially useful on the Socket Thread; see the Marker Chart. Note that some of the log messages on MainThread have artificially "long" times; I was testing a patch and set a few of the message to have a marker-start-time of the start of the http connection; you can ignore when those log messages start; only the end is relevant.

Note: Moto G5 (8 identical cores), arm32, local m-c build of geckoview_example

Flags: needinfo?(jjones)
Flags: needinfo?(honzab.moz)

What version of NSS was this? Or version of m-c?

This was m-c rev 989bbd83ccd5 (monday Jan 6th)

The profiled timings seem reasonable to us for ECDH on p256 for 32-bit ARM. We have no acceleration for p256 on any platform, it's all integer math. We do have p256 acceleration in the pipeline, but I'm not sure if it's going to cover ARM at this point.

Flags: needinfo?(jjones)

Thanks, jcj - that's unfortunate, but I suppose we don't control that. I do wonder if Fennec is seeing the same cipher...

Also I want to look more closely at why the reload was slower than the load, and if the NSS code took more time there for some reason.

yaml file

To install the yaml file, use adb push /tmp/org.mozilla.geckoview_example-geckoview-config.yaml /data/local/tmp

(In reply to Randell Jesup [:jesup] (needinfo me) from comment #8)

yaml file

To install the yaml file, use adb push /tmp/org.mozilla.geckoview_example-geckoview-config.yaml /data/local/tmp

FYI: in your own builds, mach run --setenv ... is supported. I'm pretty sure that there's no --setpref support (yet), though :(

Depends on: 1609682

Quick look at https://perfht.ml/39YU17M shows that we do mostly ciphering operations. This has been answered by :jcj before. Note that logging itself adds a lot of significant overhead too. The rest seems reasonable and expected.

~85ms to do some SPDY stuff and get data: https://perfht.ml/2NorCOZ

I only see 36ms. Split to 4+4+4+4+4+4+4=ms of logging time, 8ms of a PSM work (nssTrustDomain_FindCertificatesBySubject et al). So, nothing unexpected.

No idea what socket_local_client is, where from it's called.

The rest is not significant or known to me what it means.

I cannot say what exactly happens under the hood and why the second load (reload) takes so long time, we are missing in-depth tooling for that.

Flags: needinfo?(honzab.moz)
Blocks: 1604248
Depends on: 1613484

(I'm merging bug 1613484 into this one. From that bug:)

Here's a profile of what happens when I enter "google.com" in the Fenix URL bar and press enter, on the way to the first network request:
https://perfht.ml/2S2b567 (with sampling, easier to read but slowed down by profiler overhead)
https://perfht.ml/31yaXOD (without sampling, much less overhead)

This is in an existing about:blank tab. (I entered the URL about:blank manually.)

You can see that there's activity in the parent process a long time before the network chart in the parent process shows the network request starting.

I can see a number of problems:

  • The parent process knows about the URL we want to go to first, but it initiates the navigation through the content process which will bounce the network request back to the parent process. I think we do the same on desktop, so this is probably not straightforward to fix. However, maybe we can start a speculative connect in the parent and absorb some of the latency that way.
  • Bug 1613488: While the URL bar is focused, the browser has a different size. When enter is pressed, the browser is first resized to its regular size, and then the navigation is initiated.
  • This resize triggers at least two sets of restyle+reflow+paint in the content process (~24ms). These delay the initiation of the network request. (The first paint is from UpdateDimensions+RenderLayers, the second from UpdateEffects once the parent process has painted and knows how much of the browser is revealed on screen.)
  • AsyncOpen in the content process takes another 16ms before it propagates the request to the parent process. Half of that (8ms) seem to be spent notifying "http-on-opening-request" observers inside the content process.
Depends on: 1602318, 1613488
No longer depends on: 1613484
Depends on: 1617750
Depends on: 1613227

New profile for pressing enter in the URL bar in Fenix: https://perfht.ml/2VteuN4

Problems in the content process:

  • Issue #8782: The first load is delayed by 500ms of work on the Java UI thread, via a nested event loop. (This is a debug Fenix build so this delay is hopefully not as long in release builds. Update: 240ms in a release build)
  • ~50ms of LoadRemoteScript work: https://perfht.ml/2wa2DZJ

Problems in the parent process:

  • Bug 1617979: 13ms in synchronous message to start a new WebRender renderer
  • 8ms in Window_Binding::_resolve due to new geckoView.xhtml
  • 80ms in geckoview.js's startup function, https://perfht.ml/3c7z77B

Startup profiles for Fenix App Link load initiation: https://perfht.ml/32tJVsg (5ms profiling interval) / https://perfht.ml/2T5UX45 (10ms)

Depends on: 1620079
Depends on: 1619796
No longer depends on: 1620079
Depends on: 1626389
Depends on: 1628449
Depends on: 1634480
Depends on: 1641129
Whiteboard: [qf:p1:pageload] → [qf:p3:pageload]
Performance Impact: --- → P3
Keywords: perf:pageload
Whiteboard: [qf:p3:pageload]
Severity: normal → N/A
Priority: -- → P3
You need to log in before you can comment on or make changes to this bug.