Visual metrics: Fenix is slower than Fennec for the site google.com
Categories
(Core :: Performance, defect, P1)
Tracking
()
Performance Impact | high |
People
(Reporter: mstange, Assigned: mstange)
References
(Blocks 1 open bug)
Details
(Keywords: perf:pageload)
Attachments
(4 files, 2 obsolete files)
Based on the results in this spreadsheet, google.com loads more slowly in Fenix than in Fennec.
Assignee | ||
Comment 1•5 years ago
|
||
Assignee | ||
Comment 2•5 years ago
|
||
Assignee | ||
Comment 3•5 years ago
|
||
Assignee | ||
Comment 4•5 years ago
|
||
Assignee | ||
Comment 5•5 years ago
|
||
SpeedIndex is about 95% worse:
- Mean: 242 -> 473
- Median: 235 -> 460
Assignee | ||
Comment 6•5 years ago
|
||
Assignee | ||
Comment 7•5 years ago
|
||
Assignee | ||
Comment 8•5 years ago
|
||
I captured some profiles from Fennec and Fenix, where I triggered the page loads by navigating manually using the URL bar. This might show different perf characteristics than when running the comparison in an automated fashion via browsertime, but it'll still be a valid comparison.
Fennec:
Uncached: https://perfht.ml/2SkKBM2
Cached: https://perfht.ml/393o6ls
Fenix:
Cached (I think): https://perfht.ml/2ueb2uV
I can see two major differences:
- In Fennec, one of the network requests takes advantage of a preconnect / speculative load. In the Fennec:Cached profile you can see this preconnect in the network chart and in the activity on the Socket thread. The Fenix profiles do not have preconnects, but they don't show as much activity on the Socket thread overall. It seems like most of the CPU time during the preconnect was taken up by the slow ec_Curve25519_mul implementation, which was fixed in Firefox 69, so the fix didn't make it into Fennec which is based on 68.
- More importantly, the ordering of the major paint vs one of the script evaluations is different! In the Fennec profiles, the very first paint already includes the page content, and in the cached version even the Google logo. In the Fenix profiles, the paints at the beginning are all empty, and the important paint waits for this script to execute:
<script async src="https://www.google.com/xjs/_/js/k=xjs.mhp.en.e-LvQEPFNvE.O/m=sb_mobh,hjsa,d,csi/am=gAABNgI/d=1/rs=ACT90oH3gbN6EpRRvnasTlE8Ax7j7FnC0w">
In Fennec, this script also executes, but it does so after the paint. I don't know the reason for the different ordering yet. There definitely is a window for painting in the Fenix profiles when the thread goes idle briefly while the script loads and compiles.
Assignee | ||
Comment 9•5 years ago
•
|
||
The values in the CSV also indicate that this regression is made up of two pieces: a network piece (~140ms difference) and some other piece (~100ms difference).
The mean values in the CSV regress as follows:
- firstVisualChange: 234ms -> 471ms (~240ms regression)
- requestStart: 18ms -> 157ms (~140ms regression)
- responseStart: 123ms -> 256ms (takes about 100ms for the server to react in both cases)
- responseEnd: 124ms -> 257ms (and 1ms to transmit the response)
The straightforward cause of the "network difference" would be a speculative connection that Fennec makes but Fenix does not make.
I'm not sure yet how to find out where this speculative connection is triggered during our automated perf testing. I probably need to do some code inspection, or push a Fennec try build with some extra debug logging.
The remaining 100ms of difference might be the ordering issue described in the previous comment, where the paint is happening before vs after the execution of a certain script. On my Moto G5, that script takes about 400ms to execute, but the measurements in the spreadsheet were taken on a Pixel 3, which presumably is faster at executing this script, so the 100ms might make sense.
I'm going to look into the paint ordering issue first because I'm more familiar with that part of our platform.
Assignee | ||
Comment 10•5 years ago
|
||
The late paint problem does not occur in GeckoView example! Proof: https://perfht.ml/2S5Swhl
So I don't know why Fenix shows this problem but GeckoView example is fine. I suspect that something is suppressing painting in Fenix, maybe the tab thinks it's in the background.
Assignee | ||
Comment 11•5 years ago
|
||
More surprises: With a locally-built GeckoView + (debug) Fenix, I consistently get early paints.
With Firefox Preview Nightly from the play store, I mostly get late paints, with an occasional early paint.
Local Fenix profiles: https://perfht.ml/3888PQn https://perfht.ml/2H3e087 https://perfht.ml/2S8Hzvx https://perfht.ml/379awf1 (all early)
Playstore Fenix profiles: https://perfht.ml/31xuTRU https://perfht.ml/2vcE0LB https://perfht.ml/2vcE0LB (all late) https://perfht.ml/31ADTFH (half-half) https://perfht.ml/2S8Fn6Z (early paint that took forever to reach the screen)
Assignee | ||
Comment 12•5 years ago
|
||
I found the cause for the late paint. It's because something's blocking the Java main thread, so the vsync notifications are delayed, so Gecko is never notified that it should paint. See bug 1614063.
Comment 13•5 years ago
|
||
I would have been bumping into the Android-Components regex compile bug in this case:
https://github.com/mozilla-mobile/android-components/issues/5880
This was fixed in Android-Components a few weeks back but is not yet in Firefox Preview (only Firefox Preview Nightly).
This may also be delaying the load:
https://github.com/mozilla-mobile/fenix/issues/8291
Assignee | ||
Comment 14•5 years ago
|
||
The late paint has been fixed by the change for Fenix issue #8238.
Before: https://perfht.ml/31xuTRU
After: https://perfht.ml/3bwT6fJ
Updated•5 years ago
|
Assignee | ||
Comment 15•5 years ago
|
||
To provide some closure on this bug: I only saw two problems in the profiles, a network delay and a paint delay. The paint delay is fixed. For the network delay I'm putting my hopes on bug 1612575 - triggering a speculative connection to google.com during autocomplete should reduce the network delay by a lot. Until speculative connections are implemented in Fenix, I will pause further investigations on this bug.
Assignee | ||
Comment 16•5 years ago
|
||
I'll check if the recent speculative connection fixes in Fenix (bug 1612575 comment 7) have made a difference here.
Assignee | ||
Comment 17•5 years ago
|
||
Recent profile: https://bit.ly/3d278Wt
I don't see much evidence of a speculative connection. The biggest problems seem to be:
- there's a gap during which the network request is delayed for unknown reasons (bug 1618687), 289ms in this profile
- there are two redirects: http://google.com/ -> http://www.google.com/ -> https://www.google.com/?gws_rd=ssl
- the second redirect never seems to get cached; the tooltip in the profiler says "Cache: Unresolved".
Comment 18•4 years ago
|
||
We are no longer this in recent visual metrics tests:
(i.e. fenix is now faster than fennec68)
https://docs.google.com/spreadsheets/d/18qCiz3SReDgDPwhbYfuDrbnBK1030FuVWGBHWwdgCFY/edit#gid=1689990234&range=44:44
Comment 19•4 years ago
|
||
Closing this bug due to it's no longer an issue.
Updated•3 years ago
|
Description
•