Closed Bug 1613205 Opened 5 years ago Closed 4 years ago

Visual metrics: Fenix is slower than Fennec for the site google.com

Categories

(Core :: Performance, defect, P1)

All
Android
defect

Tracking

()

RESOLVED WONTFIX
Performance Impact high

People

(Reporter: mstange, Assigned: mstange)

References

(Blocks 1 open bug)

Details

(Keywords: perf:pageload)

Attachments

(4 files, 2 obsolete files)

Based on the results in this spreadsheet, google.com loads more slowly in Fenix than in Fennec.

Attached file fenix_www.google.com.csv (obsolete) —
Attached file fennec_www.google.com.csv (obsolete) —
Attachment #9124335 - Attachment is obsolete: true
Attachment #9124336 - Attachment is obsolete: true

SpeedIndex is about 95% worse:

  • Mean: 242 -> 473
  • Median: 235 -> 460
Attached video fenix1.mp4
Attached video fennec1.mp4

I captured some profiles from Fennec and Fenix, where I triggered the page loads by navigating manually using the URL bar. This might show different perf characteristics than when running the comparison in an automated fashion via browsertime, but it'll still be a valid comparison.

Fennec:
Uncached: https://perfht.ml/2SkKBM2
Cached: https://perfht.ml/393o6ls

Fenix:
Cached (I think): https://perfht.ml/2ueb2uV

I can see two major differences:

  • In Fennec, one of the network requests takes advantage of a preconnect / speculative load. In the Fennec:Cached profile you can see this preconnect in the network chart and in the activity on the Socket thread. The Fenix profiles do not have preconnects, but they don't show as much activity on the Socket thread overall. It seems like most of the CPU time during the preconnect was taken up by the slow ec_Curve25519_mul implementation, which was fixed in Firefox 69, so the fix didn't make it into Fennec which is based on 68.
  • More importantly, the ordering of the major paint vs one of the script evaluations is different! In the Fennec profiles, the very first paint already includes the page content, and in the cached version even the Google logo. In the Fenix profiles, the paints at the beginning are all empty, and the important paint waits for this script to execute: <script async src="https://www.google.com/xjs/_/js/k=xjs.mhp.en.e-LvQEPFNvE.O/m=sb_mobh,hjsa,d,csi/am=gAABNgI/d=1/rs=ACT90oH3gbN6EpRRvnasTlE8Ax7j7FnC0w">
    In Fennec, this script also executes, but it does so after the paint. I don't know the reason for the different ordering yet. There definitely is a window for painting in the Fenix profiles when the thread goes idle briefly while the script loads and compiles.

The values in the CSV also indicate that this regression is made up of two pieces: a network piece (~140ms difference) and some other piece (~100ms difference).

The mean values in the CSV regress as follows:

  • firstVisualChange: 234ms -> 471ms (~240ms regression)
  • requestStart: 18ms -> 157ms (~140ms regression)
  • responseStart: 123ms -> 256ms (takes about 100ms for the server to react in both cases)
  • responseEnd: 124ms -> 257ms (and 1ms to transmit the response)

The straightforward cause of the "network difference" would be a speculative connection that Fennec makes but Fenix does not make.

I'm not sure yet how to find out where this speculative connection is triggered during our automated perf testing. I probably need to do some code inspection, or push a Fennec try build with some extra debug logging.

The remaining 100ms of difference might be the ordering issue described in the previous comment, where the paint is happening before vs after the execution of a certain script. On my Moto G5, that script takes about 400ms to execute, but the measurements in the spreadsheet were taken on a Pixel 3, which presumably is faster at executing this script, so the 100ms might make sense.

I'm going to look into the paint ordering issue first because I'm more familiar with that part of our platform.

The late paint problem does not occur in GeckoView example! Proof: https://perfht.ml/2S5Swhl
So I don't know why Fenix shows this problem but GeckoView example is fine. I suspect that something is suppressing painting in Fenix, maybe the tab thinks it's in the background.

More surprises: With a locally-built GeckoView + (debug) Fenix, I consistently get early paints.
With Firefox Preview Nightly from the play store, I mostly get late paints, with an occasional early paint.

Local Fenix profiles: https://perfht.ml/3888PQn https://perfht.ml/2H3e087 https://perfht.ml/2S8Hzvx https://perfht.ml/379awf1 (all early)
Playstore Fenix profiles: https://perfht.ml/31xuTRU https://perfht.ml/2vcE0LB https://perfht.ml/2vcE0LB (all late) https://perfht.ml/31ADTFH (half-half) https://perfht.ml/2S8Fn6Z (early paint that took forever to reach the screen)

Depends on: 1614063

I found the cause for the late paint. It's because something's blocking the Java main thread, so the vsync notifications are delayed, so Gecko is never notified that it should paint. See bug 1614063.

I would have been bumping into the Android-Components regex compile bug in this case:
https://github.com/mozilla-mobile/android-components/issues/5880

This was fixed in Android-Components a few weeks back but is not yet in Firefox Preview (only Firefox Preview Nightly).

This may also be delaying the load:
https://github.com/mozilla-mobile/fenix/issues/8291

The late paint has been fixed by the change for Fenix issue #8238.
Before: https://perfht.ml/31xuTRU
After: https://perfht.ml/3bwT6fJ

Priority: -- → P1
Whiteboard: [qf:p1:pageload]

To provide some closure on this bug: I only saw two problems in the profiles, a network delay and a paint delay. The paint delay is fixed. For the network delay I'm putting my hopes on bug 1612575 - triggering a speculative connection to google.com during autocomplete should reduce the network delay by a lot. Until speculative connections are implemented in Fenix, I will pause further investigations on this bug.

Assignee: mstange → nobody
Status: ASSIGNED → NEW
Depends on: 1612575

I'll check if the recent speculative connection fixes in Fenix (bug 1612575 comment 7) have made a difference here.

Assignee: nobody → mstange
Status: NEW → ASSIGNED

Recent profile: https://bit.ly/3d278Wt

I don't see much evidence of a speculative connection. The biggest problems seem to be:

We are no longer this in recent visual metrics tests:
(i.e. fenix is now faster than fennec68)
https://docs.google.com/spreadsheets/d/18qCiz3SReDgDPwhbYfuDrbnBK1030FuVWGBHWwdgCFY/edit#gid=1689990234&range=44:44

Closing this bug due to it's no longer an issue.

Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Resolution: --- → WONTFIX
Performance Impact: --- → P1
Keywords: perf:pageload
Whiteboard: [qf:p1:pageload]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: