Closed Bug 1613205 Opened 5 years ago Closed 4 years ago

Visual metrics: Fenix is slower than Fennec for the site google.com

Tracking

()

Status:

RESOLVED WONTFIX

Project Flags:

Performance Impact

high

People

(Reporter: mstange, Assigned: mstange)

References

(Blocks 1 open bug)

Details

(Keywords: perf:pageload)

Attachments

(4 files, 2 obsolete files)

fenix_www.google.com.csv 5 years ago Markus Stange [:mstange] 2.06 KB, text/csv		Details
fennec_www.google.com.csv 5 years ago Markus Stange [:mstange] 2.21 KB, text/csv		Details
fenix_www.google.com.csv 5 years ago Markus Stange [:mstange] 2.95 KB, text/plain		Details
fennec_www.google.com.csv 5 years ago Markus Stange [:mstange] 3.07 KB, text/plain		Details
fenix1.mp4 5 years ago Markus Stange [:mstange] 36.55 KB, video/mp4		Details
fennec1.mp4 5 years ago Markus Stange [:mstange] 55.48 KB, video/mp4		Details

Markus Stange [:mstange]

Assignee

Description

•

5 years ago

•

Edited

Based on the results in this spreadsheet, google.com loads more slowly in Fenix than in Fennec.

Markus Stange [:mstange]

Assignee

Comment 1

•

5 years ago

Attached file fenix_www.google.com.csv (obsolete) — Details

Markus Stange [:mstange]

Assignee

Comment 2

•

5 years ago

Attached file fennec_www.google.com.csv (obsolete) — Details

Markus Stange [:mstange]

Assignee

Comment 3

•

5 years ago

Attached file fenix_www.google.com.csv — Details

Attachment #9124335 - Attachment is obsolete: true

Markus Stange [:mstange]

Assignee

Comment 4

•

5 years ago

Attached file fennec_www.google.com.csv — Details

Attachment #9124336 - Attachment is obsolete: true

Markus Stange [:mstange]

Assignee

Comment 5

•

5 years ago

SpeedIndex is about 95% worse:

Mean: 242 -> 473
Median: 235 -> 460

Markus Stange [:mstange]

Assignee

Comment 6

•

5 years ago

Attached video fenix1.mp4 — Details

Markus Stange [:mstange]

Assignee

Comment 7

•

5 years ago

Attached video fennec1.mp4 — Details

Markus Stange [:mstange]

Assignee

Comment 8

•

5 years ago

I captured some profiles from Fennec and Fenix, where I triggered the page loads by navigating manually using the URL bar. This might show different perf characteristics than when running the comparison in an automated fashion via browsertime, but it'll still be a valid comparison.

Fennec:
Uncached: https://perfht.ml/2SkKBM2
Cached: https://perfht.ml/393o6ls

Fenix:
Cached (I think): https://perfht.ml/2ueb2uV

I can see two major differences:

In Fennec, one of the network requests takes advantage of a preconnect / speculative load. In the Fennec:Cached profile you can see this preconnect in the network chart and in the activity on the Socket thread. The Fenix profiles do not have preconnects, but they don't show as much activity on the Socket thread overall. It seems like most of the CPU time during the preconnect was taken up by the slow ec_Curve25519_mul implementation, which was fixed in Firefox 69, so the fix didn't make it into Fennec which is based on 68.
More importantly, the ordering of the major paint vs one of the script evaluations is different! In the Fennec profiles, the very first paint already includes the page content, and in the cached version even the Google logo. In the Fenix profiles, the paints at the beginning are all empty, and the important paint waits for this script to execute: <script async src="https://www.google.com/xjs/_/js/k=xjs.mhp.en.e-LvQEPFNvE.O/m=sb_mobh,hjsa,d,csi/am=gAABNgI/d=1/rs=ACT90oH3gbN6EpRRvnasTlE8Ax7j7FnC0w">
In Fennec, this script also executes, but it does so after the paint. I don't know the reason for the different ordering yet. There definitely is a window for painting in the Fenix profiles when the thread goes idle briefly while the script loads and compiles.

Markus Stange [:mstange]

Assignee

Comment 9

•

5 years ago

•

Edited

The values in the CSV also indicate that this regression is made up of two pieces: a network piece (~140ms difference) and some other piece (~100ms difference).

The mean values in the CSV regress as follows:

firstVisualChange: 234ms -> 471ms (~240ms regression)
requestStart: 18ms -> 157ms (~140ms regression)
responseStart: 123ms -> 256ms (takes about 100ms for the server to react in both cases)
responseEnd: 124ms -> 257ms (and 1ms to transmit the response)

The straightforward cause of the "network difference" would be a speculative connection that Fennec makes but Fenix does not make.

I'm not sure yet how to find out where this speculative connection is triggered during our automated perf testing. I probably need to do some code inspection, or push a Fennec try build with some extra debug logging.

The remaining 100ms of difference might be the ordering issue described in the previous comment, where the paint is happening before vs after the execution of a certain script. On my Moto G5, that script takes about 400ms to execute, but the measurements in the spreadsheet were taken on a Pixel 3, which presumably is faster at executing this script, so the 100ms might make sense.

I'm going to look into the paint ordering issue first because I'm more familiar with that part of our platform.

Markus Stange [:mstange]

Assignee

Comment 10

•

5 years ago

The late paint problem does not occur in GeckoView example! Proof: https://perfht.ml/2S5Swhl
So I don't know why Fenix shows this problem but GeckoView example is fine. I suspect that something is suppressing painting in Fenix, maybe the tab thinks it's in the background.

Markus Stange [:mstange]

Assignee

Comment 11

•

5 years ago

More surprises: With a locally-built GeckoView + (debug) Fenix, I consistently get early paints.
With Firefox Preview Nightly from the play store, I mostly get late paints, with an occasional early paint.

Local Fenix profiles: https://perfht.ml/3888PQn https://perfht.ml/2H3e087 https://perfht.ml/2S8Hzvx https://perfht.ml/379awf1 (all early)
Playstore Fenix profiles: https://perfht.ml/31xuTRU https://perfht.ml/2vcE0LB https://perfht.ml/2vcE0LB (all late) https://perfht.ml/31ADTFH (half-half) https://perfht.ml/2S8Fn6Z (early paint that took forever to reach the screen)

Markus Stange [:mstange]

Assignee

Updated

•

5 years ago

Depends on: 1614063

Markus Stange [:mstange]

Assignee

Comment 12

•

5 years ago

I found the cause for the late paint. It's because something's blocking the Java main thread, so the vsync notifications are delayed, so Gecko is never notified that it should paint. See bug 1614063.

Andrew Creskey [:acreskey]

Comment 13

•

5 years ago

I would have been bumping into the Android-Components regex compile bug in this case:
https://github.com/mozilla-mobile/android-components/issues/5880

This was fixed in Android-Components a few weeks back but is not yet in Firefox Preview (only Firefox Preview Nightly).

This may also be delaying the load:
https://github.com/mozilla-mobile/fenix/issues/8291

Markus Stange [:mstange]

Assignee

Comment 14

•

5 years ago

The late paint has been fixed by the change for Fenix issue #8238.
Before: https://perfht.ml/31xuTRU
After: https://perfht.ml/3bwT6fJ

Patricia Lawless

Updated

•

5 years ago

Priority: -- → P1

Whiteboard: [qf:p1:pageload]

Markus Stange [:mstange]

Assignee

Comment 15

•

5 years ago

To provide some closure on this bug: I only saw two problems in the profiles, a network delay and a paint delay. The paint delay is fixed. For the network delay I'm putting my hopes on bug 1612575 - triggering a speculative connection to google.com during autocomplete should reduce the network delay by a lot. Until speculative connections are implemented in Fenix, I will pause further investigations on this bug.

Assignee: mstange → nobody

Status: ASSIGNED → NEW

Depends on: 1612575

Markus Stange [:mstange]

Assignee

Comment 16

•

5 years ago

I'll check if the recent speculative connection fixes in Fenix (bug 1612575 comment 7) have made a difference here.

Assignee: nobody → mstange

Status: NEW → ASSIGNED

Markus Stange [:mstange]

Assignee

Comment 17

•

5 years ago

Recent profile: https://bit.ly/3d278Wt

I don't see much evidence of a speculative connection. The biggest problems seem to be:

there's a gap during which the network request is delayed for unknown reasons (bug 1618687), 289ms in this profile
there are two redirects: http://google.com/ -> http://www.google.com/ -> https://www.google.com/?gws_rd=ssl
the second redirect never seems to get cached; the tooltip in the profiler says "Cache: Unresolved".

Andrew Creskey [:acreskey]

Comment 18

•

4 years ago

We are no longer this in recent visual metrics tests:
(i.e. fenix is now faster than fennec68)
https://docs.google.com/spreadsheets/d/18qCiz3SReDgDPwhbYfuDrbnBK1030FuVWGBHWwdgCFY/edit#gid=1689990234&range=44:44

Sean Feng [:sefeng]

Comment 19

•

4 years ago

Closing this bug due to it's no longer an issue.

Status: ASSIGNED → RESOLVED

Closed: 4 years ago

Resolution: --- → WONTFIX

Dave Hunt [:davehunt] [he/him] ⌚BST

Updated

•

3 years ago

Performance Impact: --- → P1

Keywords: perf:pageload

Whiteboard: [qf:p1:pageload]

You need to log in before you can comment on or make changes to this bug.