Closed Bug 1506099 Opened 6 years ago Closed 2 months ago

GeckoView testing: current “cold start” test is inadequate. I propose a more severe test - cold start to page fully loaded.

Categories

(Testing :: Performance, defect, P3)

64 Branch
defect

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: mark.paxman99, Unassigned)

Details

User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/604.4.7 (KHTML, like Gecko) Version/11.0.2 Safari/604.4.7 Steps to reproduce: Your current cold start test for GeckoView is in https://health.graphics/android. It says around 1.2 seconds for Klar/Focus. But that’s quite different from the actual experience. Your current “cold start” test didn’t pick up https://github.com/mozilla-mobile/focus-android/issues/3853. I propose a more severe test of the browser's end-to-end performance:- from cold start until www.mozilla.org fully loaded. When Fenix reaches maturity, that test might including loading representative add ons as well. You might want to use Chrome or a Chromium browser as a benchmark. I did that test just now on my BQ Aquaris X5 which I think performs like your Moto G5 reference phone. Cold start to mozilla.org fully loaded:- Chrome Beta 3 sec Samsung Internet Beta + ABP 3 sec [including Adblock Plus add on] Fennec Nightly + ABP 7 sec [including Adblock Plus add on] geckoview_example.apk 8 sec Klar + GeckoView 15 sec Yuck. Klar is supposed to be your “quick look” browser and it takes a hideous time to start up. And, Samsung Internet can load a “full fat” Chrome browser + a substantial add on + mozilla.org in much less time than the barebones geckoview_example.apk. I think you need to do a test like this, to get your cold start times down to the Chromesque levels. Of course, you need other tests too to find out where any slowness is coming from. I know the Klar issue will be addressed in https://github.com/mozilla-mobile/focus-android/issues/3853 but the point is to try to catch other similar issues early on by doing this kind of benchmark.
Component: General → Testing

I'm not sure what component this should go in, or who to direct it to. I don't think this cold start test is run against Firefox for Android - only geckoview products: Maybe geckoview:general? Let's see if it finds a home in testing:performance first...

Component: Testing → Performance
Product: Firefox for Android → Testing
Version: Firefox 64 → 64 Branch

:njpark this sounds like it's related to the NimbleDroid tests. Can you comment on what these tests are measuring?

Flags: needinfo?(npark)
Priority: -- → P3

Currently nimbledroid's 'cold start' test is defined by Nimbledroid, and we can't alter the test. What we can do is to load the app with the intent that contains the URL, and wait until the site is loaded. That is what we currently do for the site load test with Nimbledroid (you can see the numbers in the detailed view). It currently doesn't have www.mozilla.org, because those sites were selected from alexa 100 sites, but we can either add mozilla site or use one of them.

Flags: needinfo?(npark)

It doesn't need to be mozilla.org, it was just an example.

I think we might be using different terminology, I propose killing the app and forcing it to restart and then load the URL each time, without resetting the app (i.e. without doing App Info > Storage > Clear Data) . Is that what happens on https://health.graphics/android?

For example, your tests show https://en.m.wikipedia.org/wiki/Main_Page loading in ~1 second on GeckoView Example on a Moto G5 class device whereas bug 1506471 says there is ~3 seconds of "dead time" after cold start where the app appears to be doing nothing, with the total load time ~6 seconds, see eg bug 1506471 comment 56. It depends on how long since the app was last reset.

So I don't understand the difference between my observed ~6 second load time from app start with your ~1 second figure from nimbledroid. So we are probably talking about different things :)

Anyway the point I was trying to make is that the current "cold start" testing didn't pick up bug 1506471 which is IMHO quite a big bug so perhaps the testing needs to be more stringent?

(In reply to Mark from comment #4)

It doesn't need to be mozilla.org, it was just an example.

I think we might be using different terminology, I propose killing the app and forcing it to restart and then load the URL each time, without resetting the app (i.e. without doing App Info > Storage > Clear Data) . Is that what happens on https://health.graphics/android?

There are two page load results shown on that page. The first is under the Nimbledroid section in the top right. This is directly above the "Cold start", which is the app startup and not a page load. If you click "SHOW DETAILED VIEW" you'll see the page load measurements. The second is the Raptor (TP6m) section, below.

For example, your tests show https://en.m.wikipedia.org/wiki/Main_Page loading in ~1 second on GeckoView Example on a Moto G5 class device whereas bug 1506471 says there is ~3 seconds of "dead time" after cold start where the app appears to be doing nothing, with the total load time ~6 seconds, see eg bug 1506471 comment 56. It depends on how long since the app was last reset.

I think you're referring to the Nimbledroid results here, a detailed view of which can be found at https://health.graphics/android/graph?site=https://en.m.wikipedia.org/wiki/Main_Page. Note that these tests are run against Nexus 5, not Moto G5. I'll defer to No-Jun for details of how the device is prepared between each page load.

So I don't understand the difference between my observed ~6 second load time from app start with your ~1 second figure from nimbledroid. So we are probably talking about different things :)

How are you measuring when you see a 6 second load time? Is this with a fresh profile? I realise this bug was opened before many optimisations were made. Are you still able to replicate such a disparity?

Anyway the point I was trying to make is that the current "cold start" testing didn't pick up bug 1506471 which is IMHO quite a big bug so perhaps the testing needs to be more stringent?

The cold start does not load a website, and many page load tests have been added since you opened this bug. We now have Raptor TP6m tests, some of which can be seen here: https://health.graphics/android/tp6m?test=cold-loadtime&platform=geckoview-g5&past=month&ending=2019-07-22. Each of those graphs also has a link to a more details graph, such as Wikipedia's at https://treeherder.mozilla.org/perf.html#/graphs?timerange=7776000&series=mozilla-central,2007663,1,10&series=mozilla-central,2007753,1,10.

The Raptor TP6m tests run in warm load and cold load variations, with the cold load being our primary metric. When running cold load, the application is killed after each load, and the browser settles for 30 seconds before initiating the page load. The app storage is cleared between each load, and a fresh profile is used each time.

(In reply to Dave Hunt [:davehunt] [he/him] ⌚️UTC from comment #5)

The Raptor TP6m tests run in warm load and cold load variations, with the cold load being our primary metric. When running cold load, the application is killed after each load, and the browser settles for 30 seconds before initiating the page load. The app storage is cleared between each load, and a fresh profile is used each time.

I think this explains why you didn't pick up bug 1506471 in your tests - because the "3 second delay" of that bug gets absorbed in your 30 second settling time.

I understand why you do what you do, but the 30 second settling time concerns me - it doesn't reflect a common user behavior. Most often the user will (eg) click on a URL in an email, and then her main concern is how long she waits until she can start reading page content. If <browser> has been killed by the Android memory manager, the total time is time to launch the app + X + time to load the page. As far as I can see, your current tests don't account for X, and for Fennec/GV X is currently a long time because of bug 1506471 and perhaps other stuff.

On a more user focused test like this, Mozilla browsers perform pretty badly as I said in comment 0. On a cheap, slow phone with low memory Mozilla browers are not fun, they keep getting killed and then take much longer to restart than Chrome, mostly because of X. I think you need a test that represents that kind of real user behavior so you can work on getting the total time from tapping on a URL to reading the page content super snappy like it is Chrome, even when the app has to cold launch.

[correction, I meant Nexus 5 not Moto G5, sorry]

Put it more simply

If <browser> has been killed by the Android memory manager, the total time [to display a URL] is time to launch the app + X + time to load the page

I don't think you currently measure X. X seems complicated, there's a lot going on, see eg bug 1506471 & bug 1529044. If in future you introduce bugs or regressions which increase X, you could degrade the user experience particularly on entry level phones. I don't think you have an automated test to detect such degradation. I think you need an automated test which measures X.

Case in point, on bug 1506471, Andrew has just realised that the pause time X has dramatically reduced on GeckoView (but not Fennec). From around 3 seconds to... near zero??? So now GeckoView is much faster than a couple of months ago in this (IMHO) very important app-launch-to-page-loaded metric. Why? When? Is the improvement stable, or is it a temporary regression? Could it be improved still further? No idea.

I bet Andrew is manually going back through versions of GeckoView to try to find when & how the improvement occurred. A tedious job. If only there was an automated test...

app-launch-to-page-loaded time, mozilla.org, slow phone
GeckoView (few months ago) ~8 seconds
GeckoView (today) ~5 seconds
Chromium ~3 seconds

Hooray! But, why & how?

(In reply to Mark from comment #8)

Case in point, on bug 1506471, Andrew has just realised that the pause time X has dramatically reduced on GeckoView (but not Fennec). From around 3 seconds to... near zero??? So now GeckoView is much faster than a couple of months ago in this (IMHO) very important app-launch-to-page-loaded metric. Why? When? Is the improvement stable, or is it a temporary regression? Could it be improved still further? No idea.

I bet Andrew is manually going back through versions of GeckoView to try to find when & how the improvement occurred. A tedious job. If only there was an automated test...

I am very curious but I don't know if I'll be able to find the time to go back and trace that down. Root cause analysis on a browser can be surprisingly time consuming!

I think this scenario might be best captured in a "user journey" tests.
A test that captures a complete flow -- e.g. logging in, adding items to a shopping cart etc.
(I don't think we have these yet)

app-launch-to-page-loaded time, mozilla.org, slow phone
GeckoView (few months ago) ~8 seconds
GeckoView (today) ~5 seconds
Chromium ~3 seconds

Hooray! But, why & how?

:sparky do you think the new applink tests with visual metrics covers the scenario described here? I wonder if we could close this bug as a duplicate.

Flags: needinfo?(gmierz2)

(In reply to Dave Hunt [:davehunt] [he/him] ⌚BST from comment #10)

:sparky do you think the new applink tests with visual metrics covers the scenario described here? I wonder if we could close this bug as a duplicate.

I think :acreskey would be best to answer this.

Flags: needinfo?(gmierz2) → needinfo?(acreskey)

(In reply to Greg Mierzwinski [:sparky] from comment #11)

(In reply to Dave Hunt [:davehunt] [he/him] ⌚BST from comment #10)

:sparky do you think the new applink tests with visual metrics covers the scenario described here? I wonder if we could close this bug as a duplicate.

I think :acreskey would be best to answer this.

Yes, our app-link tests now cover the area in the initially-listed bug: https://github.com/mozilla-mobile/focus-android/issues/3853
These measure the time from application launch (cold process) up until the navigation begins.

We have nightly tests running for Geckoview_example, Fenix, and (shortly) the Reference Browser.

There is still work to do:
• We don't have visual metrics for them as it's not currently possible to capture vismets while browsertime is starting up
• Dashboarding is not yet in place
• It would be useful to also capture the timing up until pageload complete (although noisier)

But yes, I think this bug can now be closed.

Flags: needinfo?(acreskey)
Severity: normal → S3

We have 3 cold startup tests now (perfdocs link here), which are the same that Andrew is referring to.

Status: UNCONFIRMED → RESOLVED
Closed: 2 months ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.