Closed Bug 1771430 Opened 3 years ago Closed 3 years ago

20-30% regression in "cnn-ampstories SpeedIndex warm live" page load test on Android

Categories

(Core :: Performance: General, defect)

Unspecified
Android
defect

Tracking

()

RESOLVED WORKSFORME
Tracking Status
firefox-esr91 --- unaffected
firefox-esr102 --- unaffected
firefox100 --- unaffected
firefox101 --- wontfix
firefox102 --- wontfix
firefox103 --- wontfix

People

(Reporter: cpeterson, Unassigned)

Details

(Keywords: perf, regression, Whiteboard: [fenix:p1?])

Attachments

(1 file)

Attached image screenshot.png

There was a 20-30% page load regression in GeckoViewExample and Fenix for the "CNN AMP Stories" test (but not other sites?) on April 26. Chrome's results are unchanged on that day. The regression only seems to affect the warm live tests, not warm recorded tests or cold live tests:

Unfortunately, we don't run the AMP test on desktop, so I don't know if this is a Gecko or GeckoView regression.

Here's the mozilla-central pushlog for regressing build:

https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=65f792b26245a61a2ad29388727b72250ef0a463&tochange=976c7f10f616d57b72c94d1e0fa90cebfd19f1bf

Did GeckoView update any of its dependencies that are vendored from outside mozilla-central on April 26?

Yulia, Tooru, and Bob: do you think your April 26 changes in ScriptLoadInfo bug 1764596, subscript loader bug 1608276, or remote canvas bug 1766402 could cause a page load regression on Android? The regression only affected the "CNN AMP Stories" test, no other sites. Maybe AMP uses a lot of JavaScript?

Your commits are in the regressing pushlog and, AFAICT, none of the other commits look like that changes that might affect performance:

https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=65f792b26245a61a2ad29388727b72250ef0a463&tochange=976c7f10f616d57b72c94d1e0fa90cebfd19f1bf

Flags: needinfo?(ystartsev)
Flags: needinfo?(bobowencode)
Flags: needinfo?(arai.unmht)
Component: General → Performance
Keywords: perf
Product: GeckoView → Core
Summary: 20-30% regression in cnn-amptories SpeedIndex warm live page load test → 20-30% regression in "cnn-ampstories SpeedIndex warm live" page load test on Android

(In reply to Chris Peterson [:cpeterson] from comment #1)

Yulia, Tooru, and Bob: do you think your April 26 changes in ScriptLoadInfo bug 1764596, subscript loader bug 1608276, or remote canvas bug 1766402 could cause a page load regression on Android? The regression only affected the "CNN AMP Stories" test, no other sites. Maybe AMP uses a lot of JavaScript?

Your commits are in the regressing pushlog and, AFAICT, none of the other commits look like that changes that might affect performance:

https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=65f792b26245a61a2ad29388727b72250ef0a463&tochange=976c7f10f616d57b72c94d1e0fa90cebfd19f1bf

I think that my change should only have ever been a performance win, because before we were preparing and caching DataSourceSurfaces in the GPU process when we weren't using them.
The only scenario I can think of would be if a page was getting canvas data mid frame and then switching to getting it at the start of the frame. The first one at the start of the frame might be quicker, but I doubt that would have a big impact.

Flags: needinfo?(bobowencode)

Heh, just noticed that this is Android, my change was Windows only.

Triggered the job in all intermediate autoland builds, we should know soon which one regressed (nothing jumped at me either)

My changes only impact Workers, and for the most part off thread worker tasks. I wouldn't have expected that to touch Page load performance unless that page had service workers, even then I would expect a performance improvement. If it is the case from the retrigger, I will need to take a look the week after next, as I am out on PTO next week.

Flags: needinfo?(ystartsev)

The change done in bug 1608276 is mostly for the chrome-priv backend code, and it's unlikely affecting the content code performance by 20%,
but given the regression happens only in one testcase and one metric, it could be a result of unexpected consequence, and so far I cannot rule out my changes.

I'll wait for the benchmark result for each push.

Flags: needinfo?(arai.unmht)

(In reply to Agi Sferro | :agi | [slow ni? rn sorry] | ⏰ PST | he/him from comment #8)

The retry shows two different values for the same push, which strongly implies that this is a change in the website rather than a regression in Gecko(View).

That would explain why the "warm live" results regressed and not the "warm recorded" results.

If the website changed on April 26, do GeckoViewExample builds from April 25 or earlier now reproduce the slower SpeedIndex result?

(In reply to Chris Peterson [:cpeterson] from comment #10)

If the website changed on April 26, do GeckoViewExample builds from April 25 or earlier now reproduce the slower SpeedIndex result?

I retried an older build to verify this: https://treeherder.mozilla.org/jobs?repo=autoland&selectedJob=375797393&group_state=expanded&revision=7cae87d1af62136459d8c0e3610ec0f8fdb8af03&searchStr=cnn-amp&selectedTaskRun=KVedOiIVS7y-GnmirMeO8g.0 we should know in about a day. For now I think it's safe to assume this is a website change.

Chris noted in a meeting that, even if this is a website change, Chrome did not regress, so the website is now hitting a slow code path in Fenix/Gecko which we should/could investigate.

(In reply to Agi Sferro | :agi | [slow ni? rn sorry] | ⏰ PST | he/him from comment #11)

I retried an older build to verify this: https://treeherder.mozilla.org/jobs?repo=autoland&selectedJob=375797393&group_state=expanded&revision=7cae87d1af62136459d8c0e3610ec0f8fdb8af03&searchStr=cnn-amp&selectedTaskRun=KVedOiIVS7y-GnmirMeO8g.0 we should know in about a day. For now I think it's safe to assume this is a website change.

Here's a table summarizing the Btime-live(cnn-amp-vismet) results from the test runs in comment 8 and 11:

Build Date Device Treeherder Link SpeedIndex opt cold live webrender SpeedIndex opt live warm webrender
Tue, Apr 26, 06:21:17 Android 7.0 MotoG5 Shippable WebRender Link 1725 ms 687 ms
Tue, Apr 26, 06:21:17 Android 7.0 MotoG5 Shippable WebRender Link (retry) 1788 ms 957 ms
Tue, Apr 26, 06:21:17 Android 8.0 Pixel2 AArch64 Shippable WebRender Link 826 ms 249 ms
Mon, Apr 25, 23:33:31 Android 7.0 MotoG5 Shippable WebRender Link 1690 ms 693 ms
Mon, Apr 25, 23:33:31 Android 8.0 Pixel2 AArch64 Shippable WebRender Link 742 ms 255 ms

I don't know if we can draw any solid conclusion from these results without more retries, but it does seem like this is not a client regression since the retries of the "Tue, Apr 26, 06:21:17" build produced two different results for "SpeedIndex opt live warm webrender".

Either way, someone should profile Fenix on the new website to see where the new hot spot is.

Even though Fenix regressed, it's still faster than Chrome on this test.

That said, Fenix's WARM LIVE regression appears to have disappeared on June 8:

https://treeherder.mozilla.org/perfherder/graphs?highlightAlerts=1&highlightChangelogData=1&highlightCommonAlerts=0&series=autoland,3386338,1,13&series=mozilla-central,3526232,1,13&series=autoland,3634413,1,13&series=mozilla-central,3767360,1,13&series=mozilla-central,3644543,1,13&timerange=5184000&zoom=1649579957679,1654808232179,132.22222222222229,2118.777777777778

Here is the pushlog for that June 8 build:

https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=c76671d99573988a7033a5ab4253a06dfed716c1&tochange=c5d74a64aa20c86db46d3c3473d8c8e611afdf65

WARM RECORDED results also improved, even though they hadn't regressed on April 26 (comment #0):

https://treeherder.mozilla.org/perfherder/graphs?series=fenix,3491777,1,13&timerange=5184000&series=autoland,3392220,1,13&series=autoland,3634965,1,13

The pushlog includes some preload fixes (bug 1744822, bug 1761242, and bug 1761252). Maybe preload helped? The pushlog also disables WebRender for some Android devices (bug 1773128).

I'll check back in a week to see if the June 8 improvement sticks. If it does, then we can close this bug.

Whiteboard: [fenix:p1?]
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: