20-30% regression in "cnn-ampstories SpeedIndex warm live" page load test on Android
Categories
(Core :: Performance: General, defect)
Tracking
()
| Tracking | Status | |
|---|---|---|
| firefox-esr91 | --- | unaffected |
| firefox-esr102 | --- | unaffected |
| firefox100 | --- | unaffected |
| firefox101 | --- | wontfix |
| firefox102 | --- | wontfix |
| firefox103 | --- | wontfix |
People
(Reporter: cpeterson, Unassigned)
Details
(Keywords: perf, regression, Whiteboard: [fenix:p1?])
Attachments
(1 file)
|
87.69 KB,
image/png
|
Details |
There was a 20-30% page load regression in GeckoViewExample and Fenix for the "CNN AMP Stories" test (but not other sites?) on April 26. Chrome's results are unchanged on that day. The regression only seems to affect the warm live tests, not warm recorded tests or cold live tests:
- WARM LIVE results show a regression: https://treeherder.mozilla.org/perfherder/graphs?series=autoland,3386338,1,13&timerange=5184000&series=mozilla-central,3526232,1,13&series=autoland,3634413,1,13&series=mozilla-central,3767360,1,13&series=mozilla-central,3644543,1,13
- COLD LIVE results do NOT show a regression: https://treeherder.mozilla.org/perfherder/graphs?series=autoland,3386332,1,13&timerange=5184000&series=autoland,3634407,1,13&series=mozilla-central,3526226,1,13&series=mozilla-central,3767354,1,13&series=mozilla-central,3644537,1,13
- WARM RECORDED results do NOT show a regression: https://treeherder.mozilla.org/perfherder/graphs?series=fenix,3491777,1,13&timerange=5184000&series=autoland,3392220,1,13&series=autoland,3634965,1,13
Unfortunately, we don't run the AMP test on desktop, so I don't know if this is a Gecko or GeckoView regression.
Here's the mozilla-central pushlog for regressing build:
Did GeckoView update any of its dependencies that are vendored from outside mozilla-central on April 26?
| Reporter | ||
Comment 1•3 years ago
|
||
Yulia, Tooru, and Bob: do you think your April 26 changes in ScriptLoadInfo bug 1764596, subscript loader bug 1608276, or remote canvas bug 1766402 could cause a page load regression on Android? The regression only affected the "CNN AMP Stories" test, no other sites. Maybe AMP uses a lot of JavaScript?
Your commits are in the regressing pushlog and, AFAICT, none of the other commits look like that changes that might affect performance:
| Reporter | ||
Updated•3 years ago
|
Comment 2•3 years ago
|
||
(In reply to Chris Peterson [:cpeterson] from comment #1)
Yulia, Tooru, and Bob: do you think your April 26 changes in ScriptLoadInfo bug 1764596, subscript loader bug 1608276, or remote canvas bug 1766402 could cause a page load regression on Android? The regression only affected the "CNN AMP Stories" test, no other sites. Maybe AMP uses a lot of JavaScript?
Your commits are in the regressing pushlog and, AFAICT, none of the other commits look like that changes that might affect performance:
I think that my change should only have ever been a performance win, because before we were preparing and caching DataSourceSurfaces in the GPU process when we weren't using them.
The only scenario I can think of would be if a page was getting canvas data mid frame and then switching to getting it at the start of the frame. The first one at the start of the frame might be quicker, but I doubt that would have a big impact.
Comment 3•3 years ago
|
||
Heh, just noticed that this is Android, my change was Windows only.
Comment 4•3 years ago
|
||
Triggered the job in all intermediate autoland builds, we should know soon which one regressed (nothing jumped at me either)
| Reporter | ||
Comment 5•3 years ago
|
||
Looks like the SpeedIndex metric regressed, but not "PerceptualSpeedIndex" or "ContentfulSpeedIndex" for the same autoland pushlog on the same device. I don't know how they are measured.
Comment 6•3 years ago
•
|
||
My changes only impact Workers, and for the most part off thread worker tasks. I wouldn't have expected that to touch Page load performance unless that page had service workers, even then I would expect a performance improvement. If it is the case from the retrigger, I will need to take a look the week after next, as I am out on PTO next week.
Comment 7•3 years ago
|
||
The change done in bug 1608276 is mostly for the chrome-priv backend code, and it's unlikely affecting the content code performance by 20%,
but given the regression happens only in one testcase and one metric, it could be a result of unexpected consequence, and so far I cannot rule out my changes.
I'll wait for the benchmark result for each push.
Comment 8•3 years ago
|
||
The retry shows two different values for the same push, which strongly implies that this is a change in the website rather than a regression in Gecko(View).
687ms: https://treeherder.mozilla.org/jobs?repo=autoland&revision=65f792b26245a61a2ad29388727b72250ef0a463&group_state=expanded&selectedTaskRun=HUnuIc-JSvmuS8lkPZxgIg.0
957ms: https://treeherder.mozilla.org/jobs?repo=autoland&revision=65f792b26245a61a2ad29388727b72250ef0a463&group_state=expanded&selectedTaskRun=GMbHkemrR-iVL5zwt-wBGA.0
the above are both for the same push: https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=567e7a511a30141b4a634ddb5d90f510acfc695e&tochange=65f792b26245a61a2ad29388727b72250ef0a463
Comment 9•3 years ago
|
||
Updated•3 years ago
|
| Reporter | ||
Comment 10•3 years ago
|
||
(In reply to Agi Sferro | :agi | [slow ni? rn sorry] | ⏰ PST | he/him from comment #8)
The retry shows two different values for the same push, which strongly implies that this is a change in the website rather than a regression in Gecko(View).
That would explain why the "warm live" results regressed and not the "warm recorded" results.
If the website changed on April 26, do GeckoViewExample builds from April 25 or earlier now reproduce the slower SpeedIndex result?
Updated•3 years ago
|
Comment 11•3 years ago
|
||
(In reply to Chris Peterson [:cpeterson] from comment #10)
If the website changed on April 26, do GeckoViewExample builds from April 25 or earlier now reproduce the slower SpeedIndex result?
I retried an older build to verify this: https://treeherder.mozilla.org/jobs?repo=autoland&selectedJob=375797393&group_state=expanded&revision=7cae87d1af62136459d8c0e3610ec0f8fdb8af03&searchStr=cnn-amp&selectedTaskRun=KVedOiIVS7y-GnmirMeO8g.0 we should know in about a day. For now I think it's safe to assume this is a website change.
Chris noted in a meeting that, even if this is a website change, Chrome did not regress, so the website is now hitting a slow code path in Fenix/Gecko which we should/could investigate.
| Reporter | ||
Comment 12•3 years ago
|
||
(In reply to Agi Sferro | :agi | [slow ni? rn sorry] | ⏰ PST | he/him from comment #11)
I retried an older build to verify this: https://treeherder.mozilla.org/jobs?repo=autoland&selectedJob=375797393&group_state=expanded&revision=7cae87d1af62136459d8c0e3610ec0f8fdb8af03&searchStr=cnn-amp&selectedTaskRun=KVedOiIVS7y-GnmirMeO8g.0 we should know in about a day. For now I think it's safe to assume this is a website change.
Here's a table summarizing the Btime-live(cnn-amp-vismet) results from the test runs in comment 8 and 11:
| Build Date | Device | Treeherder Link | SpeedIndex opt cold live webrender | SpeedIndex opt live warm webrender |
|---|---|---|---|---|
| Tue, Apr 26, 06:21:17 | Android 7.0 MotoG5 Shippable WebRender | Link | 1725 ms | 687 ms |
| Tue, Apr 26, 06:21:17 | Android 7.0 MotoG5 Shippable WebRender | Link (retry) | 1788 ms | 957 ms |
| Tue, Apr 26, 06:21:17 | Android 8.0 Pixel2 AArch64 Shippable WebRender | Link | 826 ms | 249 ms |
| Mon, Apr 25, 23:33:31 | Android 7.0 MotoG5 Shippable WebRender | Link | 1690 ms | 693 ms |
| Mon, Apr 25, 23:33:31 | Android 8.0 Pixel2 AArch64 Shippable WebRender | Link | 742 ms | 255 ms |
I don't know if we can draw any solid conclusion from these results without more retries, but it does seem like this is not a client regression since the retries of the "Tue, Apr 26, 06:21:17" build produced two different results for "SpeedIndex opt live warm webrender".
Either way, someone should profile Fenix on the new website to see where the new hot spot is.
| Reporter | ||
Comment 13•3 years ago
|
||
Even though Fenix regressed, it's still faster than Chrome on this test.
That said, Fenix's WARM LIVE regression appears to have disappeared on June 8:
Here is the pushlog for that June 8 build:
WARM RECORDED results also improved, even though they hadn't regressed on April 26 (comment #0):
The pushlog includes some preload fixes (bug 1744822, bug 1761242, and bug 1761252). Maybe preload helped? The pushlog also disables WebRender for some Android devices (bug 1773128).
I'll check back in a week to see if the June 8 improvement sticks. If it does, then we can close this bug.
Updated•3 years ago
|
| Reporter | ||
Comment 14•3 years ago
|
||
I'll check back in a week to see if the June 8 improvement sticks. If it does, then we can close this bug.
Closing as WORKSFORME because the June 8 improvement has stuck:
Updated•3 years ago
|
Description
•