Closed Bug 1796163 Opened 8 months ago Closed 5 months ago

2.71% google-mail LastVisualChange (Linux) regression on Thu October 13 2022

Categories

(WebExtensions :: General, defect, P2)

Firefox 108
defect

Tracking

(firefox-esr102 unaffected, firefox106 unaffected, firefox107 disabled, firefox108 disabled, firefox109 wontfix, firefox110 wontfix)

RESOLVED WONTFIX
Tracking Status
firefox-esr102 --- unaffected
firefox106 --- unaffected
firefox107 --- disabled
firefox108 --- disabled
firefox109 --- wontfix
firefox110 --- wontfix

People

(Reporter: alexandrui, Unassigned)

References

(Regression)

Details

(Keywords: perf, perf-alert, regression, Whiteboard: [addons-jira])

Attachments

(3 files)

Perfherder has detected a browsertime performance regression from push 82de653dd2d9c8e9901f18993258d91a836a1a9a. As author of one of the patches included in that push, we need your help to address this regression.

Regressions:

Ratio Test Platform Options Absolute values (old vs new)
4% google-mail LastVisualChange linux1804-64-shippable-qr cold fission webrender 4,238.33 -> 4,386.67

Details of the alert can be found in the alert summary, including links to graphs and comparisons for each of the affected tests. Please follow our guide to handling regression bugs and let us know your plans within 3 business days, or the offending patch(es) may be backed out in accordance with our regression policy.

If you need the profiling jobs you can trigger them yourself from treeherder job view or ask a sheriff to do that for you.

For more information on performance sheriffing please see our FAQ.

Flags: needinfo?(wdurand)

Set release status flags based on info from the regressing bug 1793626

:mixedpuppy could this be triaged for impact on the 107 release?

Flags: needinfo?(mixedpuppy)

The associated change is flipping a pref for a feature on nightly. There should be no affect on release.

Flags: needinfo?(mixedpuppy)

(In reply to Shane Caraveo (:mixedpuppy) from comment #3)

The associated change is flipping a pref for a feature on nightly. There should be no affect on release.

Sorry for the unnecessary ping, I should have spotted the regressing change was nightly only. Thanks.

Severity: -- → S4
Priority: -- → P2
Whiteboard: [addons-jira]

:alexandrui looking at the recent trend, it looks like the perfs are a lot better than they could have been and so I am wondering if you could help here. Is there still a perf regression caused by the unified extensions UI work? Thanks!

Flags: needinfo?(wdurand) → needinfo?(aionescu)

The improvement is because of bug 1794153. The perf regression caused by the unified extensions UI work is still there.

Flags: needinfo?(aionescu)

:alexandrui ok, thanks. Now... I am sorry but could you help me understand what this perftest is measuring?

I clicked all the links in comment 0 but that didn't help. I don't think I've seen "4%" mentioned anywhere and loading 60 days of data shows a pretty stable and flat curve to me, until recently (but that is explained by your comment 6). The commit that regressed this perftest enables a feature in Nightly but the feature itself does not do much (it shows a new button in the toolbar basically).

Flags: needinfo?(aionescu)

here is the regression. You can close this as wontfix as this affects nightly only.

Flags: needinfo?(aionescu)

This feature will likely be enabled in all channels during 109 so I don't think wontfixing it will solve the problem.

Adding Mike who has some context around the changes in this project as well as knowing a lot of stuff about front-end perf. :-)

I ended up looking at this link which is zoomed in on the regression (link might stop working in another 20 or so days) and it seems there were retriggers in the range of commits to establish which commit moved the needle here. But each of the measurement levels of 4400 or 4360ms (which are all rounded to multiples of 40ms, it seems, so presumably something in the test does that or we're multiplying by 40 or whatever) has values from multiple revisions in it. So it seems to me the test is reasonably noisy. Looking at the commits in this range I'm also seeing way more retriggers than I'm seeing in that graph, which further confuses things.

Given this is just a pref flip, ISTM there's a pretty easy test to see if this was the real regressor: if you push a perf try run for both the autoland revision that was identified in comment 0, and that + the nightly if branch set back to false, do we see the regression disappearing? (Maybe Alexandru already did this?)
It'd also be useful to check this against current central - as noted in comment #5 / comment #6, current measurements are around half of what they were so any regression is necessarily going to be much smaller. I can't tell if comment 6 means this was already done or not.

If there is a real regression here, I'd probably try doing perf profile compares, though it may be hard to attribute a difference as small as 2% to something in those. Maybe Mike has better/other ideas here...

Flags: needinfo?(mconley)

So thankfully someone has gone to the trouble of doing a bunch of retriggers on the original jobs, and it looks like browsertime profiles were generated for both the previous revision, and the one that enabled the Unified Extensions UI.

Here are those two builds:
https://treeherder.mozilla.org/jobs?repo=autoland&fromchange=4007caecc5ff616ed4c9ae76d8bfa218a1177444&tochange=82de653dd2d9c8e9901f18993258d91a836a1a9a&searchStr=browsertime&selectedTaskRun=cv9c4TIcQQmditA8NAeN4g.0

But I'm having a pretty hard time making heads or tails out of what I'm seeing. At least looking at the screenshots / thumbsnails in those profiles, the loading of GMail doesn't look significantly different in the content area. Example:

Baseline: https://share.firefox.dev/3OMtkbe
Regressor: https://share.firefox.dev/3VDOquM

Maybe I'm not understanding LastVisualChange correctly, but I thought reading this doc that it had to do with the last change in the content area, and I'm not seeing much of a difference between these.

Hey Bas, do you happen to know who the current in-house expert is on Browsertime who could help us see in these profiles where the LastVisualChange difference is?

Flags: needinfo?(mconley) → needinfo?(bas)

(In reply to Mike Conley (:mconley) (:⚙️) from comment #11)

So thankfully someone has gone to the trouble of doing a bunch of retriggers on the original jobs, and it looks like browsertime profiles were generated for both the previous revision, and the one that enabled the Unified Extensions UI.

Here are those two builds:
https://treeherder.mozilla.org/jobs?repo=autoland&fromchange=4007caecc5ff616ed4c9ae76d8bfa218a1177444&tochange=82de653dd2d9c8e9901f18993258d91a836a1a9a&searchStr=browsertime&selectedTaskRun=cv9c4TIcQQmditA8NAeN4g.0

But I'm having a pretty hard time making heads or tails out of what I'm seeing. At least looking at the screenshots / thumbsnails in those profiles, the loading of GMail doesn't look significantly different in the content area. Example:

Baseline: https://share.firefox.dev/3OMtkbe
Regressor: https://share.firefox.dev/3VDOquM

Maybe I'm not understanding LastVisualChange correctly, but I thought reading this doc that it had to do with the last change in the content area, and I'm not seeing much of a difference between these.

Hey Bas, do you happen to know who the current in-house expert is on Browsertime who could help us see in these profiles where the LastVisualChange difference is?

Sorry for the slow reply here. Sparky is probably a good initial point of contact. Nazim would know most about how the profiles are made.

Flags: needinfo?(bas) → needinfo?(gmierz2)

This is where the regression seems to be coming from (see arrow). I'll upload a zip with more information in the the comment.

Attached file side-by-sides-gmail.7z

I produced these locally using the following command: ./mach perftest-tools side-by-side --base-platform test-linux1804-64-shippable-qr/opt -t browsertime-tp6-firefox-google-mail --new-revision 82de653dd2d9c8e9901f18993258d91a836a1a9a --base-revision 4007caecc5ff616ed4c9ae76d8bfa218a1177444 --metric LastVisualChange

The before videos are from this push: https://treeherder.mozilla.org/jobs?repo=autoland&searchStr=browsertime&revision=4007caecc5ff616ed4c9ae76d8bfa218a1177444

The after videos are from this push: https://treeherder.mozilla.org/jobs?repo=autoland&searchStr=browsertime&revision=82de653dd2d9c8e9901f18993258d91a836a1a9a

Flags: needinfo?(gmierz2)

sparky! This is awesome! Maybe I used to know this and I forgot - but WOW that's a useful command to know! These videos are great.

And I think I see what's going wrong... I actually think there might be something wrong with how LastVisualChange is being calculated.

You highlighted in comment 13 the region of the screen that is still updating at the end of the test, and I agree - this is the problem area. What I'm noticing when scrubbing the video is that in the "before" video, the LastVisualChange is recorded when those sidebar items up their opacity and the underline below it appears. It seems to ignore the "+" symbol that appears below them. In the "after" video, LastVisualChange is recorded when those things occur and the "+" symbol appears.

Any idea why LastVisualChange for "before" wouldn't care about the "+" symbol?

Flags: needinfo?(gmierz2)

(In reply to Mike Conley (:mconley) (:⚙️) from comment #16)

sparky! This is awesome! Maybe I used to know this and I forgot - but WOW that's a useful command to know! These videos are great.

Glad you enjoy it! We'll have these available in the regression/improvement summaries in the very near future too (:alexandrui has been working on integrating this tool into the workflows).

You highlighted in comment 13 the region of the screen that is still updating at the end of the test, and I agree - this is the problem area. What I'm noticing when scrubbing the video is that in the "before" video, the LastVisualChange is recorded when those sidebar items up their opacity and the underline below it appears. It seems to ignore the "+" symbol that appears below them. In the "after" video, LastVisualChange is recorded when those things occur and the "+" symbol appears.

Any idea why LastVisualChange for "before" wouldn't care about the "+" symbol?

Good catch! I have a strong feeling this is because of the fuzz factor that the visual-metrics script uses to compare frames and find ones that are different. The plus sign is likely too small of a change and falls into the noise: https://github.com/sitespeedio/browsertime/blob/main/browsertime/visualmetrics-portable.py#L1053

I could change the factor to capture this but it'll have some bad unwanted side effects in other tests (gslides for instance). We can fix this with some image segmentation but I haven't had a chance to implement this yet.

Flags: needinfo?(gmierz2)

This feature is riding the 109 train to release, which goes to RC next week. I'm having a bit of trouble following this bug at this point - are we saying this likely isn't a real regression but instead a harness issue? Is there anything actionable here that we can do before we ship 109?

Flags: needinfo?(mconley)
Flags: needinfo?(gmierz2)

I'll let :mconley answer this. Based on the videos I provided, there is definitely a portion that was slower (the opacity change towards the end).

Flags: needinfo?(gmierz2)

There's definitely a difference in those side by side videos, but I don't think it should block the rollout of the Unified Extensions UI in 109. I suggest we WONTFIX this and try to find some bigger and more obvious fish to fry for page load optimizations.

Status: NEW → RESOLVED
Closed: 5 months ago
Flags: needinfo?(mconley)
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.