Closed Bug 1796163 Opened 3 years ago Closed 3 years ago

2.71% google-mail LastVisualChange (Linux) regression on Thu October 13 2022

Tracking

(firefox-esr102 unaffected, firefox106 unaffected, firefox107 disabled, firefox108 disabled, firefox109 wontfix, firefox110 wontfix)

Status:

RESOLVED WONTFIX

Tracking Flags:

Tracking

Status

firefox-esr102

---

unaffected

firefox106

---

unaffected

firefox107

---

disabled

firefox108

---

disabled

firefox109

---

wontfix

firefox110

---

wontfix

People

(Reporter: alexandrui, Unassigned)

References

(Regression)

Details

(Keywords: perf, perf-alert, regression, Whiteboard: [addons-jira])

Attachments

(3 files)

lastvisualchange_regression_gmail.png 3 years ago Greg Mierzwinski [:sparky] 818.93 KB, image/png		Details
cold-side-by-side-slow-motion.gif 3 years ago Greg Mierzwinski [:sparky] 7.71 MB, image/gif		Details
side-by-sides-gmail.7z 3 years ago Greg Mierzwinski [:sparky] 8.60 MB, application/x-7z-compressed		Details

Alexandru Ionescu (needinfo me) [:alexandrui]

Reporter

Description

•

3 years ago

•

Edited

Perfherder has detected a browsertime performance regression from push 82de653dd2d9c8e9901f18993258d91a836a1a9a. As author of one of the patches included in that push, we need your help to address this regression.

Regressions:

Ratio	Test	Platform	Options	Absolute values (old vs new)
4%	google-mail LastVisualChange	linux1804-64-shippable-qr	cold fission webrender	4,238.33 -> 4,386.67

Details of the alert can be found in the alert summary, including links to graphs and comparisons for each of the affected tests. Please follow our guide to handling regression bugs and let us know your plans within 3 business days, or the offending patch(es) may be backed out in accordance with our regression policy.

If you need the profiling jobs you can trigger them yourself from treeherder job view or ask a sheriff to do that for you.

For more information on performance sheriffing please see our FAQ.

Flags: needinfo?(wdurand)

BugBot [:suhaib / :marco/ :calixte]

Comment 1

•

3 years ago

Set release status flags based on info from the regressing bug 1793626

status-firefox106: --- → unaffected

status-firefox107: --- → affected

status-firefox-esr102: --- → unaffected

Donal Meehan [:dmeehan]

Comment 2

•

3 years ago

:mixedpuppy could this be triaged for impact on the 107 release?

Flags: needinfo?(mixedpuppy)

Shane Caraveo (:mixedpuppy)

Comment 3

•

3 years ago

The associated change is flipping a pref for a feature on nightly. There should be no affect on release.

Flags: needinfo?(mixedpuppy)

Donal Meehan [:dmeehan]

Updated

•

3 years ago

status-firefox107: affected → disabled

Donal Meehan [:dmeehan]

Comment 4

•

3 years ago

(In reply to Shane Caraveo (:mixedpuppy) from comment #3)

The associated change is flipping a pref for a feature on nightly. There should be no affect on release.

Sorry for the unnecessary ping, I should have spotted the regressing change was nightly only. Thanks.

Tomislav Jovanovic :zombie

Updated

•

3 years ago

Severity: -- → S4

Priority: -- → P2

Whiteboard: [addons-jira]

Jira Integration Bot

Updated

•

3 years ago

See Also: → https://mozilla-hub.atlassian.net/browse/WEBEXT-777

Dianna Smith [:diannaS]

Updated

•

3 years ago

status-firefox108: affected → wontfix

status-firefox109: --- → affected

William Durand [:willdurand]

Comment 5

•

3 years ago

:alexandrui looking at the recent trend, it looks like the perfs are a lot better than they could have been and so I am wondering if you could help here. Is there still a perf regression caused by the unified extensions UI work? Thanks!

Flags: needinfo?(wdurand) → needinfo?(aionescu)

Dianna Smith [:diannaS]

Updated

•

3 years ago

status-firefox108: wontfix → disabled

Alexandru Ionescu (needinfo me) [:alexandrui]

Reporter

Comment 6

•

3 years ago

The improvement is because of bug 1794153. The perf regression caused by the unified extensions UI work is still there.

Flags: needinfo?(aionescu)

William Durand [:willdurand]

Comment 7

•

3 years ago

:alexandrui ok, thanks. Now... I am sorry but could you help me understand what this perftest is measuring?

I clicked all the links in comment 0 but that didn't help. I don't think I've seen "4%" mentioned anywhere and loading 60 days of data shows a pretty stable and flat curve to me, until recently (but that is explained by your comment 6). The commit that regressed this perftest enables a feature in Nightly but the feature itself does not do much (it shows a new button in the toolbar basically).

Flags: needinfo?(aionescu)

Alexandru Ionescu (needinfo me) [:alexandrui]

Reporter

Comment 8

•

3 years ago

here is the regression. You can close this as wontfix as this affects nightly only.

Flags: needinfo?(aionescu)

William Durand [:willdurand]

Comment 9

•

3 years ago

This feature will likely be enabled in all channels during 109 so I don't think wontfixing it will solve the problem.

:Gijs (he/him)

Comment 10

•

3 years ago

Adding Mike who has some context around the changes in this project as well as knowing a lot of stuff about front-end perf. :-)

I ended up looking at this link which is zoomed in on the regression (link might stop working in another 20 or so days) and it seems there were retriggers in the range of commits to establish which commit moved the needle here. But each of the measurement levels of 4400 or 4360ms (which are all rounded to multiples of 40ms, it seems, so presumably something in the test does that or we're multiplying by 40 or whatever) has values from multiple revisions in it. So it seems to me the test is reasonably noisy. Looking at the commits in this range I'm also seeing way more retriggers than I'm seeing in that graph, which further confuses things.

Given this is just a pref flip, ISTM there's a pretty easy test to see if this was the real regressor: if you push a perf try run for both the autoland revision that was identified in comment 0, and that + the nightly if branch set back to false, do we see the regression disappearing? (Maybe Alexandru already did this?)
It'd also be useful to check this against current central - as noted in comment #5 / comment #6, current measurements are around half of what they were so any regression is necessarily going to be much smaller. I can't tell if comment 6 means this was already done or not.

If there is a real regression here, I'd probably try doing perf profile compares, though it may be hard to attribute a difference as small as 2% to something in those. Maybe Mike has better/other ideas here...

Flags: needinfo?(mconley)

Mike Conley (:mconley) (:⚙️)

Comment 11

•

3 years ago

So thankfully someone has gone to the trouble of doing a bunch of retriggers on the original jobs, and it looks like browsertime profiles were generated for both the previous revision, and the one that enabled the Unified Extensions UI.

Here are those two builds:
https://treeherder.mozilla.org/jobs?repo=autoland&fromchange=4007caecc5ff616ed4c9ae76d8bfa218a1177444&tochange=82de653dd2d9c8e9901f18993258d91a836a1a9a&searchStr=browsertime&selectedTaskRun=cv9c4TIcQQmditA8NAeN4g.0

But I'm having a pretty hard time making heads or tails out of what I'm seeing. At least looking at the screenshots / thumbsnails in those profiles, the loading of GMail doesn't look significantly different in the content area. Example:

Baseline: https://share.firefox.dev/3OMtkbe
Regressor: https://share.firefox.dev/3VDOquM

Maybe I'm not understanding LastVisualChange correctly, but I thought reading this doc that it had to do with the last change in the content area, and I'm not seeing much of a difference between these.

Hey Bas, do you happen to know who the current in-house expert is on Browsertime who could help us see in these profiles where the LastVisualChange difference is?

Flags: needinfo?(mconley) → needinfo?(bas)

Bas Schouten (:bas.schouten)

Comment 12

•

3 years ago

(In reply to Mike Conley (:mconley) (:⚙️) from comment #11)

So thankfully someone has gone to the trouble of doing a bunch of retriggers on the original jobs, and it looks like browsertime profiles were generated for both the previous revision, and the one that enabled the Unified Extensions UI.

Here are those two builds:
https://treeherder.mozilla.org/jobs?repo=autoland&fromchange=4007caecc5ff616ed4c9ae76d8bfa218a1177444&tochange=82de653dd2d9c8e9901f18993258d91a836a1a9a&searchStr=browsertime&selectedTaskRun=cv9c4TIcQQmditA8NAeN4g.0

But I'm having a pretty hard time making heads or tails out of what I'm seeing. At least looking at the screenshots / thumbsnails in those profiles, the loading of GMail doesn't look significantly different in the content area. Example:

Baseline: https://share.firefox.dev/3OMtkbe
Regressor: https://share.firefox.dev/3VDOquM

Maybe I'm not understanding LastVisualChange correctly, but I thought reading this doc that it had to do with the last change in the content area, and I'm not seeing much of a difference between these.

Hey Bas, do you happen to know who the current in-house expert is on Browsertime who could help us see in these profiles where the LastVisualChange difference is?

Sorry for the slow reply here. Sparky is probably a good initial point of contact. Nazim would know most about how the profiles are made.

Flags: needinfo?(bas) → needinfo?(gmierz2)

Greg Mierzwinski [:sparky]

Comment 13

•

3 years ago

Attached image lastvisualchange_regression_gmail.png — Details

This is where the regression seems to be coming from (see arrow). I'll upload a zip with more information in the the comment.

Greg Mierzwinski [:sparky]

Comment 14

•

3 years ago

Attached image cold-side-by-side-slow-motion.gif — Details

Greg Mierzwinski [:sparky]

Comment 15

•

3 years ago

Attached file side-by-sides-gmail.7z — Details

I produced these locally using the following command: ./mach perftest-tools side-by-side --base-platform test-linux1804-64-shippable-qr/opt -t browsertime-tp6-firefox-google-mail --new-revision 82de653dd2d9c8e9901f18993258d91a836a1a9a --base-revision 4007caecc5ff616ed4c9ae76d8bfa218a1177444 --metric LastVisualChange

The before videos are from this push: https://treeherder.mozilla.org/jobs?repo=autoland&searchStr=browsertime&revision=4007caecc5ff616ed4c9ae76d8bfa218a1177444

The after videos are from this push: https://treeherder.mozilla.org/jobs?repo=autoland&searchStr=browsertime&revision=82de653dd2d9c8e9901f18993258d91a836a1a9a

Flags: needinfo?(gmierz2)

Mike Conley (:mconley) (:⚙️)

Comment 16

•

3 years ago

sparky! This is awesome! Maybe I used to know this and I forgot - but WOW that's a useful command to know! These videos are great.

And I think I see what's going wrong... I actually think there might be something wrong with how LastVisualChange is being calculated.

You highlighted in comment 13 the region of the screen that is still updating at the end of the test, and I agree - this is the problem area. What I'm noticing when scrubbing the video is that in the "before" video, the LastVisualChange is recorded when those sidebar items up their opacity and the underline below it appears. It seems to ignore the "+" symbol that appears below them. In the "after" video, LastVisualChange is recorded when those things occur and the "+" symbol appears.

Any idea why LastVisualChange for "before" wouldn't care about the "+" symbol?

Flags: needinfo?(gmierz2)

Greg Mierzwinski [:sparky]

Comment 17

•

3 years ago

(In reply to Mike Conley (:mconley) (:⚙️) from comment #16)

sparky! This is awesome! Maybe I used to know this and I forgot - but WOW that's a useful command to know! These videos are great.

Glad you enjoy it! We'll have these available in the regression/improvement summaries in the very near future too (:alexandrui has been working on integrating this tool into the workflows).

You highlighted in comment 13 the region of the screen that is still updating at the end of the test, and I agree - this is the problem area. What I'm noticing when scrubbing the video is that in the "before" video, the LastVisualChange is recorded when those sidebar items up their opacity and the underline below it appears. It seems to ignore the "+" symbol that appears below them. In the "after" video, LastVisualChange is recorded when those things occur and the "+" symbol appears.

Any idea why LastVisualChange for "before" wouldn't care about the "+" symbol?

Good catch! I have a strong feeling this is because of the fuzz factor that the visual-metrics script uses to compare frames and find ones that are different. The plus sign is likely too small of a change and falls into the noise: https://github.com/sitespeedio/browsertime/blob/main/browsertime/visualmetrics-portable.py#L1053

I could change the factor to capture this but it'll have some bad unwanted side effects in other tests (gslides for instance). We can fix this with some image segmentation but I haven't had a chance to implement this yet.

Flags: needinfo?(gmierz2)

Ryan VanderMeulen [:RyanVM]

Comment 18

•

3 years ago

This feature is riding the 109 train to release, which goes to RC next week. I'm having a bit of trouble following this bug at this point - are we saying this likely isn't a real regression but instead a harness issue? Is there anything actionable here that we can do before we ship 109?

status-firefox110: --- → affected

Flags: needinfo?(mconley)

Flags: needinfo?(gmierz2)

Greg Mierzwinski [:sparky]

Comment 19

•

3 years ago

I'll let :mconley answer this. Based on the videos I provided, there is definitely a portion that was slower (the opacity change towards the end).

Flags: needinfo?(gmierz2)

Mike Conley (:mconley) (:⚙️)

Comment 20

•

3 years ago

There's definitely a difference in those side by side videos, but I don't think it should block the rollout of the Unified Extensions UI in 109. I suggest we WONTFIX this and try to find some bigger and more obvious fish to fry for page load optimizations.

Status: NEW → RESOLVED

Closed: 3 years ago

Flags: needinfo?(mconley)

Resolution: --- → WONTFIX

Ryan VanderMeulen [:RyanVM]

Updated

•

3 years ago

status-firefox109: affected → wontfix

status-firefox110: affected → wontfix

You need to log in before you can comment on or make changes to this bug.