Closed Bug 1704795 Opened 3 years ago Closed 3 years ago

17.4 - 8.01% tart / tart + 3 more (Linux, OSX, Windows) regression on Sat April 10 2021

Categories

(Firefox :: Address Bar, defect, P3)

Firefox 89
defect

Tracking

()

RESOLVED WORKSFORME
Tracking Status
firefox-esr78 --- unaffected
firefox87 --- unaffected
firefox88 --- unaffected
firefox89 --- wontfix

People

(Reporter: alexandrui, Unassigned)

References

(Blocks 1 open bug, Regression)

Details

(4 keywords, Whiteboard: [proton-address-bar])

Attachments

(8 files)

Perfherder has detected a talos performance regression from push 135c5397086dac5623bc45d92bd6e7709806d89d. As author of one of the patches included in that push, we need your help to address this regression.

Regressions:

Ratio Suite Test Platform Options Absolute values (old vs new)
17% tart windows10-64-shippable e10s stylo 2.24 -> 2.63
17% tart linux1804-64-shippable e10s stylo 2.74 -> 3.21
15% tart linux1804-64-shippable e10s stylo 2.81 -> 3.23
14% tart macosx1015-64-shippable e10s stylo 2.08 -> 2.38
8% tart windows10-64-shippable-qr e10s stylo webrender 2.58 -> 2.78

Improvements:

Ratio Suite Test Platform Options Absolute values (old vs new)
9% tresize macosx1015-64-shippable e10s stylo 9.71 -> 8.87
7% tp5o_scroll macosx1015-64-shippable e10s stylo 1.52 -> 1.41
7% tresize macosx1015-64-shippable-qr e10s stylo webrender-sw 7.82 -> 7.28

Details of the alert can be found in the alert summary, including links to graphs and comparisons for each of the affected tests. Please follow our guide to handling regression bugs and let us know your plans within 3 business days, or the offending patch(es) will be backed out in accordance with our regression policy.

For more information on performance sheriffing please see our FAQ.

Flags: needinfo?(mak)

It's possible this is partially undoing what graduating the main proton pref did, and may be related to to the pixel rounding problem that made us first mark WPT tests passing when the main pref graduated, and then failing again when urlbar graduated, there must be an half pixel varying the toolbar size.
At least some of the graphs show us going back to previous values.

It's also possible tart doesn't appreciate much the wider shadow around the urlbar, though we also removed the expansion of the urlbar.
This affects mostly Linux and Windows, on the Mac we just seem to go back to the status-quo.
We should probably run some experiments related to the shadow and focus ring.

Let's move this to the right Epic.

Flags: needinfo?(mak)
Whiteboard: [proton-address-bar]

== Change summary for alert #29679 (as of Tue, 13 Apr 2021 11:40:36 GMT) ==

Regressions:

Ratio Suite Test Platform Options Absolute values (old vs new)
23% youtube SpeedIndex windows10-64-shippable warm 792.62 -> 972.50
22% youtube SpeedIndex windows10-64-shippable warm 789.29 -> 962.17
18% youtube SpeedIndex windows10-64-shippable cold 1,240.92 -> 1,469.58
10% paypal ContentfulSpeedIndex windows10-64-shippable-qr cold webrender 688.31 -> 758.17
5% google-docs SpeedIndex windows10-64-shippable warm 1,103.17 -> 1,153.33
4% bing-search ContentfulSpeedIndex linux1804-64-shippable-qr cold webrender 334.25 -> 347.33
4% bing-search ContentfulSpeedIndex linux1804-64-shippable cold 323.69 -> 335.42
3% bing-search ContentfulSpeedIndex linux1804-64-shippable-qr warm webrender 252.29 -> 260.33

Improvements:

Ratio Suite Test Platform Options Absolute values (old vs new)
19% youtube SpeedIndex windows10-64-shippable-qr warm webrender 1,009.12 -> 813.25
16% youtube SpeedIndex windows10-64-shippable-qr cold webrender 1,524.38 -> 1,275.33
7% ebay ContentfulSpeedIndex linux1804-64-shippable warm 468.88 -> 434.83

For up to date results, see: https://treeherder.mozilla.org/perfherder/alerts?id=29679

all these tests graph seem to pretty much mimic the tart one.

Set release status flags based on info from the regressing bug 1703274

Making it a P1 to reflect the fact we need to investigate in order to better understand the issue

Priority: -- → P1

adding ni? on myself, I'll do some Try pushes with parts disabled (like the shadow, updating engines...) to see if numbers change.

Flags: needinfo?(mak)

Regarding TART, I first tried the simplest things: removing the larger shadow, removing the border, avoiding search engines update work. Those didn't help. Then I generated 2 perf profiles (sorry, bit.ly not working atm):
before: https://profiler.firefox.com/from-url/https%3A%2F%2Ffirefox-ci-tc.services.mozilla.com%2Fapi%2Fqueue%2Fv1%2Ftask%2FWcqw-rNyQfGLgi99-cvFmw%2Fruns%2F0%2Fartifacts%2Fpublic%2Ftest_info%2Fprofile_tart.zip
after: https://profiler.firefox.com/from-url/https%3A%2F%2Ffirefox-ci-tc.services.mozilla.com%2Fapi%2Fqueue%2Fv1%2Ftask%2FepnJQbFwSxGPFaqEYCI1SA%2Fruns%2F0%2Fartifacts%2Fpublic%2Ftest_info%2Fprofile_tart.zip

Looking through them, and more specifically looking at the Marker Chart, there's one obvious difference; in the good case there is a long urlbar-grow css animation, that is taking up most of the initial time and likely delaying other things. This is the old urlbar expansion animation.
In the bad case, that animation doesn't exist, so things should be a bit cheaper but, due to that, other things enter the measurement bracket before the TART - End event, for example I see a BookmarksToolbarVisibilityUpdated event.
Most of the additional events seem to be related to the bookmarks toolbar now being visible sooner.
Thus, I suspect we are effectively measuring different things now, maybe we should further delay the toolbars update? Though that may also introduce visual "flicker".
All in all, it doesn't look like the urlbar changes are directly causing a tart regression, but rather making things happen sooner than usual.
Mike, is this a plausible interpretation based on your experience?

Next step, I'll look at Youtube speedindex.

Flags: needinfo?(mconley)

I have no idea how to trigger perf profiles on vismet tests, thus I'm asking on Matrix (and waiting...). Without profiles it may not be trivial to understand where we spend time.
I assumed maybe updating opensearch engines made a difference, but my patch disabling that didn't change much, so it's likely unrelated.
Since this is another test that measures how long it takes for something to appear on screen, I wonder if it's somehow related to the tart problem, thus things happen sooner and delay page painting. Maybe I should try to disable the bookmarks toolbar completely and see what happens.

Unfortunately, those shortlinks to the profiles aren't working for me. :/ Can you post the long links?

So one of the bad things about TART is that what it's measuring is the average cost to paint frames during tab animations. It's possible that by getting rid of the URL bar grow animation, we've reduced the number of frames we need to paint (that's good), which reduces the size of the set of numbers that we're taking the average of, which can then shift the numbers (especially if that grow animation required little time to paint).

However, I highly recommend that we ignore the TART regressions for now. The percentages look bad, but the actual magnitude in milliseconds is small (less than a millisecond for each). I think we should focus our energies elsewhere, like looking at the SpeedIndex regressions.

Flags: needinfo?(mconley)

I updated the above comment with links to the profiles.

(In reply to Mike Conley (:mconley) (:⚙️) (Catching up on needinfos) from comment #9)

So one of the bad things about TART is that what it's measuring is the average cost to paint frames during tab animations. It's possible that by getting rid of the URL bar grow animation, we've reduced the number of frames we need to paint (that's good), which reduces the size of the set of numbers that we're taking the average of, which can then shift the numbers (especially if that grow animation required little time to paint).

It was showing as a large entry on the profile, but I'm not expert in reading graphics costs on these profiles.

However, I highly recommend that we ignore the TART regressions for now. The percentages look bad, but the actual magnitude in milliseconds is small (less than a millisecond for each). I think we should focus our energies elsewhere, like looking at the SpeedIndex regressions.

I'm on those, comparing videos based on your idea in the other bug.

Attached video test.mp4

While many of the test runs show no difference, as visible here there is some cases where content appears slightly sooner before the urlbar change.
Also the urlbar contents get drawn slightly later.
Though the video also shows how the urlbar is pretty much doing nothing fancy.

I'll run more experiments on Try changing some padding here and there to see if that influences the score.

at this point, I'll take the bug since I spent quite some time investigating it already.

Assignee: nobody → mak
Status: NEW → ASSIGNED
Flags: needinfo?(mak)
See Also: → 1704463

I ran an experiment increasing the padding around the urlbar to 10px (from 4px) to check if it had any effect.
It actually improved youtube SpeedIndex opt cold by 17%... but not effect on warm. It is still interesting how the test is sensitive to changes of the toolbox height.

On the profiles I've noticed this so far:

  1. before reports things like setURI, setTabTitle, that come earlier in the navigation, while after reports things like updateReaderButton, onSecurityStateChange. This seems to suggest there is a shift in times, and maybe the test measurement is suffering an initial delay.
  2. There is a block of telemetry markers that in the before profile happen around 2.4s, in the after profile they happen at 1.3s, so they happen much sooner. This comes from profile-after-change TelemetryControllerParent.jsm::setupTelemetry() that contains a DeferredTask base d on toolkit.telemetry.initDelay, with a default time of 60s. It also creates a new thread and seems to also cause a couple of writeAtomics. It is happening about 1s sooner in after and more or less that time coincides with the time where we see a paint delay. should telemetry setup time be included in this measurement?

I'll run a further experiment delaying that telemetry block and checking its effect.

my telemetry experiment brought some improvements in Bing Search Speedindex, but not in Youtube.

Let me sum up a bit the status here:

Experiments:

  1. the shadow around the urlbar doesn't seem to have a measurable effect
  2. the bookmarks toolbar updates don't seem to have a measurable effect
  3. the transparent/thick border doesn't seem to have a measurable effect
  4. the toolbar size seems to have a measurable effect, increasing padding improved some cold speedindex measurement
  5. delaying telemetry seems to have a measurable effect, it cleaned up considerably the profile and improved some speedindex but had no effect on others

Notes:

  1. Things happen sooner in the after profile. Some examples are tab-burst-animation, reload-to-stop animation, network events and Navigation events all tend to happen sooner. Some of these even 100ms sooner. Paints though seem to happen later. May we be seeing a shift in workload?
  2. in general, chrome doesn't seem to do additional work, or at least I cannot clearly identity any of it
  3. since this bug is about the urlbar more in specific, I cannot find any additional work added by the urlbar in the profile

My conclusions:

  1. I'll file a bug to evaluate delaying telemetry init, so it doesn't happen during these tests and in profiles. It doesn't seem to have a huge impact, but it causes I/O, GC and thread creation, that can create noise.
  2. there is a time shift in events, certain events happen sooner, while graphic events seem to happen later. In general most of the chrome things seem to happen sooner, that seems counter intuitive
  3. The toolbar padding seems to also have an effect on delays.
  4. it may be useful to have someone from layout/graphics to look at the profiles
Depends on: 1706180
See Also: → 1706333

Bug 1706333 is a further indication that changing animation duration in the chrome toolbar affects these tests. As a reminder here we removed a quite long animation due to the urlbar expanding.

Hey mak, I wonder if the fact that we're getting to certain animations earlier is, as you say, reordering some work and priority. What happens if you hide the stop/reload and tab throbber elements in the before and after cases? Does the "after" case continue to regress?

Flags: needinfo?(mak)

An experiment I want to run is to remove the urlbar expansion in the before case, and see the effect of doing that.
Removing all the animations is another possibility we should test, even if ideally those didn't change when we enabled the proton urlbar.

Flags: needinfo?(mak)

And, to be clear, I didn't find any obvious new jank or explanations in the profiles that I looked at. I'm reasonably certain this is a work re-scheduling thing, and that we should just accept this regression.

I tried running a test on pre-proton-urlbar with ui.prefersReducedMotion = 1, it didn't have interesting effects on SpeedIndex.

Here's a side-by-side comparison, comparing 6542eba3a6dfb79c7c979f6ec7eea9315e6824c4 on autoland to 3a4f4d99eb21080af52e408da7b60885140136d2 using sparky's script here.

I honestly don't see a real difference here. I don't know what SpeedIndex is talking about.

python3 generate_side_by_side.py --base-revision 6542eba3a6dfb79c7c979f6ec7eea9315e6824c4 --base-branch autoland --new-revision 3a4f4d99eb21080af52e408da7b60885140136d2 --new-branch autoland --platform test-windows10-64-shippable/opt --test-name browsertime-tp6-firefox-youtube-e10s --warm --output ./output

Same deal as above, except for cold load.

So I think the YouTube regression report is bogus.

Attachment #9217803 - Attachment description: warm-side-by-side.mp4 → YouTube warm-side-by-side.mp4
Attachment #9217804 - Attachment description: cold-side-by-side.mp4 → YouTube cold-side-by-side.mp4

Again, no visible regression.

In this case, it actually looks like the after case is faster to get the table of contents listed. For some reason, the before case never gets that far.

So, again, I'm not convinced there's a regression for this one too.

There is a slight difference on Bing with WebRender enabled. It's very slight.

Very slight difference on Bing here.

Very slight difference here, too.

Okay, here's my conclusion:

Enabling the Proton URL bar resulted in a small change to the content area. This has resulted in some changes to our browsertime scores, and I think we can chalk that up as a re-baselining (similar to bug 1704463).

The change has also resulted in slight differences in scores for some of our other tests, but there's really nothing actionable from the profiles as a single "culprit". If anything, the change in the URL bar has changed the environment slightly, as well as shifting some scheduling around. That's what these tests are detecting. I don't think this bug should block MR1. I think we should at least deprioritize this bug, if not closing it as WONTFIX.

Flags: needinfo?(rtestard)

Thanks for the thorough investigations! Agreed that if this is most likely a re-baselining it should not block Proton.
Leaving this opened as a P3 since it seems like a follow-up is needed to ensure that the proton changes are accounted for moving forward (set a new baseline?)

Flags: needinfo?(rtestard)
Priority: P1 → P3

Not actively working on this atm, and I did all I could here.

Assignee: mak → nobody
Status: ASSIGNED → NEW

I'm going to close this as WFM.
I already explained above how tart is now measuring stuff that didn't enter its measurement bracket before.
And both me and mconley were unable to find a clear cause for the SpeedIndex regression. Now today this happens: https://bugzilla.mozilla.org/show_bug.cgi?id=1707949#c9. That bug only changed opacity of urlbar-icons, and shows a huge 18% win on youtube SpeedIndex.
At this point keeping this open doesn't make sense.

Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → WORKSFORME
Has Regression Range: --- → yes
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: