Closed Bug 1272357 Opened 8 years ago Closed 8 years ago

7.06 - 20.12% tps (linux64, osx-10-10, windows8-64, windowsxp) regression on push 719d6d5d9d21 (Wed May 11 2016, visibility API changes)

Categories

(Core :: Layout, defect)

47 Branch
defect
Not set
normal

Tracking

()

RESOLVED INVALID

People

(Reporter: jmaher, Unassigned)

References

(Blocks 1 open bug)

Details

(Keywords: perf, regression, Whiteboard: [talos_regression])

Talos has detected a Firefox performance regression from push 719d6d5d9d21. As author of one of the patches included in that push, we need your help to address this regression.

This is a list of all known regressions and improvements related to the push:
https://treeherder.mozilla.org/perf.html#/alerts?id=1168

On the page above you can see an alert for each affected platform as well as a link to a graph showing the history of scores for this test. There is also a link to a treeherder page showing the Talos jobs in a pushlog format.

To learn more about the regressing test(s), please see:
https://wiki.mozilla.org/Buildbot/Talos/Tests#tps

Reproducing and debugging the regression:

If you would like to re-run this Talos test on a potential fix, use try with the following syntax:
try: -b o -p win64,linux64,macosx64,win32 -u none -t g2[Windows 8,10.10,Windows XP],g2-e10s[Windows 8,10.10,Windows XP] --rebuild 5  # add "mozharness: --spsProfile" to generate profile data

(we suggest --rebuild 5 to be more confident in the results)

To run the test locally and do a more in-depth investigation, first set up a local Talos environment:
https://wiki.mozilla.lorg/Buildbot/Talos/Running#Running_locally_-_Source_Code

Then run the following command from the directory where you set up Talos:
talos --develop -e [path]/firefox -a tps

(add --e10s to run tests in e10s mode)

Making a decision:
As the patch author we need your feedback to help us handle this regression.
*** Please let us know your plans by Tuesday, or the offending patch(es) will be backed out! ***

Our wiki page outlines the common responses and expectations:
https://wiki.mozilla.org/Buildbot/Talos/RegressionBugsHandling
Seth, can you look at this?  we have a decent set of tps regressions and we need to understand these or fix them- if you need me to bisect the commit range on try, I would be happy to.
Flags: needinfo?(mfowler)
Summary: 7.06 - 20.12% tps (linux64, osx-10-10, windows8-64, windowsxp) regression on push 719d6d5d9d21 (Wed May 11 2016) → 7.06 - 20.12% tps (linux64, osx-10-10, windows8-64, windowsxp) regression on push 719d6d5d9d21 (Wed May 11 2016, visibility API changes)
Wow, this is a shockingly large regression for such a small change. I need to verify which bug this regression came from as I pushed a large number of commits in the same push. Let me push some try jobs to try to bisect.
First step of bisection:

remote: Follow the progress of your build on Treeherder:
remote:   https://treeherder.mozilla.org/#/jobs?repo=try&revision=16f2407070fe
remote:
remote: It looks like this try push has talos jobs. Compare performance against a baseline revision:
remote:   https://treeherder.mozilla.org/perf.html#/comparechooser?newProject=try&newRevision=16f2407070fe

Presumably the baseline revision here is 6fe85a187e70.
Sigh. Build failed due to -Werror. Trying again:

remote: Follow the progress of your build on Treeherder:
remote:   https://treeherder.mozilla.org/#/jobs?repo=try&revision=140267859d2b
remote:
remote: It looks like this try push has talos jobs. Compare performance against a baseline revision:
remote:   https://treeherder.mozilla.org/perf.html#/comparechooser?newProject=try&newRevision=140267859d2b
My mistake, the baseline revision is 1db715403fbe.
Alright, I think we can safely say that the following bugs are excluded as possible causes of the regression:

Bug 1269935 - Replace PresShell::DecVisibleCount() with a general map function. r=mstange
Bug 1269934 - Handle visible frame sets more generically in PresShell. r=mstange
Bug 1269931 - Send visible region updates for pres shells associated with nested views. r=botond
Bug 1268348 - Pass the previous visibility state to OnVisibilityChange(). r=mstange

That leaves these two:

Bug 1269937 - Manage updating visible frames and regions using RAII. r=mstange
Bug 1259281 - Mark frames NONVISIBLE if their pres shell is inactive or frozen. r=mstange
remote: Follow the progress of your build on Treeherder:
remote:   https://treeherder.mozilla.org/#/jobs?repo=try&revision=d77f0bb482fd
remote:
remote: It looks like this try push has talos jobs. Compare performance against a baseline revision:
remote:   https://treeherder.mozilla.org/perf.html#/comparechooser?newProject=try&newRevision=d77f0bb482fd
Am I interpreting this correctly that the evidence is pointing at bug 1259281 as the regressor here?
Flags: needinfo?(seth)
(In reply to Mike Conley (:mconley) - Needinfo me! from comment #9)
> Am I interpreting this correctly that the evidence is pointing at bug
> 1259281 as the regressor here?

Yes, I believe so. So the next step is to bisect within the patch. This is the one I suspected (in that it's the only one that makes a substantial change) but the regression is *not* expected so I need to try to figure out exactly what may have caused it.
No longer blocks: 1267562, 1268348, 1269931, 1269934, 1269935, 1269937
Flags: needinfo?(seth)
An experiment to test one theory as to the cause of the regression:

remote: Follow the progress of your build on Treeherder:
remote:   https://treeherder.mozilla.org/#/jobs?repo=try&revision=726ff937543f
remote:
remote: It looks like this try push has talos jobs. Compare performance against a baseline revision:
remote:   https://treeherder.mozilla.org/perf.html#/comparechooser?newProject=try&newRevision=726ff937543f
remote: recorded changegroup in replication log in 0.102s
(In reply to Joel Maher (:jmaher) from comment #13)
> checking out the results, we see a slight improvement, but not in the range
> of the regressions:
> https://treeherder.mozilla.org/perf.html#/compare?originalProject=mozilla-
> central&originalRevision=a884b96685aa&newProject=try&newRevision=726ff937543f
> &framework=1&filter=tps&showOnlyImportant=0

Yep, not enough to do the job. (Though it's good to know we can get a 1% improvement for those changes, since they're coming anyway in an upcoming patch. 1% isn't much, but I'll take it!)

The try job in comment 14 is testing a second theory as to the origin of the regression.

remote: Follow the progress of your build on Treeherder:
remote:   https://treeherder.mozilla.org/#/jobs?repo=try&revision=33356aa1f3a7
remote:
remote: It looks like this try push has talos jobs. Compare performance against a baseline revision:
remote:   https://treeherder.mozilla.org/perf.html#/comparechooser?newProject=try&newRevision=33356aa1f3a7
Hmm, curiouser and curiouser. Let's try this again:

remote: Follow the progress of your build on Treeherder:
remote:   https://treeherder.mozilla.org/#/jobs?repo=try&revision=ca9a99310939
remote:
remote: It looks like this try push has talos jobs. Compare performance against a baseline revision:
remote:   https://treeherder.mozilla.org/perf.html#/comparechooser?newProject=try&newRevision=ca9a99310939
(In reply to Joel Maher (:jmaher) from comment #18)
> I did some retriggers on the base and helped kick start the win8 jobs- still
> waiting on results, but this link should help:
> https://treeherder.mozilla.org/perf.html#/compare?originalProject=mozilla-
> central&originalRevision=f3f2fa1d7eed&newProject=try&newRevision=ca9a99310939
> &framework=1&filter=tps&showOnlyImportant=0

Based on what I'm seeing here, it looks like the latest patch eliminates the regression. What that patch did was make us trigger decodes for images in the approximate frame visibility region (as we did before) but *not* in the displayport (which we started doing in bug 1259281). It's puzzling to me that this causes a regression, as the frame in the displayport should (usually) be a subset of the frames in the approximate region, and the code is designed to not trigger duplicate decodes in that scenario.

My best guess to the cause is that the approximate frame visibility region does not contain all the frames in the displayport, so we're decoding more frames. That's unexpected; I'll push a job in a little bit that will add some logging to help debug this.
Hmm. I'm honestly not sure we want to "fix" this. I suspect we're just seeing the patch working as intended here.

So when we switch to an inactive tab, making it active, we run UnsuppressVisibility() to let all the frames in that tab know that they're visible again. (This applies to any type of frame we track visibility for, but we're really only concerned about images for the purposes of this bug.) Here's an example log of what UnsuppressVisibility() is doing, from the try job in comment 20:

22:31:04     INFO -  PROCESS | 1682 | *** BEGIN UnsuppressVisibility
22:31:04     INFO -  PROCESS | 1682 | *** UnsuppressVisibility: frame 0x14fa7f0c8 MAY_BECOME_VISIBLE
22:31:04     INFO -  PROCESS | 1682 | *** UnsuppressVisibility: frame 0x13cfe8ab0 MAY_BECOME_VISIBLE
22:31:04     INFO -  PROCESS | 1682 | *** UnsuppressVisibility: frame 0x13cfdad38 MAY_BECOME_VISIBLE
22:31:04     INFO -  PROCESS | 1682 | *** UnsuppressVisibility: frame 0x14fa7e290 MAY_BECOME_VISIBLE
22:31:04     INFO -  PROCESS | 1682 | *** UnsuppressVisibility: frame 0x13cfe9778 MAY_BECOME_VISIBLE
22:31:04     INFO -  PROCESS | 1682 | *** UnsuppressVisibility: frame 0x13cfe62d0 MAY_BECOME_VISIBLE
22:31:04     INFO -  PROCESS | 1682 | *** UnsuppressVisibility: frame 0x1539c1c08 IN_DISPLAYPORT
22:31:04     INFO -  PROCESS | 1682 | *** UnsuppressVisibility: frame 0x141d87908 IN_DISPLAYPORT
22:31:04     INFO -  PROCESS | 1682 | *** UnsuppressVisibility: frame 0x142ece6d0 IN_DISPLAYPORT
22:31:04     INFO -  PROCESS | 1682 | *** UnsuppressVisibility: frame 0x1422dc9b0 IN_DISPLAYPORT
22:31:04     INFO -  PROCESS | 1682 | *** UnsuppressVisibility: frame 0x153a37498 IN_DISPLAYPORT
22:31:04     INFO -  PROCESS | 1682 | *** UnsuppressVisibility: frame 0x141d85420 IN_DISPLAYPORT
22:31:04     INFO -  PROCESS | 1682 | *** UnsuppressVisibility: frame 0x142c0c978 IN_DISPLAYPORT
22:31:04     INFO -  PROCESS | 1682 | *** UnsuppressVisibility: frame 0x141d89a28 IN_DISPLAYPORT
22:31:04     INFO -  PROCESS | 1682 | *** UnsuppressVisibility: frame 0x153a37b38 IN_DISPLAYPORT
22:31:04     INFO -  PROCESS | 1682 | *** UnsuppressVisibility: frame 0x142eee020 IN_DISPLAYPORT
22:31:04     INFO -  PROCESS | 1682 | *** UnsuppressVisibility: frame 0x141d89438 IN_DISPLAYPORT
22:31:04     INFO -  PROCESS | 1682 | *** UnsuppressVisibility: frame 0x153e19c38 IN_DISPLAYPORT
22:31:04     INFO -  PROCESS | 1682 | *** UnsuppressVisibility: frame 0x142c0b718 IN_DISPLAYPORT
22:31:04     INFO -  PROCESS | 1682 | *** UnsuppressVisibility: frame 0x153f09f00 IN_DISPLAYPORT
22:31:04     INFO -  PROCESS | 1682 | *** UnsuppressVisibility: frame 0x142ece2b0 IN_DISPLAYPORT
22:31:04     INFO -  PROCESS | 1682 | *** UnsuppressVisibility: frame 0x153f09cc8 IN_DISPLAYPORT
22:31:04     INFO -  PROCESS | 1682 | *** UnsuppressVisibility: frame 0x141d87558 IN_DISPLAYPORT
22:31:04     INFO -  PROCESS | 1682 | *** UnsuppressVisibility: frame 0x142eeaa88 IN_DISPLAYPORT
22:31:04     INFO -  PROCESS | 1682 | *** UnsuppressVisibility: frame 0x153a14e58 IN_DISPLAYPORT
22:31:04     INFO -  PROCESS | 1682 | *** UnsuppressVisibility: frame 0x153a37fa0 IN_DISPLAYPORT
22:31:04     INFO -  PROCESS | 1682 | *** UnsuppressVisibility: frame 0x150cc71b8 IN_DISPLAYPORT
22:31:04     INFO -  PROCESS | 1682 | *** UnsuppressVisibility: frame 0x141d87080 IN_DISPLAYPORT
22:31:04     INFO -  PROCESS | 1682 | *** UnsuppressVisibility: frame 0x150cc52d8 IN_DISPLAYPORT
22:31:04     INFO -  PROCESS | 1682 | *** UnsuppressVisibility: frame 0x1422db098 IN_DISPLAYPORT
22:31:04     INFO -  PROCESS | 1682 | *** UnsuppressVisibility: frame 0x141d88020 IN_DISPLAYPORT
22:31:04     INFO -  PROCESS | 1682 | *** UnsuppressVisibility: frame 0x153f03498 IN_DISPLAYPORT
22:31:04     INFO -  PROCESS | 1682 | *** UnsuppressVisibility: frame 0x142ecfdc8 IN_DISPLAYPORT
22:31:04     INFO -  PROCESS | 1682 | *** UnsuppressVisibility: frame 0x141d85ca8 IN_DISPLAYPORT
22:31:04     INFO -  PROCESS | 1682 | *** UnsuppressVisibility: frame 0x1539cad10 IN_DISPLAYPORT
22:31:04     INFO -  PROCESS | 1682 | *** UnsuppressVisibility: frame 0x141d89730 IN_DISPLAYPORT
22:31:04     INFO -  PROCESS | 1682 | *** UnsuppressVisibility: frame 0x153f038b8 IN_DISPLAYPORT
22:31:04     INFO -  PROCESS | 1682 | *** UnsuppressVisibility: frame 0x141d858f8 IN_DISPLAYPORT
22:31:04     INFO -  PROCESS | 1682 | *** UnsuppressVisibility: frame 0x15401ca80 IN_DISPLAYPORT
22:31:04     INFO -  PROCESS | 1682 | *** UnsuppressVisibility: frame 0x141d890b8 IN_DISPLAYPORT
22:31:04     INFO -  PROCESS | 1682 | *** UnsuppressVisibility: frame 0x150cc5cc8 IN_DISPLAYPORT
22:31:04     INFO -  PROCESS | 1682 | *** UnsuppressVisibility: frame 0x15401f548 IN_DISPLAYPORT
22:31:04     INFO -  PROCESS | 1682 | *** UnsuppressVisibility: frame 0x153a145f0 IN_DISPLAYPORT
22:31:04     INFO -  PROCESS | 1682 | *** END UnsuppressVisibility

As you can see, there are a lot more IN_DISPLAYPORT frames (which are frames, unsurprisingly, that are in the displayport and will be painted on the initial paint) than there are MAY_BECOME_VISIBLE frames (which are frames in the approximate visibility region). The approximate visibility region is normally larger than the displayport, but doesn't contain all the IN_DISPLAYPORT frames in this case. This is probably because we only update it at most once a second in response to layout or style changes, which given how frequently this benchmark is switching between tabs is an eternity.

Here's the thing, though. If we don't trigger decodes for those IN_DISPLAYPORT images at this point, we'll just trigger them later, as we're doing the first paint, because the images won't be decoded at that point and hence won't be ready to paint. And indeed we do:

22:31:04     INFO -  PROCESS | 1682 | *** BEGIN Painting frame 0x1539c1c08
22:31:04     INFO -  PROCESS | 1682 | *** Decoding image 0x147717180
22:31:04     INFO -  PROCESS | 1682 | *** END Painting frame 0x1539c1c08
22:31:04     INFO -  PROCESS | 1682 | *** BEGIN Painting frame 0x153a145f0
22:31:04     INFO -  PROCESS | 1682 | *** Decoding image 0x14749abc0
22:31:04     INFO -  PROCESS | 1682 | *** END Painting frame 0x153a145f0
22:31:04     INFO -  PROCESS | 1682 | *** BEGIN Painting frame 0x153a14e58
22:31:04     INFO -  PROCESS | 1682 | *** Decoding image 0x147717600
22:31:04     INFO -  PROCESS | 1682 | *** END Painting frame 0x153a14e58
22:31:04     INFO -  PROCESS | 1682 | *** BEGIN Painting frame 0x153e19c38
22:31:04     INFO -  PROCESS | 1682 | *** Decoding image 0x147717780
22:31:04     INFO -  PROCESS | 1682 | *** END Painting frame 0x153e19c38

That's just a small sample. For every IN_DISPLAYPORT image we didn't start decoding during UnsuppressVisibility(), we start a decode during the first paint.

The try job in comment 16 made us not start decodes for IN_DISPLAYPORT images during UnsuppressVisibility() anymore. Instead, we only start decodes for MAY_BECOME_VISIBLE images (i.e. images in the approximate visibility region), which is what we did before the regressing bug landed. This eliminated the regression in the tps benchmark. So we know that this is where the regression stems from: we do better on tps if we wait until the first paint to start decoding most of the images. The primary reason for this, I'd expect, is that if virtually none of the images in the document are decoding on the first paint, *we don't have to do the work of drawing them*.

Here's the tradeoff, though. If we trigger image decoding early, we have a better chance of having images ready to paint by the time we do the first paint, which means we reduce flashing. We also won't need to repaint later when the images get decoded. (And several repaints could be required, since the images won't finish decoding all at once.) On the other hand, if we wait, we'll have flashing and have to do additional repaints later, but because we didn't have to do the work of drawing those images on the first paint, we'll do better on the tps benchmark.

It's probably also worth noting that if we had lingered on each tab longer before switching (as would happen during real usage) the set of MAY_BECOME_VISIBLE images would generally be a *superset* of the IN_DISPLAYPORT images. So this wouldn't be an issue at all; we'd have triggered decodes for the same images no matter what. The only reason we see a behavior change is because IN_DISPLAYPORT is updated on every paint, while MAY_BECOME_VISIBLE has a slower update speed, and this can matter if tabs are switched very fast during initial page load.

So my conclusion is this: because starting decoding earlier leads to less flashing (which makes for a better user experience) and fewer repaints (which, since repaints are expensive, results in better overall performance and energy usage), we should accept this regression, since in the real world this is not a regression but an improvement. (Of course, as mentioned in the preceding paragraph, most of the time there really won't be any change at all. But when there is, it should be an improvement in practical terms.)

Joel, are you alright with this?
Flags: needinfo?(jmaher)
(In reply to Seth Fowler [:seth] [:s2h] from comment #21)
> So my conclusion is this: because starting decoding earlier leads to less
> flashing (which makes for a better user experience) and fewer repaints
> (which, since repaints are expensive, results in better overall performance
> and energy usage), we should accept this regression, since in the real world
> this is not a regression but an improvement. (Of course, as mentioned in the
> preceding paragraph, most of the time there really won't be any change at
> all. But when there is, it should be an improvement in practical terms.)

Thanks for the analysis, sounds good to me.

Is there an implied suggestion there on how to improve tps to be more representative of real-world use cases? If yes, would you be able to spell it out and maybe someone could followup?
Seth, this is a great analysis- I am fine closing this out.  As :avih pointed out, it would be nice to fix tps to be even better at measuring issues that affect the end user vs its own measurements.
Flags: needinfo?(jmaher)
Another thing to consider: During tab switch, we briefly reduce the display port size to be just the viewport, so that the first paint when switching to a tab is faster. (Look for "suppressDisplayport" in the code.) Are we only triggering decoding for in-viewport images for the first tab paint, or are we triggering decodes for all the images in the unsuppressed displayport?
ni'ing seth for comment 24
Flags: needinfo?(seth)
(In reply to Markus Stange [:mstange] from comment #24)
> Another thing to consider: During tab switch, we briefly reduce the display
> port size to be just the viewport, so that the first paint when switching to
> a tab is faster. (Look for "suppressDisplayport" in the code.) Are we only
> triggering decoding for in-viewport images for the first tab paint, or are
> we triggering decodes for all the images in the unsuppressed displayport?

Right now we can't distinguish between the viewport and the displayport, because bug 1270389 hasn't landed that. That's something we'll be able to do soon, but can't do yet.

The set of IN_DISPLAYPORT frames we're using is updated each time we paint. So if the previous time we painted, we used an unsuppressed displayport, the IN_DISPLAYPORT frames would be all the frames in the unsuppressed displayport.

In the past what we've been doing is triggering decoding for all images in MAY_BECOME_VISIBLE frames, the idea being that we'll catch things that are visible as well as things that the user may scroll to soon. We may want to get more clever about this in the future (by prioritizing things in the viewport), and in fact that's been on my to-do list for a while - it's one of the things the new visibility API should let us implement easily, once the switchover to it is complete. I'll make sure there's a bug filed about it.

For now, though, sticking to the status quo means that triggering image decoding for all IN_DISPLAYPORT frames serves as a safeguard; if MAY_BECOME_VISIBLE is up to date, then it makes no difference, but if it's out of date as we're seeing in this benchmark, then it's beneficial because we don't delay image decoding until the first paint. In other words, where we are now is strictly better than where we were, but you're absolutely right that we can do even better and we should. =)
Flags: needinfo?(seth)
Flags: needinfo?(mfowler)
Sounds good, thanks.
(In reply to Avi Halachmi (:avih) from comment #22)
> Is there an implied suggestion there on how to improve tps to be more
> representative of real-world use cases? If yes, would you be able to spell
> it out and maybe someone could followup?

Well, I can tell you what would make things more representative with respect to this specific issue: just make sure that when we open a tab for the first time, we leave it open for more than a second, to ensure that the set of MAY_BECOME_VISIBLE frames is able to settle. (The problem is that we won't update it for background tabs.)

On the other hand, this obviously makes the test take longer, which costs Mozilla money and may not have any benefits other than for visibility-related code. Some alternatives:

1. We could just start updating the set of MAY_BECOME_VISIBLE frames, even for background tabs. I have mixed feelings about whether this is a good idea. If JS code in background tabs is making dramatic changes to layout or scroll position, this would reduce flashing, but it makes us do additional work.

2. We could add a pref that would make us do #1 just for Talos tests. This is easy to do, but I suspect you don't want to do it, as you probably don't want to test a configuration that's different from the one that users actually use.
I'm going to go ahead and mark this resolved.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → INVALID
Component: Untriaged → Layout
Product: Firefox → Core
You need to log in before you can comment on or make changes to this bug.