Closed Bug 990212 Opened 10 years ago Closed 10 years ago

8% tart regression on fx-team for most all platforms

Categories

(Firefox :: General, defect)

defect
Not set
normal

Tracking

()

RESOLVED WONTFIX
Tracking Status
firefox31 - wontfix
firefox32 + wontfix

People

(Reporter: jmaher, Unassigned)

References

(Blocks 1 open bug)

Details

(Keywords: perf, regression, Whiteboard: [talos_regression])

as seen on dev.tree-management:
https://groups.google.com/forum/#!topic/mozilla.dev.tree-management/_Mz5PR7K_aI

we have a nice bump on all platforms except for osx 10.6:
http://graphs.mozilla.org/graph.html#tests=[[293,132,33],[293,132,31],[293,132,37],[293,132,35],[293,132,25],[293,64,24]]&sel=none&displayrange=7&datatype=running

here is the datazilla view (w7):
https://datazilla.mozilla.org/?start=1395679254&stop=1396284054&product=Firefox&repository=Fx-Team-Non-PGO&os=win&os_version=6.1.7601&test=tart&graph_search=3f245df4a3cd&project=talos

newtab-open-preload-no.all.TART
newtab-open-preload-no.half.TART
newtab-open-preload-yes.all.TART
newtab-open-preload-yes.error.TART
newtab-open-preload-yes.half.TART



right now it is leaning towards:
http://hg.mozilla.org/integration/fx-team/rev/3f245df4a3cd

I have done some retriggers to see if this is really the problem or not:
https://tbpl.mozilla.org/?tree=Fx-Team&jobname=fx-team%20talos%20svgr&fromchange=382f676d0ed9&tochange=dc04e14d954b
most of the results are in, it appears to be what I originally suspected.

maxim, can you take a look at this?
Flags: needinfo?(mzhilyaev)
Depends on: 975228
It should be bug 975228 causing the increase for tab animation tests. It causes the newtab to show 9 tiles with images instead of 9 (8?) empty tiles.
good call.  If this is a feature that is optimized and we all acknowledge the regression, then we can be good.  Quite possibly there are a few optimizations we can do to make this slightly better.
mconley walked me through some steps to create ( https://dl.dropboxusercontent.com/u/2921989/tart.xpi -> prefs -> chrome://tart/content/tart.html ):
https://people.mozilla.org/~elee/tart-tiles.mov
https://people.mozilla.org/~elee/tart-blank.mov

He also suggested looking at the raw numbers, and it does seem like pretty much all of the increase is from the first frame. Here are numbers of the average of 5 repeats across 3 runs.

first frame:
without bug 975228 first frame:    53.64 53.79 53.54 -> 53.66
with bug first frame:              70.00 67.16 67.35 -> 68.17
with, thumbnail visibility hidden: 55.92 54.61 56.28 -> 55.60

half of frames:
without bug 975228 first frame:    17.70 17.15 17.33 -> 17.39
with bug first frame:              17.15 17.57 17.61 -> 17.44
with, thumbnail visibility hidden: 17.03 17.33 17.41 -> 17.26

So the animation seems to be the same as before except for the very first frame as the first frame will try to render a bunch of images instead of empty background.
And this might be more representative of the situation our user-base is in; if they're showing the newtab page tiles, almost certainly they're showing thumbnails (unless something's gone wrong there).

As you're all probably already aware of, the "empty" tiles scenario originally measured by TART is only likely on the first open of the new-tab page, and goes away as soon as some history is added.

So we might be OK here. We should let Avi weigh in, though.
Flags: needinfo?(avihpit)
We need to look into this IMO.

First, a summary:

The newtab page introduced a real performance hit to tab animation at the time due to loading the thumbnails while animating the tab. Because opening a new empty tab happens frequently and we wanted this specific scenario to look good, this case was addressed partially by preloading the newtab page - implemented by ttaubert.

TART measures newtab animation performance with and without this preload, but the users see only the preload performance ("...-preload-yes" at the TART numbers).

The preload-no results are more of an indication to us about how the performance would be like if we weren't preloading, so other than understanding what the numbers mean (which we do in this case), they shouldn't be reflected in real-world numbers.

It's also true that the TART runs are (were - after bug 975228) with unpopulated thumbs - which has better performance than populated thumbs/tiles, although this should be at least partially mitigated by the preload.

It's important to remember that talos in general doesn't try to measure real-world numbers or cover every real-world case - we have telemetry for more global numbers. Instead, talos numbers indicate relative improvements/regressions, which are likely to be represented in real world numbers as well.

The specific case of measuring newtab animation performance while the thumbs are populated is indeed not covered by TART, due to technical reasons.


Back to this bug.

This indicated regression has 2 faces:

1. After the thumbs are populated with history, this regression would _probably_ not be manifested.

2. On initial usage of Firefox, where tiles do come in, previously it had empty thumbs and now it has tiles which apparently degrade the animation performance (as the TART regression indicate).


I'd suggest to look into the following:

- Make sure that loading the tiles and related logic happens before the page is presented on screen, i.e. as part of the preload.

- At the time, there were also also other performance issues - which were related to the complexity of the newtab XUL page rendering (e.g. some background image resizing IIRC). These hurt newtab animation smoothness regardless of preload because they manifest when the page is presented to screen. So it could be worth looking into such issues as well.

Beyond making sure that preload is utilized effectively also with the tiles, and that we didn't introduce some high rendering complexity to the newtab page, I don't think we should act further for now.
Flags: needinfo?(avihpit)
(In reply to Avi Halachmi (:avih) from comment #6)
> 1. After the thumbs are populated with history, this regression would
> _probably_ not be manifested.
I just populated the history with 9 pages, restarted, opened a new tab and waited for it to fetch thumbnails. Ran the test with 5 repeats then did that 3 times to get these numbers:

history first frame:  74.85 87.85 74.99 -> 79.23 (47% more than 53.66)
history half frames:  17.78 17.11 17.73 -> 17.54 (0.9% more than 17.39)

Compare that to the numbers with the directory links:

with bug first frame: 70.00 67.16 67.35 -> 68.17 (27% more than 53.66)
with bug half frames: 17.15 17.57 17.61 -> 17.44 (0.3% more than 17.39)

This regression *is* manifested with actual thumbnails, and it's probably worse because the thumbnails are captured at higher resolutions than what is displayed, so resizing happens as it's rendered. The directoryLinks images are sized to what it'll be shown at (243x150).

(That last point does seem to indicate we could optimize thumbnail caching to grab that specific size.)
Flags: needinfo?(mzhilyaev)
Blocks: 990416
From avih on irc:

let's prioritize the regressions: least significant: the preload-no numbers; slightly more relevant but not THAT important, depending on magnitude are the error values and then the important ones are all and half

we should remember that there are limits to how much we can optimize stuff. as i noted: 1. make sure the logic and loading happens before the directory links are presented to screen, i.e. at the preload stage. and 2. that the directory links don't have unnecessary high rendering complexity.

you could talk to ttaubert on both issues, as he implemented preloading and also simplified some rendering complexities and also filed some platform bugs in inherent platform inefficiencies

generally, these are tough calls on how much effort we want to put into regressions which come from introducing new features. but these specific two suggestions are worth the time because we've been there before, so these are the relatively low hanging fruits of this regression.
Flags: firefox-backlog+
the uplift to Aurora happens in 2 weeks, lets start making decisions.
Mardak, what's the plan here?
Flags: needinfo?(edilee)
The regression was reverted when bug 993581 turned off tiles. It'll come back when bug 990713 lands and reenables tiles. The regression also happens when there are thumbnails as opposed to just tiles, so it's the fact that images needs to be rendered that slows things down.

The packaged tiles can change over time with different image complexity that would cause undesired tart swings.

Also, with bug 993904 we'll use cached remotely-fetched (bug 986521) tiles instead of packaged tiles, and I would assume we wouldn't want our tiles server getting pinged by these tests to fetch tiles. So implicitly tart will be running with empty tiles again instead of prepopulated images.

We have fixed some bugs related to making sure things run when preloading before the new tab is displayed, such as bug 991210.

I'm not sure at what point we make the call that we want thumbnails/tile images rendered on showing of the page as opposed to potentially delaying the image rendering to avoid tart regressions. ?
Flags: needinfo?(edilee)
I suppose an alternative is to have tart run with some fixed sites and thumbnails to be more in line with what most users will be experiencing?
(In reply to Ed Lee :Mardak from comment #12)
> I suppose an alternative is to have tart run with some fixed sites and
> thumbnails to be more in line with what most users will be experiencing?

That's an option which we couldn't do back then, but maybe can now (and it seems we do measure it when directory tiles are enabled).

I think that directory tiles don't introduce a new regression, but rather make TART measure a (slow) case which wasn't measured before (tab animation while thumb/tiles are displayed).

So, while improving this case would definitely be nice, I don't think we this "Regression" should stop us with going on with directory tiles.

At the very least, tab animation with directory tiles is not much different, if at all, than without directory tiles but after the history thumbs got populated at the user's profile.
(In reply to Avi Halachmi (:avih) from comment #13)
> At the very least, tab animation with directory tiles is not much different,
> if at all, than without directory tiles but after the history thumbs got
> populated at the user's profile.

I think we should:
- verify the assumption above
- ensure that TART is measuring the most "realistic" case across the board (new tab page with thumbnails displayed)
- investigate ways to improve animation performance in that case (delaying load/showing of the images?)

Those can be three separate bugs, and we can WONTFIX this one. Does that seem reasonable?
Flags: needinfo?(edilee)
Flags: needinfo?(avihpit)
(In reply to :Gavin Sharp (email gavin@gavinsharp.com) from comment #14)
> I think we should:
> - verify the assumption above

The numbers on comment 7 support this, and on a chat with :mardak on IRC he presented more numbers which support this, IIRC.


> - ensure that TART is measuring the most "realistic" case across the board

That's always a good goal, and more realistic cases are always better than less realistic ones.


> (new tab page with thumbnails displayed)

However back then we didn't manage to make it work, and now that we have directory tiles which produce similar behavior, I think that's covered.

Also, talos tests are meant to let us know when performance change. It's impossible to cover every real world case out there, and more sub tests also makes the results harder to analyse (and TART already produces 30 sub results per run which it reports to graphserver and datazilla - and these are only the averages we had to trim the results to. The run logs contain individual frame intervals of everything which was too much for the infrastructure to store and process).

Bottom line is that TART already proved highly sensitive (in a good way) to even minor changes, and adding directory tiles produces a behavior which was hard to test until now, but now it gets tested "for free", so IMO this is already very good coverage.


> - investigate ways to improve animation performance in that case (delaying
> load/showing of the images?)

Sure. If you got resources for that...
Flags: needinfo?(avihpit)
(In reply to Avi Halachmi (:avih) from comment #15)
> > (new tab page with thumbnails displayed)
> 
> However back then we didn't manage to make it work, and now that we have
> directory tiles which produce similar behavior, I think that's covered.

Yes, I know. There was some discussion about disabling tiles in tests to avoid this impact, and all I'm saying is that that's a bad idea. We should explicitly ensure that TART covers the tiles/thumbnails case.

> > - investigate ways to improve animation performance in that case (delaying
> > load/showing of the images?)
> 
> Sure. If you got resources for that...

I do! Ed, can you file a bug and put it in the backlog?
Depends on: 1004911
Depends on: 1004912
(In reply to :Gavin Sharp (email gavin@gavinsharp.com) from comment #16)
> Ed, can you file a bug and put it in the backlog?
Filed both bug 1004911 for improving performance and bug 1004912 for making TART not depend on what directory links are packaged.
Flags: needinfo?(edilee)
(In reply to :Gavin Sharp (email gavin@gavinsharp.com) from comment #16)
> There was some discussion about disabling tiles in tests to
> avoid this impact, and all I'm saying is that that's a bad idea.

Out of interest, where there any arguments other than "so that the test numbers don't show regression"?
(In reply to Avi Halachmi (:avih) from comment #18)
> were there any arguments other than "so that the test numbers don't show regression"?
It's the consistency part of bug 1004912, where the test numbers would change every time we packaged different images that have different rendering complexity.
(In reply to Ed Lee :Mardak from comment #19)
> It's the consistency part of bug 1004912, where the test numbers would
> change every time we packaged different images that have different rendering
> complexity.

And wouldn't you like to know if you make a change which affects performance for users?

It's not hard to make it consistent. just do if (testingNow()) {do_nothing()} and voila, perfectly consistent, but is this what we want?
(In reply to Avi Halachmi (:avih) from comment #20)
> And wouldn't you like to know if you make a change which affects performance
> for users?
If it's a change in how we render the page, yes. If it's a change because we switched from a simple logo to something complex, probably not? Unless we're okay with swings in TART just because we changed up the list of links.
Do you argue that we shouldn't care-about/stay-blind-to user-visible perf changes due to reason A, but should care/get-noticed if they're due to reasons B/C/D?

Or maybe that not all the changes which TART currently measure are user-visible? (we do know of at least one such set of results - the newtab-preload-no.* results, and we do ignore them because users can't see them, but IMO we'd still like to be aware of them).

If it's the latter then we should fix it, but I don't think I've seen such claim so far.
as a result of http://hg.mozilla.org/integration/fx-team/pushloghtml?changeset=83ae68a12f75

I see these regressions (http://graphs.mozilla.org/graph.html#tests=[[293,132,33],[293,132,25],[293,132,31],[293,132,37],[293,132,35],[293,64,24],[293,64,21]]&sel=1399035720004,1399640520004&displayrange=7&datatype=running):
ubuntu 12.04: 2% regression
win xp:      10% regression
win 7:       13% regression
osx 10.8:     2% regression

Bug 990713 landed which is most likely the cause of this, I know we have work underway to make the test measure tiles more *dynamically* or *realistic*.

Please let me know if you want me to file a new bug- this is pretty much the same bug and related to the same code.
Removing from Backlog based on team review/discussion during estimation meeting.
Flags: firefox-backlog+ → firefox-backlog-
Untracking. Removed from the backlog, beta 6 should be released today and we won't block the release on this bug.
this is still a problem for:
http://graphs.mozilla.org/graph.html#tests=[[293,53,33],[293,53,37],[293,53,35],[293,53,21],[293,53,25]]&sel=1397234509172,1405010509172&displayrange=90&datatype=running
osx 10.6 (appears fixed by ggc backout)
ubuntu 12.04 (appears half fixed by ggc backout)
ubuntu 12.04 x64 (appears half fixed by ggc backout)
win xp (appears fixed by ggc backout)
win 7

so win7 is the nastiest of all as it stands :)
Is there anything left to do in this bug or is the work to be done in bug 1004911 and bug 1004912?
It sounds like it is just those two bugs and this case be closed.
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.