5% win7/linux64 e10s CART regression on fx-team (v.45) on Dec 2, 2015 from push 661a531f5dee

RESOLVED WONTFIX

Status

()

defect
RESOLVED WONTFIX
4 years ago
4 years ago

People

(Reporter: jmaher, Unassigned)

Tracking

({perf, regression})

unspecified
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(e10s-)

Details

(Whiteboard: [talos_regression][e10s])

Talos has detected a Firefox performance regression from your commit 661a531f5dee in bug 1201331.  We need you to address this regression.

This is a list of all known regressions and improvements related to your bug:
http://alertmanager.allizom.org:8080/alerts.html?rev=d37627931b2ba7e4de8d3059e4b62500d3ba97ea&showAll=1

On the page above you can see Talos alert for each affected platform as well as a link to a graph showing the history of scores for this test. There is also a link to a treeherder page showing the Talos jobs in a pushlog format.

To learn more about the regressing test, please see:
https://wiki.mozilla.org/Buildbot/Talos/Tests#TART.2FCART

Reproducing and debugging the regression:
If you would like to re-run this Talos test on a potential fix, use try with the following syntax:
try: -b o -p linux64,win64,win32 -u none -t svg  # add "mozharness: --spsProfile" to generate profile data

To run the test locally and do a more in-depth investigation, first set up a local Talos environment:
https://wiki.mozilla.org/Buildbot/Talos/Running#Running_locally_-_Source_Code

Then run the following command from the directory where you set up Talos:
talos --develop -e <path>/firefox -a cart

Making a decision:
As the patch author we need your feedback to help us handle this regression.
*** Please let us know your plans by Monday, or the offending patch will be backed out! ***

Our wiki page outlines the common responses and expectations:
https://wiki.mozilla.org/Buildbot/Talos/RegressionBugsHandling
keep in mind we didn't generate data for the push prior which is mcomella's push, but that is android related, so high confidence this is the right bug to look at:
https://hg.mozilla.org/integration/fx-team/pushloghtml?fromchange=de8bc7c996fe&tochange=661a531f5dee

here is a view on treeherder with retriggers:
https://treeherder.mozilla.org/#/jobs?repo=fx-team&filter-searchStr=talos%20svg&fromchange=0ae21184dbe7&tochange=30f9c10a9e5b

and here is cart like a knight in shining armour:
https://treeherder.allizom.org/perf.html#/compare?originalProject=fx-team&originalRevision=de8bc7c996fe&newProject=fx-team&newRevision=661a531f5dee&filter=cart

looking at the subtests, almost all of them regress:
https://treeherder.allizom.org/perf.html#/comparesubtest?originalProject=fx-team&originalRevision=de8bc7c996fe&newProject=fx-team&newRevision=661a531f5dee&originalSignature=eb05ab320cf3b9d062e624276d42337a95bd24bf&newSignature=eb05ab320cf3b9d062e624276d42337a95bd24bf

Mark, can you take a look at this?  it would be great to understand what is going on and resolve this one way or another early next week.
Flags: needinfo?(standard8)
Flags: needinfo?(standard8) → needinfo?(markh)
I strongly suspect this is due to importing sync earlier than necessary and can be fixed with a sprinkling of lazy imports - I'll look ASAP.
Thanks Mark!
I've a try push that I expect to solve the problem - I didn't get an email, and I'm afraid I can't find a revision which I can compare against the try push at https://treeherder.mozilla.org/perf.html#/comparechooser?newProject=try&newRevision=e2256e. Joel, can you help here?

remote: View your changes here:
remote:   https://hg.mozilla.org/try/rev/ac2dd96c2668
remote:   https://hg.mozilla.org/try/rev/e2256ef64c96
remote:
remote: Follow the progress of your build on Treeherder:
remote:   https://treeherder.mozilla.org/#/jobs?repo=try&revision=e2256ef64c96
remote:
remote: It looks like this try push has talos jobs. Compare performance against a baseline revision:
remote:   https://treeherder.mozilla.org/perf.html#/comparechooser?newProject=try&newRevision=e2256e f64c96
remote: recorded changegroup in replication log in 0.021s

https://treeherder.mozilla.org/#/jobs?repo=try&revision=e2256ef64c96
Flags: needinfo?(markh) → needinfo?(jmaher)
prior to the regression we had values posting in the 35.xx range, with a few stragglers in the 36.xx range.  This is for win7 cart e10s, I did some retriggers on the try push and I see:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=e2256ef64c96&filter-searchStr=svg&selectedJob=14457703

you can see the trend on inbound:
https://treeherder.mozilla.org/perf.html#/graphs?series=[mozilla-inbound,37db67737f62fa118938ef7dddd94d24a9e8d418,1]

I had a hard time finding the base revision here, I think this was pushed on top of another try push, etc.  Either way, I don't see other increases/decreases in that specific test which aids in me having more confidence to say this push doesn't improve the regression.
Flags: needinfo?(jmaher)
I've created a new try push that leaves the button in place but completely neuters its functionality - all "entry points" for the widget early-return. Best I can tell, the regression remains, which implies to me that the regression is caused simply because another button exists and not due to what the button does. A quick look implies bug 1004430 had the same issue (with "Open e10s window" being the offending button there) but I failed to find one for any other buttons.

Gijs suggested the problem may be that the tests are performed before the widgets are initially created, so we are measuring button creation time *and* animation - but a quick look at the test implementation left me as confused as when I started. Joel, any thoughts on how to proceed here?

Try:
Thank you for your try submission. It's the best!

Results will be displayed on Treeherder as they come in:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=33f71af5e287

It looks like this submission has talos jobs. You can compare the performance of your push against a baseline revision here:
https://treeherder.mozilla.org/perf.html#/comparechooser?newProject=try&newRevision=33f71af5e287

Once completed, builds and logs will be available at:
https://archive.mozilla.org/pub/firefox/try-builds/mhammond@skippinet.com.au-33f71af5e2876c251ff2a7cae60b78fba79c9583/
Flags: needinfo?(jmaher)
here is the compare view:
https://treeherder.mozilla.org/perf.html#/compare?originalProject=fx-team&originalRevision=d5fd77678f2c&newProject=try&newRevision=33f71af5e287&showOnlyImportant=0&showOnlyConfident=1

I don't see anything changing TART numbers.  It could be as simple as a button in the UI causing this.  Do we need to modify the test or accept this regression?  I would like to hear from :avih on this.
Flags: needinfo?(jmaher) → needinfo?(avihpit)
(In reply to Joel Maher (:jmaher) from comment #7)
> here is the compare view:
> https://treeherder.mozilla.org/perf.html#/compare?originalProject=fx-
> team&originalRevision=d5fd77678f2c&newProject=try&newRevision=33f71af5e287&sh
> owOnlyImportant=0&showOnlyConfident=1
> 
> I don't see anything changing TART numbers.  It could be as simple as a
> button in the UI causing this.  Do we need to modify the test or accept this
> regression?  I would like to hear from :avih on this.

This is a CART regression, not a TART one.

I wonder what the window size is for our CART runs and if we now have a scrollbar in customize mode where we wouldn't have done so previously, assuming the addition of this button actually created a new row (which I'm not sure it did/didn't).
(In reply to Joel Maher (:jmaher) from comment #7)
> ... Do we need to modify the test or accept this regression?

I didn't see anything at the comments above which suggests the test does something incorrectly, so for now let's keep the test as is.

Whether or not we should accept the regression, IMO not yet.


(In reply to Mark Hammond [:markh] from comment #6)
> Gijs suggested the problem may be that the tests are performed before the
> widgets are initially created, so we are measuring button creation time
> *and* animation - but a quick look at the test implementation left me as
> confused as when I started.

Could you please explain what you're confused about? The basic CART description is here: https://wiki.mozilla.org/Buildbot/Talos/Tests#TART.2FCART , and you can ask further questions if you have some.

The CART test just enters and exits the customize mode and measures how fast we can animate. This apparently regressed, as well as the overall duration till we identify that the animation is over (300ms to 345ms for cart 1-customize-enter.error.TART at https://treeherder.allizom.org/perf.html#/comparesubtest?originalProject=fx-team&originalRevision=de8bc7c996fe&newProject=fx-team&newRevision=661a531f5dee&originalSignature=eb05ab320cf3b9d062e624276d42337a95bd24bf&newSignature=eb05ab320cf3b9d062e624276d42337a95bd24bf ).

Before we decide to accept it, I'd like to see a minimal patch which demonstrates the regression (with try pushes etc). I.e. a patch which only adds the button, to confirms this:

> ... which implies to me that the regression is caused simply because another
> button exists and not due to what the button does.

Once we have that confirmed, we'll have better background before we decide.


(In reply to :Gijs Kruitbosch from comment #8)
> I wonder what the window size is for our CART runs and if we now have a
> scrollbar in customize mode where we wouldn't have done so previously,
> assuming the addition of this button actually created a new row (which I'm
> not sure it did/didn't).

Interesting Idea, but as far as I can tell with a new profile (though outside of talos), it adds a third icon at the 4th row (from 11 to 12 icons), so I don't think there's a new scrollbar there.
Flags: needinfo?(avihpit) → needinfo?(markh)
I added a try push that completely removed the sync button from the panel UI. As expected (and if I'm reading the results correctly) we saw around 6% improvement (I didn't retry these jobs, but this is as expected - it takes us roughly back to prior this regression.)

I then added another try push that added a new "test" button that is as small as possible - it does nothing other than show a panel with a single label. This took us back to the current status quo (ie, the regression returned) - https://treeherder.mozilla.org/perf.html#/compare?originalProject=mozilla-inbound&originalRevision=3a206e2652fc&newProject=try&newRevision=46b64e145e0c&framework=1&showOnlyImportant=0

Comparing these 2 try revisions shows that simple test button did indeed cause a 5% regression - https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=cdde38063fd0&newProject=try&newRevision=46b64e145e0c&framework=1&showOnlyImportant=0

I believe this demonstrates that simply adding a new "do nothing" button caused the regression. I've no idea why this only shows up for e10s - but either way, I don't see any action I can take here.

> I didn't see anything at the comments above which suggests the test does something incorrectly, so for now let's keep the test as is.

Gijs speculated that because the test enters customize mode before showing the panel, the time for widget creation is also being captured by this test. Because users will typically enter customize mode via the panel, the widgets will have been created by the time customize mode is entered, so they will be paying that penalty at the panel's first open rather than at customize time. Thus, it might make more sense to have the test open and close the panel before starting the tests so this time isn't counted and so better reflects what the user actually sees. However, it's not clear we want to avoid measuring widget creation time completely.
Flags: needinfo?(markh) → needinfo?(avihpit)
(In reply to Mark Hammond [:markh] from comment #11)
> Gijs speculated that because the test enters customize mode before showing
> the panel, the time for widget creation is also being captured by this test.
> Because users will typically enter customize mode via the panel, the widgets
> will have been created by the time customize mode is entered, so they will
> be paying that penalty at the panel's first open rather than at customize
> time. Thus, it might make more sense to have the test open and close the
> panel before starting the tests so this time isn't counted and so better
> reflects what the user actually sees. However, it's not clear we want to
> avoid measuring widget creation time completely.

Not sure what "the panel" is, but the test does wait reasonably long after starting the browser and before the first test is executed.

Also CART, like many other tests, ignore first test value and then takes the median of the rest. So the result should not reflect an issue with the firs run.

However, one might argue that especially for CART, only the first test result is actually meaningful since typically one would not use customize more than once per session, so if the test should be improved, I'd say it would need to also include the first run (and use something other than median since median would also "ignore" outliers). Though such improvement is unlikely to make this regression go away, but rather possibly make it worse.


> I believe this demonstrates that simply adding a new "do nothing" button caused
> the regression. I've no idea why this only shows up for e10s - but either way,
> I don't see any action I can take here.

Thanks, interesting. While personally I tend to accept 5% regression on this issue, it's still important IMO to understand further why does adding a button regresses the animation in e10s.

Mike, any idea what might be affecting it? in a nutshell, apparently adding an icon to the burger menu (even without "connecting" it to any code) seems to regress CART consistently in e10s. As far as we can tell the new icon doesn't change the size of the panel (and so supposedly not adding a scrollbar).
Flags: needinfo?(avihpit) → needinfo?(mconley)
Looks like we're spending more time calculating reflows, which makes sense - we do this thing where for every toolbar item that's customizable, we remove from the DOM, wrap in a toolbarpaletteitem, and then re-insert into the DOM. By adding a new default button, we've added another item to do this operation to.

So that's, I believe, where the cost is coming from.
Flags: needinfo?(mconley)
glob, can you add the e10s tracking flag to this product?
Flags: needinfo?(glob)
Blocks: 1239084
ni'ing myself to post the actual profiles for analysis
Flags: needinfo?(mconley)
(In reply to Brad Lassey [:blassey] (use needinfo?) from comment #15)
> glob, can you add the e10s tracking flag to this product?

done.
Flags: needinfo?(glob)
No longer blocks: 1239084
This looks like it's not Synced Tabs related at all, so moving to a new category.
tracking-e10s: --- → ?
Component: Firefox Sync → Toolbars and Customization
Product: Tracking → Firefox
Version: --- → unspecified
thanks all for keeping this bug moving along!
Baseline profiling with 5 runs:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=72d87bba3204

Regression profiling with 5 runs:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=5dc4ae98955f

Keeping the needinfo to help do the analysis when these come in.
These profiles tell me spending more time doing reflow, likely in the panel menu when doing the transition. I suspect just having yet another item in the menu panel causes this.

The sync button isn't doing anything particularly special - this sort of regression will occur anytime we add anything new to the menu panel.

If we could move away from our current mechanism of transitioning (transitioning the padding, which causes reflow), and instead used something like a CSS translation to get the effect we're looking for, we could really cut down on the work required to do this effect.
Flags: needinfo?(mconley)
Unfortunately, it's not clear to me why this would affect only e10s and not non-e10s.
This isn't related to e10s other than the fact that e10s adds another button to the menu (which will be removed for release)
I actually suspect that this is being caused by the "Open new non-e10s window" button that e10s adds, which is probably pushing the new button in the panel to another row, which _maybe_ makes it scrollable, but non-the-less makes it more expensive.

Needinfo'ing myself to prove this by removing the "Open new non-e10s window" button in a talos test run.
Flags: needinfo?(mconley)
Hooray, I think I was right!

"Baseline" in the following analysis is the revision before the "Synced tabs" button / panel landed. "Regression" is the patch that landed that button / panel.

If you compare Baseline (normal) with Baseline (without the "New Non-e10s Window" button), there is no significant difference:

https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=5b0268f08b34&newProject=try&newRevision=047c757c8c8f&framework=1

If you compare Baseline (normal) with Regression (normal), we see our ~5% regression still, as expected:

https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=5b0268f08b34&newProject=try&newRevision=5a98ab055d03&framework=1

So that covers our expectations. Now, to test whether removing the "New Non-e10s Window" affected the Regression patch...

Here's Regression (normal) vs Regression (without the "New Non-e10s Window" button):

https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=5a98ab055d03&newProject=try&newRevision=552188708106&framework=1

Hey, there's our 5% win! And if we compare Baseline (normal) vs Regression (without the "New Non-e10s Window" button):

https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=5b0268f08b34&newProject=try&newRevision=552188708106&framework=1

Conclusion:

I'm highly confident that the fact that e10s windows have the "New Non-e10s Window" button by default in the panel is causing this, since the new button that the Regression patch introduced caused the customize panel to have a new row (and a vertical scroll bar after wrapping all items and showing the placeholders) while entering customize mode.

I think we can close the books on this one, especially since the "New Non-e10s Window" button is not going to ship to release.

We should keep in mind, however, that increasing the number of rows inside the customization panel will cause CART regressions. We should be mindful about what we put in there by default. Is there a good place to put information about this regression somewhere so that if folks run into something similar in the future, they'll have a good sense of what probably happened?
Flags: needinfo?(mconley) → needinfo?(jmaher)
we really should store this somewhere- how about a simple wiki style FAQ/info, maybe linked from:
https://wiki.mozilla.org/Buildbot/Talos#Resources_.2F_How-to

I would imagine a page of "common causes of regressions" and something like:
cart:
* brief summary with link to above comment.
* another brief summary if we have a different reason

tart:
* ...



I am open to other suggestions.
Flags: needinfo?(jmaher)
How about each test in https://wiki.mozilla.org/Buildbot/Talos/Tests gets a section about "Usual Suspects" or something?

Then maybe we should link to those immediately when a new Talos regression bug is filed?

I'll happily fill out this CART one (and add the "Usual Suspects"[1] section to each test if you'd like).

Let me know if you'd like me to do that. Are we good to close this one off?

[1]: Happy to bikeshed on that name.
Flags: needinfo?(jmaher)
yeah, lets leave it in the main wiki outlining what the test is.  Already we try to link to each test description in each bug (see comment 0), so this would require no extra links.

instead of usual suspects, how about "possible regression causes"

We should close this bug as wontfix, and then move on.  Do let me know how far you get with the wiki editing and I will work towards adding common info from bugs as we find fixes to the wiki.
Flags: needinfo?(jmaher)
(In reply to Joel Maher (:jmaher) from comment #29)
> yeah, lets leave it in the main wiki outlining what the test is.  Already we
> try to link to each test description in each bug (see comment 0), so this
> would require no extra links.
> 
> instead of usual suspects, how about "possible regression causes"
> 
> We should close this bug as wontfix, and then move on.  Do let me know how
> far you get with the wiki editing and I will work towards adding common info
> from bugs as we find fixes to the wiki.

Added: https://wiki.mozilla.org/Buildbot/Talos/Tests#Possible_regression_causes_26

I think we're done here.

https://media.giphy.com/media/upg0i1m4DLe5q/giphy.gif
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → WONTFIX
https://hg.mozilla.org/integration/fx-team/rev/73ebed82822cb54d3f85319b557a69be3483677f
Bug 1230671 - add documentation to CustomizableUI for being careful about adding default panel items due to possible CART regressions. rs=jaws, DONTBUILD.
You need to log in before you can comment on or make changes to this bug.