Closed Bug 1161593 Opened 4 years ago Closed 4 years ago

3-8% Win7/linux32 tpaint regression on Mozilla-Inbound-Non-PGO on May 03, 2015 from push 1e8d30cb367e

Categories

(Testing :: Talos, defect, P2)

defect

Tracking

(firefox38.0.5+ wontfix, firefox39+ wontfix, firefox40 unaffected, firefox41 unaffected)

RESOLVED WORKSFORME
Tracking Status
firefox38.0.5 + wontfix
firefox39 + wontfix
firefox40 --- unaffected
firefox41 --- unaffected

People

(Reporter: vaibhav1994, Unassigned)

References

(Blocks 1 open bug)

Details

Attachments

(2 files, 2 obsolete files)

Talos has detected a Firefox performance regression from your commit 1e8d30cb367e  in bug 1159744.  We need you to address this regression.

This is a list of all known regressions and improvements related to your bug:
http://alertmanager.allizom.org:8080/alerts.html?rev=81650de20da1&showAll=1

On the page above you can see Talos alert for each affected platform as well as a link to a graph showing the history of scores for this test. There is also a link to a treeherder page showing the Talos jobs in a pushlog format.

To learn more about the regressing test, please see: https://wiki.mozilla.org/Buildbot/Talos/Tests#tpaint

Reproducing and debugging the regression:
If you would like to re-run this Talos test on a potential fix, use try with the following syntax:
try: -b o -p win32 -u none -t other  # add "mozharness: --spsProfile" to generate profile data

To run the test locally and do a more in-depth investigation, first set up a local Talos environment:
https://wiki.mozilla.org/Buildbot/Talos/Running#Running_locally_-_Source_Code

Then run the following command from the directory where you set up Talos:
talos --develop -e <path>/firefox -a tpaint

Making a decision:
As the patch author we need your feedback to help us handle this regression.
*** Please let us know your plans by Friday, or the offending patch will be backed out! ***

Our wiki page oulines the common responses and expectations:
https://wiki.mozilla.org/Buildbot/Talos/RegressionBugsHandling
Hi Jared, I have re-triggered some jobs in 
https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&fromchange=6fa4bb578765&tochange=b8af7eb3f513&filter-searchStr=Windows%207%2032-bit%20mozilla-inbound%20talos%20other

and tpaint test in Windows 7 opt seems to regress on your patch in  81650de20da1. Could you look into what may be possibly causing this? One way to solve this would be to make 2 pushes to try, one with the patch and one without. Thanks
Flags: needinfo?(jaws)
It's not clear to me that this regression is real. The values before and after are not significantly outside the usual range, and the regression is a small fraction of the usual noise in the test. 

The feature that landed is pref'd off, and effectively doesn't execute any code other than having 2 new <script> loads in browser.xul. So if this is measuring anything, it's likely that, and it isn't avoidable here.

Also note this test has steadily risen by 30% over the last year, so there's a baseline upwards trend to the data anyway.
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → INVALID
:dolske,  this is a real regression- it isn't random noise.  the old range (noise) of values is less than the new range of values here.

This seems like weak justification to close a bug as invalid- if we have new <script> tags in browser.xul, then there is a chance of a perf regression.  Do we need these?  Can we optimize these?  If it is unavoidable that is just fine- please don't mark something as invalid because you think there is noise in the results.
Status: RESOLVED → REOPENED
Flags: needinfo?(jaws) → needinfo?(dolske)
Resolution: INVALID → ---
(In reply to Justin Dolske [:Dolske] from comment #2)

> The feature that landed is pref'd off, and effectively doesn't execute any
> code other than having 2 new <script> loads in browser.xul. So if this is
> measuring anything, it's likely that, and it isn't avoidable here.

Couldn't these 2 scripts be loaded only once the Pocket panel is shown?
:dolske, I would like to back this out today, please let me know if I shouldn't and indicate what the plan is to fix this with a timeline.
Backing this out today is not acceptable.
Flags: needinfo?(dolske)
:vladan, I believe you are travelling, but this is an issue you expressed interest in earlier this week.  Could you weigh in here.  We can make exceptions as needed.
Flags: needinfo?(vdjeric)
Tracking as it is a regression.
this is seen on linux as well, updating title to include that.

From chatting on irc with :gavin, this will be shipped in firefox 38 even if we have not fixed or reduced the perf regression- we are adding a work item to address this in the next release.

If we don't fix this in firefox 39, I will be happy to mark this bug officially as 'wontfix'.

Gavin, can you add context about who will be a contact for looking at this and a general timeline of when that will happen.
Flags: needinfo?(gavin.sharp)
Summary: 3.6% Win7 tpaint regression on Mozilla-Inbound-Non-PGO on May 03, 2015 from push 1e8d30cb367e → 3-8% Win7/linux32 tpaint regression on Mozilla-Inbound-Non-PGO on May 03, 2015 from push 1e8d30cb367e
Attached patch Patch (obsolete) — Splinter Review
This should fix the regression by only loading the scripts when the widget is created (which post-enabling of the feature should still delay the loading of the scripts enough to not interfere with window creation).

Note that this patch also includes some clean-up in the onLocationChange handler. This needed to get tweaked to check for a null window.pktApi, and while I was there I fixed up the usage of the widgetGroupWrapper.
Assignee: nobody → jaws
Status: REOPENED → ASSIGNED
Attachment #8602932 - Flags: review?(gijskruitbosch+bugs)
Attachment #8602932 - Flags: review?(dolske)
Attached patch Patch (qref'd) (obsolete) — Splinter Review
(of course, it would help if I qrefreshed the patch before uploading)
Attachment #8602932 - Attachment is obsolete: true
Attachment #8602932 - Flags: review?(gijskruitbosch+bugs)
Attachment #8602932 - Flags: review?(dolske)
Attachment #8602933 - Flags: review?(gijskruitbosch+bugs)
Attachment #8602933 - Flags: review?(dolske)
oh, those don't have -t in them.  The never ending saga of getting the right try syntax.  I have pushed to try with the patch a bit ago:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=73d544b27a68

Next I will get a compare view...
this is a compare view:
https://treeherder.mozilla.org/perf.html#/compare?originalProject=mozilla-inbound&originalRevision=98d3519ab098&newProject=try&newRevision=73d544b27a68

a baseline from Friday on inbound.  Since pocket landed after last Friday, then this would be pre-regression.  If we want to compare to recent times, I pushed a talos push yesterday afternoon and we can show a compare view:
https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=beb62e8bce0d&newProject=try&newRevision=73d544b27a68

Of course we are still waiting on data- probably in the next hour.
Comment on attachment 8602933 [details] [diff] [review]
Patch (qref'd)

Review of attachment 8602933 [details] [diff] [review]:
-----------------------------------------------------------------

rs=me assuming this actually fixes the perf issue.
Attachment #8602933 - Flags: review?(gijskruitbosch+bugs) → review+
Comment on attachment 8602933 [details] [diff] [review]
Patch (qref'd)

Review of attachment 8602933 [details] [diff] [review]:
-----------------------------------------------------------------

::: browser/components/customizableui/CustomizableWidgets.jsm
@@ +1090,5 @@
>        type: "view",
>        viewId: "PanelUI-pocketView",
>        label: PocketBundle.GetStringFromName("pocket-button.label"),
>        tooltiptext: PocketBundle.GetStringFromName("pocket-button.tooltiptext"),
> +      onBeforeCreated: Pocket.onBeforeCreated,

This is going to be called multiple times if the button is removed/readded to chrome, right?

::: browser/components/pocket/Pocket.jsm
@@ +19,5 @@
> +    let subscriptLoader = Cc["@mozilla.org/moz/jssubscript-loader;1"]
> +                            .getService(Ci.mozIJSSubScriptLoader);
> +    let win = doc.defaultView;
> +    subscriptLoader.loadSubScript("chrome://browser/content/pocket/pktApi.js", win);
> +    subscriptLoader.loadSubScript("chrome://browser/content/pocket/main.js", win);

Will be interesting to see if this is actually an improvement, since it would seem we're still loading the scripts fairly early, as the button is created.

@@ +57,4 @@
>        return;
>      }
> +    let widgetGroupWrapper = CustomizableUI.getWidget("pocket-button");
> +    let widget = widgetGroupWrapper.forWindow(win);

Won't |widgetGroupWrapper| be null when the button is removed or not present? Or |widget| in similar cases?
Enabling the button created a 20% ts paint regression on non-PGO windows. :-(
Attachment #8602933 - Flags: review?(dolske)
Attachment #8602933 - Flags: review-
Attachment #8602933 - Flags: review+
So there are still a few regressions left? What's causing tsvgx to regress 12.56% on winxp opt and tart to regress 6.09% on win8-64 opt in that compare-talos link?
Flags: needinfo?(vdjeric)
This patch can be switched to trigger off of browser-delayed-startup-finished instead of the timeout here.
Discussed with Jared, going to take over this patch.
Assignee: jaws → gijskruitbosch+bugs
Flags: needinfo?(gavin.sharp)
Blocks: 1163512
(In reply to Jared Wein [:jaws] (please needinfo? me) from comment #23)
> This patch can be switched to trigger off of
> browser-delayed-startup-finished instead of the timeout here.

As a sort of side note, and while I'm not fully aware of when browser-delayed-startup-finished actually happens, our ultimate goal with the tpaint test is to make sure, as much as we can, that the user experience during Firefox initialization doesn't get hurt - typically to let us know if we've just made the startup take longer than before.

While the first paint is indeed some indication for the startup duration, Firefox is typically unresponsive even after first paint and until things "settle down" and then the user can actually start interacting with it.

My fear with such solutions for tpaint regressions is that we don't really improve the user experience, but rather just make the test not see the extra delay.

The issue is that right now we don't really have tests to measure when Firefox is actually usable for the user, and we do have the tpaint test, so people end up "optimizing" for this test, which is the best we can expect, I guess.

Still, I think that as much as possible, things should get initialized lazily on first use rather then triggered by the browser startup - which ultimately does hurt the startup user experience.

Ant thoughts?
(In reply to Avi Halachmi (:avih) from comment #25)
> ...
> Ant thoughts?

Hmm.. ants.. let's make that "any" :)
I see this on aurora, as a 4.74% win8 regression, not sure what other regressions are related- just working through the list.
(In reply to Joel Maher (:jmaher) from comment #27)
> I see this on aurora, as a 4.74% win8 regression, not sure what other
> regressions are related- just working through the list.

If you mean that the aurora uplift displayed a 4.74% win8 regression, that in and of itself is not necessarily due to this bug - perf changes slowly during a cycle generally - and probably shouldn't be used as evidence without verification with trypushes.

(In reply to Avi Halachmi (:avih) from comment #25)
> (In reply to Jared Wein [:jaws] (please needinfo? me) from comment #23)
> > This patch can be switched to trigger off of
> > browser-delayed-startup-finished instead of the timeout here.
> 
> As a sort of side note, and while I'm not fully aware of when
> browser-delayed-startup-finished actually happens, our ultimate goal with
> the tpaint test is to make sure, as much as we can, that the user experience
> during Firefox initialization doesn't get hurt - typically to let us know if
> we've just made the startup take longer than before.

This bug is about tpaint, not ts_paint - tpaint opens windows, it does not start a new browser, that's what ts_paint is for.
Depends on: 1164940
Depends on: 1164942
Priority: -- → P2
Going to step away from this one to let fresh eyes have a look. There's a summary in bug 1163512 comment 21. There may or may not be things left to do in this bug considering the patches that have landed already.
Assignee: gijskruitbosch+bugs → nobody
Status: ASSIGNED → NEW
Blocks: Pocket
Blocks: 1168408
Too late to do anything for 38.0.5 but tracking for 39 in case we have a fix.
The investigation in bug 1163512 found that these regressions only exist on Firefox 38.0.5 and Firefox 39. They are fixed in Firefox 40 and onward.

With respect to tpaint, the reported regression in this bug is from 3-8%. In Firefox 40 and ownward, this regression should be reduced but the reduction is within the margin of error range[1]. More data can be found at [2]. There are some costs with placing another button in the toolbar that are unavoidable. I have stopped investigating this regression at this point.

[1] https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=6944212d73b5&newProject=try&newRevision=c300f6d88f07
[2] https://docs.google.com/spreadsheets/d/1-s4-VKWt-__8BM8XIO5f78BYKsPWA8s0GG6Zc5ydoUQ/edit?usp=sharing
Status: NEW → RESOLVED
Closed: 4 years ago4 years ago
Resolution: --- → WORKSFORME
Jaws, dolske, just to be clear, are you wontfixing this for 39?
Flags: needinfo?(jaws)
Flags: needinfo?(dolske)
(In reply to Liz Henry (:lizzard) from comment #32)
> Jaws, dolske, just to be clear, are you wontfixing this for 39?

Yes, that's correct.
Flags: needinfo?(jaws)
Flags: needinfo?(dolske)
You need to log in before you can comment on or make changes to this bug.