Closed Bug 1229518 Opened 9 years ago Closed 8 years ago

4.4-6.4% Tart regression on Inbound (v.45) seen Dec 1, 2015 on revision c9bc8f46ac3e672aa768912361c17a661b4e8b5e

Categories

(Core :: CSS Parsing and Computation, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: jmaher, Unassigned)

References

Details

(Keywords: perf, regression, Whiteboard: [talos_regression][e10s])

Talos has detected a Firefox performance regression from your commit c9bc8f46ac3e672aa768912361c17a661b4e8b5e.

This is a list of all known regressions and improvements related to your bug:
http://alertmanager.allizom.org:8080/alerts.html?rev=c9bc8f46ac3e672aa768912361c17a661b4e8b5e&showAll=1

On the page above you can see Talos alert for each affected platform as well as a link to a graph showing the history of scores for this test. There is also a link to a treeherder page showing the Talos jobs in a pushlog format.

To learn more about the regressing test, please see: https://wiki.mozilla.org/Buildbot/Talos/Tests#TART.2FCART

Reproducing and debugging the regression:
If you would like to re-run this Talos test on a potential fix, use try with the following syntax:
try: -b o -p win64,win32 -u none -t svgr  # add "mozharness: --spsProfile" to generate profile data

To run the test locally and do a more in-depth investigation, first set up a local Talos environment:
https://wiki.mozilla.org/Buildbot/Talos/Running#Running_locally_-_Source_Code

Then run the following command from the directory where you set up Talos:
talos --develop -e <path>/firefox -a tart


Our wiki page oulines the common responses and expectations:
https://wiki.mozilla.org/Buildbot/Talos/RegressionBugsHandling
if changeset c9bc8f46ac3e672aa768912361c17a661b4e8b5e is known-bad, then bug 1197307 can't be implicated, because it was backed out before that commit (for causing build bustage).

--> Removing bug 1197307 from "Blocks" list.
No longer blocks: 1197307
Bug 1229518 does seem likely-relevant, though, since this is a tab-animation performance regression, and bug 1229518 was about changing our animation code.
this bug wasted an hour of my time thanks to a broken build :(  I still don't have an answer, but tomorrow morning and 15 more minutes of my time I will have the culprit figured out.  "mach build" is awesome, all developers should use this BEFORE LANDING.  /rant

we have tart regressions across the board (still waiting on osx results) but windows* on e10s and regular:
https://treeherder.allizom.org/perf.html#/compare?originalProject=mozilla-inbound&originalRevision=76c20ad43874&newProject=mozilla-inbound&newRevision=c9bc8f46ac3e&filter=tart&showOnlyConfident=1

looking at the subtests, this is almost every subtest with a regression.

to help narrow this down, I have pushed to try:
https://treeherder.mozilla.org/#/jobs?repo=try&author=jmaher@mozilla.com&fromchange=5757b52f01ad&tochange=cb04bcfb821c

tomorrow we will have results and the patch in question should be uncovered.  I have cc'd all patch authors and reviewers who have bugs in this range- in case someone chimes in and says "hey, I was expecting a tart regression, it is my patch".  Otherwise tomorrow, I will get the correct component, bugs, cc list sorted out.
thanks dholbert, you are fast!  I have left the cc list alone, and it makes sense that bug 1197307 wouldn't be related as it was backed out!
(In reply to Daniel Holbert [:dholbert] from comment #2)
> Bug 1229518 does seem likely-relevant, though, since this is a tab-animation
> performance regression, and bug 1229518 was about changing our animation
> code.

Did you mean a different bug?  Bug 1229518 is this bug.
Flags: needinfo?(dholbert)
(In reply to Joel Maher (:jmaher) from comment #3)
> this bug wasted an hour of my time thanks to a broken build :(  I still
> don't have an answer, but tomorrow morning and 15 more minutes of my time I
> will have the culprit figured out.  "mach build" is awesome, all developers
> should use this BEFORE LANDING.  /rant

(The developer in question said he did use mach build, but he's a new contributor, and didn't have enable-debug in his mozconfig, so the debug-only code ended up being accidentally horked without him noticing.)

(In reply to Nathan Froyd [:froydnj] from comment #5)
> Did you mean a different bug?  Bug 1229518 is this bug.

Yes, I meant to say bug 1219236.
Flags: needinfo?(dholbert)
yeah, we all make mistakes (I have broken the build a few times), in looking at the builds all opt/debug/pgo builds are failing- so this isn't just a debug only issue.  Lets hope my try push did the backouts correctly- either way we are moving forward!
Removing bug 1214377 from "Blocks" list.

In bug 1214377 some code inside a #ifdef MOZ_THUNDERBIRD ... #endif block was changed.
No code that is compiled for Firefox changed. So therefore, this cannot cause any change in behaviour.

http://alertmanager.allizom.org:8080/alerts.html?rev=c9bc8f46ac3e672aa768912361c17a661b4e8b5e&showAll=1&testIndex=0&platIndex=0
points to
http://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=76c20ad438741358103f3a09c3faf19fbf695e6e&tochange=c9bc8f46ac3e672aa768912361c17a661b4e8b5e

Maybe it's caused by bug 1219236.
(In reply to Joel Maher (:jmaher) from comment #7)
> yeah, we all make mistakes (I have broken the build a few times), in looking
> at the builds all opt/debug/pgo builds are failing- so this isn't just a
> debug only issue.

(Yup, the new contributor didn't have --enable-warnings-as-errors locally, either, and his patch introduced a build warning, which treeherder promoted to a build error, so that's why opt builds were busted as well. Really, we should have done an additional Try run before adding checkin-needed; my bad for missing that. Anyway, we now return you to your regularly-scheduled discussion of this perf regression.)
awesome!  Thanks for the reply.  :hiro, thought I would go ahead and needinfo you here if you think your patch could affect the tart test "./mach talos-test -a tart", no need to reply right away until we confirm via try.
Flags: needinfo?(hiikezoe)
Daniel is right. I am confident that this regression is caused by the patch for bug 1219236. But it's an inevitable result because restyling for animations in descendant documents and *the document itself* which receives events has not been processed correctly before the patch.

Rest of fixes for bug 1219236 and 1179535 will reduce the restyling(paint) process in some cases, but even if all of these bugs is fixed, talos regression will be caused.

Joel, what should we do in this case? Should we land all of patches at the same time?
Flags: needinfo?(hiikezoe) → needinfo?(jmaher)
No longer blocks: 1202087, 1214377, 1215774, 1218027
Component: Talos → CSS Parsing and Computation
Flags: needinfo?(jmaher)
Product: Testing → Core
try has confirmed bug 1219236.  We have an uplift occurring in <2 weeks, will we have all the other patches landed then?  Could we test these patches on try to determine if it reduces the tart numbers:
try: -b o -p win32 -u none -t svgr[Windows 7]

Ideally the goal is to understand why this is happening and then make a decision as to whether or not:
* we need to live with this as the benefits outweigh the regression
* we can fix this fully or partially in timeline X
* this is causing too much trouble and is not needed and we should back it out

My goal is to make this data available bring awareness to this.  Ideally if this can be resolved one way or another prior to the uplift that would be great!  The concern is after this week most people will be in Orlando which will distract from progress.  Immediately following Orlando will be an uplift on Dec 14th.
this regression is now seen on mozilla-aurora for win7, win7-e10s, and win8.

:hiro, what is the expected time to land the related patches for this?
Flags: needinfo?(hiikezoe)
The related patches now depends on a big bug (bug 1230040).  I don't think that bug will land soon.  We should back the change for bug 1219236 out from mozilla-aurora.
Joel, do you mind backing the change out from mozilla-aurora on mybehalf?
Thank you,
Flags: needinfo?(hiikezoe)
I will be glad to back out the two patches from bug 1219236 and have them queued up.  Currently mozilla-aurora is closed and has been for over a week!  Just to double check, do we want to backout and fix this in trunk (v.46), or let this slide for v.45 on aurora and known that the following release will have the fix?

Looking forward to the real fix!
Thanks, Joel! I think we should leave them in trunk as it is, we should back them out from v.45. The real fix will be coming in v.46. I hope!
backout on aurora only:
* https://hg.mozilla.org/releases/mozilla-aurora/rev/146a3bd10420
* https://hg.mozilla.org/releases/mozilla-aurora/rev/1e98cb16806b

looking forward to the fixes on trunk in the coming weeks!
(In reply to Hiroyuki Ikezoe (:hiro) from comment #11)
> Daniel is right. I am confident that this regression is caused by the patch
> for bug 1219236. But it's an inevitable result because restyling for
> animations in descendant documents and *the document itself* which receives
> events has not been processed correctly before the patch.

(In reply to Hiroyuki Ikezoe (:hiro) from comment #14)
> The related patches now depends on a big bug (bug 1230040).  I don't think
> that bug will land soon.

Not sure I quite get it. On one hand you state that the regression should be acceptable because it's inevitable, but OTOH you wanted to back it out on Aurora.

Is it in effect only causing regression but without any real benefits before bug 1230040 is resolved? If not, and there's value also in bug 1219236 regardless of bug 1230040, then I don't quite get why it should have been backed out on Aurora...

(sorry for not asking it earlier, but better late than never, and it would provide some potentially useful info regardless if it stays backed out or not)
Flags: needinfo?(hiikezoe)
(In reply to Avi Halachmi (:avih) from comment #18)
> (In reply to Hiroyuki Ikezoe (:hiro) from comment #11)
> > Daniel is right. I am confident that this regression is caused by the patch
> > for bug 1219236. But it's an inevitable result because restyling for
> > animations in descendant documents and *the document itself* which receives
> > events has not been processed correctly before the patch.
> 
> (In reply to Hiroyuki Ikezoe (:hiro) from comment #14)
> > The related patches now depends on a big bug (bug 1230040).  I don't think
> > that bug will land soon.
> 
> Not sure I quite get it. On one hand you state that the regression should be
> acceptable because it's inevitable, but OTOH you wanted to back it out on
> Aurora.
> 
> Is it in effect only causing regression but without any real benefits before
> bug 1230040 is resolved? If not, and there's value also in bug 1219236
> regardless of bug 1230040, then I don't quite get why it should have been
> backed out on Aurora...

That's because v45 will be an ESR release.  I don't think the real fix (I mean it will be less performance regression) can be uplifted to the ESR because the real fix will depend on some other bunch of patches.
Moreover as far as I know, the problem which is fixed by the commit <https://hg.mozilla.org/mozilla-central/rev/58b0f5d557a6> has never reported.  So I think nobody notices it.  I just noticed the problem when I tried to fix bug 1219236 and the commit is also necessary to fix bug 1219236.

I hope this answers your question.
Flags: needinfo?(hiikezoe)
as per https://bugzilla.mozilla.org/show_bug.cgi?id=1219236#c68, this is now fixed!  I verified the improvement on all windows tart tests on the graph and am marking this as resolved fixed!

Thanks :hiro :)
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Joel, I did forget to say "thank you" here.  If I did verify the improvement, it would take a lot of time!
Thanks for the verification!
no problem- I enjoy working with you and do what I can do best to help out the process!
You need to log in before you can comment on or make changes to this bug.