Closed
Bug 1015572
Opened 10 years ago
Closed 10 years ago
8% linux32 tpaint regression on fx32, fx31 around May 21
Categories
(Testing :: Talos, defect)
Tracking
(firefox31- wontfix, firefox32+ wontfix, firefox33+ wontfix, firefox34+ wontfix)
People
(Reporter: jmaher, Assigned: dao)
References
Details
(Keywords: perf, regression, Whiteboard: [talos_regression][dzalerts] [qa-])
Attachments
(1 file)
222.39 KB,
image/png
|
Details |
I found this from the datazilla alerts we are working on- there is no tpaint talos alert in the last month. Looking at the graph this looks real: http://graphs.mozilla.org/graph.html#tests=[[82,52,33],[82,63,33]]&sel=1400321808476,1400926608476&displayrange=7&datatype=running I have done a lot of retriggers, it is so hard with the noise to determine what is going on: https://tbpl.mozilla.org/?tree=Mozilla-Aurora&jobname=Ubuntu%20HW%2012.04%20mozilla-aurora%20pgo%20talos%20other&startdate=2014-05-21&enddate=2014-05-22 Looking at graph server and the retriggers the first point where we cross over the 160 line and continue to report numbers above it (for the most part) is: https://tbpl.mozilla.org/?tree=Mozilla-Aurora&jobname=Ubuntu%20HW%2012.04%20mozilla-aurora%20pgo%20talos%20other&startdate=2014-05-21&enddate=2014-05-22&rev=6c7ac16fbaaa There are a lot of changes there, I am not sure why we have landed so many changes at once. My thoughts are to look at Aurora and then see what landed there around the same time we saw the overall range bump up on Inbound. Cross referencing those patches we should be able to narrow this down. Good news is that we don't see this on beta, so any of these patches that landed on beta can be removed from the suspect list.
Reporter | ||
Comment 1•10 years ago
|
||
and the datazilla link: https://datazilla.mozilla.org/?start=1400267691&stop=1400868224&product=Firefox&repository=Mozilla-Aurora&os=linux&os_version=Ubuntu%2012.04&test=tpaint&graph_search=6c7ac16fbaaa,a0779ff10844&tr_id=5638063&graph=tpaint&x86=true&x86_64=false&project=talos
Reporter | ||
Comment 2•10 years ago
|
||
push some backouts to try: efe8c330b742: https://tbpl.mozilla.org/?tree=Try&rev=911a6c1770eb 39922bbe9a08: https://tbpl.mozilla.org/?tree=Try&rev=3c908cdb38c5 d71874cd9efa: https://tbpl.mozilla.org/?tree=Try&rev=00f5c426bf18 f6660863c0aa: https://tbpl.mozilla.org/?tree=Try&rev=54b9234424d8 I will see how these look tomorrow!
Reporter | ||
Comment 3•10 years ago
|
||
and the winner is: https://hg.mozilla.org/releases/mozilla-aurora/rev/f6660863c0aa I cannot access the bug, so I will just guess on who to cc. Here is more information about the test: https://wiki.mozilla.org/Buildbot/Talos/Tests#tpaint Alexandre, I see that you pushed this patch, it appears to have caused a 8% regression. Do you have any idea why this might have done it? another rant- if you are going to push code to Mozilla and make bugs private please push with your email address that you use for bugzilla so I can copy/paste your author email and contact you in bugzilla. That is a major fail! and another rant- since this bug is private I cannot link to it which makes it even more difficult to link the root cause of this. The root cause is the patch from bug 1000337. This is on Aurora, so please respond promptly.
Flags: needinfo?(lissyx+mozillians)
Summary: 8% tpaint regression on fx32, fx31 around May 21 → 8% linux32 tpaint regression on fx32, fx31 around May 21
Reporter | ||
Updated•10 years ago
|
tracking-firefox31:
--- → ?
tracking-firefox32:
--- → ?
Comment 4•10 years ago
|
||
(In reply to Joel Maher (:jmaher) from comment #3) > and the winner is: > https://hg.mozilla.org/releases/mozilla-aurora/rev/f6660863c0aa > > I cannot access the bug, so I will just guess on who to cc. > > Here is more information about the test: > https://wiki.mozilla.org/Buildbot/Talos/Tests#tpaint > > Alexandre, I see that you pushed this patch, it appears to have caused a 8% > regression. Do you have any idea why this might have done it? I'm a bit surprised, this is code related to the NotificationDB/NotificationStorage; how can it have an impact on paint tests ?! > > another rant- if you are going to push code to Mozilla and make bugs private > please push with your email address that you use for bugzilla so I can > copy/paste your author email and contact you in bugzilla. That is a major > fail! Sorry, I was not aware of this. > > and another rant- since this bug is private I cannot link to it which makes > it even more difficult to link the root cause of this. The root cause is > the patch from bug 1000337. It's a security bug. > > This is on Aurora, so please respond promptly.
Flags: needinfo?(lissyx+mozillians)
Reporter | ||
Comment 5•10 years ago
|
||
Thanks for the quick response. I am not sure why this is the one which caused the regression. I do know that with a series of retriggers done on the 4 backouts this was the only one that had numbers back to normal and less noise. The paint test opens a new application window 25 times, ignores the first 5 and reports the median number of the remaining 20. Could there be anything that affects new window creation?
Comment 6•10 years ago
|
||
(In reply to Joel Maher (:jmaher) from comment #5) > Thanks for the quick response. > > I am not sure why this is the one which caused the regression. I do know > that with a series of retriggers done on the 4 backouts this was the only > one that had numbers back to normal and less noise. > > The paint test opens a new application window 25 times, ignores the first 5 > and reports the median number of the remaining 20. Could there be anything > that affects new window creation? Does your test involves any Notification API use?
Reporter | ||
Comment 7•10 years ago
|
||
Unfortunately this test doesn't do any intentional use of the notification api. We call window.open: http://hg.mozilla.org/build/talos/file/2b78501af8d0/talos/startup_test/tpaint.html#l102 and the uri we open is a small blob of html: http://hg.mozilla.org/build/talos/file/2b78501af8d0/talos/startup_test/tpaint.html#l88
Comment 8•10 years ago
|
||
(In reply to Joel Maher (:jmaher) from comment #7) > Unfortunately this test doesn't do any intentional use of the notification > api. > > We call window.open: > http://hg.mozilla.org/build/talos/file/2b78501af8d0/talos/startup_test/ > tpaint.html#l102 > > and the uri we open is a small blob of html: > http://hg.mozilla.org/build/talos/file/2b78501af8d0/talos/startup_test/ > tpaint.html#l88 I really have no idea how those two things may relates :(
Flags: needinfo?(mhenretty)
Reporter | ||
Comment 9•10 years ago
|
||
interesting, I have access to the bug now and see that it landed on beta- there is no regression on beta. This could be one of two things: 1) this is the wrong patch 2) other code which is on aurora/trunk is causing this patch to show the regression
Comment 10•10 years ago
|
||
(In reply to Joel Maher (:jmaher) from comment #9) > interesting, I have access to the bug now and see that it landed on beta- > there is no regression on beta. This could be one of two things: > 1) this is the wrong patch > 2) other code which is on aurora/trunk is causing this patch to show the > regression My feeling would be much more on 1, but until proved otherwise, I cannot assert anything. Do you have any way to get the diff between the two trees ? Maybe Michael will be able to have an idea, also. He did most of the Notification API work.
Reporter | ||
Comment 11•10 years ago
|
||
Lets see what Michael says, there have been a few times in the past where it looks the the changeset, but it ends up not being it. It is rare, but if the code doesn't seem related at all we need to not close that door.
Comment 12•10 years ago
|
||
Looking at the changeset [1], there is nothing that should change the timing of the first paint. In the patch we added a few lines of pure JS, but no additional asynchronous calls, no additional loops, and no calls into xp/webidl. Even if the startup code from [2] did use the notification API (which it doesn't), the should be no significant additional running time added due to this patch. Joel, are there any other dependencies that could cause this regression unrelated to the changeset from bug 1000337? 1.) https://hg.mozilla.org/releases/mozilla-aurora/rev/f6660863c0aa 2.) http://hg.mozilla.org/build/talos/file/2b78501af8d0/talos/startup_test/tpaint.html#l88
Flags: needinfo?(mhenretty)
Updated•10 years ago
|
status-firefox31:
--- → affected
status-firefox32:
--- → affected
Reporter | ||
Comment 13•10 years ago
|
||
ok, I am starting to think it is more likely this push: https://tbpl.mozilla.org/?tree=Mozilla-Aurora&jobname=Ubuntu%20HW%2012.04%20mozilla-aurora%20pgo%20talos%20other&startdate=2014-05-21&enddate=2014-05-22&rev=5239876497e2 quite possibly from bug 1012629: https://hg.mozilla.org/releases/mozilla-aurora/rev/1684d49f5650 I will do a backout on try server and see where that gets us!
Comment 14•10 years ago
|
||
How delightful. Those patches were actually pushed to try to check perf impact and nothing showed up.
Reporter | ||
Comment 15•10 years ago
|
||
Gijs, maybe they are not the culprit, let me see how it turns out: https://tbpl.mozilla.org/?tree=Try&rev=77c6493f3987
Reporter | ||
Comment 16•10 years ago
|
||
hmm, over 10 retriggers and all are showing the old value of 148-152 (except one high point of 189). The previous work in this bug was just trying to get this down <160. Gijs, any thoughts on this?
Flags: needinfo?(gijskruitbosch+bugs)
Comment 17•10 years ago
|
||
(I'm out until Monday, unfortunately, and won't have time to look at this before then, sorry - perhaps Dão can help as it was his patch?)
Flags: needinfo?(dao)
Comment 18•10 years ago
|
||
Yeah, http://compare-talos.mattn.ca/?oldRevs=5239876497e2&newRev=77c6493f3987&server=graphs.mozilla.org&submit=true looks serious. Dão, can you look into this, please? It would be interesting, for instance, if you could trypush pgo builds with your other idea and see if that fixes our linux startup time issue. :-\ Otherwise, Joel, can you get a recent screenshot of the browser windows on those machines, so we can look at their toolbar colors and see how they would/wouldn't be affected by the patch, just so we can keep that in mind? I'm not sure where to look for something like that, but I'm hoping you know. :-)
Flags: needinfo?(gijskruitbosch+bugs) → needinfo?(jmaher)
Reporter | ||
Comment 19•10 years ago
|
||
I don't have a method for getting screenshots mid test. I do have a loaner linux64 machine where I could run the test locally and get a screenshot. This regression is on linux32, but I suspect the colors would be the same. Gijs, would that be useful?
Flags: needinfo?(jmaher)
Comment 20•10 years ago
|
||
(In reply to Joel Maher (:jmaher) from comment #19) > I don't have a method for getting screenshots mid test. I do have a loaner > linux64 machine where I could run the test locally and get a screenshot. > This regression is on linux32, but I suspect the colors would be the same. > > Gijs, would that be useful? Anything's better than nothing, so yes please. :-)
Comment 21•10 years ago
|
||
Untracking. beta 6 should be released today and we won't block the release on this bug.
Updated•10 years ago
|
Comment 22•10 years ago
|
||
I'm marking this bug as won't fix for 32 as there's been no progress. Joel - Can you confirm that the regression is still present and as severe as when you reported this 3 months ago?
Reporter | ||
Comment 23•10 years ago
|
||
we still have this regression on beta, aurora and trunk: http://graphs.mozilla.org/graph.html#tests=[[82,52,33],[82,63,33],[82,53,33]]&sel=1376993234671,1408529234671&displayrange=365&datatype=running
Flags: needinfo?(jmaher)
Comment 24•10 years ago
|
||
Thanks for confirming Joel. Reading through the comment history, this bug was left off with : - a request to Dão to look into this (comment 18) - Joel needing to get a screenshot on a loaner linux64 machine (comment 20) We're about to ship this regression in Firefox 32. I would like to see this picked back up in the next couple of weeks to see if progress can be made on 33 while on Aurora.
tracking-firefox33:
--- → +
tracking-firefox34:
--- → +
Flags: needinfo?(jmaher)
Flags: needinfo?(dao)
Flags: firefox-backlog?
Reporter | ||
Comment 25•10 years ago
|
||
took a screen shot (vnc -> machine) while the test was running. I am not sure I am getting anything useful, but now that I have this setup, I can easily try other ones Gijs, does this help at all?
Flags: needinfo?(jmaher) → needinfo?(gijskruitbosch+bugs)
Comment 26•10 years ago
|
||
(In reply to Joel Maher (:jmaher) from comment #25) > Created attachment 8477397 [details] > tpaint.png > > took a screen shot (vnc -> machine) while the test was running. I am not > sure I am getting anything useful, but now that I have this setup, I can > easily try other ones > > Gijs, does this help at all? I mean, from the looks of it we'd find a dark toolbar for some cases (i.e. tab bar, menubar (assuming we're not ignoring collapsed toolbars?)) and not for other cases (i.e. nav-bar). I still think Dão is the best person to investigate here, as it was his patch. I'm not sure why the needinfo was cleared earlier.
Flags: needinfo?(gijskruitbosch+bugs) → needinfo?(dao)
Updated•10 years ago
|
Flags: firefox-backlog? → firefox-backlog+
Updated•10 years ago
|
Whiteboard: [talos_regression][dzalerts] → [talos_regression][dzalerts] [qa?]
Updated•10 years ago
|
Whiteboard: [talos_regression][dzalerts] [qa?] → [talos_regression][dzalerts] [qa-]
Assignee | ||
Comment 27•10 years ago
|
||
Last week I sent a few diagnosis patches to try using this syntax: try: -b o -p linux,linux64 -u none -t other but no talos jobs were run. :( If somebody can spot a mistake in the syntax or knows what else could have gone wrong, please let me know.
Reporter | ||
Comment 28•10 years ago
|
||
for linux specifically, you need: -t other_l64
Assignee | ||
Comment 29•10 years ago
|
||
Thanks, I'll try again. Why does http://trychooser.pub.build.mozilla.org/ not take care of this?
Reporter | ||
Comment 30•10 years ago
|
||
an oversight on my part, I have a bug to work on now: https://bugzilla.mozilla.org/show_bug.cgi?id=1059689
Assignee | ||
Comment 31•10 years ago
|
||
So, how would I get tpaint runs on linux32 as opposed to x64?
Reporter | ||
Comment 32•10 years ago
|
||
try: -b o -p linux -u none -t all if we get trychooser fixed, it could be '-t other_nol64', maybe that works, I am not sure without attempting it though.
Updated•10 years ago
|
Points: --- → 8
Updated•10 years ago
|
Assignee: nobody → dao
Status: NEW → ASSIGNED
Iteration: --- → 35.1
Assignee | ||
Comment 33•10 years ago
|
||
The patch in bug 1061947 should cut off about half of the regression. Filed as a separate bug because I'll want to land and uplift that patch now while also investigating the other half of the regression, which might lead to more patches.
Flags: needinfo?(dao)
Assignee | ||
Comment 34•10 years ago
|
||
I think this will have to stay half fixed/wontfix. Like I said, bug 1061947 eliminated some inefficiency, but the rest of the regression appears to be inevitable because we're doing more work on startup. We could delay this until after the first paint, but then users would see the wrong icons being used for that first paint. I've nominated bug 1061947's patch for aurora and beta uplift.
Updated•10 years ago
|
Iteration: 35.1 → ---
Flags: firefox-backlog+ → firefox-backlog-
You need to log in
before you can comment on or make changes to this bug.
Description
•