Closed Bug 1218779 Opened 9 years ago Closed 9 years ago

150% android tcheck2 regression on Mozilla-Inbound (v.44) on Oct 26, 2015 from push 51036998fbc8

Categories

(Core :: Graphics, defect)

defect
Not set
normal

Tracking

()

RESOLVED WONTFIX
Tracking Status
firefox44 --- affected

People

(Reporter: jmaher, Unassigned)

References

Details

(Keywords: perf, regression, Whiteboard: [talos_regression][gfx-noted])

Talos has detected a Firefox performance regression from your commit cae63161b689624cf55c4391c42b8e7683a72654 in bug 1217617.  We need you to address this regression.

This is a list of all known regressions and improvements related to your bug:
http://alertmanager.allizom.org:8080/alerts.html?rev=cae63161b689624cf55c4391c42b8e7683a72654&showAll=1

On the page above you can see Talos alert for each affected platform as well as a link to a graph showing the history of scores for this test. There is also a link to a treeherder page showing the Talos jobs in a pushlog format.

To learn more about the regressing test, please see: https://wiki.mozilla.org/Buildbot/Talos/Tests#robocop

Reproducing and debugging the regression:
If you would like to re-run this Talos test on a potential fix, use try with the following syntax:
try: -b o -p android-api-11 -u none -t remote-trobocheck2  # add "mozharness: --spsProfile" to generate profile data

To run the test locally and do a more in-depth investigation, first set up a local Talos environment:
https://wiki.mozilla.org/Buildbot/Talos/Running#Running_locally_-_Source_Code

Then run the following command from the directory where you set up Talos:
talos --develop -e <path>/firefox -a tcheck2

Making a decision:
As the patch author we need your feedback to help us handle this regression.
*** Please let us know your plans by Friday, or the offending patch will be backed out! ***

Our wiki page oulines the common responses and expectations:
https://wiki.mozilla.org/Buildbot/Talos/RegressionBugsHandling
you can see the regression on a graph:
https://treeherder.mozilla.org/perf.html#/graphs?timerange=1209600&series=[mozilla-inbound,fdad6ae27544b0dd52113fce3184968100190e76,1]

and in compare mode:
https://treeherder.allizom.org/perf.html#/comparesubtest?originalProject=mozilla-inbound&originalRevision=67770106a029&newProject=mozilla-inbound&newRevision=cae63161b689&originalSignature=fdad6ae27544b0dd52113fce3184968100190e76&newSignature=fdad6ae27544b0dd52113fce3184968100190e76

Overall, this test is almost deprecated due to upcoming APZ changes, but this is a pretty major regression.

:kats is this something we should look into, if so, we can see if Yury has ideas on what could be the cause or what we could change.
Flags: needinfo?(bugmail.mozilla)
The change in bug 1217617 is needed to have a proper state of the C++ code -- we were just lucky to not have this issue in the past. There is a huge chance that next contributor to the netwerk/base/ directory will encounter the same issue -- and it will be waste of time for the contributor to find the real reason of the problem.

Since this issue affects only android platform AFAIK, so there may be an issue with how C++ build infrastructure works with it, so I recommend to also seek help from people who dealt with android build system and/or UNIFED_SOURCES problem in the past.
Sorry, this regression doesn't make any kind of sense. The changeset that is being blamed here is a compile fix and should have no functional effect.
Flags: needinfo?(bugmail.mozilla)
(In reply to Joel Maher (:jmaher) from comment #1)
> and in compare mode:
> https://treeherder.allizom.org/perf.html#/
> comparesubtest?originalProject=mozilla-
> inbound&originalRevision=67770106a029&newProject=mozilla-
> inbound&newRevision=cae63161b689&originalSignature=fdad6ae27544b0dd52113fce31
> 84968100190e76&newSignature=fdad6ae27544b0dd52113fce3184968100190e76

This range has two changesets, yet only one is listed in comment 0. Also this link shows only a 72% regression, where did the 325% come from?
From graphs.m.o [1] (it's still way better than perfherder for this sort of thing) it's obvious that the regressing changeset is actually bug 1210351 which makes a lot more sense.

[1] http://graphs.mozilla.org/graph.html#tests=[[201,63,29]]&sel=1445797128356.3076,1445856279125.5386,0.40095490885805063,15.10262555324946&displayrange=7&datatype=geo
Probably unrelated, but I received different notification about "(Improvement) Mozilla-Inbound - tscroll-ASAP - MacOSX 10.10 - 5.25%", but in this case changeset range includes more that 2 commits:

http://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=67770106a029c671f6af7a6b9653eedef3ef08aa&tochange=51036998fbc8f650a2e8d6800b53ed58ab7887ad

Why the inbound is measured twice with the same changeset?
Also, comment 0 links to [1], which has a changeset link to [2] which is a different set of patches from the "compare mode" link in comment 1.

Which patches are really involved here?

And if bug 1196654 is the one suspected here, why was I cc'ed but not the patch author or reviewer? Though I really don't think 1196654 is at fault here given that all of the code touched there is B2G specific.

[1] http://alertmanager.allizom.org:8080/alerts.html?rev=cae63161b689624cf55c4391c42b8e7683a72654&showAll=1
[2] http://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=67770106a029c671f6af7a6b9653eedef3ef08aa&tochange=cae63
ah, I hit the wrong push.  Yes, bug 1210351 is the culprit- bad alert verification!  In general, I find the suspect revision and cc the author and reviewers.  In my mistaken push from earlier there were 2 pushes, 1 for a test only change, the other what I called the root cause.
Blocks: 1210351
No longer blocks: 1217617
and here is the compare view:
https://treeherder.allizom.org/perf.html#/comparesubtest?originalProject=mozilla-inbound&originalRevision=11b00fe66b65&newProject=mozilla-inbound&newRevision=51036998fbc8&originalSignature=fdad6ae27544b0dd52113fce3184968100190e76&newSignature=fdad6ae27544b0dd52113fce3184968100190e76
Summary: 325% anroid tcheck2 regression on Mozilla-Inbound (v.44) on Oct 26, 2015 from push cae63161b689 → 150% android tcheck2 regression on Mozilla-Inbound (v.44) on Oct 26, 2015 from push 51036998fbc8
Component: Networking → Graphics
I believe this patch should stay. It partially undoes a fairly recent commit (https://hg.mozilla.org/mozilla-central/rev/9fea88097171). So this is only a regression from nightly, and we should still be performing better than on any release.

And most importantly the gigantic tile sizes are using *a lot* of memory, and we're currently having major issues with memory consumption on some android devices (bug 1165951, bug 1164027, and others). They're not the root cause of these crashes but they're certainly not helping.
thanks Jamie!  That sounds logical and a good tradeoff.  I will see if anyone else chimes in with concerns about this.
Whiteboard: [talos_regression] → [talos_regression][gfx-noted]
shall we close this as wontfix?
Flags: needinfo?(jnicol)
I would say so, yes
Flags: needinfo?(jnicol)
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.