1371214 - 18.77% Resident Memory (android-api-15-gradle) regression on push 1c66da2b1e88e56a1b4de45a0721808eacb16338 (Wed Jun 7 2017)

Reporter

Description

•

7 years ago

We have detected an awsy regression from push:

https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?changeset=1c66da2b1e88e56a1b4de45a0721808eacb16338

As author of one of the patches included in that push, we need your help to address this regression.

Regressions:

 19%  Resident Memory summary android-api-15-gradle opt      179,889,816.92 -> 213,650,557.25


You can find links to graphs and comparison views for each of the above tests at: https://treeherder.mozilla.org/perf.html#/alerts?id=7112

On the page above you can see an alert for each affected platform as well as a link to a graph showing the history of scores for this test. There is also a link to a treeherder page showing the jobs in a pushlog format.

To learn more about the regressing test(s), please see: https://developer.mozilla.org/en-US/docs/Mozilla/Performance/AWSY

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 1

•

7 years ago

:smaug, I see you authored bug 1367905, could you take a look at this regression and help figure out if this is expected and what we can do if anything to reduce or remove the regression?

Component: Untriaged → XPCOM

Flags: needinfo?(bugs)

Product: Firefox → Core

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 2

•

7 years ago

as a note, on the perfherder graph you can see the regression, then improvement on the backout.  Now that the patch landed again we see the regression sustained.

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 3

•

7 years ago

odd, I see this is the mochitest-chrome-2 job, not awsy- the data is posted to my awsy dashboard.

It appears this was added as awsy_lite 11 months ago for android in bug 1233220- in case that helps understand what we are testing.

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 4

•

7 years ago

Changing CC/GC timings affects easily to memory reporting.

When is the measurement done? After running what kinds of tests?

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 5

•

7 years ago

ah, looking bug 1233220

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 6

•

7 years ago

How do I see which of the numbers regressed? The test does several checkpoints, and I could easily see some of those being affected (because we're trying to not GC/CC while browser is busy with other stuff).

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 7

•

7 years ago

I am not familiar with the script, possibly :erahm could help?

Flags: needinfo?(erahm)

Eric Rahm [:erahm]

Comment 8

•

7 years ago

Geoff wrote that test, I'm not particularly familiar with android testing. The test follows roughly the same logic as the standard AWSY test but uses slightly different methods.

Looking at the sub-tests we see startup regressed the most, even after a 30 second delay which seems to imply fennec never gets idle enough after startup to run GC/CC:

> Med confidence:
> RSS After tabs [+30s, forced GC] opt   204,187,648.00 ± 1.69%   <   214,956,032.00 ± 1.87%  5.27%   
> RSS After tabs [+30s] opt              206,264,320.00 ± 1.28%   <   214,933,504.00 ± 1.88%  4.20%   
> RSS After tabs opt                     208,576,512.00 ± 0.74%   <   217,415,680.00 ± 1.99%  4.24%   
> 
> High confidence:
> **RSS Fresh start [+30s] opt           153,581,568.00 ± 1.69%   <   231,287,808.00 ± 0.85%  50.60%  
> **RSS Fresh start opt                  179,819,520.00 ± 5.05%   <   289,598,464.00 ± 1.21%  61.05%  
> RSS Tabs closed [+30s, forced GC] opt  163,103,744.00 ± 1.27%   <   179,809,280.00 ± 1.72%  10.24%  
> RSS Tabs closed [+30s] opt             163,103,744.00 ± 1.27%   <   179,809,280.00 ± 1.72%  10.24%  
> RSS Tabs closed opt                    190,893,056.00 ± 1.28%   <   202,011,648.00 ± 1.39%  5.82%   

That we're seeing regressions after "forcing" a GC [1] is interesting as well. smaug, how legit does that function look? Did you changes make explicitly running GC no longer work?

[1] http://searchfox.org/mozilla-central/source/mobile/android/tests/browser/chrome/test_awsy_lite.html#66-105

Flags: needinfo?(erahm) → needinfo?(gbrown)

Eric Rahm [:erahm]

Comment 9

•

7 years ago

(In reply to Olli Pettay [:smaug] from comment #6)
> How do I see which of the numbers regressed? The test does several
> checkpoints, and I could easily see some of those being affected (because
> we're trying to not GC/CC while browser is busy with other stuff).

Hover over the result name in the alert and then click 'subtests'.

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 10

•

7 years ago

Well, we do run GC/CC also using timers if there isn't idle time.

I didn't change the code which is used to explicitly trigger GC.
Can RSS stay higher if we at some point used more memory and then it got fragmented?

Flags: needinfo?(bugs)

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 11

•

7 years ago

Where do I see those individual test numbers?
Treeherder shows just "Resident Memory summary opt: 211770140"

Flags: needinfo?(erahm)

Geoff Brown [:gbrown]

Comment 12

•

7 years ago

There's more info in the test log, if that's convenient:

https://public-artifacts.taskcluster.net/SHEUkTJpTs-X2OyGbSp-bg/0/public/logs/live_backing.log

09:25:08     INFO -  101 INFO TEST-START | mobile/android/tests/browser/chrome/test_awsy_lite.html
09:25:08     INFO -  102 INFO Fresh start | Resident Memory: 290934784
09:25:52     INFO -  103 INFO Fresh start [+30s] | Resident Memory: 231944192
09:25:52     INFO -  104 INFO opening tab with url [http://mochi.test:8888/chrome/mobile/android/tests/browser/chrome/tp5/baidu.com/www.baidu.com/s@wd=mozilla.html]
09:26:02     INFO -  105 INFO opening tab with url [http://mochi.test:8888/chrome/mobile/android/tests/browser/chrome/tp5/twitter.com/twitter.com/ICHCheezburger.html]
09:26:24     INFO -  106 INFO opening tab with url [http://mochi.test:8888/chrome/mobile/android/tests/browser/chrome/tp5/msn.com/www.msn.com/index.html]
09:26:46     INFO -  107 INFO opening tab with url [http://mochi.test:8888/chrome/mobile/android/tests/browser/chrome/tp5/163.com/www.163.com/index.html]
09:27:51     INFO -  108 INFO opening tab with url [http://mochi.test:8888/chrome/mobile/android/tests/browser/chrome/tp5/bbc.co.uk/www.bbc.co.uk/news/index.html]
09:28:13     INFO -  109 INFO After tabs | Resident Memory: 221622272
09:28:44     INFO -  110 INFO After tabs [+30s] | Resident Memory: 218329088
09:28:44     INFO -  111 INFO After tabs [+30s, forced GC] | Resident Memory: 218345472
09:28:44     INFO -  112 INFO Tabs closed | Resident Memory: 203350016
09:29:17     INFO -  113 INFO Tabs closed [+30s] | Resident Memory: 181518336
09:29:17     INFO -  114 INFO Tabs closed [+30s, forced GC] | Resident Memory: 181518336
09:29:17     INFO -  115 INFO PERFHERDER_DATA: {"framework": {"name": "awsy"}, "suites": [{"name": "Resident Memory", "subtests": [{"name": "Fresh start", "value": 290934784}, {"name": "Fresh start [+30s]", "value": 231940096}, {"name": "After tabs", "value": 221622272}, {"name": "After tabs [+30s]", "value": 218329088}, {"name": "After tabs [+30s, forced GC]", "value": 218345472}, {"name": "Tabs closed", "value": 203350016}, {"name": "Tabs closed [+30s]", "value": 181518336}, {"name": "Tabs closed [+30s, forced GC]", "value": 181518336}], "value": 216217447}]}
09:29:17     INFO -  116 INFO TEST-PASS | mobile/android/tests/browser/chrome/test_awsy_lite.html | memory logging complete -- view results in Perfherder
09:29:17     INFO -  117 INFO TEST-OK | mobile/android/tests/browser/chrome/test_awsy_lite.html | took 244042ms

Eric Rahm [:erahm]

Comment 13

•

7 years ago

(In reply to Olli Pettay [:smaug] from comment #11)
> Where do I see those individual test numbers?
> Treeherder shows just "Resident Memory summary opt: 211770140"

See comment 9, here's the full link: https://treeherder.mozilla.org/perf.html#/comparesubtest?originalProject=mozilla-inbound&originalRevision=6fd10955e89903caf8f92e2c600aa2aa4a2be496&newProject=mozilla-inbound&newRevision=1c66da2b1e88e56a1b4de45a0721808eacb16338&originalSignature=5d7ccb5c36cbfb7c60a7b06e92f07eaa9e59b122&newSignature=5d7ccb5c36cbfb7c60a7b06e92f07eaa9e59b122&framework=4

Eric Rahm [:erahm]

Updated

•

7 years ago

Flags: needinfo?(erahm)

Geoff Brown [:gbrown]

Comment 14

•

7 years ago

I don't see any other unanswered questions for me...let me know if I can be of help.

Also cc'd snorp in case he is interested / has something to add.

Flags: needinfo?(gbrown)

Ben Kelly [:bkelly, not reviewing]

Comment 15

•

7 years ago

Am I reading this right that we are running a performance-like test on android emulator hardware?  Or does this somehow run on a real device?

FWIW I don't think we should tune GC/CC for android emulator.

Geoff Brown [:gbrown]

Comment 16

•

7 years ago

This test is indeed running on an android emulator, which is running on an aws host. The previous iteration of awsy-for-android ran on real hardware but was always breaking. The thinking was something like "even though the emulator is limited, it is reliable and scalable".

It occurs to me that today, another option is to run something like this on Autophone, using a real device.

Ben Kelly [:bkelly, not reviewing]

Comment 17

•

7 years ago

The android emulator is something like an order of magnitude slower than our slowest real physical device.  While memory seems like it shouldn't be timing dependent, it definitely is given we have to choose when to run GC/CC.

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 18

•

7 years ago

overall this is pretty stable.  All of the AWSY desktop tests run on AWS VMs (I know emulators are another factor of slow on top of that).

Autophone might be a real option.  I don't want to discount what we have as it is pretty stable (if you accept a 15MB range of values):
https://treeherder.mozilla.org/perf.html#/graphs?timerange=31536000&series=%5Bmozilla-inbound,5d7ccb5c36cbfb7c60a7b06e92f07eaa9e59b122,1,4%5D

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 19

•

7 years ago

Testing what might cause this
https://treeherder.mozilla.org/#/jobs?repo=try&revision=85bbdd696b82e433b3df31ecf4b7afdd1dbdcb84

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 20

•

7 years ago

I see bug 1367905 backed out and have seen a few improvements, working on analyzing them.

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 21

•

7 years ago

I'll probably land bug 1367905 today again.

Ionuț Goldan [:igoldan]

Comment 22

•

7 years ago

The re-land considerably dimmed down the original regressions, but 4-5% from them still remained:

== Change summary for alert #7193 (as of June 10 2017 00:34 UTC) ==

Regressions:

  5%  Resident Memory summary android-4-2-x86 opt      182,034,354.17 -> 190,508,672.67
  4%  Resident Memory summary android-api-15-gradle opt 179,586,539.42 -> 186,667,111.08

For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=7193

Ionuț Goldan [:igoldan]

Comment 23

•

7 years ago

I noticed you pushed a bigger patch, which involved 3 bugs, including this one. If either bug 1371224 or bug 1371438 is related to this issue, please let me know in a comment.

Flags: needinfo?(bugs)

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 24

•

7 years ago

This is still from bug 1367905. The difference between the first landing and the current one is
https://bug1367905.bmoattachments.org/attachment.cgi?id=8876336
Especially 
-  int64_t budget = sActiveIntersliceGCBudget;
+  // We use longer budgets when timer runs since that means
+  // there hasn't been idle time recently and we may have significant amount
+  // garbage to collect.
+  int64_t budget = sActiveIntersliceGCBudget * 2;

Before bug 1367905 we collected 10ms after tick and 40ms after timer (which run only when refresh driver wasn't ticking), 
now 5-50ms during idle time, and 10ms when timer runs (it runs if there hasn't been idle time).

Perhaps I need to bring back some of the old behavior that when there isn't idle time, we collect up to 10ms after tick, though, the test doesn't seem to load pages too fast after previous ones. There should be idle time.

Flags: needinfo?(bugs)

Ben Kelly [:bkelly, not reviewing]

Comment 25

•

7 years ago

Can we measure if this shows a regression on a real device?  I really don't think we should be tuning for the emulator. (Sorry for repeating myself.)

Nathan Froyd [:froydnj]

Updated

•

7 years ago

Priority: -- → P3

BMO Automation

Updated

•

2 years ago

Severity: normal → S3