Closed Bug 1368114 Opened 2 years ago Closed 2 years ago

3.25 - 7.5% Heap Unclassified / Images (linux64, windows10-64-vm, windows7-32-vm) regression on push 38efc433e5dd777d472287b3ff9899b9f6a768ac (Tue May 23 2017)

Categories

(Firefox :: Search, defect, P1)

53 Branch
defect

Tracking

()

RESOLVED FIXED
Firefox 55
Tracking Status
firefox-esr52 --- unaffected
firefox53 --- unaffected
firefox54 --- unaffected
firefox55 --- fixed

People

(Reporter: jmaher, Assigned: mak)

References

Details

(Keywords: perf, regression, Whiteboard: [fxsearch])

Attachments

(1 file)

We have detected an awsy regression from push:

https://hg.mozilla.org/integration/autoland/pushloghtml?changeset=38efc433e5dd777d472287b3ff9899b9f6a768ac

As author of one of the patches included in that push, we need your help to address this regression.

Regressions:

  8%  Images summary linux64 opt      6,274,792.17 -> 6,745,597.05
  5%  Images summary windows10-64-vm opt 6,816,928.69 -> 7,165,536.05
  3%  Heap Unclassified summary windows7-32-vm opt 37,485,449.31 -> 38,757,713.89
  3%  Heap Unclassified summary windows10-64-vm opt 42,344,482.78 -> 43,721,792.91


You can find links to graphs and comparison views for each of the above tests at: https://treeherder.mozilla.org/perf.html#/alerts?id=6808

On the page above you can see an alert for each affected platform as well as a link to a graph showing the history of scores for this test. There is also a link to a treeherder page showing the jobs in a pushlog format.

To learn more about the regressing test(s), please see: https://developer.mozilla.org/en-US/docs/Mozilla/Performance/AWSY
I have done a lot of retriggers/backfilling, this looks to be the right root cause.

:florian, I see you are the patch author of bug 1344928, can you take a look at these memory regressions to see if there is an easy way to explain this or fix this?
Flags: needinfo?(florian)
If it's related to search suggestion it may be the same as bug 1367351 (but I don't know where we set prefs for AWSY). Off-hand I don't think this memory increase may be related to that though.

The pushlog link in comment 0 looks wrong?
This is more likely:
https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=a8d70b131270b07c448f4f32a888d34c1ac546e0&tochange=38efc433e5dd777d472287b3ff9899b9f6a768ac
the pushlog shows where the alert shows up, I backfilled data, if you look at the graphs, it is easy to see where we increased.
do these tests use the prefs from testing/talos/talos/config.py?
Flags: needinfo?(florian) → needinfo?(jmaher)
Priority: -- → P1
Whiteboard: [fxsearch]
thanks bc; possibly we need to add prefs for AWSY?
Flags: needinfo?(jmaher)
erahm: thoughts before you go on pto?
Flags: needinfo?(erahm)
(In reply to Joel Maher ( :jmaher) from comment #6)
> thanks bc; possibly we need to add prefs for AWSY?

Flipping prefs to hide memory regressions is a bad idea. We should probably disable this feature until we have an idea what's going on.
Flags: needinfo?(erahm)
(In reply to Eric Rahm (Out 5/27 - 6/4) [:erahm] from comment #8)
> Flipping prefs to hide memory regressions is a bad idea. We should probably
> disable this feature until we have an idea what's going on.

no.
Measuring memory should be done in a realistic way, beneficial to our users, in this case we are showing a new notification 4 times, then it disappears in nothing.
Wasting months of work just because we can't regress memory in 5 minutes of the browser's life sounds completely wrong.

We should check through prefs.json if this is indeed the cause of the regression, if it is, the only reasonable thing is to set the pref and have AWSY measure something that makes sense.
Fwiw, some of the prefs in prefs.json are worse (that doesn't mean 2 wrongs make a right, just that we should have a clear policy):
    "startup.homepage_welcome_url": "",
    "startup.homepage_override_url": "",

this is sort-of in the same category as the new notification, a page that is shown once. But it is actually shown far more than the new notification, since it's likely once per session.

    "browser.newtab.url": "about:blank",

This completely screws up the measurement, most of our users won't bother changing the newtab page, and the newtab page is likely to take up memory. Now that we'll move to Activity Stream it's likely to be even more.

    "browser.displayedE10SNotice": 1000,

This looks old, there's no displayedE10SNotice in code. But looks like another case of a notice shown just a few times.
(In reply to Marco Bonardo [::mak] from comment #10)
>     "browser.newtab.url": "about:blank",

And now that I think about that, it doesn't even work anymore! We are fine for measuring newtab, but it's clear in the past we were ok skipping its measurement...

Both "browser.newtab.url" and "browser.displayedE10SNotice" can be removed with no consequences afaict.
what's the trychooser syntax to run AWSY on Try? Does perherder compare support it?
Flags: needinfo?(jmaher)
note, I tried running AWSY locally but I keep getting this failure:

 0:12.14 LOG: MainThread ERROR Checkpoint JavaScript error: [Exception... "Component returned failure code: 0x80520001 (
NS_ERROR_FILE_UNRECOGNIZED_PATH) [nsIMemoryInfoDumper.dumpMemoryReportsToNamedFile]"  nsresult: "0x80520001 (NS_ERROR_FI
LE_UNRECOGNIZED_PATH)"  location: "JS frame :: test_memory_usage.py :: <TOP_LEVEL> :: line 5"  data: no]
stacktrace:
        execute_script @test_memory_usage.py, line 194
        inline javascript, line 5
        src: "            dumper.dumpMemoryReportsToNamedFile("
        Stack:
        @test_memory_usage.py:5:13
        @test_memory_usage.py:0:59
        evaluate.sandbox/promise<@chrome://marionette/content/evaluate.js:154:13
        evaluate.sandbox@chrome://marionette/content/evaluate.js:105:17
        GeckoDriver.prototype.execute_@chrome://marionette/content/driver.js:894:29
        GeckoDriver.prototype.executeAsyncScript@chrome://marionette/content/driver.js:869:27
        TaskImpl_run@resource://gre/modules/Task.jsm:321:42
        TaskImpl@resource://gre/modules/Task.jsm:279:3
        asyncFunction@resource://gre/modules/Task.jsm:254:14
        Task_spawn@resource://gre/modules/Task.jsm:168:12
        TaskImpl_handleResultValue@resource://gre/modules/Task.jsm:391:16
        TaskImpl_run@resource://gre/modules/Task.jsm:329:15
        TaskImpl@resource://gre/modules/Task.jsm:279:3
        asyncFunction@resource://gre/modules/Task.jsm:254:14
        Task_spawn@resource://gre/modules/Task.jsm:168:12
        execute@chrome://marionette/content/server.js:510:15
        onPacket@chrome://marionette/content/server.js:481:7
        _onJSONObjectReady/<@chrome://marionette/content/transport.js:480:9
try: -b o -p linux64,win64 -u awsy-e10s -t none

https://treeherder.mozilla.org/#/jobs?repo=try&revision=493b08553c146446c8b55754c0fbf337855a900e is an example and it is submitted to perfherder.

How are you trying to run it? Is this in a local build directory? What is the command line? What platform?

./mach awsy-test --quick 

will do a quick check and works for me locally on linux.
mid air collision with :bc and he answered it just as well as I did!
Flags: needinfo?(jmaher)
(In reply to Bob Clary [:bc:] from comment #14)
> How are you trying to run it? Is this in a local build directory? What is
> the command line? What platform?
> 
> ./mach awsy-test --quick 

This is exactly what I'm doing. Win10 x64, my local full build (though it's an --enable-optimized --enable-debug) out of the latest mozilla-central.
I tested on windows 7 64 bit originally but that was some time ago and with an opt build and vs 2015. My qemu vm takes forever to build though and I'm in the middle of something else at the moment. As soon as I can I'll refresh my opt build to make sure it still works and then try a debug build, but that will take many hours.

You are sure your mozconfig matches that particular build configuration?

Can you try an opt build quickly and see if that works or not?

You have mozilla-build 2.2.0? Which compiler did you use? 2015 or 2017? You performed the mach command in the proper msys environment?
And you path doesn't include spaces does it?
(In reply to Bob Clary [:bc:] from comment #18)
> And you path doesn't include spaces does it?

nope, I'm in D:\src\
regarding the rest VS 2015, mozillaBuild-latest (2.2.0 iirc). Didn't try an opt build yet (maybe tomorrow).
(In reply to Marco Bonardo [::mak] from comment #9)
> (In reply to Eric Rahm (Out 5/27 - 6/4) [:erahm] from comment #8)
> > Flipping prefs to hide memory regressions is a bad idea. We should probably
> > disable this feature until we have an idea what's going on.
> 
> no.
> Measuring memory should be done in a realistic way, beneficial to our users,
> in this case we are showing a new notification 4 times, then it disappears
> in nothing.
> Wasting months of work just because we can't regress memory in 5 minutes of
> the browser's life sounds completely wrong.
> 
> We should check through prefs.json if this is indeed the cause of the
> regression, if it is, the only reasonable thing is to set the pref and have
> AWSY measure something that makes sense.

Oh sorry, that was a misunderstanding! If it's just a first time notification I'm fine disabling the prompt, just not the entire feature :)
(In reply to Marco Bonardo [::mak] from comment #16)
> (In reply to Bob Clary [:bc:] from comment #14)
> > How are you trying to run it? Is this in a local build directory? What is
> > the command line? What platform?
> > 
> > ./mach awsy-test --quick 
> 
> This is exactly what I'm doing. Win10 x64, my local full build (though it's
> an --enable-optimized --enable-debug) out of the latest mozilla-central.

Perhaps try specifying the resultsDir (that's what it's choking on). Maybe:

> ./mach awsy-test --quick --resultsDir .

or maybe it's unhappy with your D drive, maybe try:

> ./mach awsy-test --quick --resultsDir 'C:/foo/'
(In reply to Eric Rahm (Out 5/27 - 6/4) [:erahm] from comment #22)
> > ./mach awsy-test --quick --resultsDir .

It looks like you passed an unrecognized argument into mach.
The awsy-test command does not accept the arguments: --resultsDir

The command also seems to create temp files in the src dir (in a tooltool-cache folder) and then mercurial notices those files as untracked... it should probably not create files in the src dir by default, but rather use the tmp dir.

Btw, I resolved by running the test on Try, for now I don't have the time to dig deeper.
In the patch I removed the "browser.newtab.url" and "browser.displayedE10SNotice" prefs that are no more used by Firefox, and sorted the list alphabetically.
Assignee: nobody → mak77
Status: NEW → ASSIGNED
(In reply to Marco Bonardo [::mak] from comment #13)
> note, I tried running AWSY locally but I keep getting this failure:

filed bug 1368751
Comment on attachment 8872289 [details]
Bug 1368114 - Don't mesure memory added by a notification shown only a few times to the user in AWSY.

https://reviewboard.mozilla.org/r/143792/#review147974

thanks for fixing this and looking over the other preferences.
Attachment #8872289 - Flags: review?(jmaher) → review+
Pushed by mak77@bonardo.net:
https://hg.mozilla.org/integration/autoland/rev/9309326b2723
Don't mesure memory added by a notification shown only a few times to the user in AWSY. r=jmaher
https://hg.mozilla.org/mozilla-central/rev/9309326b2723
Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 55
and the improvements are in:
== Change summary for alert #6972 (as of May 30 2017 18:08 UTC) ==

Improvements:

  6%  Images summary windows10-64-vm opt      7,099,886.30 -> 6,649,327.20
  6%  Images summary windows7-32-vm opt       5,491,150.84 -> 5,171,798.56
  5%  Images summary linux64 opt              6,699,775.09 -> 6,353,229.92
  3%  Heap Unclassified summary windows7-32-vm opt 40,057,014.85 -> 38,709,302.16
  2%  Heap Unclassified summary windows10-64-vm opt 44,941,237.42 -> 43,888,684.37

For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=6972
You need to log in before you can comment on or make changes to this bug.