Closed Bug 1230661 Opened 10 years ago Closed 10 years ago

4% win8 e10s ts_paint regression on fx-team (v.45) on dec 2, 2015 from revision 2d33fa3346b2

Tracking

()

Status:

RESOLVED FIXED

Milestone:

Firefox 46

Tracking Flags:

Tracking

Status

e10s

---

firefox45

fixed

firefox46

---

fixed

People

(Reporter: jmaher, Assigned: florian)

References

(Blocks 1 open bug)

Details

(Keywords: perf, regression, Whiteboard: [talos_regression][e10s])

Attachments

(2 files)

avoid calling the search service's currentEngine setter during startup, r=? 10 years ago Florian Quèze [:florian] 5.62 KB, patch	mak : review+	Details \| Diff \| Splinter Review
Patch (as landed) 10 years ago Florian Quèze [:florian] 4.52 KB, patch	Sylvestre : approval-mozilla-aurora+	Details \| Diff \| Splinter Review

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Description

•

10 years ago

Talos has detected a Firefox performance regression from your commit 2d33fa3346b2e4bfd7720a9f1f19e4a59b84e143 in bug 1227045. We need you to address this regression. This is a list of all known regressions and improvements related to your bug: http://alertmanager.allizom.org:8080/alerts.html?rev=2d33fa3346b2e4bfd7720a9f1f19e4a59b84e143&showAll=1 On the page above you can see Talos alert for each affected platform as well as a link to a graph showing the history of scores for this test. There is also a link to a treeherder page showing the Talos jobs in a pushlog format. To learn more about the regressing test, please see: https://wiki.mozilla.org/Buildbot/Talos/Tests#ts_paint Reproducing and debugging the regression: If you would like to re-run this Talos test on a potential fix, use try with the following syntax: try: -b o -p win64 -u none -t other # add "mozharness: --spsProfile" to generate profile data To run the test locally and do a more in-depth investigation, first set up a local Talos environment: https://wiki.mozilla.org/Buildbot/Talos/Running#Running_locally_-_Source_Code Then run the following command from the directory where you set up Talos: talos --develop -e <path>/firefox -a ts_paint Making a decision: As the patch author we need your feedback to help us handle this regression. *** Please let us know your plans by Monday, or the offending patch will be backed out! *** Our wiki page outlines the common responses and expectations: https://wiki.mozilla.org/Buildbot/Talos/RegressionBugsHandling here is a compareview to show the raw numbers on retrigger: https://treeherder.allizom.org/perf.html#/compare?originalProject=fx-team&originalRevision=872043399490&newProject=fx-team&newRevision=2d33fa3346b2&showOnlyImportant=0&showOnlyConfident=1 there is a lot of noise in the compare view, but the ts_paint does show a distinct difference.

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 1

•

10 years ago

I have done the retriggers on fx-team: https://treeherder.mozilla.org/#/jobs?repo=fx-team&filter-searchStr=Windows%207%20opt%20Talos%20Performance%20e10s%20Talos%20other%20e10s%20T-e10s%28o%29&fromchange=efc976f76ff0&tochange=ed6fd38d7eb2 I pushed a backout to try since this test is a noisy one and has false positives sometimes, in a few hours results should be here: https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=e00ff88c3366&newProject=try&newRevision=abf3339045e9

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Updated

•

10 years ago

Summary: 4% win8 e10s ts_paint regression on fx-team (v.445) on dec 2, 2015 from revision 2d33fa3346b2 → 4% win8 e10s ts_paint regression on fx-team (v.45) on dec 2, 2015 from revision 2d33fa3346b2

Marco Bonardo [:mak]

Comment 2

•

10 years ago

it would be more likely if the regression is due to bug 1228627 or panorama removal than the search change... But I didn't check if those landed close enough, so I'm just guessing. The only other possibility I may think of is that on startup we read currentEngine and that now causes us to write to disk... But it sounds unlikely, especially cause Ts runs multiple times but this may happen just once. Florian would know better.

Marco Bonardo [:mak]

Comment 3

•

10 years ago

ok looks like the suspect changes I mentioned landed far enough from this to be connected with the regression, so they are off the hook.

Florian Quèze [:florian]

Assignee

Comment 4

•

10 years ago

(In reply to Marco Bonardo [::mak] from comment #2) > it would be more likely if the regression is due to bug 1228627 or panorama > removal than the search change... But I didn't check if those landed close > enough, so I'm just guessing. > > The only other possibility I may think of is that on startup we read > currentEngine and that now causes us to write to disk... Any disk write of the cache file is delayed by a 1000ms timer, and the file is written asynchronously with OS.File. Also, if the tests are using a clean profile, we were already writing the cache file to disk 1000ms after the search service's initialization. I don't see how that search change could regress performance, but I also don't know well what is being run for ts_paint.

Florian Quèze [:florian]

Assignee

Comment 5

•

10 years ago

Hmm... random guess: we may now be calling the currentEngine setter in a case when we were not before; this could send a notification that causes about:home to update/redraw something.

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 6

•

10 years ago

Thanks for the quick comments! The backout on try does show the improvement, so it looks like we found the right two patches. Now to figure out why that is the case. It sounds like there are some ideas about what might be happening. ts_paint shouldn't be going to about:home, it loads a simple page on the commandline when launching firefox: https://dxr.mozilla.org/mozilla-central/source/testing/talos/talos/startup_test/tspaint_test.html then again, it could be loading about:home, then the page requested.

Florian Quèze [:florian]

Assignee

Comment 7

•

10 years ago

It would probably be interesting to run this code locally with the search service logging enabled, to compare a log of before/after and see if there are obvious differences.

Jim Mathies [:jimm]

Comment 8

•

10 years ago

[Tracking Requested - why for this release]: impacts fx45 release

Blocks: e10s-perf

status-firefox45: --- → affected

tracking-e10s: --- → +

tracking-firefox45: --- → ?

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 9

•

10 years ago

this is on aurora now, :florian, can you update this bug of when this will be fixed?

Assignee: nobody → florian

Flags: needinfo?(florian)

Florian Quèze [:florian]

Assignee

Comment 10

•

10 years ago

(In reply to Joel Maher (:jmaher) from comment #9) > this is on aurora now, :florian, can you update this bug of when this will > be fixed? I'll try to look at this early next week, but I'm not really sure how to get started on debugging this. Running the test by following the instructions in comment 0 only gives me a set of numbers in the output. Is there an easy way to run the same test but with the console output of the Firefox process visible, so that I can use dump() to check if some code within the search service is reached?

Flags: needinfo?(florian) → needinfo?(jmaher)

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 11

•

10 years ago

when you run this locally, it creates a local.out and local.json. this has the raw data. local.json has more raw data, is parsable (this is what perfherder uses), but might take a bit to get used to. local.out is exactly what would be sent to graph server. Pick what you like, then you can run- save off a copy, run again, and compare the two files. while running locally, it should print out all the output from the console, if not, add --debug flag to the command line and it will print out the console info. Then dump(...) should work. I know I depended on dump() recently while tracking down a common timeout in talos earlier this week.

Flags: needinfo?(jmaher)

Florian Quèze [:florian]

Assignee

Comment 12

•

10 years ago

I added some dumps in the currentEngine setter. A surprise there is that when running ts_paint, the setter is called 1 to 9 times, and this doesn't seem deterministic at all. I need to investigate more to decide if the way forward is to try to make this more deterministic, or to prevent the setter from being called at all by the getter when we are starting with a fresh profile and not customizing the search engine.

Florian Quèze [:florian]

Assignee

Comment 13

•

10 years ago

I looked at the stacks. The code that's called consistently is BrowserGlue.prototype._syncSearchEngines from nsBrowserGlue.js (I'm looking forward to removing this code as part of my next wave or search service cleanups, likely during Q1). Sometimes there's code from ContentSearch.jsm that is called from the notification fired by the currentEngine setter. This generates a message for the content process. Sometimes there's also code from the searchbar (search.xml) that runs as a result of the notification from the currentEngine setter. This can lead to initializing the some of the searchbar visible UI elements twice: once from the notification, and once from the normal initialization code path that can happen later. Rarely, there's code from BrowserUITelemetry.jsm that joins the party. I think this will happen only if the search service is already fully initialized when BrowserUITelemetry is getting initialized. All of these are potential candidates for the perf hit, but the double initialization of the searchbar UI seems more likely to have a significant impact. I still think the safest solution is to avoid firing notifications from the search service's currentEngine setter during startup.

Florian Quèze [:florian]

Assignee

Comment 14

•

10 years ago

Attached patch avoid calling the search service's currentEngine setter during startup, r=? — Details — Splinter Review

This changes the implementation of the currentEngine getter to avoid calling the setter unless it's really necessary. With this patch applied, the currentEngine setter is no longer called during ts_paint (and the currentEngine getter is still called 1 to 4 times... but that's definitely not a regression caused by my previous patch). The change I had to make to the test_geodefaults.js test seems to indicate that even before my previous patch we were already calling the setter (and so updating the cache on disk) more often than needed.

Attachment #8701089 - Flags: review?(mak77)

Florian Quèze [:florian]

Assignee

Comment 15

•

10 years ago

Comment on attachment 8701089 [details] [diff] [review] avoid calling the search service's currentEngine setter during startup, r=? Please ignore the unrelated aboutDialog.xul changes.

Florian Quèze [:florian]

Assignee

Comment 16

•

10 years ago

https://treeherder.mozilla.org/#/jobs?repo=try&revision=23920e3a3b73

Florian Quèze [:florian]

Assignee

Comment 17

•

10 years ago

https://treeherder.mozilla.org/#/jobs?repo=try&revision=c0188fcefc3b

Florian Quèze [:florian]

Assignee

Comment 18

•

10 years ago

(In reply to Florian Quèze [:florian] [:flo] from comment #17) > https://treeherder.mozilla.org/#/jobs?repo=try&revision=c0188fcefc3b Looks good to me: https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=c0188fcefc3b&newProject=try&newRevision=abf3339045e9&framework=1&showOnlyImportant=0 The other try run also looks good (all the oranges seem intermittent).

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 19

•

10 years ago

great stuff!

Marco Bonardo [:mak]

Updated

•

10 years ago

Attachment #8701089 - Flags: review?(mak77) → review+

Florian Quèze [:florian]

Assignee

Comment 20

•

10 years ago

https://hg.mozilla.org/integration/fx-team/rev/a6a1f948a4665f48205e819c3defa7adf1de1b00 Bug 1230661 - avoid calling the search service's currentEngine setter during startup, r=mak.

Florian Quèze [:florian]

Assignee

Comment 21

•

10 years ago

Attached patch Patch (as landed) — Details — Splinter Review

Approval Request Comment [Feature/regressing bug #]: bug 1227045 [User impact if declined]: startup performance regression [Describe test coverage new/current, TreeHerder]: tested the fix on try. [Risks and why]: low risk, as the code is well covered by unit tests. [String/UUID change made/needed]: none. (note: I would usually wait for the patch to reach mozilla-central before requesting approval, but I'll likely be off by the time this happens, so requesting now.)

Attachment #8701475 - Flags: approval-mozilla-aurora?

Ryan VanderMeulen [:RyanVM]

Comment 22

•

10 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/a6a1f948a466

Status: NEW → RESOLVED

Closed: 10 years ago

status-firefox46: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → Firefox 46

Sylvestre Ledru [:Sylvestre]

Comment 23

•

10 years ago

Comment on attachment 8701475 [details] [diff] [review] Patch (as landed) Fix a perf regression, taking it.

Attachment #8701475 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+

Sylvestre Ledru [:Sylvestre]

Updated

•

10 years ago

tracking-firefox45: ? → +

Carsten Book [:Tomcat]

Comment 24

•

10 years ago

bugherder uplift

https://hg.mozilla.org/releases/mozilla-aurora/rev/31a67ea4e0e8

status-firefox45: affected → fixed

Mayur Patil

Comment 25

•

9 years ago

[bugday-20160323] Status: RESOLVED,FIXED -> UNVERIFIED Comments: STR: Not clear. Developer specific testing Component: Name Firefox Version 46.0b9 Build ID 20160322075646 Update Channel beta User Agent Mozilla/5.0 (Windows NT 6.1; WOW64; rv:46.0) Gecko/20100101 Firefox/46.0 OS Windows 7 SP1 x86_64 Expected Results: Developer specific testing Actual Results: As expected

You need to log in before you can comment on or make changes to this bug.