1464058 - 51.6 - 159.95% sessionrestore_many_windows (windows10-64, windows10-64-qr) regression on push 4269e02632264e158ac9d7cfc12cb33ab0dc8bf5 (Thu May 24 2018)

Reporter

Description

•

7 years ago

Talos has detected a Firefox performance regression from push: https://hg.mozilla.org/integration/autoland/pushloghtml?changeset=4269e02632264e158ac9d7cfc12cb33ab0dc8bf5 As author of one of the patches included in that push, we need your help to address this regression. Regressions: 160% sessionrestore_many_windows windows10-64 pgo e10s stylo 834.42 -> 2,169.08 52% sessionrestore_many_windows windows10-64-qr opt e10s stylo 2,609.17 -> 3,955.42 You can find links to graphs and comparison views for each of the above tests at: https://treeherder.mozilla.org/perf.html#/alerts?id=13405 On the page above you can see an alert for each affected platform as well as a link to a graph showing the history of scores for this test. There is also a link to a treeherder page showing the Talos jobs in a pushlog format. To learn more about the regressing test(s), please see: https://wiki.mozilla.org/Buildbot/Talos/Tests For information on reproducing and debugging the regression, either on try or locally, see: https://wiki.mozilla.org/Buildbot/Talos/Running *** Please let us know your plans within 3 business days, or the offending patch(es) will be backed out! *** Our wiki page outlines the common responses and expectations: https://wiki.mozilla.org/Buildbot/Talos/RegressionBugsHandling

Ionuț Goldan [:igoldan]

Reporter

Updated

•

7 years ago

Component: General → Session Restore

Product: Testing → Firefox

Ionuț Goldan [:igoldan]

Reporter

Updated

•

7 years ago

Flags: needinfo?(jkt)

Jonathan Kingston [:jkt] he/him

Assignee

Comment 1

•

7 years ago

Hey so I haven't run this locally yet as I'm doing another build. The code itself changed only a few lines: 1. entry.principalToInherit_base64 will potentially now be null instead of undefined 2. entry.triggeringPrincipal_base64 will potentially now be null instead of undefined 3. We check for no triggering principal and generate a debug log and null principal 1. and 2. I can't see there is a perf hit here that is significant, especially was we removed some other statements in exchange for this. 3. This makes more sense as the test doesn't have principals in the file: testing/talos/talos/startup_test/sessionrestore/profile-manywindows/sessionstore.js This should actually be an edge case so I personally think it's reasonable we increased startup time here (I suspect creating more null principals is expensive). Are we sure that the previous code was even loading these tabs at all given the triggering principal check that was added in which prevented them from loading without a principal. I personally think we should be serializing a principal into this test and reset the counters to get a more realistic profile. > Please let us know your plans within 3 business days, or the offending patch(es) will be backed out! I'm out of the office until Tuesday and I am not sure if I will be able to work on it before then. Thanks Jonathan

Flags: needinfo?(jkt)

Ionuț Goldan [:igoldan]

Reporter

Comment 2

•

7 years ago

(In reply to Jonathan Kingston [:jkt] from comment #1) > Are we sure that the previous code was even loading these tabs > at all given the triggering principal check that was added in which > prevented them from loading without a principal. If this is a question related to the test, I'm not able to answer that. :mdeboer, can you give more details for Jonathan?

Flags: needinfo?(mdeboer)

Jonathan Kingston [:jkt] he/him

Assignee

Comment 3

•

7 years ago

The test itself certainly doesn't have triggering principals, another patch I am working on made every tab crash without this change (as I'm asserting it's always set on tab creation also). We should treat a missing principal as an edge case, I'm fine for it to be in the perf testing however this doesn't seem to be in the spirit of what we are checking here. I'm still rebuilding a clean build that can run talos, however I'm 99% certain those tabs will be blank. This accounts for why the perf increase over the creation of null principals. This means we haven't been checking session restore perf at all.

Joel Maher ( :jmaher ) (UTC -8)

Comment 4

•

7 years ago

gecko profiles: before: https://perf-html.io/from-url/https%3A%2F%2Fqueue.taskcluster.net%2Fv1%2Ftask%2FJuGe7jK9QbuULn3fTxk7LA%2Fruns%2F0%2Fartifacts%2Fpublic%2Ftest_info%2Fprofile_sessionrestore_many_windows.zip after: https://perf-html.io/from-url/https%3A%2F%2Fqueue.taskcluster.net%2Fv1%2Ftask%2FUGvHH1wnSZuRdyinDkDK6w%2Fruns%2F0%2Fartifacts%2Fpublic%2Ftest_info%2Fprofile_sessionrestore_many_windows.zip

Mike de Boer [:mikedeboer]

Comment 5

•

7 years ago

Good point to update the profile that this test uses - it basically needs a fresh sessionstore.jsonlz4. Now we _do_ need to think about what has happened when users restored their session after the Firefox update that required the principal data to be present. I mean, following the same logic this would mean that all these users would all have ended up with blank tabs, unresponsive tabs. Also, wouldn't it make more sense that when the user hits the refresh button or Enter in the address bar, that the document load would be initiated with the current principal, including updating the navigation history entries in that case? Because right now the 'blank tabs' repeat the same thing with invalid principal data over and over again...

Flags: needinfo?(mdeboer)

Boris Zbarsky [:bzbarsky]

Comment 6

•

7 years ago

> Now we _do_ need to think about what has happened when users restored their session after the Firefox update > that required the principal data to be present. This was multiple releases after we started writing out the principal data to the session store file, if I recall events correctly. > that the document load would be initiated with the current principal What current principal? > I suspect creating more null principals is expensive Creating nullprincipals is somewhat expensive (have to create a UUID for each one). As you note, actually loading stuff is even more expensive. ;)

Mike de Boer [:mikedeboer]

Comment 7

•

7 years ago

(In reply to Boris Zbarsky [:bz] (no decent commit message means r-) from comment #6) > This was multiple releases after we started writing out the principal data > to the session store file, if I recall events correctly. Ah true, I remember this too now. Thanks :) > What current principal? Déjà vu! We discussed this already way back when with :ckerschb, I think. Nothing to do here.

Rares Doghi, Desktop QA

Updated

•

7 years ago

status-firefox60: --- → unaffected

status-firefox61: --- → unaffected

status-firefox62: --- → affected

Jonathan Kingston [:jkt] he/him

Assignee

Comment 8

•

7 years ago

> Good point to update the profile that this test uses - it basically needs a fresh sessionstore.jsonlz4. I'm not sure how these files got generated. Even loading the JS files doesn't seem to generate a lz4 file with the principals. Any tips?

Flags: needinfo?(mdeboer)

Jonathan Kingston [:jkt] he/him

Assignee

Comment 9

•

7 years ago

Actually I managed to get the lz4, any advice on how to get the json back out? (I assume we want this checked in also)

Mike de Boer [:mikedeboer]

Comment 10

•

6 years ago

Attached file mozlz4 — Details

Cool that you could make it work! I was about to suggest to select and open each restored tab, so that they are fully loaded to get proper sessionstore data flushed to disk. You can un-lz4 the file using the `mozlz4` script I posted above - which I got off a StackOverflow answer: `mozlz4 -d < sessionstore.jsonlz4 > sessionstore.js`

Flags: needinfo?(mdeboer)

Liz Henry (:lizzard) (relman/hg->git project)

Comment 11

•

6 years ago

Tracking for 62 to make sure we follow up here.

tracking-firefox62: --- → +

Ionuț Goldan [:igoldan]

Reporter

Comment 12

•

6 years ago

:jkt What's the status here? Have you opened separate bugs for this? If so, please link them to this one.

Flags: needinfo?(jkt)

Comment hidden (mozreview-request)

Jonathan Kingston [:jkt] he/him

Assignee

Comment 14

•

6 years ago

There is a test fail on try but it seems unrelated: https://treeherder.mozilla.org/#/jobs?repo=try&revision=9880bd5aab92f829a8f59dcef30049950a42b7a1

Flags: needinfo?(jkt)

Ionuț Goldan [:igoldan]

Reporter

Comment 15

•

6 years ago

(In reply to Jonathan Kingston [:jkt] from comment #14) > There is a test fail on try but it seems unrelated: > https://treeherder.mozilla.org/#/ > jobs?repo=try&revision=9880bd5aab92f829a8f59dcef30049950a42b7a1 If you're referring to the T-e10s(x) failures from Windows 10, those are expected. Currently, their not functional (see bug 1451682 and bug 1439588 for reasons).

Jonathan Kingston [:jkt] he/him

Assignee

Comment 16

•

6 years ago

Hey :mythmon, Did you manage to look at the patch at all? I think it would be great to get this land in this release. Thanks Jonathan

Flags: needinfo?(mcooper)

Michael Cooper [:mythmon]

Comment 17

•

6 years ago

I don't think I'm the right person to be reviewing this patch. I think you meant mdeboer. I've ni?ed him instead.

Flags: needinfo?(mcooper) → needinfo?(mdeboer)

Liz Henry (:lizzard) (relman/hg->git project)

Comment 18

•

6 years ago

If this doesn't land before next Monday, please request uplift to beta 62 next week after the merge.

Assignee: nobody → jkt

Flags: needinfo?(jkt)

Mike de Boer [:mikedeboer]

Comment 19

•

6 years ago

mozreview-review

Comment on attachment 8984389 [details] Bug 1464058 - Rebuild profile for session restore talos test to add principals to the storage. https://reviewboard.mozilla.org/r/250214/#review257600 rs=me. Thanks Jonathan and my apologies for reviewing this after such a long delay...

Attachment #8984389 - Flags: review?(mdeboer) → review+

Mike de Boer [:mikedeboer]

Comment 20

•

6 years ago

Jonathan, did you see the perma-orange of the xperf talos suite in your latest try run? Did you have time to check that out?

Flags: needinfo?(mdeboer)

Ryan VanderMeulen [:RyanVM]

Comment 21

•

6 years ago

Per comment 15, those are safe to ignore.

Pulsebot

Comment 22

•

6 years ago

Pushed by mdeboer@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/fe3e2484074b Rebuild profile for session restore talos test to add principals to the storage. r=mikedeboer

Cristian Brindusan [:cbrindusan]

Comment 23

•

6 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/fe3e2484074b

Status: NEW → RESOLVED

Closed: 6 years ago

status-firefox62: affected → fixed

Resolution: --- → FIXED

Target Milestone: --- → Firefox 62

Jonathan Kingston [:jkt] he/him

Assignee

Comment 24

•

6 years ago

Thanks all!

Flags: needinfo?(jkt)

Ryan VanderMeulen [:RyanVM]

Comment 25

•

6 years ago

Ionut, can you please verify that the regression is fixed?

status-firefox-esr52: --- → unaffected

status-firefox-esr60: --- → unaffected

Flags: needinfo?(igoldan)

Ionuț Goldan [:igoldan]

Reporter

Comment 26

•

6 years ago

(In reply to Ryan VanderMeulen [:RyanVM] from comment #25) > Ionut, can you please verify that the regression is fixed? I'm afraid very little has changed, since the push from comment 23. There are some *very* small improvements, on some of the builds. Still, even for these, I cannot confirm they're real. I'm reopening the bug.

Flags: needinfo?(igoldan)

Ionuț Goldan [:igoldan]

Reporter

Updated

•

6 years ago

Status: RESOLVED → REOPENED

Resolution: FIXED → ---

Liz Henry (:lizzard) (relman/hg->git project)

Updated

•

6 years ago

status-firefox62: fixed → affected

Jonathan Kingston [:jkt] he/him

Assignee

Comment 27

•

6 years ago

Hey :mikedeboer, I'm really struggling to work out what this test is supposed to be doing. It currently doesn't use the mitm proxy so appears to fail to load the tab with "unable to connect" error pages, this likely is less deterministic than we might like. The urls appear to be like "http://localhost:8080/?httpsurl=www.facebook.com/" Is this something the proxy should be handling? When using the proxy I get the following errors: 400 Bad Request HttpException('Invalid HTTP request form (expected: authority or absolute, got: relative)',) It looks like we need to specify recordings / web pages that are in the proxy? Thanks

Flags: needinfo?(mdeboer)

Liz Henry (:lizzard) (relman/hg->git project)

Comment 28

•

6 years ago

Emailing Mike with the question from comment 27.

Mike de Boer [:mikedeboer]

Comment 29

•

6 years ago

(In reply to Jonathan Kingston [:jkt] from comment #27) > Hey :mikedeboer, > > I'm really struggling to work out what this test is supposed to be doing. It > currently doesn't use the mitm proxy so appears to fail to load the tab with > "unable to connect" error pages, this likely is less deterministic than we > might like. > > The urls appear to be like > "http://localhost:8080/?httpsurl=www.facebook.com/" Is this something the > proxy should be handling? > > When using the proxy I get the following errors: > 400 Bad Request > > HttpException('Invalid HTTP request form (expected: authority or absolute, > got: relative)',) > > It looks like we need to specify recordings / web pages that are in the > proxy? > > Thanks Sorry for the delay, Jonathan, I've been a bit caught up in, well, many things! :) You're right. This test is essentially useless if we don't have proper proxy responses. However, relying on proxy responses for consistent perf measurement is problematic as well, but it's the best we have. Thus, what needs to be done is to fix up the test to get proper responses and '200 OK' page loads. Until then it's safe to ignore this regression, because it doesn't signify trouble for product, but merely trouble for its own trustworthyness. I'm not saying that this is part of your responsibilities at all, that'd be odd. Instead, it's mine as the owner of this test/ module.

Flags: needinfo?(mdeboer)

Joel Maher ( :jmaher ) (UTC -8)

Comment 30

•

6 years ago

we should consider: 1) fixing the test asap (in the next month) 2) reducing the test to tier-2 (m-c only, no sheriffing) 3) turning off the test 4) other options? not sure what would be best, :mikedeboer, do you have a preference?

Flags: needinfo?(mdeboer)

Mike de Boer [:mikedeboer]

Comment 31

•

6 years ago

Let's try 1. I'll reach out to you when I've got time to work on it, to learn how to best fix the test.

Flags: needinfo?(mdeboer)

Joel Maher ( :jmaher ) (UTC -8)

Updated

•

6 years ago

Depends on: 1471918

Julien Cristau [:jcristau]

Updated

•

6 years ago

status-firefox63: --- → affected

Liz Henry (:lizzard) (relman/hg->git project)

Comment 32

•

6 years ago

Mike, have you had a chance to look into fixing this test? Thanks!

Flags: needinfo?(mdeboer)

Mike de Boer [:mikedeboer]

Comment 33

•

6 years ago

Liz, thanks for the ping. I haven't had a chance to look at this yet, which is caused by other things on my schedule taking priority over fixing this. So I do want to fix this, but I can't give a guaranteed timeline for it, so we might want to look into disabling it in the interim.

Flags: needinfo?(mdeboer)

Liz Henry (:lizzard) (relman/hg->git project)

Comment 34

•

6 years ago

Joel, can you disable the test on beta for now, as discussed in comment 30 and 33? And maybe we can fix it in 63 (I'm leaving the bug tracked for 63.) Thanks!

tracking-firefox63: --- → +

Flags: needinfo?(jmaher)

Joel Maher ( :jmaher ) (UTC -8)

Comment 35

•

6 years ago

:igoldan- can you pick this up sometime in the near future: 1) disable sessionrestore_many_windows on mozilla-beta (tweak a config and land it on mozilla-beta) 2) make sessionrestore_many_windows run on mozilla-central/try only for now; I assume this means breaking it out of the existing suite and creating a new job (yay taskcluster) that only run on mc/try.

Flags: needinfo?(jmaher) → needinfo?(igoldan)

Ionuț Goldan [:igoldan]

Reporter

Updated

•

6 years ago

Depends on: 1478295

Ionuț Goldan [:igoldan]

Reporter

Updated

•

6 years ago

Depends on: 1478297

Ionuț Goldan [:igoldan]

Reporter

Comment 36

•

6 years ago

(In reply to Joel Maher ( :jmaher ) (UTC+2) (PTO: back Aug 2) from comment #35) > :igoldan- can you pick this up sometime in the near future: > 1) disable sessionrestore_many_windows on mozilla-beta (tweak a config and > land it on mozilla-beta) > 2) make sessionrestore_many_windows run on mozilla-central/try only for now; > I assume this means breaking it out of the existing suite and creating a new > job (yay taskcluster) that only run on mc/try. Yes, I've prepared bug 1478295 and bug 1478297 for that.

Flags: needinfo?(igoldan)

Mike Taylor [:miketaylr]

Comment 37

•

6 years ago

Ionut, can we close this now?

Flags: needinfo?(igoldan)

Ionuț Goldan [:igoldan]

Reporter

Comment 38

•

6 years ago

(In reply to Mike Taylor [:miketaylr] (62 Regression Engineering Owner) from comment #37) > Ionut, can we close this now? Yes, we can close it.

Flags: needinfo?(igoldan)

Ionuț Goldan [:igoldan]

Reporter

Updated

•

6 years ago

Status: REOPENED → RESOLVED

Closed: 6 years ago → 6 years ago

Resolution: --- → WONTFIX

Ryan VanderMeulen [:RyanVM]

Updated

•

6 years ago

status-firefox62: affected → wontfix

status-firefox63: affected → wontfix

mozlz4 6 years ago Mike de Boer [:mikedeboer] 552 bytes, text/x-python		Details
Bug 1464058 - Rebuild profile for session restore talos test to add principals to the storage. 6 years ago Jonathan Kingston [:jkt] he/him 59 bytes, text/x-review-board-request	mikedeboer : review+	Details