Closed
Bug 1464058
Opened 7 years ago
Closed 6 years ago
51.6 - 159.95% sessionrestore_many_windows (windows10-64, windows10-64-qr) regression on push 4269e02632264e158ac9d7cfc12cb33ab0dc8bf5 (Thu May 24 2018)
Categories
(Firefox :: Session Restore, defect)
Tracking
()
RESOLVED
WONTFIX
Firefox 62
Tracking | Status | |
---|---|---|
firefox-esr52 | --- | unaffected |
firefox-esr60 | --- | unaffected |
firefox60 | --- | unaffected |
firefox61 | --- | unaffected |
firefox62 | + | wontfix |
firefox63 | + | wontfix |
People
(Reporter: igoldan, Assigned: jkt)
References
(Depends on 1 open bug)
Details
(Keywords: perf, regression, talos-regression)
Attachments
(2 files)
Talos has detected a Firefox performance regression from push:
https://hg.mozilla.org/integration/autoland/pushloghtml?changeset=4269e02632264e158ac9d7cfc12cb33ab0dc8bf5
As author of one of the patches included in that push, we need your help to address this regression.
Regressions:
160% sessionrestore_many_windows windows10-64 pgo e10s stylo 834.42 -> 2,169.08
52% sessionrestore_many_windows windows10-64-qr opt e10s stylo 2,609.17 -> 3,955.42
You can find links to graphs and comparison views for each of the above tests at: https://treeherder.mozilla.org/perf.html#/alerts?id=13405
On the page above you can see an alert for each affected platform as well as a link to a graph showing the history of scores for this test. There is also a link to a treeherder page showing the Talos jobs in a pushlog format.
To learn more about the regressing test(s), please see: https://wiki.mozilla.org/Buildbot/Talos/Tests
For information on reproducing and debugging the regression, either on try or locally, see: https://wiki.mozilla.org/Buildbot/Talos/Running
*** Please let us know your plans within 3 business days, or the offending patch(es) will be backed out! ***
Our wiki page outlines the common responses and expectations: https://wiki.mozilla.org/Buildbot/Talos/RegressionBugsHandling
Reporter | ||
Updated•7 years ago
|
Component: General → Session Restore
Product: Testing → Firefox
Reporter | ||
Updated•7 years ago
|
Flags: needinfo?(jkt)
Assignee | ||
Comment 1•7 years ago
|
||
Hey so I haven't run this locally yet as I'm doing another build.
The code itself changed only a few lines:
1. entry.principalToInherit_base64 will potentially now be null instead of undefined
2. entry.triggeringPrincipal_base64 will potentially now be null instead of undefined
3. We check for no triggering principal and generate a debug log and null principal
1. and 2. I can't see there is a perf hit here that is significant, especially was we removed some other statements in exchange for this.
3. This makes more sense as the test doesn't have principals in the file: testing/talos/talos/startup_test/sessionrestore/profile-manywindows/sessionstore.js
This should actually be an edge case so I personally think it's reasonable we increased startup time here (I suspect creating more null principals is expensive). Are we sure that the previous code was even loading these tabs at all given the triggering principal check that was added in which prevented them from loading without a principal.
I personally think we should be serializing a principal into this test and reset the counters to get a more realistic profile.
> Please let us know your plans within 3 business days, or the offending patch(es) will be backed out!
I'm out of the office until Tuesday and I am not sure if I will be able to work on it before then.
Thanks
Jonathan
Flags: needinfo?(jkt)
Reporter | ||
Comment 2•7 years ago
|
||
(In reply to Jonathan Kingston [:jkt] from comment #1)
> Are we sure that the previous code was even loading these tabs
> at all given the triggering principal check that was added in which
> prevented them from loading without a principal.
If this is a question related to the test, I'm not able to answer that. :mdeboer, can you give more details for Jonathan?
Flags: needinfo?(mdeboer)
Assignee | ||
Comment 3•7 years ago
|
||
The test itself certainly doesn't have triggering principals, another patch I am working on made every tab crash without this change (as I'm asserting it's always set on tab creation also).
We should treat a missing principal as an edge case, I'm fine for it to be in the perf testing however this doesn't seem to be in the spirit of what we are checking here.
I'm still rebuilding a clean build that can run talos, however I'm 99% certain those tabs will be blank. This accounts for why the perf increase over the creation of null principals. This means we haven't been checking session restore perf at all.
Comment 4•7 years ago
|
||
gecko profiles:
before:
https://perf-html.io/from-url/https%3A%2F%2Fqueue.taskcluster.net%2Fv1%2Ftask%2FJuGe7jK9QbuULn3fTxk7LA%2Fruns%2F0%2Fartifacts%2Fpublic%2Ftest_info%2Fprofile_sessionrestore_many_windows.zip
after:
https://perf-html.io/from-url/https%3A%2F%2Fqueue.taskcluster.net%2Fv1%2Ftask%2FUGvHH1wnSZuRdyinDkDK6w%2Fruns%2F0%2Fartifacts%2Fpublic%2Ftest_info%2Fprofile_sessionrestore_many_windows.zip
Comment 5•7 years ago
|
||
Good point to update the profile that this test uses - it basically needs a fresh sessionstore.jsonlz4.
Now we _do_ need to think about what has happened when users restored their session after the Firefox update that required the principal data to be present. I mean, following the same logic this would mean that all these users would all have ended up with blank tabs, unresponsive tabs.
Also, wouldn't it make more sense that when the user hits the refresh button or Enter in the address bar, that the document load would be initiated with the current principal, including updating the navigation history entries in that case? Because right now the 'blank tabs' repeat the same thing with invalid principal data over and over again...
Flags: needinfo?(mdeboer)
Comment 6•7 years ago
|
||
> Now we _do_ need to think about what has happened when users restored their session after the Firefox update
> that required the principal data to be present.
This was multiple releases after we started writing out the principal data to the session store file, if I recall events correctly.
> that the document load would be initiated with the current principal
What current principal?
> I suspect creating more null principals is expensive
Creating nullprincipals is somewhat expensive (have to create a UUID for each one). As you note, actually loading stuff is even more expensive. ;)
Comment 7•7 years ago
|
||
(In reply to Boris Zbarsky [:bz] (no decent commit message means r-) from comment #6)
> This was multiple releases after we started writing out the principal data
> to the session store file, if I recall events correctly.
Ah true, I remember this too now. Thanks :)
> What current principal?
Déjà vu! We discussed this already way back when with :ckerschb, I think. Nothing to do here.
Updated•7 years ago
|
status-firefox60:
--- → unaffected
status-firefox61:
--- → unaffected
status-firefox62:
--- → affected
Assignee | ||
Comment 8•7 years ago
|
||
> Good point to update the profile that this test uses - it basically needs a fresh sessionstore.jsonlz4.
I'm not sure how these files got generated. Even loading the JS files doesn't seem to generate a lz4 file with the principals. Any tips?
Flags: needinfo?(mdeboer)
Assignee | ||
Comment 9•7 years ago
|
||
Actually I managed to get the lz4, any advice on how to get the json back out? (I assume we want this checked in also)
Comment 10•6 years ago
|
||
Cool that you could make it work! I was about to suggest to select and open each restored tab, so that they are fully loaded to get proper sessionstore data flushed to disk.
You can un-lz4 the file using the `mozlz4` script I posted above - which I got off a StackOverflow answer:
`mozlz4 -d < sessionstore.jsonlz4 > sessionstore.js`
Flags: needinfo?(mdeboer)
Comment 11•6 years ago
|
||
Tracking for 62 to make sure we follow up here.
tracking-firefox62:
--- → +
Reporter | ||
Comment 12•6 years ago
|
||
:jkt What's the status here? Have you opened separate bugs for this? If so, please link them to this one.
Flags: needinfo?(jkt)
Comment hidden (mozreview-request) |
Assignee | ||
Comment 14•6 years ago
|
||
There is a test fail on try but it seems unrelated: https://treeherder.mozilla.org/#/jobs?repo=try&revision=9880bd5aab92f829a8f59dcef30049950a42b7a1
Flags: needinfo?(jkt)
Reporter | ||
Comment 15•6 years ago
|
||
(In reply to Jonathan Kingston [:jkt] from comment #14)
> There is a test fail on try but it seems unrelated:
> https://treeherder.mozilla.org/#/
> jobs?repo=try&revision=9880bd5aab92f829a8f59dcef30049950a42b7a1
If you're referring to the T-e10s(x) failures from Windows 10, those are expected. Currently, their not functional (see bug 1451682 and bug 1439588 for reasons).
Assignee | ||
Comment 16•6 years ago
|
||
Hey :mythmon,
Did you manage to look at the patch at all? I think it would be great to get this land in this release.
Thanks
Jonathan
Flags: needinfo?(mcooper)
Comment 17•6 years ago
|
||
I don't think I'm the right person to be reviewing this patch. I think you meant mdeboer. I've ni?ed him instead.
Flags: needinfo?(mcooper) → needinfo?(mdeboer)
Comment 18•6 years ago
|
||
If this doesn't land before next Monday, please request uplift to beta 62 next week after the merge.
Assignee: nobody → jkt
Flags: needinfo?(jkt)
Comment 19•6 years ago
|
||
mozreview-review |
Comment on attachment 8984389 [details]
Bug 1464058 - Rebuild profile for session restore talos test to add principals to the storage.
https://reviewboard.mozilla.org/r/250214/#review257600
rs=me. Thanks Jonathan and my apologies for reviewing this after such a long delay...
Attachment #8984389 -
Flags: review?(mdeboer) → review+
Comment 20•6 years ago
|
||
Jonathan, did you see the perma-orange of the xperf talos suite in your latest try run? Did you have time to check that out?
Flags: needinfo?(mdeboer)
Comment 21•6 years ago
|
||
Per comment 15, those are safe to ignore.
Comment 22•6 years ago
|
||
Pushed by mdeboer@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/fe3e2484074b
Rebuild profile for session restore talos test to add principals to the storage. r=mikedeboer
Comment 23•6 years ago
|
||
bugherder |
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 62
Comment 25•6 years ago
|
||
Ionut, can you please verify that the regression is fixed?
status-firefox-esr52:
--- → unaffected
status-firefox-esr60:
--- → unaffected
Flags: needinfo?(igoldan)
Reporter | ||
Comment 26•6 years ago
|
||
(In reply to Ryan VanderMeulen [:RyanVM] from comment #25)
> Ionut, can you please verify that the regression is fixed?
I'm afraid very little has changed, since the push from comment 23. There are some *very* small improvements, on some of the builds. Still, even for these, I cannot confirm they're real. I'm reopening the bug.
Flags: needinfo?(igoldan)
Reporter | ||
Updated•6 years ago
|
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Updated•6 years ago
|
Assignee | ||
Comment 27•6 years ago
|
||
Hey :mikedeboer,
I'm really struggling to work out what this test is supposed to be doing. It currently doesn't use the mitm proxy so appears to fail to load the tab with "unable to connect" error pages, this likely is less deterministic than we might like.
The urls appear to be like "http://localhost:8080/?httpsurl=www.facebook.com/" Is this something the proxy should be handling?
When using the proxy I get the following errors:
400 Bad Request
HttpException('Invalid HTTP request form (expected: authority or absolute, got: relative)',)
It looks like we need to specify recordings / web pages that are in the proxy?
Thanks
Flags: needinfo?(mdeboer)
Comment 28•6 years ago
|
||
Emailing Mike with the question from comment 27.
Comment 29•6 years ago
|
||
(In reply to Jonathan Kingston [:jkt] from comment #27)
> Hey :mikedeboer,
>
> I'm really struggling to work out what this test is supposed to be doing. It
> currently doesn't use the mitm proxy so appears to fail to load the tab with
> "unable to connect" error pages, this likely is less deterministic than we
> might like.
>
> The urls appear to be like
> "http://localhost:8080/?httpsurl=www.facebook.com/" Is this something the
> proxy should be handling?
>
> When using the proxy I get the following errors:
> 400 Bad Request
>
> HttpException('Invalid HTTP request form (expected: authority or absolute,
> got: relative)',)
>
> It looks like we need to specify recordings / web pages that are in the
> proxy?
>
> Thanks
Sorry for the delay, Jonathan, I've been a bit caught up in, well, many things! :)
You're right. This test is essentially useless if we don't have proper proxy responses. However, relying on proxy responses for consistent perf measurement is problematic as well, but it's the best we have.
Thus, what needs to be done is to fix up the test to get proper responses and '200 OK' page loads. Until then it's safe to ignore this regression, because it doesn't signify trouble for product, but merely trouble for its own trustworthyness.
I'm not saying that this is part of your responsibilities at all, that'd be odd. Instead, it's mine as the owner of this test/ module.
Flags: needinfo?(mdeboer)
Comment 30•6 years ago
|
||
we should consider:
1) fixing the test asap (in the next month)
2) reducing the test to tier-2 (m-c only, no sheriffing)
3) turning off the test
4) other options?
not sure what would be best, :mikedeboer, do you have a preference?
Flags: needinfo?(mdeboer)
Comment 31•6 years ago
|
||
Let's try 1. I'll reach out to you when I've got time to work on it, to learn how to best fix the test.
Flags: needinfo?(mdeboer)
Updated•6 years ago
|
status-firefox63:
--- → affected
Comment 32•6 years ago
|
||
Mike, have you had a chance to look into fixing this test? Thanks!
Flags: needinfo?(mdeboer)
Comment 33•6 years ago
|
||
Liz, thanks for the ping. I haven't had a chance to look at this yet, which is caused by other things on my schedule taking priority over fixing this.
So I do want to fix this, but I can't give a guaranteed timeline for it, so we might want to look into disabling it in the interim.
Flags: needinfo?(mdeboer)
Comment 34•6 years ago
|
||
Joel, can you disable the test on beta for now, as discussed in comment 30 and 33? And maybe we can fix it in 63 (I'm leaving the bug tracked for 63.) Thanks!
tracking-firefox63:
--- → +
Flags: needinfo?(jmaher)
Comment 35•6 years ago
|
||
:igoldan- can you pick this up sometime in the near future:
1) disable sessionrestore_many_windows on mozilla-beta (tweak a config and land it on mozilla-beta)
2) make sessionrestore_many_windows run on mozilla-central/try only for now; I assume this means breaking it out of the existing suite and creating a new job (yay taskcluster) that only run on mc/try.
Flags: needinfo?(jmaher) → needinfo?(igoldan)
Reporter | ||
Comment 36•6 years ago
|
||
(In reply to Joel Maher ( :jmaher ) (UTC+2) (PTO: back Aug 2) from comment #35)
> :igoldan- can you pick this up sometime in the near future:
> 1) disable sessionrestore_many_windows on mozilla-beta (tweak a config and
> land it on mozilla-beta)
> 2) make sessionrestore_many_windows run on mozilla-central/try only for now;
> I assume this means breaking it out of the existing suite and creating a new
> job (yay taskcluster) that only run on mc/try.
Yes, I've prepared bug 1478295 and bug 1478297 for that.
Flags: needinfo?(igoldan)
Reporter | ||
Comment 38•6 years ago
|
||
(In reply to Mike Taylor [:miketaylr] (62 Regression Engineering Owner) from comment #37)
> Ionut, can we close this now?
Yes, we can close it.
Flags: needinfo?(igoldan)
Reporter | ||
Updated•6 years ago
|
Status: REOPENED → RESOLVED
Closed: 6 years ago → 6 years ago
Resolution: --- → WONTFIX
Updated•6 years ago
|
You need to log in
before you can comment on or make changes to this bug.
Description
•