6.05 - 10.6% raptor-tp6-tumblr-firefox loadtime regression on push d52a0108ed8ef2593db6fe0dafedbc2790941fd0 (Mon September 30 2019)
Categories
(Core :: Networking: HTTP, defect, P1)
Tracking
()
Tracking | Status | |
---|---|---|
firefox-esr60 | --- | unaffected |
firefox-esr68 | --- | unaffected |
firefox67 | --- | unaffected |
firefox68 | --- | unaffected |
firefox69 | --- | unaffected |
firefox70 | --- | unaffected |
firefox71 | --- | wontfix |
firefox72 | --- | wontfix |
firefox73 | --- | fixed |
People
(Reporter: marauder, Assigned: jkt)
References
(Regression)
Details
(Keywords: perf, perf-alert, regression, Whiteboard: [necko-triaged])
Attachments
(5 files)
Raptor has detected a Firefox performance regression from push:
(linux64-shippable, linux64-shippable-qr, macosx1014-64-shippable, windows10-64-shippable, windows10-64-shippable-qr, windows7-32-shippable)
As author of one of the patches included in that push, we need your help to address this regression.
Regressions:
11% raptor-tp6-tumblr-firefox loadtime macosx1014-64-shippable opt 942.85 -> 1,042.75
9% raptor-tp6-tumblr-firefox loadtime windows7-32-shippable opt 599.27 -> 654.92
8% raptor-tp6-tumblr-firefox loadtime windows10-64-shippable opt 614.17 -> 665.42
8% raptor-tp6-tumblr-firefox loadtime windows10-64-shippable-qr opt 611.15 -> 661.29
7% raptor-tp6-tumblr-firefox loadtime linux64-shippable-qr opt 632.46 -> 673.58
6% raptor-tp6-tumblr-firefox loadtime linux64-shippable opt 613.27 -> 650.38
Improvements:
3% raptor-tp6-google-mail-firefox fcp linux64-shippable-qr opt 241.98 -> 235.75
2% raptor-tp6-google-mail-firefox linux64-shippable-qr opt 268.75 -> 262.12
You can find links to graphs and comparison views for each of the above tests at: https://treeherder.mozilla.org/perf.html#/alerts?id=23284
On the page above you can see an alert for each affected platform as well as a link to a graph showing the history of scores for this test. There is also a link to a Treeherder page showing the Raptor jobs in a pushlog format.
To learn more about the regressing test(s) or reproducing them, please see: https://wiki.mozilla.org/TestEngineering/Performance/Raptor
*** Please let us know your plans within 3 business days, or the offending patch(es) will be backed out! ***
Our wiki page outlines the common responses and expectations: https://wiki.mozilla.org/TestEngineering/Performance/Talos/RegressionBugsHandling
Reporter | ||
Updated•5 years ago
|
Assignee | ||
Comment 1•5 years ago
|
||
:marauder how confident are we that my patch caused this, the graph doesn't have many data points on it.
I can't see that tumblr loads a manifest at all which granted would probably cause this regression.
The zip file appears to contain everything that it loads, I'm not sure if I am reading it correctly but again I can't see a appcache manifest file loading here.
Comment 2•5 years ago
|
||
:jkt as you can see from the graph your commit is the one that introduced the regression.
Also it's adding some bimodal results that are affecting our graphs
Updated•5 years ago
|
Updated•5 years ago
|
Assignee | ||
Comment 3•5 years ago
|
||
To me this test is broken, I compared a reversion of the patch using osx to the current central and the reversion was slower:
I have a patch that changes all the code to using static prefs and also prevents the browser even spinning up the parser and again locally it was slower, the test to me seems flakey in it's results and perhaps that has been another patch in central that has caused this around the time of my patch.
I'll update this bug when my series of patches have passed through try which I am confident that it would fix any regressions if it were possible given that I turned everything static. I think we should probably think about putting these changes in regardless as they vastly simplify the cache service... However I'm not confident we will see a perf improvement in this particular test and I still think we should be looking for another cause of it.
Assignee | ||
Comment 4•5 years ago
|
||
Assignee | ||
Comment 5•5 years ago
|
||
Depends on D47958
Assignee | ||
Comment 6•5 years ago
|
||
Depends on D47959
Assignee | ||
Comment 7•5 years ago
|
||
Depends on D47960
Comment 8•5 years ago
|
||
To shed some light on the performance sherriffing process we base our performance data on autoland and inbound.
Also when we confirm an performance alert improvement or regression we re-trigger and run the same test tests to the commits before and after the suspected alert to make sure we are detecting the correct commit.
From your comment I see you are comparing try with mozilla central, which is a moving target.
I would suggest to test your changes on try. first push a commit with latest autoland and the secon push sould contain the same "latest" autoland and your changes. This way you can compare them and see if there are any performance improvments/regressions in your changes.
https://wiki.mozilla.org/TestEngineering/Performance/Raptor#Running_Raptor_on_Try
The raptor-tp6-tumblr-firefox
test contains recordings of the tumbler website that are replayed with a proxy service with no outbound connections to make we are in a "air tight" environment.
https://wiki.mozilla.org/TestEngineering/Performance/Raptor#Warm_Page-Load
As your initial bug makes changes to the appcache
I suspect tumblr uses this technology in their implementation and that's why our test identified a regression.
Also If yo take a look your changes introduces a "bimodality" to the tests results. Test that historically was quite stable
If you have any issues running our tests on try or questions about performance you can find us on the #perftest channel on Slack
Assignee | ||
Comment 9•5 years ago
|
||
Hey :)
(In reply to Florin Strugariu [:Bebe] (needinfo me) from comment #8)
To shed some light on the performance sherriffing process we base our performance data on autoland and inbound.
Also when we confirm an performance alert improvement or regression we re-trigger and run the same test tests to the commits before and after the suspected alert to make sure we are detecting the correct commit.From your comment I see you are comparing try with mozilla central, which is a moving target.
I would suggest to test your changes on try. first push a commit with latest autoland and the secon push sould contain the same "latest" autoland and your changes. This way you can compare them and see if there are any performance improvments/regressions in your changes.
I will test them now on autoland, I think they are worthwhile anyway in reducing complexity but we can always move them into another bug.
https://wiki.mozilla.org/TestEngineering/Performance/Raptor#Running_Raptor_on_Try
Sorry my previous comment wasn't clear as I had both try and central on the graph I linked to (https://treeherder.mozilla.org/perf.html?selectMulti=raptor-tp6-tumblr-firefox+opt#/graphs?timerange=1209600&series=try,2014435,1,10&series=mozilla-central,1872186,1,10&highlightedRevisions=16891d6bd2d99&highlightedRevisions=a1c23ce146b21&highlightAlerts=0&zoom=1569630724365,1570030096005,403.84859259259247,896.3651851851853).
The highlighted commits were both from try, the latest central at the time code pushed to try vs the latest code on central with the patch I made reverted. The reverted patch was slower.
The
raptor-tp6-tumblr-firefox
test contains recordings of the tumbler website that are replayed with a proxy service with no outbound connections to make we are in a "air tight" environment.
https://wiki.mozilla.org/TestEngineering/Performance/Raptor#Warm_Page-LoadAs your initial bug makes changes to the
appcache
I suspect tumblr uses this technology in their implementation and that's why our test identified a regression.
I downloaded the zip that the proxy service uses and from what I can tell nothing was using appcache.
Also If yo take a look your changes introduces a "bimodality" to the tests results. Test that historically was quite stable
If you have any issues running our tests on try or questions about performance you can find us on the #perftest channel on Slack
Some questions:
- If the problem is indeed that tumblr or a subresource used appcache can we update the test as it doesn't look like they still do.
- If they do on live (I don't have an account to verify their account pages right now), can we accept the performance hit.
- Is there anything else that might introduce this bimodality?
I ran the tests again a few times:
Inbound (https://treeherder.mozilla.org/#/jobs?repo=try&selectedJob=269781247&revision=4c706780fe5e7ddf1bc781edcd2c41444a648999):
667.98, 743.64, 728.74, 621.98, 733.17, 641.49, 720.56
Inboud with patches (https://treeherder.mozilla.org/#/jobs?repo=try&revision=7579cdd52e02def2baf447fc861fe00489d3cac4):
738.8, 688.22, 660.91, 642.16, 713.79, 727.71, 755.35, 745.04
Inbound with backout (https://treeherder.mozilla.org/#/jobs?repo=try&revision=708c015b2805a70c1b185bb4592f9f6a013caa69):
697.62, 700.58, 619.79, 699.12, 610.98, 608.65, 706.36, 784.55
I don't see anything obvious in any of these, maybe try isn't stable enough for perf testing this?
Thanks
Comment 10•5 years ago
|
||
Jonathan, can we get an update on the status of this bug? (I see that you have approved patches that could land and we are close to merge day) Thanks!
Assignee | ||
Comment 11•5 years ago
|
||
Sorry I moved onto another bug, I wasn't able to reproduce any notable changes with any of the patches or the backout. I can land those and we can see if there is an improvement?
Comment 12•5 years ago
|
||
(In reply to Jonathan Kingston [:jkt] from comment #11)
Sorry I moved onto another bug, I wasn't able to reproduce any notable changes with any of the patches or the backout. I can land those and we can see if there is an improvement?
That sounds good to me.
Comment 13•5 years ago
|
||
Comment 14•5 years ago
|
||
Comment 15•5 years ago
|
||
Backed out 3 changesets (bug 1585582) for bc failures browser/base/content/test/sanitize/browser.ini
Failure log: hhttps://treeherder.mozilla.org/logviewer.html#/jobs?job_id=271919748&repo=autoland&lineNumber=1473
Backout: https://hg.mozilla.org/integration/autoland/rev/2afde003f4b1ff56511fe58226711cdb442df950
Comment 16•5 years ago
|
||
bugherder |
Updated•5 years ago
|
Comment 17•5 years ago
|
||
Comment 18•5 years ago
|
||
Backed out for perma fails on appcache.tentative.https.sub.html.
Failure log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=272083200&repo=autoland&lineNumber=31477
Backout: https://hg.mozilla.org/integration/autoland/rev/2b0f526cdab1f1a6f93a86ed51a961f40bb98641
![]() |
||
Updated•5 years ago
|
Comment 19•5 years ago
|
||
Comment 20•5 years ago
•
|
||
(In reply to Jonathan Kingston [:jkt] from comment #9)
[...]
I downloaded the zip that the proxy service uses and from what I can tell nothing was using appcache.
[...]
Some questions:
- If the problem is indeed that tumblr or a subresource used appcache can we update the test as it doesn't look like they still do.
If the downloaded zip wasn't hinting anything about our recordings using appcache, then probably this isn't something trivial to check. Still, we need to know for sure if these recordings are using appcache or not. Please try and simulate something at Firefox source code level, so we can tell if the appcache mechanics are hit when running the affected perf tests.
- If they do on live (I don't have an account to verify their account pages right now), can we accept the performance hit.
If we can properly confirm that the appcache mechanics are hit, then we can consider accepting the performance hit.
- Is there anything else that might introduce this bimodality?
Not really. We've done many retriggers & concluded that bug 1237782 alone brought the bimodality.
I ran the tests again a few times:
Inbound (https://treeherder.mozilla.org/#/jobs?repo=try&selectedJob=269781247&revision=4c706780fe5e7ddf1bc781edcd2c41444a648999):
667.98, 743.64, 728.74, 621.98, 733.17, 641.49, 720.56Inboud with patches (https://treeherder.mozilla.org/#/jobs?repo=try&revision=7579cdd52e02def2baf447fc861fe00489d3cac4):
738.8, 688.22, 660.91, 642.16, 713.79, 727.71, 755.35, 745.04Inbound with backout (https://treeherder.mozilla.org/#/jobs?repo=try&revision=708c015b2805a70c1b185bb4592f9f6a013caa69):
697.62, 700.58, 619.79, 699.12, 610.98, 608.65, 706.36, 784.55I don't see anything obvious in any of these, maybe try isn't stable enough for perf testing this?
I don't think there are issues with simulating this on Try. We checked the noise levels, which are similar with what we have on trunk branches.
Comment 21•5 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/43eaaab0cec7
https://hg.mozilla.org/mozilla-central/rev/8e5bb8f2f831
https://hg.mozilla.org/mozilla-central/rev/341d72ef4bf5
Comment 22•5 years ago
|
||
Is this something which needs Beta uplift consideration? Looks like a pretty sizeable regression.
Assignee | ||
Comment 23•5 years ago
|
||
:igoldan do we know if the patches fixed the regression and if we should uplift?
Comment 24•5 years ago
|
||
(In reply to Jonathan Kingston [:jkt] from comment #23)
:igoldan do we know if the patches fixed the regression and if we should uplift?
I checked the push from comment 19 and it didn't bring any kind of improvements.
Updated•5 years ago
|
Comment 25•5 years ago
•
|
||
:jkt please make sure to follow the steps described in comment 20. Otherwise, we cannot conclude this bug and close it.
Updated•5 years ago
|
Assignee | ||
Comment 26•5 years ago
|
||
:igoldan sorry I have been looking at some fission blockers so I have been unable to give any time to this. I will say that I haven't been able to see the code being triggered under the raptor tests however I wasn't able to confirm either if the test actually notices when a content process crashes either... that's about where I got last time I looked.
So until I can 100% confirm that this isn't appcache then I suspect this should remain open. I'm pretty sure it must be otherwise we would be seeing regressions across the board if let's say new tab had some reason it was spinning up the appcache service even if it wasn't being used.
If this is urgent that we investigate further to ensure we don't produce a stable regression are we able to test if the pref I added fixes the regression?
Comment 27•5 years ago
|
||
Ionuț could you answer Jonathan's question above please? Thanks
Comment 28•5 years ago
•
|
||
(In reply to Jonathan Kingston [:jkt] from comment #26)
If this is urgent that we investigate further to ensure we don't produce a stable regression
If we don't investigate now, we'll lose the ability to conclude this. This is because we'll update our tests and their new baselines won't tell us whether the regressions went away or not.
And in time, this ticket will be closed as WONTFIX or INCOMPLETE.
are we able to test if the pref I added fixes the regression?
Are you referring to a new fix?
Assignee | ||
Comment 29•5 years ago
|
||
Are you referring to a new fix?
Sorry no, I was interested if the regression is removed by flipping the pref back which the patch adds. Which of course means we can chase it again in the future too.
Comment 30•5 years ago
|
||
(In reply to Jonathan Kingston [:jkt] from comment #29)
Are you referring to a new fix?
Sorry no, I was interested if the regression is removed by flipping the pref back which the patch adds.
Could you check that via a Try push and share the results?
Comment 31•5 years ago
|
||
We have shipped our last beta for 71, so this is wontfix for 71.
Updated•5 years ago
|
Comment 32•5 years ago
|
||
Even though the baselines have returned to normal since around Oct 22, that is only attributed to bug 1583700, which caused these perf changes.
If bug 1591864's changes are somewhat related to bug 1237782, please let us know.
Comment 33•5 years ago
|
||
Does this bug still need to stay open? Are we accepting the perf regression here?
In any case, it is likely too late for a fix in 72.
Comment 34•5 years ago
|
||
Should I try outreach to tumblr?
Assignee | ||
Comment 35•5 years ago
|
||
When changing the pref I didn't see any difference in raptor: https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=1259e9c66e5295cba3543c2bc0e0830e037af518&newProject=try&newRevision=8f32be0fa95103fd28c36cdcef7be7f3097b02d1&framework=10
This might mean something that has changed has an impact outside of the pref. I was looking into this but got put onto other work.
I would like to keep this open to investigate further however I can't guarantee set time on this to fix it. For me anyway this is at least blocking rolling this pref enable out further to release channel.
I'm not even sure if we should outreach to tumblr given that the regression isn't even clear if it's from appcache.
Assignee | ||
Comment 36•5 years ago
|
||
Assignee | ||
Comment 37•5 years ago
|
||
Ok with the latest patch there isn't any difference in the performance however with the latest patch and the pref disabled I'm seeing ~6% improvement on the test:
https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=14e686d8583a0a7ac15856dad4572d0cad44f45e&newProject=try&newRevision=b4a869f06939ef24ec4c5e0ceb0f505f7ceca2e0&framework=10
It does seem bimodal still though which is odd.
Comment 38•5 years ago
|
||
Comment 39•5 years ago
|
||
bugherder |
Updated•3 years ago
|
Description
•