0.33 - 6.33% Base Content Explicit / Base Content Heap Unclassified / Base Content JS / Base Content Resident Unique Memory regression on push 35db4533f11f8d1a5bdf2d2dbafdef5c0646a4d1 (Tue February 9 2021)
Categories
(Core :: Performance, defect)
Tracking
()
Tracking | Status | |
---|---|---|
firefox-esr78 | --- | unaffected |
firefox85 | --- | unaffected |
firefox86 | --- | unaffected |
firefox87 | --- | wontfix |
firefox88 | --- | fixed |
People
(Reporter: aesanu, Assigned: sefeng)
References
(Regression)
Details
(Keywords: perf, perf-alert, regression)
Attachments
(2 files)
Perfherder has detected a awsy performance regression from push 35db4533f11f8d1a5bdf2d2dbafdef5c0646a4d1. As author of one of the patches included in that push, we need your help to address this regression.
Regressions:
Ratio | Suite | Test | Platform | Options | Absolute values (old vs new) |
---|---|---|---|---|---|
6% | Base Content Heap Unclassified | windows10-64-shippable | 1,387,717.00 -> 1,475,622.00 | ||
3% | Base Content Explicit | windows10-64-shippable | 8,389,786.67 -> 8,624,794.67 | ||
2% | Base Content Resident Unique Memory | windows10-64-shippable | 10,490,368.00 -> 10,743,125.33 | ||
1% | Base Content JS | windows10-64-shippable | 2,505,936.00 -> 2,519,440.00 | ||
1% | Base Content JS | windows10-64-shippable-qr | 2,505,936.00 -> 2,519,440.00 | ||
0.44% | Base Content JS | linux1804-64-shippable | 2,504,917.33 -> 2,515,943.33 | ||
0.43% | Base Content JS | linux1804-64-shippable-qr | 2,505,985.33 -> 2,516,879.33 | ||
0.33% | Base Content JS | macosx1014-64-shippable-qr | 2,511,376.00 -> 2,519,688.00 |
Details of the alert can be found in the alert summary, including links to graphs and comparisons for each of the affected tests. Please follow our guide to handling regression bugs and let us know your plans within 3 business days, or the offending patch(es) will be backed out in accordance with our regression policy.
For more information on performance sheriffing please see our FAQ.
Assignee | ||
Comment 1•3 years ago
|
||
Thanks! This is probably due to the event timing entries that were introduced in the patches.
We probably should change that to a nsTArray instead of AutoTArray to avoid preallocating unused memory.
I'll do some testing and make a patch
Updated•3 years ago
|
Comment 2•3 years ago
|
||
== Change summary for alert #28765 (as of Fri, 12 Feb 2021 08:56:23 GMT) ==
Regressions:
Ratio | Suite | Test | Platform | Options | Absolute values (old vs new) |
---|---|---|---|---|---|
7% | raptor-tp6-yandex-firefox-cold | windows10-64-shippable | nocondprof | 610.77 -> 650.91 | |
6% | raptor-tp6-wikipedia-firefox-cold | fcp | windows10-64-shippable | nocondprof | 1,378.75 -> 1,457.50 |
5% | raptor-tp6-wikipedia-firefox-cold | fcp | windows10-64-shippable | nocondprof | 1,395.96 -> 1,467.58 |
5% | raptor-tp6-wikipedia-firefox-cold | fcp | windows10-64-shippable | nocondprof | 1,389.31 -> 1,459.25 |
5% | raptor-tp6-netflix-firefox-cold | loadtime | windows10-64-shippable-qr | nocondprof webrender | 803.71 -> 842.17 |
3% | raptor-tp6-outlook-firefox-cold | fcp | windows10-64-shippable-qr | nocondprof webrender | 247.83 -> 255.17 |
3% | raptor-tp6-outlook-firefox-cold | windows10-64-shippable-qr | nocondprof webrender | 352.33 -> 362.02 |
Improvements:
Ratio | Suite | Test | Platform | Options | Absolute values (old vs new) |
---|---|---|---|---|---|
3% | raptor-tp6-instagram-firefox-cold | fcp | windows10-64-shippable-qr | nocondprof webrender | 216.17 -> 208.83 |
For up to date results, see: https://treeherder.mozilla.org/perfherder/alerts?id=28765
Assignee | ||
Comment 3•3 years ago
|
||
AutoTArray gives us some memory overhead and the benefits it provides
might not be that useful since PerformanceEventTiming only works
for certain event types.
Updated•3 years ago
|
Assignee | ||
Comment 4•3 years ago
|
||
Not 100% sure the posted patch will solve the memory usage regression. However, looks like it will, based on the graph. https://treeherder.mozilla.org/perfherder/graphs?highlightAlerts=1&highlightChangelogData=1&highlightedRevisions=6d3b16b66e92&highlightedRevisions=2fa5c179202b&selected=1961446,1306158978&series=try,1961446,1,4&timerange=86400&zoom=1613177775517,1613181954118,1407259.7706226483,1994246.052104132
Pushed by sefeng@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/9eddc74e3970 Use nsTArray for PerformanceEventTiming buffer instead of AutoTArray r=smaug
Assignee | ||
Comment 6•3 years ago
|
||
Added leave-open
because we need to verify the regressions are fixed.
Comment 7•3 years ago
|
||
bugherder |
Assignee | ||
Comment 8•3 years ago
|
||
Currently, whenever the algorithm requires to get timestamps, it'll
try to get the corresponding precision reduced timestamps. However,
this creates some memory overhead because the calculation of precision
reduced timestamps. So instead of always generating the precision
reduced timestamps, we use the raw timestamps internally to avoid
generating precision reduced timestamps unnecessarily.
Updated•3 years ago
|
Pushed by sefeng@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/125d2bb7680f Use raw timestamp internally for PerformanceEventTiming r=smaug
Comment 10•3 years ago
|
||
bugherder |
Assignee | ||
Comment 11•3 years ago
|
||
I just checked the latest graphs, and I think the regressions are fixed now. So I am closing this bug.
Comment 12•3 years ago
|
||
One of the patches in this bug is in 87, the other in 88. Do we want to uplift the latter to beta or should it ride the trains?
Assignee | ||
Comment 13•3 years ago
|
||
It's okay to let it ride the train since they are nightly only regressions because we don't have the feature enabled in beta.
Updated•3 years ago
|
Comment 14•3 years ago
•
|
||
FYI regression bellow was not fixed.
Regressions:
Ratio | Suite | Test | Platform | Options | Absolute values (old vs new) |
---|---|---|---|---|---|
7% | raptor-tp6-yandex-firefox-cold | windows10-64-shippable | nocondprof | 610.77 -> 650.91 |
For up to date results, see: https://treeherder.mozilla.org/perfherder/alerts?id=28765
Comment 15•3 years ago
•
|
||
I'm reopening this, because change from comment #10 caused other regressions I've listed bellow.
== Change summary for alert #28890 (as of Wed, 24 Feb 2021 00:07:36 GMT) ==
Regressions:
Ratio | Suite | Test | Platform | Options | Absolute values (old vs new) |
---|---|---|---|---|---|
4% | raptor-tp6-bing-firefox-cold | fcp | windows10-64-shippable | nocondprof | 296.42 -> 308.75 |
Improvements:
Ratio | Suite | Test | Platform | Options | Absolute values (old vs new) |
---|---|---|---|---|---|
7% | raptor-tp6-wikipedia-firefox-cold | windows10-64-shippable-qr | nocondprof webrender | 1,637.07 -> 1,524.38 | |
5% | raptor-tp6-wikipedia-firefox-cold | fcp | windows10-64-shippable-qr | nocondprof webrender | 1,556.50 -> 1,479.33 |
For up to date results, see: https://treeherder.mozilla.org/perfherder/alerts?id=28890
Assignee | ||
Comment 16•3 years ago
|
||
igoldan, I check the 4 regression comment 4, three of them (except raptor-tp6-bing-firefox-cold
) seem to have an improvement around Feb 9th. Do you know which patch caused the improvements? I deduce it was my regression bug that introduced the improvements, and my fixes in this patch reverted those improvements.
However, indeed raptor-tp6-yandex-firefox-cold
and raptor-tp6-bing-firefox-cold
are odd. I'll do some work for them.
Reporter | ||
Comment 17•3 years ago
|
||
== Change summary for alert #28904 (as of Wed, 24 Feb 2021 12:59:15 GMT) ==
Improvements:
Ratio | Suite | Test | Platform | Options | Absolute values (old vs new) |
---|---|---|---|---|---|
5% | Base Content Heap Unclassified | windows10-64-shippable | 1,463,726.00 -> 1,385,983.33 | ||
5% | Base Content Heap Unclassified | windows10-64-shippable | 1,463,132.33 -> 1,388,992.00 | ||
2% | Base Content Explicit | windows10-64-shippable | 8,604,058.67 -> 8,394,224.00 | ||
2% | Base Content Resident Unique Memory | windows10-64-shippable | 10,751,146.67 -> 10,519,893.33 |
For up to date results, see: https://treeherder.mozilla.org/perfherder/alerts?id=28904
Comment 18•3 years ago
|
||
(In reply to Sean Feng [:sefeng] from comment #16)
igoldan, I check the 4 regression comment 4, three of them (except
raptor-tp6-bing-firefox-cold
) seem to have an improvement around Feb 9th. Do you know which patch caused the improvements? I deduce it was my regression bug that introduced the improvements, and my fixes in this patch reverted those improvements.
Oh, sorry about that. You are right. Let me update comment 15.
Comment 19•3 years ago
|
||
The patch landed in nightly and beta is affected.
:sefeng, is this bug important enough to require an uplift?
If not please set status_beta
to wontfix
.
For more information, please visit auto_nag documentation.
Assignee | ||
Comment 20•3 years ago
|
||
I believe the regressions are behind a pref, which is only enabled in Nightly, so no need to uplift the patch.
Assignee | ||
Updated•3 years ago
|
Assignee | ||
Comment 21•3 years ago
|
||
I have some new thoughts on this.
raptor-tp6-yandex-firefox-cold
: The regression was the geomean of multiple subtests, so I checked each of them indivisually. Here's the link. So clearly, dcf
was the one that caused the regression. Then I looked further into this subtest, and I noticed the first regression was came from this merge which nothing to do with my patches (bug 1667836). In addition to this, I found the test started to have some improvements since this merge at Feb 22 which included the second patch I landed in this bug. So I think what happend was 89c5f958a3ac4795109acf1f9dff1c8026bb82fe
caused the majority of regression, and my patches in bug 1667836 regressed them a bit further. Then the fixes I landed in this patch fixed the regression that was caused by my patches, however there was still the marjoity one left.
raptor-tp6-bing-firefox-cold
: Link to the graph. I also have concerns about this regression because it seemed that the regression had already exited before the push that contained my patches.
Ionut, do you mind to check my above hypothesis? At the moment, I am leaning towards that it wasn't bug 1667836 caused these regressions.
Comment 22•3 years ago
|
||
(In reply to Sean Feng [:sefeng] from comment #21)
I have some new thoughts on this.
raptor-tp6-yandex-firefox-cold
: The regression was the geomean of multiple subtests, so I checked each of them indivisually. Here's the link. So clearly,dcf
was the one that caused the regression. Then I looked further into this subtest, and I noticed the first regression was came from this merge which nothing to do with my patches (bug 1667836).
I'm checking this theory using alert 28727.
Updated•3 years ago
|
Comment 23•3 years ago
|
||
I can confirm the theory. Closing this bug.
Comment 24•3 years ago
|
||
Looks like push from comment #10 brought a CNN regression:
== Change summary for alert #29030 (as of Wed, 03 Mar 2021 12:06:18 GMT) ==
Regressions:
Ratio | Suite | Test | Platform | Options | Absolute values (old vs new) |
---|---|---|---|---|---|
11% | raptor-tp6-cnn-ampstories-firefox-cold | dcf | windows10-64-shippable-qr | live nocondprof webrender | 50.25 -> 55.76 |
For up to date results, see: https://treeherder.mozilla.org/perfherder/alerts?id=29030
Assignee | ||
Comment 25•3 years ago
|
||
This still looks like something improved it on Feb 9 (presumably my old patches), and then the fix I did revert the regression.
What do you think Ionut?
Comment 26•3 years ago
|
||
Redirecting to Florin, as I'm not on perf sheriff duty anymore.
Comment 27•3 years ago
|
||
[:sefeng] is right it's something that improved on feb9 and then came back to it's original value
Let's close this one for good
Updated•3 years ago
|
Updated•2 years ago
|
Description
•