Closed Bug 1692213 Opened 2 months ago Closed 1 month ago

0.33 - 6.33% Base Content Explicit / Base Content Heap Unclassified / Base Content JS / Base Content Resident Unique Memory regression on push 35db4533f11f8d1a5bdf2d2dbafdef5c0646a4d1 (Tue February 9 2021)

Categories

(Core :: Performance, defect)

defect

Tracking

()

RESOLVED FIXED
88 Branch
Tracking Status
firefox-esr78 --- unaffected
firefox85 --- unaffected
firefox86 --- unaffected
firefox87 --- wontfix
firefox88 --- fixed

People

(Reporter: aesanu, Assigned: sefeng)

References

(Regression)

Details

(Keywords: perf, perf-alert, regression, Whiteboard: [qf-])

Attachments

(2 files)

Perfherder has detected a awsy performance regression from push 35db4533f11f8d1a5bdf2d2dbafdef5c0646a4d1. As author of one of the patches included in that push, we need your help to address this regression.

Regressions:

Ratio Suite Test Platform Options Absolute values (old vs new)
6% Base Content Heap Unclassified windows10-64-shippable 1,387,717.00 -> 1,475,622.00
3% Base Content Explicit windows10-64-shippable 8,389,786.67 -> 8,624,794.67
2% Base Content Resident Unique Memory windows10-64-shippable 10,490,368.00 -> 10,743,125.33
1% Base Content JS windows10-64-shippable 2,505,936.00 -> 2,519,440.00
1% Base Content JS windows10-64-shippable-qr 2,505,936.00 -> 2,519,440.00
0.44% Base Content JS linux1804-64-shippable 2,504,917.33 -> 2,515,943.33
0.43% Base Content JS linux1804-64-shippable-qr 2,505,985.33 -> 2,516,879.33
0.33% Base Content JS macosx1014-64-shippable-qr 2,511,376.00 -> 2,519,688.00

Details of the alert can be found in the alert summary, including links to graphs and comparisons for each of the affected tests. Please follow our guide to handling regression bugs and let us know your plans within 3 business days, or the offending patch(es) will be backed out in accordance with our regression policy.

For more information on performance sheriffing please see our FAQ.

Flags: needinfo?(sefeng)

Thanks! This is probably due to the event timing entries that were introduced in the patches.

We probably should change that to a nsTArray instead of AutoTArray to avoid preallocating unused memory.

I'll do some testing and make a patch

== Change summary for alert #28765 (as of Fri, 12 Feb 2021 08:56:23 GMT) ==

Regressions:

Ratio Suite Test Platform Options Absolute values (old vs new)
7% raptor-tp6-yandex-firefox-cold windows10-64-shippable nocondprof 610.77 -> 650.91
6% raptor-tp6-wikipedia-firefox-cold fcp windows10-64-shippable nocondprof 1,378.75 -> 1,457.50
5% raptor-tp6-wikipedia-firefox-cold fcp windows10-64-shippable nocondprof 1,395.96 -> 1,467.58
5% raptor-tp6-wikipedia-firefox-cold fcp windows10-64-shippable nocondprof 1,389.31 -> 1,459.25
5% raptor-tp6-netflix-firefox-cold loadtime windows10-64-shippable-qr nocondprof webrender 803.71 -> 842.17
3% raptor-tp6-outlook-firefox-cold fcp windows10-64-shippable-qr nocondprof webrender 247.83 -> 255.17
3% raptor-tp6-outlook-firefox-cold windows10-64-shippable-qr nocondprof webrender 352.33 -> 362.02

Improvements:

Ratio Suite Test Platform Options Absolute values (old vs new)
3% raptor-tp6-instagram-firefox-cold fcp windows10-64-shippable-qr nocondprof webrender 216.17 -> 208.83

For up to date results, see: https://treeherder.mozilla.org/perfherder/alerts?id=28765

AutoTArray gives us some memory overhead and the benefits it provides
might not be that useful since PerformanceEventTiming only works
for certain event types.

Assignee: nobody → sefeng
Status: NEW → ASSIGNED
Pushed by sefeng@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/9eddc74e3970
Use nsTArray for PerformanceEventTiming buffer instead of AutoTArray r=smaug

Added leave-open because we need to verify the regressions are fixed.

Keywords: leave-open

Currently, whenever the algorithm requires to get timestamps, it'll
try to get the corresponding precision reduced timestamps. However,
this creates some memory overhead because the calculation of precision
reduced timestamps. So instead of always generating the precision
reduced timestamps, we use the raw timestamps internally to avoid
generating precision reduced timestamps unnecessarily.

Attachment #9204351 - Attachment description: Use raw timestamp internally for PerformanceEventTiming → Bug 1692213 - Use raw timestamp internally for PerformanceEventTiming
Pushed by sefeng@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/125d2bb7680f
Use raw timestamp internally for PerformanceEventTiming r=smaug

I just checked the latest graphs, and I think the regressions are fixed now. So I am closing this bug.

Status: ASSIGNED → RESOLVED
Closed: 2 months ago
Flags: needinfo?(sefeng)
Resolution: --- → FIXED

One of the patches in this bug is in 87, the other in 88. Do we want to uplift the latter to beta or should it ride the trains?

Flags: needinfo?(sefeng)
Target Milestone: --- → 88 Branch

It's okay to let it ride the train since they are nightly only regressions because we don't have the feature enabled in beta.

Flags: needinfo?(sefeng)

FYI regression bellow was not fixed.

Regressions:

Ratio Suite Test Platform Options Absolute values (old vs new)
7% raptor-tp6-yandex-firefox-cold windows10-64-shippable nocondprof 610.77 -> 650.91

For up to date results, see: https://treeherder.mozilla.org/perfherder/alerts?id=28765

I'm reopening this, because change from comment #10 caused other regressions I've listed bellow.

== Change summary for alert #28890 (as of Wed, 24 Feb 2021 00:07:36 GMT) ==

Regressions:

Ratio Suite Test Platform Options Absolute values (old vs new)
4% raptor-tp6-bing-firefox-cold fcp windows10-64-shippable nocondprof 296.42 -> 308.75

Improvements:

Ratio Suite Test Platform Options Absolute values (old vs new)
7% raptor-tp6-wikipedia-firefox-cold windows10-64-shippable-qr nocondprof webrender 1,637.07 -> 1,524.38
5% raptor-tp6-wikipedia-firefox-cold fcp windows10-64-shippable-qr nocondprof webrender 1,556.50 -> 1,479.33

For up to date results, see: https://treeherder.mozilla.org/perfherder/alerts?id=28890

Status: RESOLVED → REOPENED
Resolution: FIXED → ---

igoldan, I check the 4 regression comment 4, three of them (except raptor-tp6-bing-firefox-cold) seem to have an improvement around Feb 9th. Do you know which patch caused the improvements? I deduce it was my regression bug that introduced the improvements, and my fixes in this patch reverted those improvements.

However, indeed raptor-tp6-yandex-firefox-cold and raptor-tp6-bing-firefox-cold are odd. I'll do some work for them.

Flags: needinfo?(igoldan)

== Change summary for alert #28904 (as of Wed, 24 Feb 2021 12:59:15 GMT) ==

Improvements:

Ratio Suite Test Platform Options Absolute values (old vs new)
5% Base Content Heap Unclassified windows10-64-shippable 1,463,726.00 -> 1,385,983.33
5% Base Content Heap Unclassified windows10-64-shippable 1,463,132.33 -> 1,388,992.00
2% Base Content Explicit windows10-64-shippable 8,604,058.67 -> 8,394,224.00
2% Base Content Resident Unique Memory windows10-64-shippable 10,751,146.67 -> 10,519,893.33

For up to date results, see: https://treeherder.mozilla.org/perfherder/alerts?id=28904

(In reply to Sean Feng [:sefeng] from comment #16)

igoldan, I check the 4 regression comment 4, three of them (except raptor-tp6-bing-firefox-cold) seem to have an improvement around Feb 9th. Do you know which patch caused the improvements? I deduce it was my regression bug that introduced the improvements, and my fixes in this patch reverted those improvements.

Oh, sorry about that. You are right. Let me update comment 15.

Flags: needinfo?(igoldan)

The patch landed in nightly and beta is affected.
:sefeng, is this bug important enough to require an uplift?
If not please set status_beta to wontfix.

For more information, please visit auto_nag documentation.

Flags: needinfo?(sefeng)

I believe the regressions are behind a pref, which is only enabled in Nightly, so no need to uplift the patch.

Flags: needinfo?(sefeng)
Whiteboard: [qf-]

I have some new thoughts on this.

raptor-tp6-yandex-firefox-cold: The regression was the geomean of multiple subtests, so I checked each of them indivisually. Here's the link. So clearly, dcf was the one that caused the regression. Then I looked further into this subtest, and I noticed the first regression was came from this merge which nothing to do with my patches (bug 1667836). In addition to this, I found the test started to have some improvements since this merge at Feb 22 which included the second patch I landed in this bug. So I think what happend was 89c5f958a3ac4795109acf1f9dff1c8026bb82fe caused the majority of regression, and my patches in bug 1667836 regressed them a bit further. Then the fixes I landed in this patch fixed the regression that was caused by my patches, however there was still the marjoity one left.

raptor-tp6-bing-firefox-cold: Link to the graph. I also have concerns about this regression because it seemed that the regression had already exited before the push that contained my patches.

Ionut, do you mind to check my above hypothesis? At the moment, I am leaning towards that it wasn't bug 1667836 caused these regressions.

Flags: needinfo?(igoldan)

(In reply to Sean Feng [:sefeng] from comment #21)

I have some new thoughts on this.

raptor-tp6-yandex-firefox-cold: The regression was the geomean of multiple subtests, so I checked each of them indivisually. Here's the link. So clearly, dcf was the one that caused the regression. Then I looked further into this subtest, and I noticed the first regression was came from this merge which nothing to do with my patches (bug 1667836).

I'm checking this theory using alert 28727.

Flags: needinfo?(igoldan)
Flags: needinfo?(igoldan)

I can confirm the theory. Closing this bug.

Status: REOPENED → RESOLVED
Closed: 2 months ago2 months ago
Flags: needinfo?(igoldan)
Resolution: --- → FIXED

Looks like push from comment #10 brought a CNN regression:

== Change summary for alert #29030 (as of Wed, 03 Mar 2021 12:06:18 GMT) ==

Regressions:

Ratio Suite Test Platform Options Absolute values (old vs new)
11% raptor-tp6-cnn-ampstories-firefox-cold dcf windows10-64-shippable-qr live nocondprof webrender 50.25 -> 55.76

For up to date results, see: https://treeherder.mozilla.org/perfherder/alerts?id=29030

Status: RESOLVED → REOPENED
Resolution: FIXED → ---

This still looks like something improved it on Feb 9 (presumably my old patches), and then the fix I did revert the regression.

What do you think Ionut?

Flags: needinfo?(igoldan)

Redirecting to Florin, as I'm not on perf sheriff duty anymore.

Flags: needinfo?(igoldan) → needinfo?(fstrugariu)

[:sefeng] is right it's something that improved on feb9 and then came back to it's original value

Let's close this one for good

Status: REOPENED → RESOLVED
Closed: 2 months ago1 month ago
Flags: needinfo?(fstrugariu)
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.