Closed Bug 1655438 Opened 5 years ago Closed 5 years ago

6.65 - 30.02% Base Content Explicit / Base Content JS / Base Content Resident Unique Memory (linux1804, macosx1014-64-shippable, windows10-64-shippable-qr) regression on push 6a3cc8878240b50d1b50d5979ad86e2078de1b4e (Thu July 23 2020)

Tracking

(firefox-esr68 unaffected, firefox-esr78 unaffected, firefox79 unaffected, firefox80 fixed)

Status:

RESOLVED FIXED

Tracking Flags:

Tracking

Status

firefox-esr68

---

unaffected

firefox-esr78

---

unaffected

firefox79

---

unaffected

firefox80

---

fixed

People

(Reporter: Bebe, Assigned: whimboo)

References

(Blocks 1 open bug, Regression)

Details

(Keywords: perf, perf-alert, regression, Whiteboard: [marionette-fission-mvp])

Florin Strugariu [:Bebe]

Reporter

Description

•

5 years ago

Perfherder has detected a awsy performance regression from push 6a3cc8878240b50d1b50d5979ad86e2078de1b4e. As author of one of the patches included in that push, we need your help to address this regression.

Regressions:

30% Base Content JS linux1804-64-shippable opt 3,509,410.67 -> 4,563,002.67
30% Base Content JS linux1804-64-shippable-qr opt 3,515,964.67 -> 4,562,896.67
30% Base Content JS windows10-64-shippable-qr opt 3,523,836.00 -> 4,568,433.33
28% Base Content JS macosx1014-64-shippable opt 3,515,592.67 -> 4,490,755.33
7% Base Content Explicit linux1804-64-shippable opt 13,849,941.33 -> 14,873,088.00
7% Base Content Explicit windows10-64-shippable-qr opt 11,151,957.33 -> 11,939,498.67
7% Base Content Explicit linux1804-64-shippable-qr opt 14,176,768.00 -> 15,166,293.33
7% Base Content Resident Unique Memory windows10-64-shippable-qr opt 12,906,069.33 -> 13,764,437.33

Improvements:

23% Base Content JS linux1804-64-shippable opt 4,561,273.33 -> 3,507,794.67
23% Base Content JS linux1804-64-shippable-qr opt 4,561,296.00 -> 3,507,882.67
23% Base Content JS windows10-64-shippable-qr opt 4,567,276.67 -> 3,513,520.00
23% Base Content JS windows10-64-shippable-qr opt 4,566,749.33 -> 3,513,477.33

Details of the alert can be found in the alert summary, including links to graphs and comparisons for each of the affected tests. Please follow our guide to handling regression bugs and let us know your plans within 3 business days, or the offending patch(es) will be backed out in accordance with our regression policy.

For more information on performance sheriffing please see our FAQ.

Florin Strugariu [:Bebe]

Reporter

Updated

•

5 years ago

Component: Performance → Marionette

Florin Strugariu [:Bebe]

Reporter

Updated

•

5 years ago

Flags: needinfo?(hskupin)

Maja Frydrychowicz :impossibus (was :maja_zf) (needinfo me)

Comment 1

•

5 years ago

I'm not sure what effect Henrik's changes could actually have on the BaseContentJS measure, given that they are disabled by a pref by default. Can we backfill the awsy tests perhaps to narrow down the culprit?

Flags: needinfo?(fstrugariu)

Florin Strugariu [:Bebe]

Reporter

Comment 2

•

5 years ago

we can't backfill on this patch but I will try to bisect and identify the culprit bug

Flags: needinfo?(fstrugariu)

Alexandru Ionescu (needinfo me) [:alexandrui]

Comment 3

•

5 years ago

•

Edited

Bisection:

baseline - the commit right before the regressing pushlog
8fc969622469 Bug 1654453
57e7ed65f92e Bug 1654454
6a3cc8878240 Bug 1654609

When the builds finish I will add the conclusions in case somebody else didn't found them yet.

Alexandru Ionescu (needinfo me) [:alexandrui]

Comment 4

•

5 years ago

Seems like Bug 1654453 is the culprit of the regression. I will do a local back-out and push again to try to make sure that its absence fixes the regression.

Maja Frydrychowicz :impossibus (was :maja_zf) (needinfo me)

Comment 5

•

5 years ago

Thank you for doing these extra checks.

Alexandru Ionescu (needinfo me) [:alexandrui]

Comment 6

•

5 years ago

The comparison reveals that Bug 1654453 introduced the regression.

Henrik Skupin [:whimboo][⌚️UTC+2] (away July 24th - August 16th)

Assignee

Comment 7

•

5 years ago

The only thing I could imagine here is the usage of the useActor getter, which does an extra read of a preference. But it's unlikely that it would cause such a massive regression. Alexandru, given that I just came back from PTO would you mind pushing a change which both uses with just if (false)? If that fixes the issue we have the culprit, and I hope that it is. All the other code is present some days longer and was only moved into this if condition. Thanks a lot.

Flags: needinfo?(hskupin) → needinfo?(aionescu)

Henrik Skupin [:whimboo][⌚️UTC+2] (away July 24th - August 16th)

Assignee

Updated

•

5 years ago

Severity: -- → S3

Priority: -- → P3

Alexandru Ionescu (needinfo me) [:alexandrui]

Comment 8

•

5 years ago

https://treeherder.mozilla.org/#/jobs?repo=try&revision=d6e66c3f8f26267d5220a159d247eed9545abcd1

Flags: needinfo?(aionescu)

Alexandru Ionescu (needinfo me) [:alexandrui]

Comment 9

•

5 years ago

Henrik, I am not sure what comparison should I make so please look into the results. For Base Content JS opt on windows10-64-shippable-qr I see no change, but I see on other Base Content tests. Not sure what conclusion should I take here.

Flags: needinfo?(hskupin)

Henrik Skupin [:whimboo][⌚️UTC+2] (away July 24th - August 16th)

Assignee

Comment 10

•

5 years ago

So with your patch from the latest try build (comment 8) we no longer have to read the preference and also don't actually register the JSWindowActor pair at all. From the baseline perspective we actually removed code and run less. As such I would expect to always see improvements. But that's not the case as you can see in this comparison:

https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=679d4361a15eb6f945e2b0896228a6c6ca0943cc&newProject=try&newRevision=d6e66c3f8f26267d5220a159d247eed9545abcd1&framework=4

Why does the Base Content Heap Unclassified opt value grow by 38.84%? Andrew, do you have an idea why that could happen when running lesser code? Or know at least who could answer?

Flags: needinfo?(hskupin) → needinfo?(continuation)

Andrew McCreight [:mccr8]

Updated

•

5 years ago

Blocks: memshrink-content

Flags: needinfo?(continuation)

Henrik Skupin [:whimboo][⌚️UTC+2] (away July 24th - August 16th)

Assignee

Comment 11

•

5 years ago

Why are regressions and improvements for the same platform shown together? See the initial comment:

Regressions:

30% Base Content JS linux1804-64-shippable opt 3,509,410.67 -> 4,563,002.67

Improvements:

23% Base Content JS linux1804-64-shippable opt 4,561,273.33 -> 3,507,794.67

It looks like there is something wrong with the reporting.

Flags: needinfo?(fstrugariu)

Henrik Skupin [:whimboo][⌚️UTC+2] (away July 24th - August 16th)

Assignee

Comment 12

•

5 years ago

So I just had a look at the perfherder graph for the regression on Windows. And as it shows there was indeed a regression when I landed the initial minimal JSWindowActor implementation (bug 1653911). But with the conditional patch all numbers went back to normal.

So I assume that no perfherder alert checks were done between those two commits, and then when reporting this bug both have been lumped together. That would explain those double listings in regressions and improvements.

Based on that I would call this bug fixed, and we have to remember to reduce the memory overhead of the Marionette actor before enabling it by default. As Kmag mentioned we would have to move the event handlers to a separate actor, so not all the code gets loaded at once with the first initial page load.

Assignee: nobody → hskupin

Status: NEW → RESOLVED

Closed: 5 years ago

Depends on: 1654453

Regressed by: 1653911
No longer regressed by: 1654453, 1654454, 1654609

Resolution: --- → FIXED

Florin Strugariu [:Bebe]

Reporter

Updated

•

5 years ago

Flags: needinfo?(fstrugariu)

Ryan VanderMeulen (PTO, back July 27)

Updated

•

5 years ago

status-firefox79: --- → unaffected

status-firefox80: affected → fixed

status-firefox-esr68: --- → unaffected

status-firefox-esr78: --- → unaffected

Henrik Skupin [:whimboo][⌚️UTC+2] (away July 24th - August 16th)

Assignee

Updated

•

5 years ago

Updated

•

5 years ago

Whiteboard: [marionette-fission-mvp]

BMO Automation

Updated

•

3 years ago

Product: Testing → Remote Protocol

You need to log in before you can comment on or make changes to this bug.

Bugzilla

6.65 - 30.02% Base Content Explicit / Base Content JS / Base Content Resident Unique Memory (linux1804, macosx1014-64-shippable, windows10-64-shippable-qr) regression on push 6a3cc8878240b50d1b50d5979ad86e2078de1b4e (Thu July 23 2020)

Categories

(Remote Protocol :: Marionette, defect, P3)

Tracking

(firefox-esr68 unaffected, firefox-esr78 unaffected, firefox79 unaffected, firefox80 fixed)

People

(Reporter: Bebe, Assigned: whimboo)

References

(Blocks 1 open bug, Regression)

Details

(Keywords: perf, perf-alert, regression, Whiteboard: [marionette-fission-mvp])

Crash Data

Security

(public)

User Story

Description

Updated

Updated

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Updated

Comment 8

Comment 9

Comment 10

Updated

Comment 11

Comment 12

Updated

Updated

Updated

Updated

Updated