Closed Bug 1655438 Opened 4 years ago Closed 4 years ago

6.65 - 30.02% Base Content Explicit / Base Content JS / Base Content Resident Unique Memory (linux1804, macosx1014-64-shippable, windows10-64-shippable-qr) regression on push 6a3cc8878240b50d1b50d5979ad86e2078de1b4e (Thu July 23 2020)

Categories

(Remote Protocol :: Marionette, defect, P3)

Other Branch
defect

Tracking

(firefox-esr68 unaffected, firefox-esr78 unaffected, firefox79 unaffected, firefox80 fixed)

RESOLVED FIXED
Tracking Status
firefox-esr68 --- unaffected
firefox-esr78 --- unaffected
firefox79 --- unaffected
firefox80 --- fixed

People

(Reporter: Bebe, Assigned: whimboo)

References

(Blocks 1 open bug, Regression)

Details

(Keywords: perf, perf-alert, regression, Whiteboard: [marionette-fission-mvp])

Perfherder has detected a awsy performance regression from push 6a3cc8878240b50d1b50d5979ad86e2078de1b4e. As author of one of the patches included in that push, we need your help to address this regression.

Regressions:

30% Base Content JS linux1804-64-shippable opt 3,509,410.67 -> 4,563,002.67
30% Base Content JS linux1804-64-shippable-qr opt 3,515,964.67 -> 4,562,896.67
30% Base Content JS windows10-64-shippable-qr opt 3,523,836.00 -> 4,568,433.33
28% Base Content JS macosx1014-64-shippable opt 3,515,592.67 -> 4,490,755.33
7% Base Content Explicit linux1804-64-shippable opt 13,849,941.33 -> 14,873,088.00
7% Base Content Explicit windows10-64-shippable-qr opt 11,151,957.33 -> 11,939,498.67
7% Base Content Explicit linux1804-64-shippable-qr opt 14,176,768.00 -> 15,166,293.33
7% Base Content Resident Unique Memory windows10-64-shippable-qr opt 12,906,069.33 -> 13,764,437.33

Improvements:

23% Base Content JS linux1804-64-shippable opt 4,561,273.33 -> 3,507,794.67
23% Base Content JS linux1804-64-shippable-qr opt 4,561,296.00 -> 3,507,882.67
23% Base Content JS windows10-64-shippable-qr opt 4,567,276.67 -> 3,513,520.00
23% Base Content JS windows10-64-shippable-qr opt 4,566,749.33 -> 3,513,477.33

Details of the alert can be found in the alert summary, including links to graphs and comparisons for each of the affected tests. Please follow our guide to handling regression bugs and let us know your plans within 3 business days, or the offending patch(es) will be backed out in accordance with our regression policy.

For more information on performance sheriffing please see our FAQ.

Component: Performance → Marionette
Flags: needinfo?(hskupin)

I'm not sure what effect Henrik's changes could actually have on the BaseContentJS measure, given that they are disabled by a pref by default. Can we backfill the awsy tests perhaps to narrow down the culprit?

Flags: needinfo?(fstrugariu)

we can't backfill on this patch but I will try to bisect and identify the culprit bug

Flags: needinfo?(fstrugariu)

Bisection:

When the builds finish I will add the conclusions in case somebody else didn't found them yet.

Seems like Bug 1654453 is the culprit of the regression. I will do a local back-out and push again to try to make sure that its absence fixes the regression.

Thank you for doing these extra checks.

The comparison reveals that Bug 1654453 introduced the regression.

The only thing I could imagine here is the usage of the useActor getter, which does an extra read of a preference. But it's unlikely that it would cause such a massive regression. Alexandru, given that I just came back from PTO would you mind pushing a change which both uses with just if (false)? If that fixes the issue we have the culprit, and I hope that it is. All the other code is present some days longer and was only moved into this if condition. Thanks a lot.

Flags: needinfo?(hskupin) → needinfo?(aionescu)
Severity: -- → S3
Priority: -- → P3

Henrik, I am not sure what comparison should I make so please look into the results. For Base Content JS opt on windows10-64-shippable-qr I see no change, but I see on other Base Content tests. Not sure what conclusion should I take here.

Flags: needinfo?(hskupin)

So with your patch from the latest try build (comment 8) we no longer have to read the preference and also don't actually register the JSWindowActor pair at all. From the baseline perspective we actually removed code and run less. As such I would expect to always see improvements. But that's not the case as you can see in this comparison:

https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=679d4361a15eb6f945e2b0896228a6c6ca0943cc&newProject=try&newRevision=d6e66c3f8f26267d5220a159d247eed9545abcd1&framework=4

Why does the Base Content Heap Unclassified opt value grow by 38.84%? Andrew, do you have an idea why that could happen when running lesser code? Or know at least who could answer?

Flags: needinfo?(hskupin) → needinfo?(continuation)
Flags: needinfo?(continuation)

Why are regressions and improvements for the same platform shown together? See the initial comment:

Regressions:

30% Base Content JS linux1804-64-shippable opt 3,509,410.67 -> 4,563,002.67

Improvements:

23% Base Content JS linux1804-64-shippable opt 4,561,273.33 -> 3,507,794.67

It looks like there is something wrong with the reporting.

Flags: needinfo?(fstrugariu)

So I just had a look at the perfherder graph for the regression on Windows. And as it shows there was indeed a regression when I landed the initial minimal JSWindowActor implementation (bug 1653911). But with the conditional patch all numbers went back to normal.

So I assume that no perfherder alert checks were done between those two commits, and then when reporting this bug both have been lumped together. That would explain those double listings in regressions and improvements.

Based on that I would call this bug fixed, and we have to remember to reduce the memory overhead of the Marionette actor before enabling it by default. As Kmag mentioned we would have to move the event handlers to a separate actor, so not all the code gets loaded at once with the first initial page load.

Assignee: nobody → hskupin
Status: NEW → RESOLVED
Closed: 4 years ago
Depends on: 1654453
Regressed by: 1653911
No longer regressed by: 1654453, 1654454, 1654609
Resolution: --- → FIXED
Flags: needinfo?(fstrugariu)
See Also: → 1660781
Whiteboard: [marionette-fission-mvp]
Product: Testing → Remote Protocol
You need to log in before you can comment on or make changes to this bug.