11.07 - 10.15% Images / Images (OSX) regression on Wed April 17 2024
Categories
(Testing :: Marionette Client and Harness, defect, P2)
Tracking
(firefox-esr115 unaffected, firefox125 unaffected, firefox126 unaffected, firefox127 fixed, firefox128 fixed)
| Tracking | Status | |
|---|---|---|
| firefox-esr115 | --- | unaffected |
| firefox125 | --- | unaffected |
| firefox126 | --- | unaffected |
| firefox127 | --- | fixed |
| firefox128 | --- | fixed |
People
(Reporter: fbilt, Assigned: whimboo)
References
(Regression)
Details
(Keywords: perf, perf-alert, regression)
Attachments
(1 file)
|
78.83 KB,
text/plain
|
Details |
Perfherder has detected a awsy performance regression from push e96fce323a1bee6228bfcb97988b7fbf37681f10. As author of one of the patches included in that push, we need your help to address this regression.
Regressions:
| Ratio | Test | Platform | Options | Absolute values (old vs new) |
|---|---|---|---|---|
| 11% | Images | macosx1015-64-shippable-qr | fission tp6 | 6,004,909.82 -> 6,669,934.43 |
| 10% | Images | macosx1015-64-shippable-qr | fission tp6 | 6,032,871.24 -> 6,645,367.10 |
Details of the alert can be found in the alert summary, including links to graphs and comparisons for each of the affected tests. Please follow our guide to handling regression bugs and let us know your plans within 3 business days, or the patch(es) may be backed out in accordance with our regression policy.
If you need the profiling jobs you can trigger them yourself from treeherder job view or ask a sheriff to do that for you.
You can run these tests on try with ./mach try perf --alert 42353
For more information on performance sheriffing please see our FAQ.
Comment 1•2 years ago
|
||
Set release status flags based on info from the regressing bug 1884401
| Assignee | ||
Comment 2•2 years ago
|
||
I can already see a higher Images value for a landing before my code reached autoland/central.
My change only modifies how Marionette starts the application but doesn't make any changes to Firefox itself. So I doubt that my change could have actually made this change.
Daisuke, given that you worked on bug 1891602, do you think that your patch might have caused this regression?
| Assignee | ||
Comment 3•2 years ago
|
||
Nevertheless I've triggered a perf job with a backout of my patch:
Base: https://treeherder.mozilla.org/jobs?repo=try&revision=5c313d96a02c6e793f5612065f0b43fe8cd8d315
Backout: https://treeherder.mozilla.org/jobs?repo=try&revision=b737d6f7835a98e32db6ffb2133e48f9c7f081f4
| Assignee | ||
Comment 4•2 years ago
|
||
Actually as it looks like it is indeed a regression as caused by my patch.
I'll take a look how image memory is recorded and how starting Firefox via psutil will cause this change in memory usage.
| Assignee | ||
Comment 5•2 years ago
|
||
The Images perf suite seems to sample memory from explicit/images. And the subtest that is actually affected here is Images closed extra processes opt fission tp6 and especially the Fresh start checkpoint. It increased by 173%.
Running ./mach awsy-test --tp6 --quick locally I can as well see an increase for Fresh Start from around 1MB to 3MB.
Andrew, do you have an idea what this specifically could mean?
Comment 6•2 years ago
|
||
I'd recommend downloading the actual memory reports from the checkpoint in question, before and after your patch, and then using the "load and diff" option in about:memory to see where the actual difference lies.
That Fresh Start link doesn't make much sense to me. "Fresh start" and "closed extra processes" are separate checkpoints, I thought, so I have no idea what it means for something to be both.
If we ignore the "closed extra processes" here and just treat this as a "Fresh start" regression, one thing that I can think of is that we drop the decoded images a certain amount of time after we're idle. Maybe your change causes the browser to start up a little later and we end up not dropping images quite as fast. If "fresh start" is the only checkpoint that regressed (and not, say, "fresh start [+30s]", and your patch just messes with test harness stuff, then this is probably okay. But it would be good to understand what regressed a bit better.
| Assignee | ||
Comment 7•2 years ago
|
||
(In reply to Andrew McCreight [:mccr8] from comment #6)
I'd recommend downloading the actual memory reports from the checkpoint in question, before and after your patch, and then using the "load and diff" option in about:memory to see where the actual difference lies.
Ah that's a good idea. Let me do that.
That Fresh Start link doesn't make much sense to me. "Fresh start" and "closed extra processes" are separate checkpoints, I thought, so I have no idea what it means for something to be both.
Yes, there is a bug in the perfherder UI. This is really about Fresh Start and not closed extra processes. Greg filed a bug to get this fixed.
If we ignore the "closed extra processes" here and just treat this as a "Fresh start" regression, one thing that I can think of is that we drop the decoded images a certain amount of time after we're idle. Maybe your change causes the browser to start up a little later and we end up not dropping images quite as fast. If "fresh start" is the only checkpoint that regressed (and not, say, "fresh start [+30s]", and your patch just messes with test harness stuff, then this is probably okay. But it would be good to understand what regressed a bit better.
Yes, Fresh Start [+30s] stays the same. It's only the memory used right after startup. I'll attach a diff for the both runs.
| Assignee | ||
Comment 8•2 years ago
|
||
Could you maybe have a look at this diff? Maybe we can really ignore it?
Comment 9•2 years ago
|
||
It looks like the change is from about:newtab being alive in the privilegedabout process. Is that present in the before memory report? I'm guessing not. It could even be the case that your changes caused things to load faster and about:newtab wasn't loaded before. I don't know what the expected thing is. Honestly, it wouldn't be a bad idea to ignore the FreshStart data point, but I haven't noticed it being a problem before. The general way we average a few data points evenly in AWSY is a bit odd.
It seems a little odd, but yeah I think that is fine to ignore it. It sounds like this is a test harness change that is causing a massive increase in one data point.
| Assignee | ||
Comment 10•2 years ago
|
||
(In reply to Andrew McCreight [:mccr8] from comment #9)
It looks like the change is from about:newtab being alive in the privilegedabout process. Is that present in the before memory report?
Yes, it is. At least it's shown by these lines:
privilegedabout (pid 1188)
Explicit Allocations
[..]
├───4.87 MB (19.79%) -- window-objects
│ ├──4.81 MB (19.53%) -- top(about:newtab, id=27)
│ │ ├──2.53 MB (10.29%) -- active
│ │ │ ├──2.47 MB (10.03%) -- window(about:newtab)
│ │ │ │ ├──1.54 MB (06.26%) -- js-realm(about:newtab)
But with Marionette we disable about:newtab and by default load about:blank instead.
I've done some more checks and looked at parts that are different between subprocess.Popen and psutil.Popen, and after commenting out certain parts of the Marionette's process handler I noticed that the performance regression is visible when running the following code:
When processOutput is not called when creating the process it works fine. As such it looks like that we have an issue with reading stdout/stderr and maybe we do not read everything and leave logs in the buffer which is accumulated? Where would this be reflected in the memory logs?
I'm off tomorrow but it looks like I should further check to find the exact reason and fix it.
Comment 11•2 years ago
|
||
The memory logs rely on reading the memory reports that get dumped to disk, to specific file names. I can't imagine how failing to write something to stdout would cause those to change.
| Assignee | ||
Comment 12•2 years ago
|
||
Here some more findings but it's still unclear to me what's causing it:
- The whole ProcessReader code as used by Marionette is a 1:1 copy from mozprocess without any changes
- Running the process reader and not using
stdout=subprocess.PIPEwhen starting the Firefox process the memory usage drops - Not running the process reader and using
stdout=subprocess.PIPEwhen starting the Firefox process the memory usage drops as well
I further tried to remove all the code that is related to handling stdout but without any change.
I've then also checked again the Images perf results for MacOS for the last 90 days and noticed that there was quite a fluctuation including a lot of cases when the Fresh start was already at this level. With my patch landed the results now look way more stable and values do not have a range of 1.7MB but just 0.4MB now.
That that I may agree to Andrew and that we should accept the new value as base for the future.
Updated•2 years ago
|
Comment 13•2 years ago
|
||
I'm marking this wontfix because it sounds like we're accepting the regression, rather than it went away for an unknown reason.
| Assignee | ||
Comment 14•2 years ago
|
||
Actually today I found another regression in Marionette related to psutil and reading the process output (see bug 1896106). As it turned out for the Marionette's new session command we are using poll() which forwards to wait() including waiting for the reader to finish. This causes a 1s delay before any Marionette consumer including AWSY can continue. But that also means that in this 1s Firefox has acquired more memory, which basically ends up in this regression.
As the try build including the awsy tp6 jobs show the images measurement returns to the previous value, and fresh start drops by 63%!
| Assignee | ||
Updated•2 years ago
|
Comment 15•2 years ago
|
||
Set release status flags based on info from the regressing bug 1884401
| Assignee | ||
Comment 16•2 years ago
|
||
Florin, could you please check if my patch from bug 1896106 actually fixed this regression? Or do we have to wait some more days to get better numbers?
| Reporter | ||
Comment 17•2 years ago
|
||
The graph seems ti have returned to its initial values. Good job!
Perfherder has detected a awsy performance change from push 78ea7783f16f564aa8886dd8e51b42c2971c508b.
Improvements:
| Ratio | Test | Platform | Options | Absolute values (old vs new) |
|---|---|---|---|---|
| 10% | Images | macosx1015-64-shippable-qr | fission tp6 | 6,642,001.27 -> 5,979,633.87 |
Details of the alert can be found in the alert summary, including links to graphs and comparisons for each of the affected tests.
If you need the profiling jobs you can trigger them yourself from treeherder job view or ask a sheriff to do that for you.
You can run these tests on try with ./mach try perf --alert 221
For more information on performance sheriffing please see our FAQ.
| Assignee | ||
Comment 18•2 years ago
|
||
That's great to hear. In that case lets close this bug as fixed!
Updated•2 years ago
|
Description
•