Closed Bug 1892940 Opened 2 years ago Closed 2 years ago

11.07 - 10.15% Images / Images (OSX) regression on Wed April 17 2024

Categories

(Testing :: Marionette Client and Harness, defect, P2)

defect

Tracking

(firefox-esr115 unaffected, firefox125 unaffected, firefox126 unaffected, firefox127 fixed, firefox128 fixed)

RESOLVED FIXED
128 Branch
Tracking Status
firefox-esr115 --- unaffected
firefox125 --- unaffected
firefox126 --- unaffected
firefox127 --- fixed
firefox128 --- fixed

People

(Reporter: fbilt, Assigned: whimboo)

References

(Regression)

Details

(Keywords: perf, perf-alert, regression)

Attachments

(1 file)

Perfherder has detected a awsy performance regression from push e96fce323a1bee6228bfcb97988b7fbf37681f10. As author of one of the patches included in that push, we need your help to address this regression.

Regressions:

Ratio Test Platform Options Absolute values (old vs new)
11% Images macosx1015-64-shippable-qr fission tp6 6,004,909.82 -> 6,669,934.43
10% Images macosx1015-64-shippable-qr fission tp6 6,032,871.24 -> 6,645,367.10

Details of the alert can be found in the alert summary, including links to graphs and comparisons for each of the affected tests. Please follow our guide to handling regression bugs and let us know your plans within 3 business days, or the patch(es) may be backed out in accordance with our regression policy.

If you need the profiling jobs you can trigger them yourself from treeherder job view or ask a sheriff to do that for you.

You can run these tests on try with ./mach try perf --alert 42353

For more information on performance sheriffing please see our FAQ.

Flags: needinfo?(hskupin)

Set release status flags based on info from the regressing bug 1884401

I can already see a higher Images value for a landing before my code reached autoland/central.

My change only modifies how Marionette starts the application but doesn't make any changes to Firefox itself. So I doubt that my change could have actually made this change.

Daisuke, given that you worked on bug 1891602, do you think that your patch might have caused this regression?

Flags: needinfo?(hskupin) → needinfo?(daisuke)

Actually as it looks like it is indeed a regression as caused by my patch.

I'll take a look how image memory is recorded and how starting Firefox via psutil will cause this change in memory usage.

Flags: needinfo?(daisuke)

The Images perf suite seems to sample memory from explicit/images. And the subtest that is actually affected here is Images closed extra processes opt fission tp6 and especially the Fresh start checkpoint. It increased by 173%.

Running ./mach awsy-test --tp6 --quick locally I can as well see an increase for Fresh Start from around 1MB to 3MB.

Andrew, do you have an idea what this specifically could mean?

Flags: needinfo?(continuation)

I'd recommend downloading the actual memory reports from the checkpoint in question, before and after your patch, and then using the "load and diff" option in about:memory to see where the actual difference lies.

That Fresh Start link doesn't make much sense to me. "Fresh start" and "closed extra processes" are separate checkpoints, I thought, so I have no idea what it means for something to be both.

If we ignore the "closed extra processes" here and just treat this as a "Fresh start" regression, one thing that I can think of is that we drop the decoded images a certain amount of time after we're idle. Maybe your change causes the browser to start up a little later and we end up not dropping images quite as fast. If "fresh start" is the only checkpoint that regressed (and not, say, "fresh start [+30s]", and your patch just messes with test harness stuff, then this is probably okay. But it would be good to understand what regressed a bit better.

Flags: needinfo?(continuation)

(In reply to Andrew McCreight [:mccr8] from comment #6)

I'd recommend downloading the actual memory reports from the checkpoint in question, before and after your patch, and then using the "load and diff" option in about:memory to see where the actual difference lies.

Ah that's a good idea. Let me do that.

That Fresh Start link doesn't make much sense to me. "Fresh start" and "closed extra processes" are separate checkpoints, I thought, so I have no idea what it means for something to be both.

Yes, there is a bug in the perfherder UI. This is really about Fresh Start and not closed extra processes. Greg filed a bug to get this fixed.

If we ignore the "closed extra processes" here and just treat this as a "Fresh start" regression, one thing that I can think of is that we drop the decoded images a certain amount of time after we're idle. Maybe your change causes the browser to start up a little later and we end up not dropping images quite as fast. If "fresh start" is the only checkpoint that regressed (and not, say, "fresh start [+30s]", and your patch just messes with test harness stuff, then this is probably okay. But it would be good to understand what regressed a bit better.

Yes, Fresh Start [+30s] stays the same. It's only the memory used right after startup. I'll attach a diff for the both runs.

Could you maybe have a look at this diff? Maybe we can really ignore it?

Flags: needinfo?(continuation)

It looks like the change is from about:newtab being alive in the privilegedabout process. Is that present in the before memory report? I'm guessing not. It could even be the case that your changes caused things to load faster and about:newtab wasn't loaded before. I don't know what the expected thing is. Honestly, it wouldn't be a bad idea to ignore the FreshStart data point, but I haven't noticed it being a problem before. The general way we average a few data points evenly in AWSY is a bit odd.

It seems a little odd, but yeah I think that is fine to ignore it. It sounds like this is a test harness change that is causing a massive increase in one data point.

Status: NEW → RESOLVED
Closed: 2 years ago
Flags: needinfo?(continuation) → needinfo?(hskupin)
Resolution: --- → WONTFIX

(In reply to Andrew McCreight [:mccr8] from comment #9)

It looks like the change is from about:newtab being alive in the privilegedabout process. Is that present in the before memory report?

Yes, it is. At least it's shown by these lines:

privilegedabout (pid 1188)
Explicit Allocations 
[..]
├───4.87 MB (19.79%) -- window-objects
│   ├──4.81 MB (19.53%) -- top(about:newtab, id=27)
│   │  ├──2.53 MB (10.29%) -- active
│   │  │  ├──2.47 MB (10.03%) -- window(about:newtab)
│   │  │  │  ├──1.54 MB (06.26%) -- js-realm(about:newtab)

But with Marionette we disable about:newtab and by default load about:blank instead.

I've done some more checks and looked at parts that are different between subprocess.Popen and psutil.Popen, and after commenting out certain parts of the Marionette's process handler I noticed that the performance regression is visible when running the following code:

https://searchfox.org/mozilla-central/rev/b41bb321fe4bd7d03926083698ac498ebec0accf/testing/marionette/client/marionette_driver/processhandler.py#231-250

When processOutput is not called when creating the process it works fine. As such it looks like that we have an issue with reading stdout/stderr and maybe we do not read everything and leave logs in the buffer which is accumulated? Where would this be reflected in the memory logs?

I'm off tomorrow but it looks like I should further check to find the exact reason and fix it.

Status: RESOLVED → REOPENED
Flags: needinfo?(hskupin) → needinfo?(continuation)
Resolution: WONTFIX → ---

The memory logs rely on reading the memory reports that get dumped to disk, to specific file names. I can't imagine how failing to write something to stdout would cause those to change.

Flags: needinfo?(continuation)

Here some more findings but it's still unclear to me what's causing it:

  • The whole ProcessReader code as used by Marionette is a 1:1 copy from mozprocess without any changes
  • Running the process reader and not using stdout=subprocess.PIPE when starting the Firefox process the memory usage drops
  • Not running the process reader and using stdout=subprocess.PIPE when starting the Firefox process the memory usage drops as well

I further tried to remove all the code that is related to handling stdout but without any change.

I've then also checked again the Images perf results for MacOS for the last 90 days and noticed that there was quite a fluctuation including a lot of cases when the Fresh start was already at this level. With my patch landed the results now look way more stable and values do not have a range of 1.7MB but just 0.4MB now.

That that I may agree to Andrew and that we should accept the new value as base for the future.

Status: REOPENED → RESOLVED
Closed: 2 years ago2 years ago
Resolution: --- → WORKSFORME
Resolution: WORKSFORME → WONTFIX

I'm marking this wontfix because it sounds like we're accepting the regression, rather than it went away for an unknown reason.

Actually today I found another regression in Marionette related to psutil and reading the process output (see bug 1896106). As it turned out for the Marionette's new session command we are using poll() which forwards to wait() including waiting for the reader to finish. This causes a 1s delay before any Marionette consumer including AWSY can continue. But that also means that in this 1s Firefox has acquired more memory, which basically ends up in this regression.

As the try build including the awsy tp6 jobs show the images measurement returns to the previous value, and fresh start drops by 63%!

Assignee: nobody → hskupin
Status: RESOLVED → REOPENED
Depends on: 1896106
Resolution: WONTFIX → ---
Severity: -- → S3
Priority: -- → P2

Set release status flags based on info from the regressing bug 1884401

Florin, could you please check if my patch from bug 1896106 actually fixed this regression? Or do we have to wait some more days to get better numbers?

Flags: needinfo?(fbilt)

The graph seems ti have returned to its initial values. Good job!

Perfherder has detected a awsy performance change from push 78ea7783f16f564aa8886dd8e51b42c2971c508b.

Improvements:

Ratio Test Platform Options Absolute values (old vs new)
10% Images macosx1015-64-shippable-qr fission tp6 6,642,001.27 -> 5,979,633.87

Details of the alert can be found in the alert summary, including links to graphs and comparisons for each of the affected tests.

If you need the profiling jobs you can trigger them yourself from treeherder job view or ask a sheriff to do that for you.

You can run these tests on try with ./mach try perf --alert 221

For more information on performance sheriffing please see our FAQ.

Flags: needinfo?(fbilt)

That's great to hear. In that case lets close this bug as fixed!

Status: REOPENED → RESOLVED
Closed: 2 years ago2 years ago
Resolution: --- → FIXED
Target Milestone: --- → 128 Branch
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: