Closed Bug 1668636 Opened 4 years ago Closed 21 days ago

Add an AWSY test for parent process overhead

Tracking

(Fission Milestone:Future)

Status:

RESOLVED INCOMPLETE

Project Flags:

Fission Milestone

Future

People

(Reporter: mccr8, Unassigned)

References

(Blocks 1 open bug)

Details

(Whiteboard: [fxp])

Attachments

(2 files, 1 obsolete file)

AWSY parent process overhead prototype. 4 years ago Andrew McCreight [:mccr8] 15.41 KB, patch		Details \| Diff \| Splinter Review
Bug 1668636 - AWSY parent process overhead prototype. 4 years ago Andrew McCreight [:mccr8] 48 bytes, text/x-phabricator-request		Details \| Review
most of the heap-unclassified 4 years ago Andrew McCreight [:mccr8] 45.23 KB, text/plain		Details

Andrew McCreight [:mccr8]

Reporter

Description

•

4 years ago

The AWSY base test measures how much memory we use for an empty content process. However, there is also per-content process overhead in other processes. We should have a test for this, for at least the parent process, so that we can avoid regressions.

My current plan is something like this:
a) Open a web page with a bunch of same-origin iframes, and record an about:memory report.
b) Open a web page with a bunch of different-origin iframes, and record an about:memory repot.
c) Extract relevant measurements from the reports, and report the difference (compared to the current tests which report a median).

I'm not entirely sure what the relevant measurements should be. Here are some interesting values in a diff I made locally:

1.30 MB (100.0%) -- explicit
├──1.08 MB (83.25%) ── heap-unclassified
├──0.15 MB (11.54%) -- heap-overhead
│ ├──0.12 MB (09.28%) ── bin-unused
│ ├──0.02 MB (01.36%) ── bookkeeping
│ └──0.01 MB (00.90%) ── page-cache

0.98 MB ── heap-allocated
1.00 MB ── heap-mapped
1.58 MB ── resident
0.88 MB ── resident-peak
0.23 MB ── resident-unique
0.59 MB ── shmem-mapped
16.52 MB ── vsize

The current base process measurement tracks resident-unique, heap-unclassified, js-main-runtime and explicit, which seems like a reasonable starting point for thinking about what to track.

We might want to use resident instead of resident unique, because the general way we've been doing things is attributing the non-unique part of resident to the parent. shmem-mapped might be interesting, but I guess it gets included in one of the other measurements.

Note that js-main-runtime isn't included in the diff above. It is actually lower with Fission enabled in my single test sample (realms is larger, runtime and zones are lower):
-0.04 MB (100.0%) -- js-main-runtime
├──-0.04 MB (113.01%) ── runtime
├───0.02 MB (-49.83%) ++ realms
└──-0.01 MB (36.82%) ++ zones

Conceptually, there could be JS overhead per content process, but if the different is so low, maybe we don't want to track it because it would be very noisy.

Andrew McCreight [:mccr8]

Reporter

Comment 1

•

4 years ago

I talked to kmag a bit about this, and it sounds like resident, heap-unclassified, js-main-runtime and explicit is a good set of measurements to try. The JS measurement isn't showing much now, but if we add a lot of JS process actors it could start being a factor. Using resident instead of resident-unique will let us get a measurement for shared memory. The shmem-mapped is specific to IPC shared memory, so it might not capture everything.

Neha Kochar [:neha]

Updated

•

4 years ago

Severity: -- → N/A

Status: NEW → ASSIGNED

Fission Milestone: --- → M7

Neha Kochar [:neha]

Updated

•

4 years ago

Blocks: fission

Greg Mierzwinski [:sparky]

Updated

•

4 years ago

Priority: P2 → P1

Andrew McCreight [:mccr8]

Reporter

Comment 2

•

4 years ago

Attached patch AWSY parent process overhead prototype. — Details — Splinter Review

Here's my rough prototype. You run it with: ./mach awsy-test testing/awsy/awsy/test_base_memory_usage.py It takes a while to run, because it sits there doing nothing for a long time. Probably the idle time can be reduced. It'll create two about:memory logs, and print out the location of each in a line starting with "checkpoint created, stored in". In a regular Firefox session, you can use the load and diff function to look at the results.

Andrew McCreight [:mccr8]

Reporter

Comment 3

•

4 years ago

Attached file Bug 1668636 - AWSY parent process overhead prototype. (obsolete) — Details

Phabricator Automation

Updated

•

4 years ago

Attachment #9194472 - Attachment is obsolete: true

Andrew McCreight [:mccr8]

Reporter

Comment 4

•

4 years ago

There's some remaining work here, but this at least works as a proof-of-concept. This does something like start the browser with Fission, waits for awhile, opens a webpage with 9 same origin iframes, waits, takes a measurement, closes the origin page, opens a webpage with 9 different origin iframes, waits, takes a measurement. Because both measurements are taken with Fission, if there's some overhead for a single origin when Fission is enabled (due to SHIP or whatever) it won't be measured.

It needs to be split into its own test. Right now, I just hacked up the existing test.
Whatever scalars are interesting need to be extracted from the reports, their difference taken, then divided by the expected number of processes. There's some existing code for extracting scalars, so the actual programming of this shouldn't be hard.
There's some stuff in the diff that doesn't look related to process overhead. Maybe mostly this loading.svg thing? The diff looked better than I remembered.
The stability of this should be assessed.

Here's an extract of some non-trivial bits of the diff of parent process memory. There are 10 "web" content processes involved, so this is about 146kb per process, which is around 1% of the child process overhead.

1.46 MB (100.0%) -- explicit
├──1.55 MB (106.71%) ── heap-unclassified
├──-0.33 MB (-22.59%) ++ heap-overhead
├──0.13 MB (08.67%) -- images
│ ├──0.13 MB (08.64%) -- chrome/vector/used/progress=18f
│ │ ├──0.12 MB (08.06%) ++ image(480x16, chrome://browser/skin/tabbrowser/loading.svg)/locked/types=1/surface(960x32, svgContext:[ viewport=(480x16) contextPaint=( fill=ff0d0c0c fillOpa=1 strokeOpa=1 ) ])
├──0.06 MB (04.34%) -- js-non-window
│ ├──0.06 MB (04.32%) -- zones/zone(0xNNN)
│ │ ├──0.06 MB (03.88%) ++ realm([System Principal], shared JSM global)/classes

I also have a patch in my stack for bug 1683911 that stops the browser from discarding the OS file worker after 30 seconds.

Given that the bulk of the heap overhead is heap-unclassified, it would probably be good to look at what that is, which will require using DMD. I'm not sure how clean DMD diffs will be. I'd expect that is mostly IPC overhead, but who knows. The same origin case has 33 IPC channels, as compared to the different origin case which has 132.

Andrew McCreight [:mccr8]

Reporter

Comment 5

•

4 years ago

Attached file most of the heap-unclassified — Details

Andrew McCreight [:mccr8]

Reporter

Comment 6

•

4 years ago

Here's a diff for the heap-unclassified. I limited to 3 stack frames to try to collapse things together as much as possible.

The total was about 1.5MB of heap-unclassified. 256KB of stuff wasn't sampled, though I could redo this with full stacks to get better numbers. About 500KB of that is IPC channels. Another few hundred KB of IPC actors. There's also a bit of APZ/layers kind of stuff, which makes sense.

Andrew McCreight [:mccr8]

Reporter

Comment 7

•

4 years ago

Bug 1639922 is an existing bug we have on file for adding memory reporting for IPC.

Comment 8

•

4 years ago

This isn't needed right now, but. we should get the test ready at some point so moving to MVP.

Fission Milestone: M7 → MVP

Greg Mierzwinski [:sparky]

Comment 9

•

4 years ago

:mccr8, are you still working on this task?

Flags: needinfo?(continuation)

Andrew McCreight [:mccr8]

Reporter

Comment 10

•

4 years ago

Not actively. It seems like a good idea to have the test around in case people want to run it given that I already wrote most of the test, but the parent process overhead is so low I don't think we'll want to run it regularly. I got bogged down in trying to untangle the different variants of AWSY that sort of but not entirely share code.

Flags: needinfo?(continuation)

Greg Mierzwinski [:sparky]

Updated

•

4 years ago

Assignee: continuation → nobody

Severity: N/A → S3

Status: ASSIGNED → NEW

Priority: P1 → P2

Chris Peterson [:cpeterson]

Comment 11

•

4 years ago

(In reply to Andrew McCreight [:mccr8] from comment #10)

Not actively. It seems like a good idea to have the test around in case people want to run it given that I already wrote most of the test, but the parent process overhead is so low I don't think we'll want to run it regularly. I got bogged down in trying to untangle the different variants of AWSY that sort of but not entirely share code.

In that case, we can move this bug from Fission MVP to Future.

Severity: S3 → N/A

Fission Milestone: MVP → Future

Priority: P2 → P3

Dave Hunt [:davehunt] [he/him] ⌚BST

Updated

•

22 days ago

Whiteboard: [fxp]

Jira Integration Bot

Updated

•

22 days ago

See Also: → https://mozilla-hub.atlassian.net/browse/FXP-4060

Andrew McCreight [:mccr8]

Reporter

Comment 12

•

21 days ago

Probably not worth the effort. Kind of neat to see the overhead but I don't think we ever had any concerted effort to remove the parent overhead.

Status: NEW → RESOLVED

Closed: 21 days ago

Resolution: --- → INCOMPLETE

You need to log in before you can comment on or make changes to this bug.