Closed Bug 1336398 Opened 7 years ago Closed 7 years ago

[e10s-multi] Enable 4 content processes on nightly

Categories

(Core :: DOM: Content Processes, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla55
Tracking Status
firefox55 --- fixed

People

(Reporter: gkrizsanits, Assigned: mrbkap)

References

(Depends on 1 open bug)

Details

(Whiteboard: [e10s-multi:+])

Attachments

(3 files)

Currently we have 2, and there is an experiment with 4 on Ash. The goal here is to make sure that the tests stay stable with 4 so we can enable it on nightly.
Whiteboard: [e10s-multi:+]
Depends on: 1340512
Whiteboard: [e10s-multi:+] → [e10s-multi:+][MemShrink]
I'm going to take a few initial memory measurements on Windows, Linux, and Mac to see how much overheard there is going from 2 to 4.
Flags: needinfo?(erahm)
Whiteboard: [e10s-multi:+][MemShrink] → [e10s-multi:+]
I wrote up a post [1] covering measurements and analysis against other browsers, e10s-multi looks pretty good. We beat Chrome on all platforms with 4 content processes. Here's the breakdown for 2, 4, and 8 content processes:

>     OS                  Browser Config      Total Memory
> ========================================================
> Ubuntu 16.04  	Firefox 55 – 2 CP 	  765 MB
> Ubuntu 16.04  	Firefox 55 – 4 CP 	  817 MB
> Ubuntu 16.04  	Firefox 55 – 8 CP 	  990 MB
> macOS 10.12.3 	Firefox 55 – 2 CP 	1,113 MB
> macOS 10.12.3 	Firefox 55 – 4 CP 	1,215 MB
> macOS 10.12.3 	Firefox 55 – 8 CP 	1,399 MB
> Windows 10 		Firefox 55 – 2 CP 	  587 MB
> Windows 10 		Firefox 55 – 4 CP 	  839 MB
> Windows 10 		Firefox 55 – 8 CP 	  905 MB

[1] http://www.erahm.org/2017/03/10/are-they-slim-yet-round-2/
Flags: needinfo?(erahm)
I guess it's good that we're beating Chrome, but these numbers are terrible! In the past, I've measured content process overhead on Windows to be about 25MB. These numbers show us growing by 252MB when going from 2 to 4 processes, which is 5x as much. Can you get about:memory reports comparing the two configurations?
Flags: needinfo?(erahm)
(In reply to Bill McCloskey (:billm) from comment #3)
> I guess it's good that we're beating Chrome, but these numbers are terrible!
> In the past, I've measured content process overhead on Windows to be about
> 25MB. These numbers show us growing by 252MB when going from 2 to 4
> processes, which is 5x as much. Can you get about:memory reports comparing
> the two configurations?

I can do a run with memory reports on Windows. Leaving ni? for now.
Flags: needinfo?(erahm)
So from the diff:

> 43.32 MB (100.0%) -- explicit
> 78.82 MB ── resident-unique [4]

|explicit| went up ~22MB / process, near your numbers
|resident-unique| went up ~40MB / process (which is how I measure). I'm not sure what your methodology for testing or summing memory was so it's hard to tell if this is an apples to apples comparison. I loaded the first 30 pages from tp5  and then measured.
Is there a lot of variation between runs? When I sum resident(parent) + resident-unique(the rest) for the memory reports you posted, I get:
2 processes: 743MB
4 processes: 822MB

The 822 number is pretty similar to comment 2, but the 743MB number is much higher.
(In reply to Bill McCloskey (:billm) from comment #8)
> Is there a lot of variation between runs? When I sum resident(parent) +
> resident-unique(the rest) for the memory reports you posted, I get:
> 2 processes: 743MB
> 4 processes: 822MB
> 
> The 822 number is pretty similar to comment 2, but the 743MB number is much
> higher.

Hmmm locally I got 588 for 2 processes, it's possible I'm missing something (I sum with psutils outside of the processes). I'll double check.
(In reply to Eric Rahm [:erahm] from comment #9)
> (In reply to Bill McCloskey (:billm) from comment #8)
> > Is there a lot of variation between runs? When I sum resident(parent) +
> > resident-unique(the rest) for the memory reports you posted, I get:
> > 2 processes: 743MB
> > 4 processes: 822MB
> > 
> > The 822 number is pretty similar to comment 2, but the 743MB number is much
> > higher.
> 
> Hmmm locally I got 588 for 2 processes, it's possible I'm missing something
> (I sum with psutils outside of the processes). I'll double check.

So strangely psutil is getting a rather low USS for one of the content processes (like 50MB vs 160MB in the memory report). The timing's a bit different, but it should be *that* drastic.
Okay, it looks like mozbuild on windows installs a 32-bit python interpreter which is what I was using. It appears this causes psutil to give flaky measurements for 64-bit processes. Setting up a virtualenv with a 64-bit interpreter seems to have fixed the issue. I'll re-run the tests once I confirm the values from about:memory match the values from psutil across a few test runs.
New measurements (just Windows is changed):

>     OS                  Browser Config      Total Memory
> ========================================================
> Ubuntu 16.04  	Firefox 55 – 2 CP 	  765 MB
> Ubuntu 16.04  	Firefox 55 – 4 CP 	  817 MB
> Ubuntu 16.04  	Firefox 55 – 8 CP 	  990 MB
> macOS 10.12.3 	Firefox 55 – 2 CP 	1,113 MB
> macOS 10.12.3 	Firefox 55 – 4 CP 	1,215 MB
> macOS 10.12.3 	Firefox 55 – 8 CP 	1,399 MB
> Windows 10 		Firefox 55 – 2 CP 	  993 MB
> Windows 10 		Firefox 55 – 4 CP 	1,098 MB
> Windows 10 		Firefox 55 – 8 CP 	1,312 MB

So it looks like ~50MB per CP.
Comment on attachment 8848395 [details]
Bug 1336398 - Enable 4 content processes on Nightly.

https://reviewboard.mozilla.org/r/121296/#review123390

(In reply to Blake Kaplan (:mrbkap) from comment #13)
> https://treeherder.mozilla.org/#/
> jobs?repo=try&revision=ba0ee263ee3b81a9b07a30f88a6618b4599ba43f

I still see some Assertion failures like: 0 == rv, at /home/worker/workspace/build/src/nsprpub/pr/src/pthreads/ptthread.c:292 on the try push... Was this push on top of your pthread patch? The rest looks really promising, let's try this :)
Attachment #8848395 - Flags: review?(gkrizsanits) → review+
I can't believe I forgot to push on top of that patch. I just confirmed that the tree I pushed from did not have that fix :-(
Pushed by mrbkap@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/5e664d9cc14d
Enable 4 content processes on Nightly. r=krizsa
https://hg.mozilla.org/mozilla-central/rev/5e664d9cc14d
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla55
Depends on: 1348547
Hi! Thanks for your help, but I'm a bit confused... You're adding dependency bugs that are not regressions to an already closed bug and I'm not sure what is your intention with it. I do appreciate your efforts, but I think it would be better if you just needinfo-ed me or :mrbkap if you're worried that we looked over a bug. In most cases we triaged them already. Alternately you can put [e10s-multi:?] in the whiteboard if there aren't and e10s-multi marker there already and then it will show up on our triage list.
Flags: needinfo?(Virtual)
I was thinking that it will be better to track these issues as blockers of this bug, which are now real regressions by default settings, due to increasing content processes per this bug and bug #1303113.
But looks like there's alternatively method for this like you said "[e10s-multi:?]", so I should use that one instead of blocking this bug?
Maybe also we should use "e10s-multi" as tracking flag, the same as it was done with e10s?
Flags: needinfo?(Virtual) → needinfo?(gkrizsanits)
Depends on: 1353945
Depends on: 1350324
(In reply to Virtual_ManPL [:Virtual] - (please needinfo? me - so I will see your comment/reply/question/etc.) from comment #20)
> I was thinking that it will be better to track these issues as blockers of
> this bug, which are now real regressions by default settings, due to
> increasing content processes per this bug and bug #1303113.

If a regression was introduced by the patch of this bug then it should block this. For general e10s-multi related bugs/features that should not be the case.

> But looks like there's alternatively method for this like you said
> "[e10s-multi:?]", so I should use that one instead of blocking this bug?

Yes, that's the best approach. But we triaged all the older bugs probably multiple times if it should make it to the first released version or not, so there is no need to flag those for now. Any new incoming bug should be marked though. The earlier they get on our radar the better.

> Maybe also we should use "e10s-multi" as tracking flag, the same as it was
> done with e10s?

There was some pushback on that. New tracking flags are hard to get approved for some reason, so we just went with this [e10s-multi:?] marker approach. Once we triage them they either get + or - flag depending on if they are e10s-multi related or not. And if we decide it should block the release, we track them. That part is currently not in Bugzilla and we didn't do the best job at reflecting the current state of the project in Bugzilla some times, because of some recent project revamps, but once things settle a bit I will try to do that.
Flags: needinfo?(gkrizsanits)
You need to log in before you can comment on or make changes to this bug.