Closed Bug 666058 Opened 13 years ago Closed 13 years ago

Don't share chunks for system compartments

Categories

(Core :: JavaScript Engine, defect)

x86
macOS
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla7

People

(Reporter: gwagner, Assigned: gwagner)

References

Details

(Keywords: memory-footprint, Whiteboard: [MemShrink:P1][Fragmentation])

Attachments

(3 files, 2 obsolete files)

Some of our memory-bloat could just be fragmentation caused by long living system objects that keep a whole chunk (1MB) alive. The problem is that we don't use this information during allocation.

A simple solution would be to use chunks explicitly for system objects or objects created by other origins.
This should also reduce the workload for the coming "mark-and-compact" collector.

The patch shouldn't be too complicated but I don't have time to look into this right now. Maybe somebody wants to give it a try.
(In reply to comment #0)
> Some of our memory-bloat could just be fragmentation caused by long living
> system objects that keep a whole chunk (1MB) alive.

Measurements confirming this hypothesis would be nice before anyone starts implementing this :)
(In reply to comment #1)
> (In reply to comment #0)
> > Some of our memory-bloat could just be fragmentation caused by long living
> > system objects that keep a whole chunk (1MB) alive.
> 
> Measurements confirming this hypothesis would be nice before anyone starts
> implementing this :)

I agree. I forgot to mention that atoms allocation would also be a good target for separate chunks.
I did a quick measurement for this. After surfing for about 30 min I see following:
chunks: 150, used: 150, empty: 0, 1: 53, 2: 16

So I have 150 chunks allocated (all of them are used) 53 have only one used arena and 16 have 2 used arenas. I will try a quick hack for my idea and compare the result.
Good idea. That's far more fragmentation that I would have expected.
I implemented a first version that puts following gcthings into separate chunks:
cx->compartment == rt->atomsCompartment || cx->compartment->principals == NULL

The first results looks very promising.

Almost all remaining single arenas that keep a chunk alive hold Shapes.
Cool.  I landed bug 661474 yesterday which adds a global gc-heap-chunk-unused reporter, and per-compartment gc-heap/arena-used reporters.  I haven't looked at them much, but the numbers seem high in some cases.

I'm also thinking about doing some visualization of the heap, to see how bad fragmentation is.
(In reply to comment #3)
> I did a quick measurement for this. After surfing for about 30 min I see
> following:
> chunks: 150, used: 150, empty: 0, 1: 53, 2: 16
> 
> So I have 150 chunks allocated (all of them are used) 53 have only one used
> arena and 16 have 2 used arenas. I will try a quick hack for my idea and
> compare the result.

Are chunks freed as soon as they become empty?  Even if they are, jemalloc will likely not release them back to the OS.
Blocks: mlk-fx7
Keywords: footprint
Whiteboard: [MemShrink]
(In reply to comment #7)
> Are chunks freed as soon as they become empty?  Even if they are, jemalloc
> will likely not release them back to the OS.

No we don't free chunks right away (except with the new SHRINK GC call). They survive 3 GCs because of allocation heavy workloads. Freeing and allocating right away hurts benchmark performance (bug 541140).
Oh the "system principal" is not null. I made a hack to get the reference to it.
So whenever we pick a new chunk I do: 

    bool isSystem = cx->compartment == rt->atomsCompartment ||
                    cx->compartment->isSystemCompartment ||
                    thingKind == JSTRACE_SHAPE;

I open several tabs(GMail, FOTN, v8 bench, ro.me, techcrunch, GDocs, ...) and close them again and wait for the GC to clean up everything.

With my patch I get now following for the same workload as before:
usr chunks: 1, used: 1, empty: 0, 1: 0, 2: 0
sys chunks: 37, used: 37, empty: 0, 1: 0, 2: 0

I tried several times without the patch and I always end up between 80 and 150 chunks after I close all tabs. That's far away from the 38 chunks! WOW!
Nick I guess you know the gc-heap numbers better from about:memory. What do you usually see when you close all tabs and wait for the GC to clean up everything?
Attached file about:memory results
I opened gmail, FOTN, TechCrunch, v8bench.  (ro.me crashed for me :(

Attached is about:memory after closing them all down and running GC+CC twice.  Notable entries:

209,154,438 B (100.0%) -- explicit
├──139,320,483 B (66.61%) -- js
│  ├──113,413,312 B (54.22%) -- gc-heap-chunk-unused
...
│  ├────2,155,328 B (01.03%) -- gc-heap-chunk-admin

I then did GC+CC another 10 times or so, and got down to this:

136,855,550 B (100.0%) -- explicit
├───74,317,837 B (54.30%) -- js
│   ├──50,626,368 B (36.99%) -- gc-heap-chunk-unused
...
│   ├───1,032,384 B (00.75%) -- gc-heap-chunk-admin


So a big chunk of the unused space is in completely empty chunks that take a few GCs to be released, but there's still 50MB in unused chunk space, which is presumably in chunks that have one or more arenas in use.
Why are shapes so long-lived?
(In reply to comment #13)
> Bug 653817

I meant to say: bug 653817 is one example where the GC heap was bigger after browsing for a while (even after closing everything), so this could help a lot.

Bug 66101 comment 6 is another example.

And it's not hard to replicate similar examples yourself.  This is a very promising bug report!
(In reply to comment #12)
> Why are shapes so long-lived?

This might have been a bug on my side. I am running tests now without it.
Assignee: general → anygregor
Attached patch WiP (obsolete) — Splinter Review
A first implementation.
Yeah the Shape stuff was not necessary. I don't have to put them into separate compartments. I tested now with a huge workload:
It almost needed 3GB ram:
sys chunks: 28, used: 28, empty: 0, 1: 0, 2: 0
usr chunks: 1264, used: 1264, empty: 0, 1: 0, 2: 0

after closing all tabs I get:
sys chunks: 28, used: 28, empty: 0, 1: 0, 2: 0
usr chunks: 1, used: 1, empty: 0, 1: 0, 2: 0

So it went back to the original 29 MB I had at startup.
Oh that's very impressive!  And such a simple change, too.  FF7 patches have to land by Tuesday July 5, do you think this has a chance?
And the same without the patch:
chunks: 92, used: 92, empty: 0, 1: 26, 2: 14
So 26 chunks are only alive because of one arena..

Yeah Tuesday should be possible. I don't wanna start another Aurora discussion :)

The only thing missing is a good way to get the "system principal" compartment.
Andreas, Blake any idea?
What exactly do you want to get gregor?
(In reply to comment #20)
> What exactly do you want to get gregor?

There's a system principal compartment which is always present, right?  It's codebase is "[System Principal]".  I think he wants to have a separate pointer to it, much like there is a pointer to the atoms compartment.
Maybe just check if principals->codebase is "[System Principal]"?  And assert if it wasn't found?
Whiteboard: [MemShrink] → [MemShrink:P1]
(In reply to comment #7)
> Are chunks freed as soon as they become empty?  Even if they are, jemalloc
> will likely not release them back to the OS.

Chunks are 1M in size?  1M is the smallest "huge" jemalloc allocation.  A huge chunk is allocated by a call to mmap and immediately deallocated on free.
Attached patch patch (obsolete) — Splinter Review
I tried to add an API that communicates the system-principle to the JS engine but when we create the first compartment I got into cyclic dependencies for creating securityManagers in xpconnect... So I have now a strcmp with "[System Principal]" isntead.
Attachment #543326 - Attachment is obsolete: true
Do you assert that the system compartment is found?  I couldn't find one in a quick look.  I don't want this to break silently if the name changes in the future.
(In reply to comment #25)
> Do you assert that the system compartment is found?  I couldn't find one in
> a quick look.  I don't want this to break silently if the name changes in
> the future.

I was thinking about that but many shell tests and jsapi-tests run without a system-principal. So the question is how/where should I test it? I can put a big warning right next to the naming in xpconnect :)
(In reply to comment #26)
>
> I can put a big warning right next to the naming in xpconnect :)

How about a warning *and* a static assertion?  (Is that possible?)  So if someone misses the comment, they'll get a compile error.
Attached patch patchSplinter Review
Attachment #543516 - Attachment is obsolete: true
Attachment #543544 - Flags: review?(gal)
Comment on attachment 543544 [details] [diff] [review]
patch

Asking review from billm as well, in an attempt to get to r+ faster (I don't need two reviews, just whoever can get to it first)... it'd be really good if this made FF7.
Attachment #543544 - Flags: review?(wmccloskey)
Drive-by nit:  using pool[0] and pool[1] is ugly.  Defining SYSTEM and USER constants or an enum would be nicer.
I just did some testing.  I opened two browsers, one without this patch, and one with it (changeset 72249:b7f03b37cf0c).  I used two separate but identical new profiles, and did a bunch of browsing with the two browsers side-by-side.  All the actions I did in the first browser and then immediately in the second browser.

- Opened about:memory?verbose
- Opened http://videos.mozilla.org/serv/mozhacks/flight-of-the-navigator/
- Opened http://v8.googlecode.com/svn/data/benchmarks/current/run.html
- Opened http://gmail.com, opened a few folders and emails
- Opened http://techcrunch.com, rolled over all the stories, opened a few, tabbed through them
- Opened http://www.cad-comic.com/cad/
- Opened http://www.businessinsider.com, opened a few stories, tabbed through them
- Close all tabs except about:memory?verbose
- Refreshed about:memory?verbose
- Clicked "minimize memory usage" three times.


Start-up (both browsers were similar)
-------------------------------------
40,702,558 B (100.0%) -- explicit
├──20,727,114 B (50.92%) -- heap-unclassified
├──15,084,679 B (37.06%) -- js
...
│  ├─────172,224 B (00.42%) -- gc-heap-chunk-unused
...
  7,340,032 B -- js-gc-heap


After closing all tabs, but before minimizing memory usage, without patch
-------------------------------------------------------------------------
506,326,510 B (100.0%) -- explicit
├──262,992,556 B (51.94%) -- js
│  ├──224,271,616 B (44.29%) -- gc-heap-chunk-unused
...
  239,075,328 B -- js-gc-heap


After closing all tabs, but before minimizing memory usage, with patch
----------------------------------------------------------------------
454,850,070 B (100.0%) -- explicit
├──213,958,387 B (47.04%) -- js
│  ├──176,048,704 B (38.70%) -- gc-heap-chunk-unused
...
  189,792,256 B -- js-gc-heap


After minimizing memory usage, without patch
--------------------------------------------
194,560,122 B (100.0%) -- explicit
├──124,508,206 B (63.99%) -- js
│  ├───97,605,824 B (50.17%) -- gc-heap-chunk-unused
...
  108,003,328 B -- js-gc-heap


After minimizing memory usage, with patch
--------------------------------------------
105,101,674 B (100.0%) -- explicit
...
├───37,379,477 B (35.57%) -- js
...
│   ├──12,237,056 B (11.64%) -- gc-heap-chunk-unused
...
   20,971,520 B -- js-gc-heap


With the patch applied:
- prior to minimizing memory usage, the JS heap was about 40MB smaller (1.26x)
- after minimizing memory usage, the JS heap was about 80MB smaller (5.15x)

And this was a short browsing session.  It's possible the improvement could become even greater over a longer browsing session.

A 5x reduction in JS heap size is just ridiculously good;  we have to get this in to FF7.
Full measurements for comment 31.  The google.com compartment went away when I refreshed about:memory a few minutes later.

Oh, I forgot to highlight the "resident" improvements:

Before minimization:
  550,207,488 B -- resident (without patch)
  494,604,288 B -- resident (with patch)

After minimization:
  310,890,496 B -- resident (without patch)
  219,856,896 B -- resident (with patch)
Comment on attachment 543544 [details] [diff] [review]
patch

Adding a 3rd review request for someone who presumably won't be taking July 4th  as a holiday :)
Attachment #543544 - Flags: review?(igor)
I am really opposed to the [System Principal] hack. This needs a clean interface.
(In reply to comment #34)
> I am really opposed to the [System Principal] hack. This needs a clean
> interface.

Can you live with it now if a follow-up bug is filed?
And can you suggest what the clean interface would look like?
If you own the follow-up bug and promise a reasonably quick followup, I am ok with taking the patch. Its a major win.

We should have some sort of allocation groups. Chunks are assigned an allocation groups, only compartments in that allocation group are assigned to it. Some clean JSAPI to make those allocation groups.
Attachment #543544 - Flags: review?(gal) → review+
(In reply to comment #37)
> If you own the follow-up bug and promise a reasonably quick followup, I am
> ok with taking the patch. Its a major win.

No problem:  Bug 669123, assigned to me :)  Thanks for the review.

Gregor, if you don't have time, I can land this tomorrow (with the 0/1 constants replaced with named constants as mentioned in comment 30).  Let me know!  Thanks.
Blocks: 669123
(In reply to comment #38)
> 
> Gregor, if you don't have time, I can land this tomorrow (with the 0/1
> constants replaced with named constants as mentioned in comment 30).  Let me
> know!  Thanks.

And if you do want to land, it's probably best to do so on mozilla-inbound (which is merged at least once per day), so you're not relying on the timing of a TM-to-mc merge.
Attachment #543544 - Flags: review?(igor) → review+
I am also not too happy about the system-principal detection and allocation groups would be nice to have but getting this out to the users asap seems more important.

(In reply to comment #31)
> 
> 
> With the patch applied:
> - prior to minimizing memory usage, the JS heap was about 40MB smaller
> (1.26x)
> - after minimizing memory usage, the JS heap was about 80MB smaller (5.15x)
> 
> And this was a short browsing session.  It's possible the improvement could
> become even greater over a longer browsing session.
> 
> A 5x reduction in JS heap size is just ridiculously good;  we have to get
> this in to FF7.

Thanks for the measurements. I kind of see the same numbers on my machine. 
We should communicate these improvements (and from bug 656120) to people at mozilla (especially mobile) and nightly users.


(In reply to comment #39)
> 
> And if you do want to land, it's probably best to do so on mozilla-inbound
> (which is merged at least once per day), so you're not relying on the timing
> of a TM-to-mc merge.

I don't think I have the rights for moz-inbound. I haven't tried but I remember that I only got TM rights 2 years ago. Feel free to make your changes and land it since I will be busy with usual 4th of July stuff today :)

Thanks for your help Nick!
CCing blizzard so the improvement here is on his radar.
Attachment #543544 - Flags: review?(wmccloskey)
http://hg.mozilla.org/mozilla-central/rev/0c94f01f53af
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
(In reply to comment #41)
> CCing blizzard so the improvement here is on his radar.

Just to clarify:  bug 656120 + bug 666058 = huge GC improvements in FF7.

Let's hope both patches stick.
Whiteboard: [MemShrink:P1] → [MemShrink:P1][FragmentationFrag]
Excellent!  And I agree with Andreas' suggestion for a post-checking/post-FF7 cleanup and followon.
Target Milestone: --- → mozilla7
Whiteboard: [MemShrink:P1][FragmentationFrag] → [MemShrink:P1][Fragmentation]
mrbkap pointed out that you may be interested that bug 667915 is adding the notion of a "trusted principal" == system principal for the purpose of giving chrome a bigger stack than content (so content can't use it up and then effectively OOM the slow-script dialog).
(In reply to comment #45)
> mrbkap pointed out that you may be interested that bug 667915 is adding the
> notion of a "trusted principal" == system principal for the purpose of
> giving chrome a bigger stack than content (so content can't use it up and
> then effectively OOM the slow-script dialog).

Will this be an additional principal? It's easy to let other principals allocate objects within the system chunks. We already do it with the atomsCompartment. Bug 669123 should make it pretty as well :)
(In reply to comment #46)
> Will this be an additional principal?

I don't understand your question; the principal will stored in a new field (rt->trustedPrincipal) which will equal the singleton &nsSystemPrincipal::mJSPrincipals.
> I don't wanna start another Aurora discussion :)

Please, start it. Reducing memory footprint is the major feature out of firefox5, firefox6, firefox7 for too many users. Memory pressure on RAM pressures people to move to google chrome.
(In reply to comment #48)
> > I don't wanna start another Aurora discussion :)
> 
> Please, start it. Reducing memory footprint is the major feature out of
> firefox5, firefox6, firefox7 for too many users. Memory pressure on RAM
> pressures people to move to google chrome.

I don't have to because it's in there. Nick was so kind and landed the patch on Monday for me. It will be available with the new aurora builds soon!
You need to log in before you can comment on or make changes to this bug.