1499570 - Nursery chunks should be decommited as they're recycled

Assignee

Description

•

6 years ago

      No description provided.

Paul Bone [:pbone]

Assignee

Comment 1

•

6 years ago

Also we could consider recycling chunks if not all of the allocated nursery size was used.

Paul Bone [:pbone]

Assignee

Comment 2

•

6 years ago

Attached patch Decommit nursery chunks as we recycle them (obsolete) — Details — Splinter Review

Assignee: nobody → pbone

Status: NEW → ASSIGNED

Attachment #9017729 - Flags: review?(sphink)

Paul Bone [:pbone]

Assignee

Comment 3

•

6 years ago

This does improve memory usage:

https://treeherder.mozilla.org/perf.html#/compare?originalProject=mozilla-central&newProject=try&newRevision=fbd30bc607bc&framework=4&showOnlyComparable=1&showOnlyConfident=1&selectedTimeRange=172800

But it also slows down some tests:

https://treeherder.mozilla.org/perf.html#/compare?originalProject=mozilla-central&newProject=try&newRevision=fbd30bc607bc&framework=1&showOnlyComparable=1&showOnlyConfident=1&selectedTimeRange=172800

IMHO it's not worth the performance hit.  (The gain is smaller than the loss).

I want to try doing this offthread and see if we get the gain without the cost.

Steve Fink [:sfink] [:s:]

Comment 4

•

6 years ago

Comment on attachment 9017729 [details] [diff] [review]
Decommit nursery chunks as we recycle them

Review of attachment 9017729 [details] [diff] [review]:
-----------------------------------------------------------------

As I recall, we've had issues with decommit speed in the past. I don't know if doing it in the background would help, or if it would just stall everything while the kernel is playing page table games and the slowdown would be swept under the rug but still present.

If you can get it performance neutral, then it seems like a clear win. Otherwise, I guess we'd want to look closer to see whether it's worth the perf cost.

Attachment #9017729 - Flags: review?(sphink)

Jon Coppeard (:jonco)

Updated

•

6 years ago

Priority: -- → P3

Paul Bone [:pbone]

Assignee

Comment 5

•

6 years ago

Hi Emanuel,

I saw you working with madvise on Bug 1502733 and I'd like to pick your brain about madvise.

I was working on this and wasn't aware of the difference between MADV_DONTNEED and MADV_FREE. For decommitting/advising the OS we don't need some pages anymore we usually do this in a seperate thread. I'd like to check that I understand madvise correctly. My understanding is that if I use MADV_DONTNEED then the kernel will need to interrupt all the program threads and unmap those pages from their page tables, invalidate TLBs etc, does this need to be synchronous? Seems like it might defeat the porpuse of doing it off-thread, except that we seem to get a performance boost from moving that offthread.

My guess is it needs to be synchronous because:

Thread 1: madvise(MADV_DONTNEED)
Thread 1 kernel: Unmap pages
Thread 2 kernel: Unmap pages
Thread 2: touches memory, writes in some important data. Causes a page fault, the kernel gives us new pages and the important data is saved and now the pages are paged in again.

However, if thread 2 didn't unmap the pages until later, then the write would have gone without a page fault, then the pages would have been unmapped and the data lost.

Next, with MADV_FREE we kind-of have that problem with a single thread:

Thread: madvise(MADV_FREE)
Thread: Wants these pages back, so begins using them again, saves some data.
Kernel: memory pressure, unmaps the MADV_FREE pages, looses data.

So does MADV_FREE have some corresponding UNFREE so you can tell the kernel you care about these pages again, and it's not the page tables that need to be tracked syncronously but some flag on them telling the kernel it can free them if it needs to?

Maybe my google-fu hasn't been strong here, but I've not found this kind of information yet. it sounds like you might knwo the answer and it might be on the top of your mind at the moment. Thanks.

Flags: needinfo?(emanuel.hoogeveen)

Emanuel Hoogeveen [:ehoogeveen]

Comment 6

•

6 years ago

Hi Paul, I'm afraid I don't have any expert knowledge in this area either. I found the article I linked in bug 1502733 while confirming the API differences between madvise on Linux and OSX and it struck me as an interesting performance comparison, but I haven't looked at the implementation of either MADV_DONTNEED or MADV_FREE in the kernel source.

Flags: needinfo?(emanuel.hoogeveen)

Steve Fink [:sfink] [:s:]

Comment 7

•

6 years ago

(In reply to Paul Bone [:pbone] from comment #5)
> Hi Emanuel,
> 
> I saw you working with madvise on Bug 1502733 and I'd like to pick your
> brain about madvise.
> 
> I was working on this and wasn't aware of the difference between
> MADV_DONTNEED and MADV_FREE.  For decommitting/advising the OS we don't need
> some pages anymore we usually do this in a seperate thread.  I'd like to
> check that I understand madvise correctly.  My understanding is that if I
> use MADV_DONTNEED then the kernel will need to interrupt all the program
> threads and unmap those pages from their page tables, invalidate TLBs etc,
> does this need to be synchronous?  Seems like it might defeat the porpuse of
> doing it off-thread, except that we seem to get a performance boost from
> moving that offthread.
> 
> My guess is it needs to be synchronous because:
> 
> Thread 1: madvise(MADV_DONTNEED)
> Thread 1 kernel: Unmap pages
> Thread 2 kernel: Unmap pages
> Thread 2: touches memory, writes in some important data. Causes a page
> fault, the kernel gives us new pages and the important data is saved and now
> the pages are paged in again.

I'm not sure I understand the example. Are these threads of different processes? I don't think so, since you're talking about an madvise thread and a main thread. If they're threads in the same process, then they would share the TLB, so there is no distinction between the middle two steps.

Then again, I don't think it matters.

> However, if thread 2 didn't unmap the pages until later, then the write
> would have gone without a page fault, then the pages would have been
> unmapped and the data lost.

I think you're right, it seems like the MMU-side unmapping needs to happen immediately because the semantics of MADV_DONTNEED say that for a private mapping the effect of writing into the page should be to zero-fill everything you didn't write to. Which would seem to require setting the protection bits on that entry, at least. (Releasing that entry could happen later, and the man page says that it can be delayed.) Whatever processor-side cache hangs onto the protection bits would need to be flushed, though that may be less expensive than updating the TLB virt->physical mapping?

This is based off of:

       MADV_DONTNEED
              Do not expect access in the near future.  (For the time being, the  application  is  finished  with  the  given
              range, so the kernel can free resources associated with it.)

              After a successful MADV_DONTNEED operation, the semantics of memory access in the specified region are changed:
              subsequent accesses of pages in the range will succeed, but will result in either repopulating the memory  con‐
              tents  from  the  up-to-date contents of the underlying mapped file (for shared file mappings, shared anonymous
              mappings, and shmem-based techniques such as System V shared memory segments) or zero-fill-on-demand pages  for
              anonymous private mappings.

              Note  that,  when applied to shared mappings, MADV_DONTNEED might not lead to immediate freeing of the pages in
              the range.  The kernel is free to delay freeing the pages until an appropriate moment.  The resident  set  size
              (RSS) of the calling process will be immediately reduced however.

              MADV_DONTNEED  cannot  be  applied to locked pages, Huge TLB pages, or VM_PFNMAP pages.  (Pages marked with the
              kernel-internal VM_PFNMAP flag are special memory areas that are not managed by the virtual  memory  subsystem.
              Such pages are typically created by device drivers that map the pages into user space.)


> Next, with MADV_FREE we kind-of have that problem with a single thread:
> 
> Thread: madvise(MADV_FREE)
> Thread: Wants these pages back, so begins using them again, saves some data.
> Kernel: memory pressure, unmaps the MADV_FREE pages, looses data.
> 
> So does MADV_FREE have some corresponding UNFREE so you can tell the kernel
> you care about these pages again, and it's not the page tables that need to
> be tracked syncronously but some flag on them telling the kernel it can free
> them if it needs to?

I think writing to those pages is the intended way to UNFREE them.

       MADV_FREE (since Linux 4.5)
              The  application no longer requires the pages in the range specified by addr and len.  The kernel can thus free
              these pages, but the freeing could be delayed until memory pressure occurs.  For each of  the  pages  that  has
              been  marked  to  be freed but has not yet been freed, the free operation will be canceled if the caller writes
              into the page.  After a successful MADV_FREE operation, any stale data (i.e., dirty, unwritten pages)  will  be
              lost  when  the kernel frees the pages.  However, subsequent writes to pages in the range will succeed and then
              kernel cannot free those dirtied pages, so that the caller can always see just written data.  If  there  is  no
              subsequent  write,  the  kernel  can  free the pages at any time.  Once pages in the range have been freed, the
              caller will see zero-fill-on-demand pages upon subsequent page references.

              The MADV_FREE operation can be applied only to private anonymous pages (see mmap(2)).  On  a  swapless  system,
              freeing pages in a given range happens instantly, regardless of memory pressure.

So again it seems like the protection bits would need to be updated immediately, but the TLB entry modifications can happen later. 

I'm not really following the exact differences here. It seems like MADV_FREE would set the entry to PROT_NONE and the dirty flag would be cleared. MADV_DONTNEED would just set the entry to PROT_NONE. On a quick write (before the page has been freed from the TLB), MADV_FREE memory would just unprotect the page (the dirty bit would also get set by the write). On a quick write, MADV_DONTNEED private memory would be zeroed, unprotected, and the write would happen. On a slow write for either, a new PTE would be established and then the memory would be zeroed, unprotected, and the write would happen.

Maybe? That's the understanding I've cobbled together from wading through the man page, at least.

Emanuel Hoogeveen [:ehoogeveen]

Comment 8

•

6 years ago

(In reply to Steve Fink [:sfink] [:s:] from comment #7)
> It seems like MADV_FREE
> would set the entry to PROT_NONE and the dirty flag would be cleared.

Note that - at least on OSX - we know from experience that this is not done synchronously. In fact the way we get an accurate RSS in about:memory on OSX is by calling mprotect on all MADV_FREEd pages to set them to PROT_NONE, then back to PROT_READ | PROT_WRITE [1] (pages_decommit() and pages_commit() do the toggling). Right now we don't do this for GC chunks as far as I know - we could add something similar for more "accurate" about:memory numbers (the idea is that the OS can release these pages whenever it wants, it just doesn't bother unless we force it to).

On Linux, as far as I know MADV_DONTNEED does result in an immediate reduction in RSS. I don't know if MADV_FREE does as well or if what I wrote above also applies on Linux.

[1] https://dxr.mozilla.org/mozilla-central/rev/c291143e24019097d087f9307e59b49facaf90cb/memory/build/mozjemalloc.cpp#4486

Emanuel Hoogeveen [:ehoogeveen]

Comment 9

•

6 years ago

Apologies for spam, but one correction: jemalloc seems to use mmap with MAP_FIXED rather than mprotect. I don't know if there's a reason for this - perhaps simply replacing the mapping as mmap with MAP_FIXED does is faster than mprotecting a region that might not cover the whole mapping.

Steve Fink [:sfink] [:s:]

Comment 10

•

6 years ago

(In reply to Emanuel Hoogeveen [:ehoogeveen] from comment #8)
> On Linux, as far as I know MADV_DONTNEED does result in an immediate
> reduction in RSS. I don't know if MADV_FREE does as well or if what I wrote
> above also applies on Linux.

That's what the man page on Linux says. Though I seem to remember that when I tried using it once, it did *not* appear to update RSS. But that's just an anecdote; don't trust my memory.

Paul Bone [:pbone]

Assignee

Comment 11

•

6 years ago

(In reply to Steve Fink [:sfink] [:s:] from comment #7)
> (In reply to Paul Bone [:pbone] from comment #5)
> > Hi Emanuel,
> > 
> > I saw you working with madvise on Bug 1502733 and I'd like to pick your
> > brain about madvise.
> > 
> > I was working on this and wasn't aware of the difference between
> > MADV_DONTNEED and MADV_FREE.  For decommitting/advising the OS we don't need
> > some pages anymore we usually do this in a seperate thread.  I'd like to
> > check that I understand madvise correctly.  My understanding is that if I
> > use MADV_DONTNEED then the kernel will need to interrupt all the program
> > threads and unmap those pages from their page tables, invalidate TLBs etc,
> > does this need to be synchronous?  Seems like it might defeat the porpuse of
> > doing it off-thread, except that we seem to get a performance boost from
> > moving that offthread.
> > 
> > My guess is it needs to be synchronous because:
> > 
> > Thread 1: madvise(MADV_DONTNEED)
> > Thread 1 kernel: Unmap pages
> > Thread 2 kernel: Unmap pages
> > Thread 2: touches memory, writes in some important data. Causes a page
> > fault, the kernel gives us new pages and the important data is saved and now
> > the pages are paged in again.
> 
> I'm not sure I understand the example. Are these threads of different
> processes? I don't think so, since you're talking about an madvise thread
> and a main thread. If they're threads in the same process, then they would
> share the TLB, so there is no distinction between the middle two steps.
> 
> Then again, I don't think it matters.

They're different threads in the same process, so a main thread and a decommit thread as we have in SpiderMonkey.

They don't share the TLB, the TLB is some transistors inside the CPU core/HT-thread, If two threads run on the same core, then they do share the TLB and it's invalidated with each context switch, if they run on different cores then they have seperate TLBs.  It mah help to ignore the TLB and think only of the page tables which are maintained by the kernel.

No, it doesn't matter in this case, this is the happy case when everything works correctly, I introduced it frist so the reader can compare it to the next case.

> > However, if thread 2 didn't unmap the pages until later, then the write
> > would have gone without a page fault, then the pages would have been
> > unmapped and the data lost.
> 
> I think you're right, it seems like the MMU-side unmapping needs to happen
> immediately because the semantics of MADV_DONTNEED say that for a private
> mapping the effect of writing into the page should be to zero-fill
> everything you didn't write to. Which would seem to require setting the
> protection bits on that entry, at least. (Releasing that entry could happen
> later, and the man page says that it can be delayed.)

Oh, interesting, I was assuming that if you change the bits you might as well unmap the page, since both would need to invalidate (that part of) the TLB (not sure if TLBs are granular at all) and therefore have the same performance impact.  But maybe it doesn't or the bookkeeping costs in the kernel are high enough that it matters.

> Whatever
> processor-side cache hangs onto the protection bits would need to be
> flushed, though that may be less expensive than updating the TLB
> virt->physical mapping?

I'd be surprised, but this is just a guess, it's a bit too deep for me to know off-hand.

> 
> This is based off of:
> 
>        MADV_DONTNEED
>               Do not expect access in the near future.  (For the time being,
> the  application  is  finished  with  the  given
>               range, so the kernel can free resources associated with it.)
> 
>               After a successful MADV_DONTNEED operation, the semantics of
> memory access in the specified region are changed:
>               subsequent accesses of pages in the range will succeed, but
> will result in either repopulating the memory  con‐
>               tents  from  the  up-to-date contents of the underlying mapped
> file (for shared file mappings, shared anonymous
>               mappings, and shmem-based techniques such as System V shared
> memory segments) or zero-fill-on-demand pages  for
>               anonymous private mappings.
> 
>               Note  that,  when applied to shared mappings, MADV_DONTNEED
> might not lead to immediate freeing of the pages in
>               the range.  The kernel is free to delay freeing the pages
> until an appropriate moment.  The resident  set  size
>               (RSS) of the calling process will be immediately reduced
> however.
> 
>               MADV_DONTNEED  cannot  be  applied to locked pages, Huge TLB
> pages, or VM_PFNMAP pages.  (Pages marked with the
>               kernel-internal VM_PFNMAP flag are special memory areas that
> are not managed by the virtual  memory  subsystem.
>               Such pages are typically created by device drivers that map
> the pages into user space.)
> 
> 
> > Next, with MADV_FREE we kind-of have that problem with a single thread:
> > 
> > Thread: madvise(MADV_FREE)
> > Thread: Wants these pages back, so begins using them again, saves some data.
> > Kernel: memory pressure, unmaps the MADV_FREE pages, looses data.
> > 
> > So does MADV_FREE have some corresponding UNFREE so you can tell the kernel
> > you care about these pages again, and it's not the page tables that need to
> > be tracked syncronously but some flag on them telling the kernel it can free
> > them if it needs to?
> 
> I think writing to those pages is the intended way to UNFREE them.
> 
>        MADV_FREE (since Linux 4.5)
>               The  application no longer requires the pages in the range
> specified by addr and len.  The kernel can thus free
>               these pages, but the freeing could be delayed until memory
> pressure occurs.  For each of  the  pages  that  has
>               been  marked  to  be freed but has not yet been freed, the
> free operation will be canceled if the caller writes
>               into the page.  After a successful MADV_FREE operation, any
> stale data (i.e., dirty, unwritten pages)  will  be
>               lost  when  the kernel frees the pages.  However, subsequent
> writes to pages in the range will succeed and then
>               kernel cannot free those dirtied pages, so that the caller can
> always see just written data.  If  there  is  no
>               subsequent  write,  the  kernel  can  free the pages at any
> time.  Once pages in the range have been freed, the
>               caller will see zero-fill-on-demand pages upon subsequent page
> references.

My Ubuntu 16.04 didn't have this entry in the man page, I hadn't read it yet.

Yeah, that sounds like it makes the page read-only so that it gets a page-fault and can cancel the free-able-ness.

Because it looses the data, and MADV_DONTNEED also looses the data, I'm now confused about the difference.  I'll have to read the article that ehoovgreen posted again.

>               The MADV_FREE operation can be applied only to private
> anonymous pages (see mmap(2)).  On  a  swapless  system,
>               freeing pages in a given range happens instantly, regardless
> of memory pressure.

Either there's a semantic difference I'm not getting WRT swap.  Or there's some implementation detail in the kernel leaking out here.

> So again it seems like the protection bits would need to be updated
> immediately, but the TLB entry modifications can happen later.

Okay, I'll need to check if changing the protection bits means that the TLB (or part of it) needs to be invalidated, and if this is different from changing the mapping.

> I'm not really following the exact differences here. It seems like MADV_FREE
> would set the entry to PROT_NONE and the dirty flag would be cleared.
> MADV_DONTNEED would just set the entry to PROT_NONE. On a quick write
> (before the page has been freed from the TLB), MADV_FREE memory would just
> unprotect the page (the dirty bit would also get set by the write). On a
> quick write, MADV_DONTNEED private memory would be zeroed, unprotected, and
> the write would happen. On a slow write for either, a new PTE would be
> established and then the memory would be zeroed, unprotected, and the write
> would happen.
> 
> Maybe? That's the understanding I've cobbled together from wading through
> the man page, at least.

Could be, I'll keep that in mind.  It looks like on calling madvise both things would behave the same for anonomous pages.  Since feeing dirty anonomous pages just discards them anyway.  However in your guess there is a difference with the quick write WRT zeroing/leaving the page contents, kinda hints at implementation details again.  I'll keep this inmind after I do some reading and see if our conclusions/guesses match up.

We may need to test to confirm our gesses.

Paul Bone [:pbone]

Assignee

Comment 12

•

6 years ago

Hi tcampbell,

Here's another change we could enable when we know all of a process' tabs are in the background.

Flags: needinfo?(tcampbell)

Paul Bone [:pbone]

Assignee

Comment 13

•

6 years ago

nevermind tcampbell, I'll add this as a dependency to our other bug.

Blocks: MinGCMem

Flags: needinfo?(tcampbell)

Paul Bone [:pbone]

Assignee

Updated

•

6 years ago

Priority: P3 → P2

Paul Bone [:pbone]

Assignee

Comment 14

•

5 years ago

Attached file Bug 1499570 - Decommit nursery chunks when they're recycled r=jonco — Details

Paul Bone [:pbone]

Assignee

Updated

•

5 years ago

Attachment #9017729 - Attachment is obsolete: true

Pulsebot

Comment 15

•

5 years ago

Pushed by pbone@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/53baf823b648
Decommit nursery chunks when they're recycled r=jonco

Paul Bone [:pbone]

Assignee

Comment 16

•

5 years ago

Hi prefherders, Here's another test that should reduce memory usage. It should not affect performance, or it it does only very slightly.

Thanks.

Flags: needinfo?(igoldan)

Noemi Erli[:noemi_erli]

Comment 17

•

5 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/53baf823b648

Status: ASSIGNED → RESOLVED

Closed: 5 years ago

status-firefox69: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla69

Ionuț Goldan [:igoldan]

Comment 18

•

5 years ago

Thanks for the heads up!

Flags: needinfo?(igoldan)

Ionuț Goldan [:igoldan]

Updated

•

5 years ago

Keywords: perf

Paul Bone [:pbone]

Assignee

Updated

•

5 years ago

Blocks: 1555550

Decommit nursery chunks as we recycle them 6 years ago Paul Bone [:pbone] 1.42 KB, patch		Details \| Diff \| Splinter Review
Bug 1499570 - Decommit nursery chunks when they're recycled r=jonco 5 years ago Paul Bone [:pbone] 47 bytes, text/x-phabricator-request		Details \| Review