Temporary data reported in js/runtime/{jaeger-code,gc-marker,etc.} is not cleared completely on minimize-memory-usage

RESOLVED WONTFIX

Status

()

Core
JavaScript Engine
RESOLVED WONTFIX
6 years ago
6 years ago

People

(Reporter: Justin Lebar (not reading bugmail), Unassigned)

Tracking

Trunk
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [MemShrink])

Attachments

(1 attachment)

(Reporter)

Description

6 years ago
Created attachment 679778 [details]
B2G memory report

I don't know if this is actually a bug, but it might be?

Here are highlights from a memory report I gathered off a device by doing ./get-about-memory --minimize.  It's possible that this isn't actually minimizing memory usage, but I see similar numbers on Firefox desktop.  (That could reflect the fact that, in Firefox desktop, I'm running about:memory's JS code after minimizing.)

Main process js-non-window:

│   ├──1.12 MB (02.95%) -- runtime
│   │  ├──0.29 MB (00.77%) ── script-sources
│   │  ├──0.27 MB (00.72%) ── jaeger-code
│   │  ├──0.25 MB (00.66%) ── atoms-table
│   │  ├──0.13 MB (00.33%) ── gc-marker
│   │  ├──0.11 MB (00.29%) ── runtime-object
│   │  ├──0.04 MB (00.09%) ── unused-code
│   │  ├──0.01 MB (00.03%) ── script-filenames
│   │  ├──0.01 MB (00.02%) ── stack
│   │  ├──0.01 MB (00.02%) ── contexts
│   │  ├──0.00 MB (00.01%) ── regexp-code
│   │  ├──0.00 MB (00.01%) ── dtoa
│   │  ├──0.00 MB (00.01%) ── temporary
│   │  ├──0.00 MB (00.00%) ── ion-code
│   │  └──0.00 MB (00.00%) ── math-cache

Homescreen js-non-window (homescreen is in the background):

│   └──0.76 MB (06.42%) -- runtime
│      ├──0.27 MB (02.26%) -- (11 tiny)
│      │  ├──0.11 MB (00.92%) ── runtime-object
│      │  ├──0.07 MB (00.63%) ── script-sources
│      │  ├──0.05 MB (00.42%) ── unused-code
│      │  ├──0.02 MB (00.16%) ── regexp-code
│      │  ├──0.00 MB (00.03%) ── dtoa
│      │  ├──0.00 MB (00.03%) ── temporary
│      │  ├──0.00 MB (00.03%) ── stack
│      │  ├──0.00 MB (00.02%) ── script-filenames
│      │  ├──0.00 MB (00.02%) ── contexts
│      │  ├──0.00 MB (00.00%) ── ion-code
│      │  └──0.00 MB (00.00%) ── math-cache
│      ├──0.24 MB (02.05%) ── jaeger-code
│      ├──0.13 MB (01.05%) ── atoms-table
│      └──0.13 MB (01.05%) ── gc-marker

And so on.
(Reporter)

Comment 1

6 years ago
I need to figure out here whether get-about-memory.py --minimize is /actually/ doing a minimize.
(In reply to Justin Lebar [:jlebar] from comment #1)
> I need to figure out here whether get-about-memory.py --minimize is
> /actually/ doing a minimize.

Comparing reports taking with and without it showed differences in my testing, with the minimized reports being smaller. It's probably easy to verify if the right code is running with gdb, shall I try it?
(Reporter)

Comment 3

6 years ago
I realize now that when we minimize memory usage on desktop, we would expect see the js-non-window values go to ~0, since the JS we're running in about:memory counts towards a window, not js-non-window.  So it sounds like what you're seeing indicates that --minimize probably works as expected on target.
Oh.  Here's what the JS engine does on a memory-pressure event:

NS_IMETHODIMP
nsMemoryPressureObserver::Observe(nsISupports* aSubject, const char* aTopic, 
                                  const PRUnichar* aData)     
{                                                             
  if (sGCOnMemoryPressure) {                                  
    nsJSContext::GarbageCollectNow(js::gcreason::MEM_PRESSURE,
                                   nsJSContext::NonIncrementalGC,
                                   nsJSContext::NonCompartmentGC,
                                   nsJSContext::ShrinkingGC); 
    nsJSContext::CycleCollectNow();                           
  }                                                           
  return NS_OK;                                               
}


Just a GC and a CC.  While we do drop some stuff on each GC (and I thought jaeger code was among that stuff) there are some stuff that we should additionally drop on memory pressure.
Note that we're doing a shrinking GC here, which is supposed to drop as much as possible.
│   │  ├──0.13 MB (00.33%) ── gc-marker

This is mostly due to GCMarker::stack::ballast, which is a never-freed vector of pointers with MARK_STACK_LENGTH (32,768) entries, i.e. 128 KiB on 32-bit platforms.  (If the stack size exceeds 32,768 we start dynamically allocating space for it;  that dynamically-allocated space is reclaimed at the end of GC.)  I wonder if it could be halved without harm.  billm, any idea how big this stack typically gets?


│   │  ├──0.25 MB (00.66%) ── atoms-table

This, like Gecko's corresponding atom table, gradually grows and doesn't ever get cleared, AFAIK.


│   │  ├──0.27 MB (00.72%) ── jaeger-code

I'm not sure about this one.  How aggressively is JaegerMonkey code discarded on GC?  Perhaps it doesn't always all get discarded at once?
│   │  ├──0.11 MB (00.29%) ── runtime-object

This is mostly taken up by the property cache, which has already been targeted for removal (bug 704356) though it's not clear what effect that would have on performance.
Depends on: 811176
I looked into the jaeger-code issue. There are a bunch of problems here.

First, I found that we could be releasing code more aggressively. I filed bug 811176 to fix that.

Second, it looks like the about:memory code actually allocates some regexps (and the corresponding Yarr code) and does some JITing in between the shrinking GC and the memory reporter code. Maybe this is necessary somehow, but it seems like we ought to be able to run them back-to-back and avoid any extra allocations in the middle.

Third, JaegerMonkey always allocates some trampoline code that lives forever. It's a very small piece of code. However, we allocate executable pools in 64K units. So even if we freed everything except the trampoline, we'd still be using 64K of code memory. We could adjust this number downward, although I think it always has to be 64K on Windows.

Finally, I looked at the stack size. I ran a 50-tab workload. I tracked the highest the stack every got in a given GC. In most GCs, the max was around 12,000 to 15,000 items. The highest it got was 38,901 items. The only downside of too little ballast is that if we OOM while trying to allocate enough stack via malloc, then the GC is forced to take an extremely slow path of iterating over the entire heap many times. We might consider making this a preferences and setting it lower for B2G, where we expect to have less to mark.
Is there telemetry for hitting the slow marking path?

Also, being able to set the mark stack size would be nice for testing the slow path code, though I forget, maybe there's already something in place for that.
> Second, it looks like the about:memory code actually allocates some regexps
> (and the corresponding Yarr code) and does some JITing in between the
> shrinking GC and the memory reporter code. Maybe this is necessary somehow,
> but it seems like we ought to be able to run them back-to-back and avoid any
> extra allocations in the middle.

about:memory's code to read the memory reports is itself JavaScript, and it does do some processing along the way.  However, for B2G we're getting memory reports via a signal-based mechanism (bug 788021) that dumps them to file, and which is entirely written in C++.  So we can ignore these here.


> Third, JaegerMonkey always allocates some trampoline code that lives
> forever. It's a very small piece of code. However, we allocate executable
> pools in 64K units. So even if we freed everything except the trampoline,
> we'd still be using 64K of code memory. We could adjust this number
> downward, although I think it always has to be 64K on Windows.

IIRC, on Windows mappings can be smaller than 64 KiB but they must be 64 KiB-aligned, so the scope for making them smaller is minimal.  I suggest we wait and see how much effect bug 811176 has before going further on this.


> Finally, I looked at the stack size. I ran a 50-tab workload. I tracked the
> highest the stack every got in a given GC. In most GCs, the max was around
> 12,000 to 15,000 items. The highest it got was 38,901 items. The only
> downside of too little ballast is that if we OOM while trying to allocate
> enough stack via malloc, then the GC is forced to take an extremely slow
> path of iterating over the entire heap many times. We might consider making
> this a preferences and setting it lower for B2G, where we expect to have
> less to mark.

Perhaps we should leave this one alone.
Bug 811176 is the only actionable thing here.
Status: NEW → RESOLVED
Last Resolved: 6 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.