Closed Bug 765435 Opened 12 years ago Closed 12 years ago

Make JS heap growth factor depend on the heap size after a GC and GC frequency

Categories

(Core :: JavaScript Engine, defect)

15 Branch
x86_64
Windows 7
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla16

People

(Reporter: jon.carlos.rivera, Assigned: gwagner)

References

(Blocks 1 open bug, )

Details

(Whiteboard: [js:p1:fx16][MemShrink:P2][games:p2])

Attachments

(5 files, 4 obsolete files)

Attached file Raytracer Code with Examples (obsolete) —
The URI associated with this bug is an image of a graph of the memory consumption of all major browsers rendering the same scene with the same settings. Horizontal axis is in seconds.

During a rendering the memory usage in FF rises rapidly with allocations sometimes exceeding 100MB a second. If the render completes before the 2GB wall is hit, the browser doesn't crash but if the target resolution is increased or the model is more highly detailed, the 2GB wall can be hit before rendering is done. Once this happens, the browser will almost certainly crash (though it may take a minute or two).

Also note the drop-off in memory after rendering at the end of most of the graphs. I shift-reloaded the tabs a few seconds after the rendering completed to show another problem I am having. Reloading the tab and then redoing a rendering often crashes the browser very quickly because the previously used memory is not freed nor reused.

To help in the diagnosis I have attached the bulk of the source code for the ray tracer. The project is not the cleanest or best code base (I apologize in advance!)

There are two html pages contained within:

render_buddha.html - Is similar to the scene that produced the graph.

render_bunny.html - Shows that increased model complexity causes an out of memory condition to occur somewhat earlier.
Attachment #633756 - Attachment mime type: application/octet-stream → application/zip
Jon-Carlos, are you able to get the output of about:memory (just cut and paste) before it crashes?
Whiteboard: [MemShrink]
Managed to get this a few seconds before the browser crashed:

---

Main Process

Explicit Allocations
1,925.90 MB (100.0%) -- explicit
├──1,877.83 MB (97.50%) -- js
│  ├────915.25 MB (47.52%) -- compartment(file:///C:/firefox_testcase/render_buddha.html)
│  │    ├──900.91 MB (46.78%) -- gc-heap
│  │    │  ├──842.61 MB (43.75%) -- arena
│  │    │  │  ├──838.13 MB (43.52%) ── unused
│  │    │  │  └────4.47 MB (00.23%) ++ (2 tiny)
│  │    │  ├───48.37 MB (02.51%) -- objects
│  │    │  │   ├──48.25 MB (02.51%) ── non-function
│  │    │  │   └───0.12 MB (00.01%) ── function
│  │    │  └────9.94 MB (00.52%) ++ (4 tiny)
│  │    └───14.34 MB (00.74%) ++ (7 tiny)
│  ├────512.15 MB (26.59%) ── gc-heap-decommitted
│  ├────364.40 MB (18.92%) ── gc-heap-chunk-dirty-unused
│  ├─────57.37 MB (02.98%) ++ (187 tiny)
│  └─────28.66 MB (01.49%) ── gc-heap-chunk-admin
├─────32.76 MB (01.70%) ── heap-unclassified
└─────15.31 MB (00.79%) ++ (12 tiny)

Other Measurements
    2.44 MB ── canvas-2d-pixel-bytes
1,925.85 MB ── explicit
    0.02 MB ── gfx-d2d-surfacecache
    6.83 MB ── gfx-d2d-surfacevram
    1.38 MB ── gfx-surface-image
          0 ── ghost-windows
   88.17 MB ── heap-allocated
  101.03 MB ── heap-committed
   12.84 MB ── heap-committed-unused
     14.56% ── heap-committed-unused-ratio
    2.90 MB ── heap-dirty
   17.82 MB ── heap-unused
    0.01 MB ── images-content-used-uncompressed
        195 ── js-compartments-system
          4 ── js-compartments-user
1,834.00 MB ── js-gc-heap
    7.34 MB ── js-main-runtime-analysis-temporary
  107.61 MB ── js-main-runtime-gc-heap-allocated
  849.84 MB ── js-main-runtime-gc-heap-arena-unused
    0.00 MB ── js-main-runtime-gc-heap-chunk-clean-unused
  364.40 MB ── js-main-runtime-gc-heap-chunk-dirty-unused
1,321.85 MB ── js-main-runtime-gc-heap-committed
1,214.25 MB ── js-main-runtime-gc-heap-committed-unused
   1128.42% ── js-main-runtime-gc-heap-committed-unused-ratio
  512.15 MB ── js-main-runtime-gc-heap-decommitted
    2.95 MB ── js-main-runtime-mjit
   65.75 MB ── js-main-runtime-objects
    6.46 MB ── js-main-runtime-scripts
   19.56 MB ── js-main-runtime-shapes
    9.46 MB ── js-main-runtime-strings
    0.40 MB ── js-main-runtime-type-inference
          0 ── low-commit-space-events
          0 ── low-memory-events-physical
          0 ── low-memory-events-virtual
1,992.30 MB ── private
2,008.78 MB ── resident
    3.69 MB ── storage-sqlite
2,226.01 MB ── vsize
    0.50 MB ── window-objects-dom
    0.93 MB ── window-objects-layout-arenas
    0.04 MB ── window-objects-layout-pres-contexts
    1.63 MB ── window-objects-layout-style-sets
    0.00 MB ── window-objects-layout-text-runs
    1.69 MB ── window-objects-style-sheets
Thanks, Jon-Carlos.


> ├─────32.76 MB (01.70%) ── heap-unclassified

That's the lowest heap-unclassified I've ever seen :)


> 1,834.00 MB ── js-gc-heap
>     7.34 MB ── js-main-runtime-analysis-temporary
>   107.61 MB ── js-main-runtime-gc-heap-allocated
>   849.84 MB ── js-main-runtime-gc-heap-arena-unused
>     0.00 MB ── js-main-runtime-gc-heap-chunk-clean-unused
>   364.40 MB ── js-main-runtime-gc-heap-chunk-dirty-unused
> 1,321.85 MB ── js-main-runtime-gc-heap-committed
> 1,214.25 MB ── js-main-runtime-gc-heap-committed-unused
>    1128.42% ── js-main-runtime-gc-heap-committed-unused-ratio
>   512.15 MB ── js-main-runtime-gc-heap-decommitted

Those are some ugly numbers.  I wonder why js-main-runtime-gc-heap-committed-unused is so high?
> >   849.84 MB ── js-main-runtime-gc-heap-arena-unused

This is wasted space caused by partially full 4KB arenas.

> >   364.40 MB ── js-main-runtime-gc-heap-chunk-dirty-unused

This is wasted space caused by partially full 1MB chunks.

How can so much fragmentation occur within a single compartment?  Something seems badly wrong.
I tried using x64 Nightly to render the more detailed scene provided in the examples (the bunny). Though it completed, it did so at the costs of extreme amounts of memory:

---

Main Process

Explicit Allocations
3,955.23 MB (100.0%) -- explicit
├──3,907.08 MB (98.78%) -- js
│  ├──1,921.63 MB (48.58%) ── gc-heap-decommitted
│  ├──1,850.47 MB (46.79%) -- compartment(file:///C:/rtree/firefox_testcase/render_bunny.html)
│  │  ├──1,823.39 MB (46.10%) -- gc-heap
│  │  │  ├──1,540.52 MB (38.95%) -- arena
│  │  │  │  ├──1,511.91 MB (38.23%) ── unused
│  │  │  │  └─────28.61 MB (00.72%) ++ (2 tiny)
│  │  │  ├────256.77 MB (06.49%) -- objects
│  │  │  │    ├──256.01 MB (06.47%) ── non-function
│  │  │  │    └────0.76 MB (00.02%) ── function
│  │  │  └─────26.10 MB (00.66%) ++ (4 tiny)
│  │  └─────27.08 MB (00.68%) ++ (6 tiny)
│  ├─────74.89 MB (01.89%) ++ (182 tiny)
│  └─────60.09 MB (01.52%) ── gc-heap-chunk-admin
└─────48.15 MB (01.22%) ++ (12 tiny)

Other Measurements
190 (100.0%) -- js-compartments
├──187 (98.42%) ── system
└────3 (01.58%) ── user

7.63 MB (100.0%) -- window-objects
├──3.86 MB (50.61%) -- layout
│  ├──2.50 MB (32.75%) ── style-sets
│  ├──0.82 MB (10.78%) ── pres-shell
│  ├──0.23 MB (02.97%) ── rule-nodes
│  ├──0.15 MB (01.97%) ── style-contexts
│  ├──0.10 MB (01.27%) ── frames
│  └──0.07 MB (00.86%) ++ (3 tiny)
├──2.81 MB (36.76%) ── style-sheets
└──0.96 MB (12.64%) -- dom
   ├──0.67 MB (08.80%) ── element-nodes
   ├──0.23 MB (03.03%) ── other
   └──0.06 MB (00.80%) ++ (3 tiny)

    2.44 MB ── canvas-2d-pixel-bytes
3,955.18 MB ── explicit
    0.00 MB ── gfx-d2d-surfacecache
    6.44 MB ── gfx-d2d-surfacevram
    3.26 MB ── gfx-surface-image
          0 ── ghost-windows
  107.16 MB ── heap-allocated
  113.44 MB ── heap-committed
    6.26 MB ── heap-committed-unused
      5.83% ── heap-committed-unused-ratio
    3.38 MB ── heap-dirty
   56.83 MB ── heap-unused
    2.67 MB ── images-content-used-uncompressed
3,846.00 MB ── js-gc-heap
   10.57 MB ── js-main-runtime-analysis-temporary
  399.31 MB ── js-main-runtime-gc-heap-allocated
1,525.07 MB ── js-main-runtime-gc-heap-arena-unused
    0.00 MB ── js-main-runtime-gc-heap-chunk-clean-unused
    0.00 MB ── js-main-runtime-gc-heap-chunk-dirty-unused
1,924.38 MB ── js-main-runtime-gc-heap-committed
1,525.07 MB ── js-main-runtime-gc-heap-committed-unused
    381.93% ── js-main-runtime-gc-heap-committed-unused-ratio
1,921.63 MB ── js-main-runtime-gc-heap-decommitted
    0.87 MB ── js-main-runtime-mjit
  284.00 MB ── js-main-runtime-objects
    7.88 MB ── js-main-runtime-scripts
   42.46 MB ── js-main-runtime-shapes
   13.42 MB ── js-main-runtime-strings
    0.62 MB ── js-main-runtime-type-inference
4,023.54 MB ── private
4,060.45 MB ── resident
    3.75 MB ── storage-sqlite
4,405.37 MB ── vsize

--- 

NOTES:

1) In Chrome (32bit), this scene consumes about ~160MB to render.

2) Unlike the previous 32-bit runs, about:memory report didn't _always_ display the bulk of the memory in "unused" categories. Sometimes the memory dump showed ~2.5gb of non-function objects (see below). But a reload would show something similar to above. The memory seemed to fluctuate from fully utilized to mostly unused throughout the course of the render.

---

Main Process

Explicit Allocations
3,868.05 MB (100.0%) -- explicit
├──3,828.27 MB (98.97%) -- js
│  ├──2,584.60 MB (66.82%) -- compartment(file:///C:/rtree/firefox_testcase/render_bunny.html)
│  │  ├──2,528.58 MB (65.37%) -- gc-heap
│  │  │  ├──2,438.09 MB (63.03%) -- objects
│  │  │  │  ├──2,433.00 MB (62.90%) ── non-function
│  │  │  │  └──────5.10 MB (00.13%) ── function
│  │  │  ├─────62.19 MB (01.61%) ++ arena
│  │  │  └─────28.30 MB (00.73%) ++ (4 tiny)
│  │  ├─────42.82 MB (01.11%) ++ objects
│  │  └─────13.21 MB (00.34%) ++ (7 tiny)
│  ├──1,107.01 MB (28.62%) ── gc-heap-chunk-dirty-unused
│  ├─────78.24 MB (02.02%) ++ (184 tiny)
│  └─────58.42 MB (01.51%) ── gc-heap-chunk-admin
└─────39.78 MB (01.03%) ++ (12 tiny)

Other Measurements
192 (100.0%) -- js-compartments
├──190 (98.96%) ── system
└────2 (01.04%) ── user

7.25 MB (100.0%) -- window-objects
├──3.62 MB (49.91%) -- layout
│  ├──2.35 MB (32.37%) ── style-sets
│  ├──0.77 MB (10.60%) ── pres-shell
│  ├──0.22 MB (02.97%) ── rule-nodes
│  ├──0.14 MB (01.88%) ── style-contexts
│  ├──0.10 MB (01.38%) ── frames
│  └──0.05 MB (00.70%) ++ (3 tiny)
├──2.77 MB (38.19%) ── style-sheets
└──0.86 MB (11.90%) -- dom
   ├──0.59 MB (08.20%) ── element-nodes
   ├──0.22 MB (03.06%) ── other
   └──0.05 MB (00.64%) ++ (3 tiny)

    2.44 MB ── canvas-2d-pixel-bytes
3,868.00 MB ── explicit
    0.00 MB ── gfx-d2d-surfacecache
    6.44 MB ── gfx-d2d-surfacevram
    0.43 MB ── gfx-surface-image
          0 ── ghost-windows
  124.00 MB ── heap-allocated
  131.89 MB ── heap-committed
    7.88 MB ── heap-committed-unused
      6.35% ── heap-committed-unused-ratio
    0.89 MB ── heap-dirty
   30.99 MB ── heap-unused
    0.00 MB ── images-content-used-uncompressed
3,740.00 MB ── js-gc-heap
    7.81 MB ── js-main-runtime-analysis-temporary
2,588.42 MB ── js-main-runtime-gc-heap-allocated
   35.91 MB ── js-main-runtime-gc-heap-arena-unused
    1.00 MB ── js-main-runtime-gc-heap-chunk-clean-unused
1,107.01 MB ── js-main-runtime-gc-heap-chunk-dirty-unused
3,732.33 MB ── js-main-runtime-gc-heap-committed
1,143.91 MB ── js-main-runtime-gc-heap-committed-unused
     44.19% ── js-main-runtime-gc-heap-committed-unused-ratio
    7.67 MB ── js-main-runtime-gc-heap-decommitted
    3.27 MB ── js-main-runtime-mjit
2,492.22 MB ── js-main-runtime-objects
    7.86 MB ── js-main-runtime-scripts
   35.47 MB ── js-main-runtime-shapes
   19.95 MB ── js-main-runtime-strings
    0.56 MB ── js-main-runtime-type-inference
3,937.80 MB ── private
3,978.88 MB ── resident
    3.75 MB ── storage-sqlite
4,298.38 MB ── vsize
> 2) Unlike the previous 32-bit runs, about:memory report didn't _always_
> display the bulk of the memory in "unused" categories. Sometimes the memory
> dump showed ~2.5gb of non-function objects (see below). But a reload would
> show something similar to above. The memory seemed to fluctuate from fully
> utilized to mostly unused throughout the course of the render.

Ah, that's a good clue.  So it sounds like Firefox is creating vast numbers of objects that die in groups, or something like that.
Apart from the huge amount of unused space on the heap, there's also lots of non-function objects kept alive during and after the rendering. So it looks like the heap is very fragmented, explaining why this much memory is wasted.

Another question altogether is why Firefox holds onto that much more objects than the other browsers. Maybe there's some engine-specific code in the application that causes it to hold onto these objects in SpiderMonkey only?
(In reply to Till Schneidereit [:till] from comment #8)
> Maybe there's some engine-specific code in the application 
> that causes it to hold onto these objects in SpiderMonkey only?

There is only one place that ever checks the user agent and that is after a tile has rendered and we determine if there is window.postMessage support and, if so, we use that to schedule the next tile instead of a setTimeout.

Any hot code (mostly creating Vectors, Triangles' intersect() function, and the BIH's traversal) is the same between browsers.
> Any hot code (mostly creating Vectors, Triangles' intersect() function, and
> the BIH's traversal) is the same between browsers.

I haven't looked at the code yet, but this reminds me of this nice article:  http://blog.tojicode.com/2012/04/if-i-built-physics-engine.html

The key quote:  "Also, avoid vector objects if at all possible (as in vector2 with x and y properties). While again it may be convenient to have functions return these objects so they can change or return both values at once, you can easily end up with hundreds of these created every frame, resulting in terrible GC performance."

This doesn't excuse Firefox's bad behaviour here, but it's good JS advice in general.
> Another question altogether is why Firefox holds onto that much more objects than the
> other browsers.

The most likely answer would be "because we run GC too rarely", no?
(In reply to Boris Zbarsky (:bz) from comment #11)
> > Another question altogether is why Firefox holds onto that much more objects than the
> > other browsers.
> 
> The most likely answer would be "because we run GC too rarely", no?

We run lots of GCs during a rendering. Unfortunately, neither those, nor using "Minimize Memory Usage" on about:memory frees all of these objects.

The stats in comment 6 show that even after rendering has finished, there's still 256MB of non-function objects kept live. My guess would be that the second set of stats describes a pre-GC state of things, where all of that memory is in fact used by temporarily-created objects, whereas the first set describes a post-GC state.

My thinking is that because 256MB is way more than Chrome ever uses, there must be something causing us to hold onto all these objects but doesn't cause other engines to do so.
Hmm.  Which builds are we testing here?  Any closures involved that close over objects but don't actually use them?
I tested with yesterday's Nightly, but it appears to be a longer-standing problem. Hence, I doubt that it's caused by bug 659577, if that's what you're getting at.
No, I'm thinking about other changes in the somewhat recent past (last few months) that got rid of some of our closure types (null closures? flat closures? something else?) that changed when closures would actually keep things alive.  How long-standing is the issue?
I think you are thinking of bug 730497, which landed in 14.
Yes, that was it.  So just checking whether this is a problem in 13 would deal with that hypothesis. ;)
(In reply to Boris Zbarsky (:bz) from comment #17)
> Yes, that was it.  So just checking whether this is a problem in 13 would
> deal with that hypothesis. ;)

So I tested some older versions: about:memory in FF 8 shows pretty much the same situation as the current Nightly does. I gave up on testing older versions after that, but for completeness' sake, I tested in 3.6, too: Memory usage is almost exactly the same, there.
(In reply to Jon-Carlos Rivera from comment #9)
> (In reply to Till Schneidereit [:till] from comment #8)
> > Maybe there's some engine-specific code in the application 
> > that causes it to hold onto these objects in SpiderMonkey only?
> 
> There is only one place that ever checks the user agent and that is after a
> tile has rendered and we determine if there is window.postMessage support
> and, if so, we use that to schedule the next tile instead of a setTimeout.
> 
> Any hot code (mostly creating Vectors, Triangles' intersect() function, and
> the BIH's traversal) is the same between browsers.

Are you sure about that regarding all of the libraries you're using?
As far as I know, the only library that does have separate code paths would be jQuery but that isn't used *in* the renderer itself and is just used to handle mouse events and retrieve values from the input elements before rendering.
I uploaded a new version of the ray tracer example code. I decided to write a command-line version of the ray tracer so that you guys could test it in JS Shell. The file is "render_object.js" and it takes a couple of command line options to make your life easier:

-sx <target width in pixel>
-sy <target height in pixel>
-obj <path to model> (default is "obj/buddha.obj" but also available is "obj/bunny.obj")

---

All this work turned out to be a big help even though I don't understand anything about debugging SpiderMonkey. I found the exact line that causes the out of memory problem.

Commenting that one line caused the renderer to finish with a roughly constant, though comparatively high, memory use. I could even see the GC kicking in and freeing 50-100mb of temporary objects every few seconds. Reloading the page after the render was complete freed lots of memory.

In the new ray tracer code I just uploaded, the problem is line #77 in engine.js

---

I can't even begin to understand *Why* this particular line is causing so many problems for the GC. What I do know is that the array is filled in tile-order (the tiles are 16x16). This means that in an image with 800x800 pixels, the array offset at 12,015 (the lower-right corner of the first tile) will be filled just before the array offset at  16 (the upper-left corner of the second tile).

I don't think the order actually matters too much, though. I tried pre-allocating the array but that didn't help. I tried simply pushing elements into the array just to see if the order was the culprit but that didn't help much (if at all).

I have been unable to reproduce the problem in any stripped-down test case but I am way out of my depth!
Attachment #633756 - Attachment is obsolete: true
Attachment #634311 - Attachment mime type: application/octet-stream → application/zip
What happens if you leave the "new Flog.RayTracer.SampleBuffer( color, 1 );" part but just don't put it in the array?

If using push() didn't help, then this is probably not a sparse vs dense array issue....

How big is this array getting?  How many of these arrays are around?
How big is this array getting?  How many of these arrays are around?(In reply to Boris Zbarsky (:bz) from comment #22)

> What happens if you leave the "new Flog.RayTracer.SampleBuffer( color, 1 );"
> part but just don't put it in the array?

Then everything works wonderfully.

In JS Shell, the memory usage with that line peaked at ~1.7gb. With that one line commented out the memory usage peaked at ~109mb.

> If using push() didn't help, then this is probably not a sparse vs dense
> array issue....
> 
> How big is this array getting? 

It eventually gets to (width * height) elements.

In order to account for the memory delta above, each element would be consuming just shy of 2.5kb. The elements are an object (SampleBuffer) that contains a number and an object reference. That other object (Color) contains just three numbers.

> How many of these arrays are around?

Just the one.
Attached file Small Testcases
Added stand-alone test cases for both JS Shell and Firefox.

SpiderMonkey handles making lots of throw-away objects just fine and handles making fairly large arrays (640k elements) just fine. But filling a large array *while* making tons of throw-away objects seems to block the GC.
I looked at the memory usage of the original test case, and there is indeed a ton of fragmentation. Most arenas have only one object in them (with the remaining ~150 cells remaining empty).

Looking at the shell test case shows why. It allocates a long-lived object followed by 200 short-lived objects, and then repeats. The long-lived object ends up in the same arena as the short-lived objects. Once they die, the entire arena is wasted except for the one long-lived object.

Generational or compacting collection would fix this. I'm not sure what the other engines do here to get around this problem. Chrome uses generational collection, but I didn't think Safari was doing that yet. I don't know what IE does.

Thanks for the test case, Jon-Carlos. We're working on adding generational and compacting collection. Unfortunately, I don't think there's much we can do here in the short term. I can give advice about how to work around the issue, if that's helpful.
> I'm not sure what the other engines do here to get around this problem.

Well, if we gc'ed more often, that would bound the amount of memory we waste, right?
(In reply to Justin Lebar [:jlebar] from comment #26)
> > I'm not sure what the other engines do here to get around this problem.
> 
> Well, if we gc'ed more often, that would bound the amount of memory we
> waste, right?

A good approach/fix would be a non-linear memory increase function.
We might also include the frequency of GC calls in the function.
So if we already use a lot of memory, don't increase the heap by 3x, just by 1.5x or even less.
If the last GC was 4 seconds ago, don't increase the heap at all.
Whiteboard: [MemShrink] → [js:p1][MemShrink]
Whiteboard: [js:p1][MemShrink] → [js:p1:fx16][MemShrink]
Attached patch first nonlinear attempt (obsolete) — Splinter Review
Hi Jon-Carlos,

I hacked a first version of some non-linear behavior. We definitely have to make it nice and tunable but could you give this patch a try and see what happens? You can also play with the numbers if you want.
Nice! Some interesting results. The new code was triggered quite often (every 5 - 8 seconds).

I tested in release builds with all optimization enabled (js.exe -a -m -n ...)

Changes to peak memory during a test rendering with otherwise identical settings:

Buddha:
~1780mb to ~800mb @ 800x800
~1390mb to ~622mb @ 400x400

Bunny:
FAILURE to ~995mb @ 800x800
~1772mb to ~641mb @ 400x400

There is a performance hit. In the few tests I did, it ranges from 5%-12%.

One thing I wanted to mention was that in some tests the new "HFGC" code is at first triggered often but later is triggered more rarely and seems to get into a pattern where it is triggered in pairs:

---
Early:
---

2.76% in 7.4 seconds @ 12542 rays/second.
Tree depth: 12.03 avg  (8 min - 19 max) Intersection tests: 135.91/pixel, 25.89/ray
HIGH FREQUENCY GC 64073728, 3.000000!

2.88% in 8.2 seconds @ 12182 rays/second.
Tree depth: 12.05 avg  (8 min - 19 max) Intersection tests: 145.45/pixel, 26.83/ray

3.08% in 9.1 seconds @ 12358 rays/second.
Tree depth: 12.04 avg  (8 min - 19 max) Intersection tests: 151.84/pixel, 26.75/ray
HIGH FREQUENCY GC 72155136, 3.000000!

3.28% in 10.0 seconds @ 12466 rays/second.
Tree depth: 12.07 avg  (8 min - 19 max) Intersection tests: 156.38/pixel, 26.44/ray
HIGH FREQUENCY GC 96567296, 3.000000!

---
Later - After ~30% this pattern emerges:
---

94.76% in 289.5 seconds @ 14603 rays/second.
Tree depth: 12.32 avg  (1 min - 19 max) Intersection tests: 134.29/pixel, 19.27/ray

94.96% in 290.3 seconds @ 14607 rays/second.
Tree depth: 12.32 avg  (1 min - 19 max) Intersection tests: 134.47/pixel, 19.27/ray

95.12% in 291.2 seconds @ 14603 rays/second.
Tree depth: 12.32 avg  (1 min - 19 max) Intersection tests: 134.74/pixel, 19.29/ray

95.32% in 292.2 seconds @ 14603 rays/second.
Tree depth: 12.32 avg  (1 min - 19 max) Intersection tests: 135.02/pixel, 19.31/ray

95.48% in 293.5 seconds @ 14574 rays/second.
Tree depth: 12.33 avg  (1 min - 19 max) Intersection tests: 135.21/pixel, 19.31/ray

96.48% in 294.6 seconds @ 14648 rays/second.
Tree depth: 12.32 avg  (1 min - 19 max) Intersection tests: 134.29/pixel, 19.21/ray

96.72% in 295.6 seconds @ 14656 rays/second.
Tree depth: 12.32 avg  (1 min - 19 max) Intersection tests: 134.50/pixel, 19.22/ray

96.92% in 296.5 seconds @ 14658 rays/second.
Tree depth: 12.32 avg  (1 min - 19 max) Intersection tests: 134.74/pixel, 19.23/ray

97.08% in 297.4 seconds @ 14653 rays/second.
Tree depth: 12.33 avg  (1 min - 19 max) Intersection tests: 135.02/pixel, 19.25/ray

97.28% in 298.4 seconds @ 14652 rays/second.
Tree depth: 12.33 avg  (1 min - 19 max) Intersection tests: 135.30/pixel, 19.27/ray
HIGH FREQUENCY GC 517435392, 1.500000!
HIGH FREQUENCY GC 517435392, 1.500000!

97.40% in 300.2 seconds @ 14589 rays/second.
Tree depth: 12.33 avg  (1 min - 19 max) Intersection tests: 135.44/pixel, 19.28/ray

97.92% in 301.2 seconds @ 14618 rays/second.
Tree depth: 12.33 avg  (1 min - 19 max) Intersection tests: 135.21/pixel, 19.25/ray
Added some tunable parameters to the Gregor's patch and the gcHeapGrowthFactor varies smoothly over the range controlled by the parameters instead of in a few discrete steps.

The memory use is way down now:

Bunny @ 800x800
Original | Gregor's | This Patch
FAILURE  -  995mb   -   404mb

With the current parameters, the performance is about the same (sometimes a little worse and sometimes a little better) than the last patch.
Attachment #634852 - Attachment is patch: true
(In reply to Bill McCloskey (:billm) from comment #25)
> Thanks for the test case, Jon-Carlos. We're working on adding generational
> and compacting collection. Unfortunately, I don't think there's much we can
> do here in the short term. I can give advice about how to work around the
> issue, if that's helpful.

I still wonder why we're holding on to ~256MB of non-function objects even after rendering has finished (and after minimizing memory usage with about:memory), where Chrome isn't ever exceeding 166MB. Is that caused by a less efficient representation of the objects, or are we holding on to more objects than they do?
(In reply to Jon-Carlos Rivera from comment #29)
> Nice! Some interesting results. The new code was triggered quite often
> (every 5 - 8 seconds).
> One thing I wanted to mention was that in some tests the new "HFGC" code is
> at first triggered often but later is triggered more rarely and seems to get
> into a pattern where it is triggered in pairs:
> 

Right that's a bug in my hack. We also have to adapt the trigger logic in MaybeGC. Currently we trigger a GC in MaybeGC if we have reached the 75% mark of the max-trigger. This doesn't work any more when we reduce the heap-growth-factor and we end up triggering 2 GCs in a row.
(In reply to Gregor Wagner [:gwagner] from comment #32)
> (In reply to Jon-Carlos Rivera from comment #29)
> > Nice! Some interesting results. The new code was triggered quite often
> > (every 5 - 8 seconds).
> > One thing I wanted to mention was that in some tests the new "HFGC" code is
> > at first triggered often but later is triggered more rarely and seems to get
> > into a pattern where it is triggered in pairs:
> 
> Right that's a bug in my hack. We also have to adapt the trigger logic in
> MaybeGC. Currently we trigger a GC in MaybeGC if we have reached the 75%
> mark of the max-trigger. This doesn't work any more when we reduce the
> heap-growth-factor and we end up triggering 2 GCs in a row.

The results look good so far. Is this going to work if you just adapt the trigger logic to the new growth factor?
Status: UNCONFIRMED → NEW
Ever confirmed: true
(In reply to David Mandelin from comment #33)
> (In reply to Gregor Wagner [:gwagner] from comment #32)
> > (In reply to Jon-Carlos Rivera from comment #29)
> > > Nice! Some interesting results. The new code was triggered quite often
> > > (every 5 - 8 seconds).
> > > One thing I wanted to mention was that in some tests the new "HFGC" code is
> > > at first triggered often but later is triggered more rarely and seems to get
> > > into a pattern where it is triggered in pairs:
> > 
> > Right that's a bug in my hack. We also have to adapt the trigger logic in
> > MaybeGC. Currently we trigger a GC in MaybeGC if we have reached the 75%
> > mark of the max-trigger. This doesn't work any more when we reduce the
> > heap-growth-factor and we end up triggering 2 GCs in a row.
> 
> The results look good so far. Is this going to work if you just adapt the
> trigger logic to the new growth factor?

I was playing a lot with this code and fixed some more problems. The results look pretty promising. I can reduce the heap size for some animations up to a factor 2 or 3. The problem now is that our trigger logic has to be adapted to a much higher GC frequency. So for example we already allocate half the heap size again during the time we spend incremental marking and background sweeping.
Attached patch patch (obsolete) — Splinter Review
I came up with following patch:
The heap growth factor depends on the heap size after a GC and the GC frequency.
For low frequency GCs (more than 3sec between GCs) we let the heap grow to 150%.
For high frequency GCs we let the heap grow depending on the heap size.
 -> x < highFrequencyLowLimit : 300%
 -> highFrequencyLowLimit < x highFrequencyHighLimit: linear decrease from highFrequencyHeapGrowthMax to highFrequencyHeapGrowthMin (3.0 to 1.5)
 -> x > highFrequencyHighLimit: 150%
I also increased the gcSliceBudget to 30msec for high frequency GCs
Assignee: general → anygregor
Attachment #634717 - Attachment is obsolete: true
Attachment #636330 - Flags: feedback?(wmccloskey)
Gregor: is this ready for review?  Do you need to do more measurement?
Whiteboard: [js:p1:fx16][MemShrink] → [js:p1:fx16][MemShrink:P2]
Comment on attachment 636330 [details] [diff] [review]
patch

Review of attachment 636330 [details] [diff] [review]:
-----------------------------------------------------------------

I like the growth factor changes in this patch.

However, I don't like the slice time changes. For example, my browser does a GC about every 2.5 seconds during the V8 spinning balls benchmark. If we increase the slice time to 30ms there, then there's almost no point in doing incremental GC at all. I'd prefer an approach that increased the slice time based on how many slices we've done so far.

::: js/src/jsgc.cpp
@@ +1359,5 @@
> +    double highFrequencyHeapGrowthMin = 1.5;
> +
> +    double lowFrequencyHeapGrowth = 1.5;
> +    double lowerHeuristicStart = 1; // Triggers are not computed for tiny compartments;
> +    double MB = 1024 * 1024;

I think this code would be cleaner if you converted lastBytes to megabytes at the beginning.

@@ +1364,5 @@
> +
> +    if (lastBytes < lowerHeuristicStart * 1024 * 1024) {
> +        gcHeapGrowthFactor = lowFrequencyHeapGrowth;
> +    } else {
> +      uint64_t currTime = PRMJ_Now();

It's very important that GCs in the shell be deterministic. We'll need a way for the browser to opt into this mode. For the shell, it seems like high frequency GCs are closest to our current behavior, so maybe that could be the default.

@@ +1366,5 @@
> +        gcHeapGrowthFactor = lowFrequencyHeapGrowth;
> +    } else {
> +      uint64_t currTime = PRMJ_Now();
> +      if (rt->gcLastGCTime && rt->gcLastGCTime + highFrequencyTimeLimit * 1000 * 1000 > currTime) {
> +        double k = (highFrequencyHeapGrowthMin - highFrequencyHeapGrowthMax) /

4 space indent.

@@ +1371,5 @@
> +                   (highFrequencyHighLimit - highFrequencyLowLimit);
> +        if (lastBytes < highFrequencyLowLimit * MB) {
> +          gcHeapGrowthFactor = highFrequencyHeapGrowthMax;
> +        } else if (lastBytes < highFrequencyHighLimit * MB) {
> +          gcHeapGrowthFactor = k * (lastBytes / 1024 / 1024 - highFrequencyLowLimit) + highFrequencyHeapGrowthMax;

I think this computation would be cleaner if you do the linear interpolation first, and then clamp it if it's out of bounds. Something like this:

k = (highFrequencyHeapGrowthMax - highFrequencyHeapGrowthMin) / (highFrequencyHighLimit - highFrequencyLowLimit);
gcHeapGrowthFactor = (lastBytes - highFrequencyLowLimit) * k + highFrequencyHeapGrowthMin;
if (gcHeapGrowthFactor > highFrequencyHeapGrowthMax)
    gcHeapGrowthFactor = highFrequencyHeapGrowthMax;
if (gcHeapGrowthFactor < highFrequencyHeapGrowthMin)
    gcHeapGrowthFactor = highFrequencyHeapGrowthMin;
> I think this computation would be cleaner if you do the linear interpolation
> first, and then clamp it if it's out of bounds. Something like this:
> 
> k = (highFrequencyHeapGrowthMax - highFrequencyHeapGrowthMin) /
> (highFrequencyHighLimit - highFrequencyLowLimit);
> gcHeapGrowthFactor = (lastBytes - highFrequencyLowLimit) * k +
> highFrequencyHeapGrowthMin;
> if (gcHeapGrowthFactor > highFrequencyHeapGrowthMax)
>     gcHeapGrowthFactor = highFrequencyHeapGrowthMax;
> if (gcHeapGrowthFactor < highFrequencyHeapGrowthMin)
>     gcHeapGrowthFactor = highFrequencyHeapGrowthMin;

JS_MIN/JS_MAX might be useful here?
Or maybe the slice time could be based on the number of bytes allocated.
Blocks: gecko-games
Summary: Memory usage during ray trace rendering rapidly rises to 2GB before crashing → Make JS heap growth factor depend on the heap size after a GC and GC frequency
(In reply to Nicholas Nethercote [:njn] from comment #36)
> Gregor: is this ready for review?  Do you need to do more measurement?

Thanks Bill! I will update the patch today. I wanted to get some feedback before proceeding here. So it seems that we agree on the basic heuristic and the requested changes won't change the heuristics. We should continue with more testing.

For the slice time: This is very device dependent and we need a more dynamic approach. For B2G for example I saw that GCs take up to 2-3 sec for a 15MB heap(total time and not peak pause time) when the device is otherwise idle.
But we should move the dynamic slice time to another bug.

Both issues are related since less heap growth leads to more GCs. For high allocation situations we trigger a new GC during the previous GC is still in the marking phase and we end up with a very long reset pause time.
(In reply to Gregor Wagner [:gwagner] from comment #40)
> Both issues are related since less heap growth leads to more GCs. For high
> allocation situations we trigger a new GC during the previous GC is still in
> the marking phase and we end up with a very long reset pause time.

Another thing we could try is to make the GC slices closer together. Normally we do a slice either when a paint occurs or, as a fallback, 100ms after the previous slice. I always thought 100ms was too long, but I was afraid that if I made it shorter then we might double up (i.e., have a slice because of the timer, and then another slice right after when we paint). However, I think we could avoid this by saving a little bit more state and avoiding the paint GC if a slice just happened.

Anyway, if we could change the 100ms timer to, say, a 50ms timer, then we would be doubling the amount of GC work being done in the same period of time. And we would still have about 35ms to respond to user events after each slice.
Attached patch Patch (obsolete) — Splinter Review
I made all the parameters tunable via about:config. The slice time increase is now only 2x for high frequency GCs but we definitely need to tune this more in another bug.
Please ignore the REMOVE lines but this is useful if other people want to test it. I will remove them when landing.
Attachment #636330 - Attachment is obsolete: true
Attachment #636330 - Flags: feedback?(wmccloskey)
Attachment #637456 - Flags: review?(wmccloskey)
Comment on attachment 637456 [details] [diff] [review]
Patch

Review of attachment 637456 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks, Gregor. Can we do one more round?

::: dom/base/nsJSEnvironment.cpp
@@ +3736,5 @@
> +  // handle overflow and negative pref values
> +  if (pref > 0 && pref < 10000)
> +    JS_SetGCParameter(nsJSRuntime::sRuntime, JSGC_HIGH_FREQUENCY_TIME_LIMIT, pref);
> +  return 0;
> +}

It seems like there's a lot of boilerplate here. Could you pass the parameter (e.g., JSGC_HIGH_FREQUENCY_TIME_LIMIT) in as the aClosure param (casting it to a void*) and just have one function?

::: js/src/jsapi.cpp
@@ +730,5 @@
> +    gcHighFrequencyHighLimit(400),
> +    gcHighFrequencyHeapGrowthMax(300),
> +    gcHighFrequencyHeapGrowthMin(150),
> +    gcLowFrequencyHeapGrowth(150),
> +    gcDynamicHeapGrowth(true),

I think this should be initialized to false so that the shell doesn't do dynamic heap growth.

::: js/src/jsapi.h
@@ +3556,5 @@
>  
>      /* Maximum size the GC mark stack can grow to. */
> +    JSGC_MARK_STACK_LIMIT = 10,
> +
> +    /* Determins high frequency time */

"GCs less than this far apart in time will be considered 'high-frequency GCs'. See setGCLastBytes in jsgc.cpp."

@@ +3559,5 @@
> +
> +    /* Determins high frequency time */
> +    JSGC_HIGH_FREQUENCY_TIME_LIMIT = 11,
> +
> +    /* Start of dynamic heap growth */

The rest of these should end with a period.

@@ +3574,5 @@
> +
> +    /* Heap growth for low frequency GCs */
> +    JSGC_LOW_FREQUENCY_HEAP_GROWTH = 16,
> +
> +    /* Enable/Disable dynamic heap growth */

"If false, the heap growth factor is fixed at 3. If true, it is determined based on whether GCs are high- or low- frequency."

@@ +3577,5 @@
> +
> +    /* Enable/Disable dynamic heap growth */
> +    JSGC_DYNAMIC_HEAP_GROWTH = 17,
> +
> +    /* Enable/Disable dynamic mark slice time for IGC */

"If true, high-frequency GCs will use a longer mark slice."

::: js/src/jscntxt.h
@@ +491,4 @@
>      int64_t             gcJitReleaseTime;
>      JSGCMode            gcMode;
> +    bool                gcHighFrequencyGC;
> +    uint64_t            gcHighFrequencyTimeLimit;

Maybe call this gcHighFrequencyTimeThreshold.

@@ +491,5 @@
>      int64_t             gcJitReleaseTime;
>      JSGCMode            gcMode;
> +    bool                gcHighFrequencyGC;
> +    uint64_t            gcHighFrequencyTimeLimit;
> +    size_t              gcHighFrequencyLowLimit;

Maybe call this gcHighFrequencyLowLimitBytes (and see my later comment about bytes/megabytes).

::: js/src/jsgc.cpp
@@ +115,5 @@
>  /* Perform a Full GC every 20 seconds if MaybeGC is called */
>  static const uint64_t GC_IDLE_FULL_SPAN = 20 * 1000 * 1000;
>  
> +/* Increase the IGC marking slice time if we are in highFrequencyGC mode. */
> +const int IGC_MARK_SLICE_MULTIPLICATOR = 2;

Multiplier seems more colloquial to me than multiplicator.

@@ +1329,5 @@
>  JSCompartment::setGCLastBytes(size_t lastBytes, size_t lastMallocBytes, JSGCInvocationKind gckind)
>  {
> +    /*
> +     * The heap growth factor depends on the heap size after a GC and the GC frequency.
> +     * For low frequency GCs (more than 3sec between GCs) we let the heap grow to 150%.

Change 3sec to 1sec.

@@ +1334,5 @@
> +     * For high frequency GCs we let the heap grow depending on the heap size.
> +     * x < highFrequencyLowLimit : 300%
> +     * highFrequencyLowLimit < x highFrequencyHighLimit: linear decrease from
> +     * highFrequencyHeapGrowthMax to highFrequencyHeapGrowthMin (3.0 to 1.5)
> +     * x > highFrequencyHighLimit: 150%

This would be easier to read with some extra spaces:

For high frequency GCs we let the heap grow depending on the heap size:
    lastBytes < highFrequencyLowLimit: 300%
    lastBytes > highFrequencyHighLimit: 150%
    otherwise: linear interpolation between 150% and 300% based on lastBytes

@@ +1337,5 @@
> +     * highFrequencyHeapGrowthMax to highFrequencyHeapGrowthMin (3.0 to 1.5)
> +     * x > highFrequencyHighLimit: 150%
> +     */
> +
> +    size_t lastBytesInMB = lastBytes / 1024 / 1024;

I realized that, for everything except this code, we store sizes in bytes. So maybe it would be better to have JS_Get/SetGCParameter to convert from megabytes to bytes and then remove this conversion on lastBytes.

@@ +1342,5 @@
> +
> +    if (!rt->gcDynamicHeapGrowth) {
> +        gcHeapGrowthFactor = 3.0;
> +    } else if (lastBytesInMB < 1) {
> +        gcHeapGrowthFactor = rt->gcLowFrequencyHeapGrowth / 100;

Shouldn't this be 100.0, not 100? However, I have an idea. For the parameters that you divide by 100, I think it would be cleaner to store them in JSRuntime as doubles and to convert them in JS_Get/SetGCParameter.

@@ +1344,5 @@
> +        gcHeapGrowthFactor = 3.0;
> +    } else if (lastBytesInMB < 1) {
> +        gcHeapGrowthFactor = rt->gcLowFrequencyHeapGrowth / 100;
> +    } else {
> +        JS_ASSERT(rt->gcHighFrequencyHighLimit > rt->gcHighFrequencyLowLimit);

We should probably handle this being false, since these are preferences.

@@ +1345,5 @@
> +    } else if (lastBytesInMB < 1) {
> +        gcHeapGrowthFactor = rt->gcLowFrequencyHeapGrowth / 100;
> +    } else {
> +        JS_ASSERT(rt->gcHighFrequencyHighLimit > rt->gcHighFrequencyLowLimit);
> +        uint64_t currTime = PRMJ_Now();

I think we usually would call this |now|.

@@ +1346,5 @@
> +        gcHeapGrowthFactor = rt->gcLowFrequencyHeapGrowth / 100;
> +    } else {
> +        JS_ASSERT(rt->gcHighFrequencyHighLimit > rt->gcHighFrequencyLowLimit);
> +        uint64_t currTime = PRMJ_Now();
> +        if (rt->gcLastGCTime && rt->gcLastGCTime + rt->gcHighFrequencyTimeLimit * 1000 > currTime) {

Instead of 1000, please use PRMJ_USEC_PER_MSEC.

@@ +1352,5 @@
> +                       / (double)(rt->gcHighFrequencyHighLimit - rt->gcHighFrequencyLowLimit);
> +            gcHeapGrowthFactor = (k * (lastBytesInMB - rt->gcHighFrequencyLowLimit)
> +                                 + rt->gcHighFrequencyHeapGrowthMax / 100.0);
> +            gcHeapGrowthFactor = JS_MIN(gcHeapGrowthFactor, (rt->gcHighFrequencyHeapGrowthMax / 100.0));
> +            gcHeapGrowthFactor = JS_MAX(gcHeapGrowthFactor, (rt->gcHighFrequencyHeapGrowthMin / 100.0));

js::Min and js::Max are a little shinier.

@@ +3948,5 @@
>  {
> +    Collect(rt, true, 
> +        rt->gcHighFrequencyGC && rt->gcDynamicMarkSlice
> +        ? rt->gcSliceBudget * IGC_MARK_SLICE_MULTIPLICATOR
> +        : rt->gcSliceBudget, gckind, reason);

Could you store the budget in a variable and then pass it into Collect?

::: modules/libpref/src/init/all.js
@@ +695,5 @@
> +pref("javascript.options.mem.gc_high_frequency_heap_growth_max", 300);
> +pref("javascript.options.mem.gc_high_frequency_heap_growth_min", 150);
> +pref("javascript.options.mem.gc_low_frequency_heap_growth", 150);
> +pref("javascript.options.mem.gc_dynamic_heap_growth", true);
> +pref("javascript.options.mem.gc_dynamic_mark_slice", true);

Do we have a way to set this to false on desktop?
Attachment #637456 - Flags: review?(wmccloskey)
(In reply to Bill McCloskey (:billm) from comment #43)
> Comment on attachment 637456 [details] [diff] [review]
> Patch
> 
> Review of attachment 637456 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Thanks, Gregor. Can we do one more round?

yeah I was pretty jet-lagged when I wrote this version :)


> @@ +695,5 @@
> > +pref("javascript.options.mem.gc_high_frequency_heap_growth_max", 300);
> > +pref("javascript.options.mem.gc_high_frequency_heap_growth_min", 150);
> > +pref("javascript.options.mem.gc_low_frequency_heap_growth", 150);
> > +pref("javascript.options.mem.gc_dynamic_heap_growth", true);
> > +pref("javascript.options.mem.gc_dynamic_mark_slice", true);
> 
> Do we have a way to set this to false on desktop?

I changed it now to 20msec instead of 10. Do you think this is still too much?
Attached patch patchSplinter Review
pushed to try:
https://tbpl.mozilla.org/?tree=Try&rev=fa1b89a4323c
Attachment #637456 - Attachment is obsolete: true
Whiteboard: [js:p1:fx16][MemShrink:P2] → [js:p1:fx16][MemShrink:P2][games:p2]
Comment on attachment 638871 [details] [diff] [review]
patch

I changed the values now from MB to bytes but I am seeing some precision problems with division in corner cases. So I am not using the JS_MIN and JS_MAX approach any more.
Attachment #638871 - Flags: review?(wmccloskey)
Comment on attachment 638871 [details] [diff] [review]
patch

Review of attachment 638871 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jsgc.cpp
@@ +1357,5 @@
> +                gcHeapGrowthFactor = (k * (lastBytes - rt->gcHighFrequencyLowLimitBytes)
> +                                     + rt->gcHighFrequencyHeapGrowthMax);
> +                JS_ASSERT(gcHeapGrowthFactor <= rt->gcHighFrequencyHeapGrowthMax
> +                          && gcHeapGrowthFactor >= rt->gcHighFrequencyHeapGrowthMin);
> +                rt->gcHighFrequencyGC = true;

I think this should be moved outside of the else block so that it's always set for high-frequency GCs.
Attachment #638871 - Flags: review?(wmccloskey) → review+
(In reply to Bill McCloskey (:billm) from comment #47)
> Comment on attachment 638871 [details] [diff] [review]
> patch
> 
> Review of attachment 638871 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: js/src/jsgc.cpp
> @@ +1357,5 @@
> > +                gcHeapGrowthFactor = (k * (lastBytes - rt->gcHighFrequencyLowLimitBytes)
> > +                                     + rt->gcHighFrequencyHeapGrowthMax);
> > +                JS_ASSERT(gcHeapGrowthFactor <= rt->gcHighFrequencyHeapGrowthMax
> > +                          && gcHeapGrowthFactor >= rt->gcHighFrequencyHeapGrowthMin);
> > +                rt->gcHighFrequencyGC = true;
> 
> I think this should be moved outside of the else block so that it's always
> set for high-frequency GCs.

Right!

https://hg.mozilla.org/integration/mozilla-inbound/rev/5bd1aff57cce
https://hg.mozilla.org/mozilla-central/rev/5bd1aff57cce
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla16
Depends on: 842932
You need to log in before you can comment on or make changes to this bug.