Improve octane score by modifying GC settings of B2G.

RESOLVED FIXED in mozilla25

Status

()

Core
JavaScript Engine
RESOLVED FIXED
4 years ago
3 years ago

People

(Reporter: nbp, Assigned: nbp)

Tracking

({qawanted})

unspecified
mozilla25
ARM
Gonk (Firefox OS)
qawanted
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(2 attachments)

(Assignee)

Description

4 years ago
+++ This bug was initially created as a clone of Bug #855037 +++

Currently the GC settings are extremely low, and are causing Octane benchmark performances to be way below what we have without any preferences made to prevent GCs.

I reported in Bug 855037 comment 3, that we get huge improvement on deltablue (~ x5.2) [on a unagi with 256MB] when GCs are disabled.

While running with MOZ_GCTIMER=stdout, I notice 291 GCs on the Unagi with full access to the RAM, compared to 132 on a desktop.  Also the ""average"" cycles are around 7ms on the desktop and around 100ms on the Unagi.

I cannot get detailled information on which phase is taking most of the time or thre reasons for the GC, because fopen probably deals badly with the sub-process model used by the b2g browser.

Looking at the results on desktop, seems to mention that most of the cases are ALLOC_TRIGGER or TOO_MANY_ALLOC reasons.  Bill mentioned that for benchmarks it is better to have full GC instead of incremental GCs in terms of performances.

At the moment, I am collecting data to see how high frequency settings influence GC pauses (as reported by MOZ_GCTIMER=stdout) and the octane score.
(Assignee)

Comment 1

4 years ago
Created attachment 739929 [details]
GC logs of automated benchmarks

This log contains the GC details of all running processes made during the automation test of octane.  Processes can be distinguished based on their relative time-stamp, as the start-time of each GC is based on the startup time of each process.  Note that the 7 DOM_WORKER GCs are occurring in a separated process.

Based on these logs, I can notice that a some GCs at the end are spaced by a few seconds which makes them low-frequency GCs instead of high-frequency GCs.

Note, I do not know if this is related to the fact that these results are dumped but the score has never been that high with these settings. (382 instead of 300)  From the benchmark perspective, it would make sense to increase the time between GCs to detect them as high-frequency GCs.  Only a few GCs are not considered as high-frequency GCs at the beginning of the benchmark.
(Assignee)

Comment 2

4 years ago
I have been experimenting for a while with GC settings, to have a good grasp of what is going on.  So far I have been testing multiple GC configurations on b2g, while trying to figure out a good set of options to keep the phone responsive.

One thing I want to avoid, is to come back on it because the responsiveness of the phone is damaged in application X, due to the upcoming GC setting changes.

So far, the experiment that I made only on octane benchmarks (score and average GC pauses) highlight that best compromised (smallest average GC pauses) score are achieved with:

  300  for javascript.options.mem.gc_high_frequency_heap_growth_max
  ~150 for javascript.options.mem.gc_high_frequency_heap_growth_min
  ~30  for javascript.options.mem.gc_high_frequency_high_limit_mb
  ~0   for javascript.options.mem.gc_high_frequency_low_limit_mb

The way the average GC pauses was computed was by using the MOZGC_TIMER=stdout to dump the result.  This has the effect of merging incremental GC slices into an apparent bigger GC which does not correspond to the apparent pauses.  So this measurement was not accurate.

Bill mentioned, the goal that we are trying to achieve on benchmark is to avoid incremental GCs as we want to minimize the time spent in the GC.  So the smallest average GC pauses should still be a good metric to correlate with benchmark score, but it does not highlight the responsiveness of B2G applications.

So to highlight the responsiveness of the phone, I tried to make a benchmark[1] which highlight how an application behaves under small periodic allocations.  This benchmark extract the average time between 2 frames, as well as the variance of the time between 2 frames.  A score is computed by multiplying the (variance + 1) with the average value.  The variance is taken into account to reduce the score of jumpy applications.  The mean is taken into account to reduce the score of small refresh rate.  This benchmark is based on the incremental GC example that Bill made[2] and published on the JavaScript blog.

Now that I got a good grasp and understanding of the effect of the high frequency settings, I will try changing other GC settings such as the time taken by slices (gc_incremental_slice_ms) and the gap between high frequency GC (high_frequency_time_limit_ms).

The first results on the score of the octane benchmarks seems to imply that we should increase the time between HF GCs, which makes sense knowing that ARM is slower than desktop, both for collecting, and also for allocating.  Doing so will cause the benchmark to allocate a larger heap sooner, and thus leaving more time to the program to run more before the next garbage collection.

[1] http://people.mozilla.org/~npierron/snappy-bench/
[2] https://blog.mozilla.org/javascript/2012/08/28/incremental-gc-in-firefox-16/
Summary: Too many GCs cycles (mark + sweep) while running octane on B2G. → Improve octane score by modifying GC settings of B2G.
(Assignee)

Comment 3

4 years ago
I have been testing on the unagi a large range of configuration while running both snappy (comment 2 [1]) and octane benchmarks.  The test I have been randomly probing the space of GC settings to see the effect of the benchmark scores in function of the GC settings.

I modified the following settings:
 - javascript.options.mem.gc_high_frequency_heap_growth_max: current 150, mostly tried with 300
 - javascript.options.mem.gc_high_frequency_heap_growth_min: current 120, tried with 270, 220, 170, 120
 - javascript.options.mem.gc_high_frequency_high_limit_mb: current 40, tried with 40, 30, 20, 10, 0
 - javascript.options.mem.gc_high_frequency_low_limit_mb: current 10, tried with 40, 30, 20, 10, 0, always below the high limit (1).
 - javascript.options.mem.gc_incremental_slice_ms: current 10, tried with 10, 20, 30, 40, 50
 - javascript.options.mem.gc_high_frequency_time_limit_ms: current 1000, tried with 2000, 1750, 1500, 1250, 1000, 750, 500, 250

I did not touch other settings such as:
 - javascript.options.mem.gc_low_frequency_heap_growth: set to 119 (2)
 - javascript.options.mem.high_water_mark: set to 6
 - javascript.options.mem.gc_allocation_threshold_mb: set to 3 (3)

I tried to get a 2D/3D representation[1] to analyze the inter-dependencies of the variables with the probability density to get a good score if other parameters are changed.  The raw data are available in a compress format[2].  These results have been checked against ~400 different gc-settings configurations.

It indicates if the score is some-how stable when we change other parameters.  Stability is interesting because these settings are defined on 2 particular benchmarks, if the parameters are not stables, then we are likely to regress performances on another benchmark, another application or another device.

The 3D representations are looking at the octane score divided by the snappy score (octane: higher is better, snappy: lower is better).  I made multiple variants[2] o1s1, o1s2, o2s1 and o2s2 which gives the ability to see how the relations between components are changing if we either emphasize the memory aspect (o1s2) or the benchmark aspect (o2s1).

From these graph we can see:
 - [g1] Octane score is highly tied to the high_frequency_time_limit_ms (higher => better octane score)
 - [g2] Snappy performs well with a 30ms incremental_slice_ms
 - [g3] Both Octane & Snappy performs well with a high high_frequency_heap_growth_max, based on bill recommendation, we should not go above 300 on desktop. (multiply the heap size by 3, when the heap size is above the high_limit_mb)
 - [g4] Highligh that the best high_frequency_time_limit_ms is around 1450, and it confirms what was shown by [g2].  Note that by looking at the o2s1 version, we can see that octane benchmarks benefits from a higher time_limit.  We can do the same remark on many other graphs.
 - [g5] Before interpreting this graph, note that the upper-left part should be ignored due to the interdependency (1).  What is interesting is that this graph suggest that the high_limit should be at 40, and that the low_limit should either be 0 (best for snappy, o1s2) or equal to the high_limit (best for octane, o2s1).
 - [g6] Suggest that the growth_min should either be the minimum (best for snappy) or the maximum (best for octane).

As opposed to most of the previous results, the growth_min setting does not have any simple relation with other settings, [g6] seems to be an exception as the o1s2 emphasis show a new class of settings which seems better for snappy.

Based on the previous analysis, on of the best configuration would be:
  300  for javascript.options.mem.gc_high_frequency_heap_growth_max
  120  for javascript.options.mem.gc_high_frequency_heap_growth_min
  40   for javascript.options.mem.gc_high_frequency_high_limit_mb
  0    for javascript.options.mem.gc_high_frequency_low_limit_mb
  30   for javascript.options.mem.gc_incremental_slice_ms
  1500 for javascript.options.mem.gc_high_frequency_time_limit_ms

and it gives the following set of results on octane:

Richards: 902
DeltaBlue: 303
Crypto: 1050
RayTrace: 178
EarleyBoyer: 434
RegExp: 109
Splay: 286
NavierStokes: 1188
PdfJS: 287
Mandreel: 201
Gameboy: 374 (4)
CodeLoad: 589
Box2D: 230
Score: 370 (23% improvement)

And a good score of 171201 on snappy (better than with the default settings, which is around ~210000) (18% improvement).

It is interesting to note that the high_limit is the best at the max value analyzed.  I will do more analysis with a larger range of high_limit and a larger range of low_limit to see if we can find non-limit values.  The same is true for the growth_max, but bill did not recommend modifying this one above 300 as it multiplier of the heap size when an allocating GC occur.


(1) If you are looking at [1] results, you might notice that the graph are covering the full space, this is due by the gaussian approximation which spread the results across the graph to make it smooth with the current sampling.  In practice, either the upper or lower diagonal is important, and you might notice that the rest is just a fading continuation of what is visible on the diagonal.

(2) The low frequency heap growth is set to the minimal value (~117) which can be during a js::MaybeGC, due to the factor which needs to be at least 1, such as we don't shrink when the compartment is full.

(3) The allocation threshold is extremely miss-leading, as changing it is a total benefit for octane benchmarks, but it also cause B2G applications to take more memory.  As this value has been lowered to improve applications on B2G, I will work with the current value, even if higher one are highly improving the octane benchmark results.

(4) The version of Gecko used for this benchmark is the same for all runs, and it does not include the recent changes made by jandem which recently improved GameBoy.

[1] http://people.mozilla.org/~npierron/unagi-gc-data/
[2] http://people.mozilla.org/~npierron/unagi-gc-data/gc-data.tar.bz2

[g1] http://people.mozilla.org/~npierron/unagi-gc-data/octane-gc_high_frequency_time_limit_ms.png
[g2] http://people.mozilla.org/~npierron/unagi-gc-data/snappy-gc_incremental_slice_ms.png
[g3] http://people.mozilla.org/~npierron/unagi-gc-data/octane-gc_high_frequency_heap_growth_max.png and http://people.mozilla.org/~npierron/unagi-gc-data/snappy-gc_high_frequency_heap_growth_max.png
[g4] http://people.mozilla.org/~npierron/unagi-gc-data/o1s2-score-gc_high_frequency_time_limit_ms-gc_incremental_slice_ms.png
[g5] http://people.mozilla.org/~npierron/unagi-gc-data/o1s1-score-gc_high_frequency_high_limit_mb-gc_high_frequency_low_limit_mb.png
[g6] http://people.mozilla.org/~npierron/unagi-gc-data/o1s1-score-gc_high_frequency_heap_growth_min-gc_high_frequency_low_limit_mb.png
(Assignee)

Updated

4 years ago
Depends on: 867779
I don't want to bikeshed on what looks like a pretty rigorous multivariate optimization, but have you considered how these changes will (or won't) affect memory usage on the phone?

If this change is causing us to gc less frequently, I would naively worry that we could OOM more easily.  But if you're not worried about that, then I'm not either.
(Assignee)

Comment 5

4 years ago
(In reply to Justin Lebar [:jlebar] from comment #4)
> I don't want to bikeshed on what looks like a pretty rigorous multivariate
> optimization, but have you considered how these changes will (or won't)
> affect memory usage on the phone?

Yes, I am, I saw the results of GC parameters changes on b2g, back in August and this was a terrible UX.  This is exactly the reason why I collected results from snappy which is allocating 2048 objects per frame and keeps 1048576 objects alive across 3750 frames.  I hope this benchmark is representative enough of the responsiveness of some B2G Games.

The last set of settings gives the following GC stats on Snappy:

GC(T+236.391s) Total Time: 1090.2ms, Compartments Collected: 1, Total Compartments: 7, Total Zones: 4, MMU (20ms): 0%, MMU (50ms): 15%, SCC Sweep Total: 0.3ms, SCC Sweep Max Pause: 0.3ms, Max Pause: 40.2ms, Nonincremental Reason: allocation trigger, Allocated: 60MB,
 +Chunks: 0, -Chunks: 0

> If this change is causing us to gc less frequently, I would naively worry
> that we could OOM more easily.  But if you're not worried about that, then
> I'm not either.

Increasing the high growth of the high frequency GC is made to GC less frequently on benchmarks.  As the time spent at doing GC is costly and cause bad octane score.  I still have to check that high frequency GC are not frequent on the phone, both before and after these changes.  As soon as I can check that my-self, I will ask some persons from QA to make sure that they cannot find pitfalls in terms of performance regressions.

High frequency GC should be avoided during the normal phone usage, as they are source of GC pauses and unresponsive behavior.  Also they are more likely source of fragmentation, as they increase the heap size by a larger factor and tend to keep alive more pages.
I'm getting mixed signals here.

Are these changes only for benchmarks?  ("Increasing the high growth of the high frequency GC is made to GC less frequently on benchmarks.")

Or do you expect them to have user-perceivable effects? ("I saw the results of GC parameters changes on b2g, back in August and this was a terrible UX. [...] I hope this benchmark is representative enough of the responsiveness of some B2G Games.")

>> have you considered how these changes will (or won't) affect memory usage on the phone?
>
> Yes, I am [...] The last set of settings gives the following GC stats on Snappy:

I don't understand how the statistics here should inform me about how the changes affect memory usage on Snappy.  Should I be looking at the "Allocated" value?  I don't know what to compare it to.  But moreover, what matters in terms of OOMs is max heap size, and these changes, as I understand them, are explicitly allowing us to get a bigger heap.
(Assignee)

Comment 7

4 years ago
(In reply to Justin Lebar [:jlebar] from comment #6)
> I'm getting mixed signals here.
> 
> Are these changes only for benchmarks?  ("Increasing the high growth of the
> high frequency GC is made to GC less frequently on benchmarks.")

Yes, because some persons are complaining about JavaScript performances on B2G.

> Or do you expect them to have user-perceivable effects? ("I saw the results
> of GC parameters changes on b2g, back in August and this was a terrible UX.
> [...] I hope this benchmark is representative enough of the responsiveness
> of some B2G Games.")

Not with the way that I have been testing and collecting data to prevent that.

> >> have you considered how these changes will (or won't) affect memory usage on the phone?
> >
> > Yes, I am [...] The last set of settings gives the following GC stats on Snappy:
> 
> I don't understand how the statistics here should inform me about how the
> changes affect memory usage on Snappy.  Should I be looking at the
> "Allocated" value?  I don't know what to compare it to.  But moreover, what
> matters in terms of OOMs is max heap size, and these changes, as I
> understand them, are explicitly allowing us to get a bigger heap.

What is interesting is the Max Pause (40.2ms) and the heap size (60MB), which I consider quite good.  And as the snappy result show, this is an improvement compared to the current GC settings (see comment 3).

Indeed these changes are allowing us to get a larger heap faster when we have high-frequency GCs, which is explained by bill in Bug 867779 comment 3.  This is also the reason why I opened Bug 867779 to address this issue by adding a cap to the JavaScript heap size.
(Assignee)

Comment 8

4 years ago
I did a small test going through multiple steps:
 - boot the phone
 - everything.me > games > tetris
 - play until level 3
 - go back to the home screen
 - open the clock
 - open the calendar, swipe to december, create an event.
 - open the cristal skull app, press right, observe latency for about 30 sec.
 - open the browser, go to ebay.com, select the first promoted product down the page.

I logged all GCs happening durings these steps before and after the changes in the GC settings.  In both cases I didn't noticed any *harmful* high frequency GCs.  So based on this small experiment I am confident that the changes in the GC settings are fine in most cases.

The only high frequency GCs that I noticed were MEM_PRESSURE GCs, which is kind of ambiguous as they are shrinking GCs.  This might deserve another Bug if the computation is wrong.
> In both cases I didn't noticed any *harmful* high frequency GCs.

What's a harmful high-frequency GC?

I am extremely uncomfortable taking any changes on b2g that will make us more likely to OOM if the only upside is benchmark performance.  This change cannot be safe under all circumstances, since a webpage can clearly trigger the same behavior as a benchmark.

Can you please put me in touch with whoever has been asking for us to improve our benchmark scores?  IMO our stability story is far more important than our benchmark scores on B2G, and I'm happy to argue that with anyone in management.

The first priority should be ensuring that this change cannot possibly cause B2G to crash more often.  Then we can tune for benchmarks to our heart's content.
(Assignee)

Comment 10

4 years ago
(In reply to Justin Lebar [:jlebar] from comment #9)
> > In both cases I didn't noticed any *harmful* high frequency GCs.
> 
> What's a harmful high-frequency GC?

Nothing, forget about it. Nothing is harmful with GCs, they are getting rid of memory. Which is good!

By harmful I meant that they are not taking allocating more memory are they are shrinking GCs.  So they are unlikely to change the fragmentation issue reported recently.

(In reply to Justin Lebar [:jlebar] from comment #9)
> This change
> cannot be safe under all circumstances, since a webpage can clearly trigger
> the same behavior as a benchmark.

Do you have a benchmark to check for that?  If no, I will continue with this work which so far improve both the responsiveness by 18% (snappy) and the throughput by 23% (octane).  Without being the source of more fragmentation (comment 8).

Thanks for paying attention at such things, but you are not the only one.

> Can you please put me in touch with whoever has been asking for us to
> improve our benchmark scores?  IMO our stability story is far more important
> than our benchmark scores on B2G, and I'm happy to argue that with anyone in
> management.

CC: Andreas & Naveed

> The first priority should be ensuring that this change cannot possibly cause
> B2G to crash more often.  Then we can tune for benchmarks to our heart's
> content.

The only difference here, is that for application which have high-frequency GCs we will have less JavaScript Garbage Collection cycles before the application gets killed by the system, which just means less chances for the application to reduce its memory usage.  If an application allocate all the memory it needs ahead of time, this this will not change anything for such kind of applications.
can someone summarize the issues please?  I see that we can win 18% on snappy and 23% on octane - but at what costs? (meminfo, about:memory snapshots, per-app rss, etc.)


ftr, we are deathly scared of taking patches that regress memory usage.  and if you plan on uplifting this to b2g18... we all better damn sure that this doesn't cause any more ooms or segfaults.
(Assignee)

Comment 12

4 years ago
(In reply to Doug Turner (:dougt) from comment #11)
> can someone summarize the issues please?

- Bad JS benchmarks results implies bad press.
- Bad GC settings might cause a bad UX, which implies bad press.

The question is how do we determine that these GC settings (comment 3) are better than the one we had before.  And this is exactly to address the second aspect that I made the snappy benchmark (comment 2), to balance with the octane benchmark results which only address the first aspect of the problem.

> I see that we can win 18% on
> snappy and 23% on octane - but at what costs? (meminfo, about:memory
> snapshots, per-app rss, etc.)

I will collect this information and attach it to this bug, but as there is no high-frequency GCs, the data are extremely likely to be similar, modulo the noise.

Is there a clear procedure for running these tests which makes them comparable, is that automated somewhere?

> ftr, we are deathly scared of taking patches that regress memory usage.  and
> if you plan on uplifting this to b2g18... we all better damn sure that this
> doesn't cause any more ooms or segfaults.

The current suggestion will not change anything on applications (based on comment 8), which are not allocating extremely fast JS objects as octane benchmark do.
See the first paragraph of bug 867779 comment 3 for an explanation of what I'm concerned about.

In comment 3, nbp suggests that the optimal value for javascript.options.mem.gc_high_frequency_heap_growth_max is 300.  As I understand it, that would mean that we allow pages which allocate many objects quickly to triple their heap size before triggering another GC.

Currently, we trigger a GC after the heap has grown to 1.5x its current size.  This means we run a lot of GCs, but it also means that we're unlikely to OOM the process because of a ton of garbage sitting in the JS heap.

It's easy to see how changing this value from 1.5 to 3 can lead to an improved score on benchmarks and also lead to more OOMs.  Indeed, it's because of concern about OOMs that nbp filed bug 867779.

jld and pnkfelix suggested on IRC that perhaps the heap growth factor could decrease as the size of the heap increased.  This way, we could allow small heaps to grow quickly while restricting the amount of garbage in large heaps.  This is apparently known as an "inverse load factor".  But the trick is, an inverse load factor gives you an extra degree of freedom in which to screw up.  There's apparently a whole horror story about tuning an inverse load factor starting at bug 619885 comment 11, but I haven't read the whole thing.
(Assignee)

Comment 14

4 years ago
(In reply to Nicolas B. Pierron [:nbp] from comment #12)
> (In reply to Doug Turner (:dougt) from comment #11)
> > I see that we can win 18% on
> > snappy and 23% on octane - but at what costs? (meminfo, about:memory
> > snapshots, per-app rss, etc.)
> 
> I will collect this information and attach it to this bug, but as there is
> no high-frequency GCs, the data are extremely likely to be similar, modulo
> the noise.

I went through the following process, while taking an about:memory profiles [1] a few seconds after the end of the load of each of the following steps:
 - re-start the phone.
 - start the calendar app.
 - start the clock app.
 - start cristal skull app & press top right-ish part of the screen.
 - start PipesPuzzle market-place app.
 - start browser & go to http://people.mozilla.org/~npierron/octane (through a recent page visited)
 - start the email app.
 - start the contact app.
 - start the phone app.
 - start the sms app.
 - Close all apps.
 - Go to the everything.me view and come back and start the settings app.

I accidentally erased the email & browser profiles while taking the recent ones.  What I can notice, is that as expected, the memory usage is *highly similar* as none of the settings are changing the behavior of these apps.

[1] http://people.mozilla.org/~npierron/unagi-gc-data/app-startup-mem.tar.bz2
(Assignee)

Comment 15

4 years ago
(In reply to Justin Lebar [:jlebar] from comment #13)
> It's easy to see how changing this value from 1.5 to 3 can lead to an
> improved score on benchmarks and also lead to more OOMs.  Indeed, it's
> because of concern about OOMs that nbp filed bug 867779.

I filed Bug 867779 as an enhancement.  An application is not dead after 40 MB because the growth factor is 3.  This just means that if there is no other GC, such as one triggered by a timer, or by a MEM_PRESSURE event, then, and only in such cases the application will be killed by the Kernel when it attempt to over-commit.

> jld and pnkfelix suggested on IRC that perhaps the heap growth factor could
> decrease as the size of the heap increased.

How do you define the cap?  This was the idea of Bug 867779, to use the CommitLimit as a limit for the JavaScript memory.  But as you mentioned, this value can only be used when overcommit_memory setting is 2.
> This just means that if there is no other GC, such as one triggered by a timer, or by a 
> MEM_PRESSURE event, then, and only in such cases the application will be killed by the Kernel when 
> it attempt to over-commit.

But we only got into the high-frequency GC regime because the page was allocating quickly, right?  In that case, it's likely that memory pressure won't be dispatched quickly enough (although changes to the JS engine could improve this), and it's likely that we'll miss the timer.

> How do you define the cap?

Like I said, it seems that tuning this heuristic is particularly nasty because it has an additional degree of freedom.

But perhaps we could say "we can afford to spend X mb of memory on garbage in this process".  We know on our 256mb phones that, on a good day, an app gets 100mb of RSS.  Offhand and completely un-scientifically, I might be OK spending 10% of that on garbage.

We could then tune the heap-growth-factor sequence around whatever value we chose.

> What I can notice, is that as expected, the memory usage is *highly similar* as none of the 
> settings are changing the behavior of these apps.

Yeah, that shouldn't surprise any of us.
(In reply to Justin Lebar [:jlebar] from comment #13)
> jld and pnkfelix suggested on IRC that perhaps the heap growth factor could
> decrease as the size of the heap increased.  [...] This is apparently known
> as an "inverse load factor".  There's apparently a whole horror story about
> tuning an inverse load factor starting at bug 619885 comment 11.

Just to keep terminology clear: The term "inverse load factor" corresponds, I think, to what the people here call the "heap growth factor"; i.e. the ratio that nbp has been playing with, varying it from "1.5x" to "3x".  (There's nothing special about the word "inverse" here in the term; if you were to look at e.g. a hash table, where load factor == #entries/#buckets, then the "inverse load factor would be #buckets/#entries"; its the same thing here.)  The term "heap growth factor" is probably friendlier.

The MMgc team at Adobe tried making the inverse load factor itself shrink as the heap grew (i.e. imposing an inverse relationship on the two quantities; so we now have two independent occurrences of the word "inverse"; much potential for confusion).

I am not sure varying the factor in this way was a good idea.  Or at least, its not clear that it really helps any more than the arguably simpler approach of increasing the initial heap size and putting a bound on the max heap size, (or just using a fixed- rather than dynamically-sized heap, which is what I think you get if you take the approach to its logical limits).  I don't know, I waver here, I guess it allows one to explore the continuum between the two extremes of (1.) a fixed heap-growth factor versus (2.) a fixed-size heap, and that that freedom might have value.

One other detail: In principle, all Flash memory allocation was supposed to be accounted for by MMgc (regardless of whether the memory itself was managed by MMgc).  I'm inferring from the conversation that the Spidermonkey GC does not necessarily have all of that information, and that lack of info forms a large part of jlebar's worry (and I understand such worry).  So this is one of many ways that the experience from the Flash runtime does not directly apply here.

----

Oh, finally: a two paragraph summary of the horror story, so you don't have to wade through the thread.

Bug 619885 is talking about a problem related to making the heap growth factor too small (on the order of 1.05x, which means you gets a mark/cons ratio of 1/(1.05-1) == 20, so that your GC has a goal of doing 20 words of tracing for every 1 word allocated (yikes)).  That was a side-effect of making the inverse load factor itself shrink as the heap grew *plus* carelessness while choosing the what the limiting value in the telescoping series would be.

The Tamarin team, after discovering the problem, got around it by changing the limiting value to 1.25 instead of 1.05; a growth factor of 1.25 implies a mark/cons ratio of 1/(1.25 - 1) = 4 words traced for each word allocated, rather than 20 words traced, so the time spent in mostly wasted GC cycles went down by 5x in that case.  Most of the analysis I attached to Bug 619885 were an attempt to justify the extra memory usage that resulted from changing the limiting value.
(In reply to Justin Lebar [:jlebar] from comment #13)
> 
> jld and pnkfelix suggested on IRC that perhaps the heap growth factor could
> decrease as the size of the heap increased.  This way, we could allow small
> heaps to grow quickly while restricting the amount of garbage in large
> heaps.  This is apparently known as an "inverse load factor".  But the trick
> is, an inverse load factor gives you an extra degree of freedom in which to
> screw up.  There's apparently a whole horror story about tuning an inverse
> load factor starting at bug 619885 comment 11, but I haven't read the whole
> thing.

We are already doing this. The growth factor depends on the heap size.
pref("javascript.options.mem.gc_high_frequency_heap_growth_max", 150);
pref("javascript.options.mem.gc_high_frequency_heap_growth_min", 120);
pref("javascript.options.mem.gc_high_frequency_high_limit_mb", 40);
pref("javascript.options.mem.gc_high_frequency_low_limit_mb", 10);

The values in b2g.js mean that for a high allocation scenario we do following:
heap size < 10MB (low_limit): 150% (heap_growth_max)
10MB < heap size < 40MB (high_limit): linear between 150% (heap_growth_max) and 120% (heap_growth_max)
heap size > 40MB: 120% (growth_min)
(Assignee)

Comment 19

4 years ago
(In reply to Gregor Wagner [:gwagner] from comment #18)
> (In reply to Justin Lebar [:jlebar] from comment #13)
> > 
> > jld and pnkfelix suggested on IRC that perhaps the heap growth factor could
> > decrease as the size of the heap increased.  This way, we could allow small
> > heaps to grow quickly while restricting the amount of garbage in large
> > heaps.  This is apparently known as an "inverse load factor".  But the trick
> > is, an inverse load factor gives you an extra degree of freedom in which to
> > screw up.  There's apparently a whole horror story about tuning an inverse
> > load factor starting at bug 619885 comment 11, but I haven't read the whole
> > thing.
> 
> We are already doing this. The growth factor depends on the heap size.
> pref("javascript.options.mem.gc_high_frequency_heap_growth_max", 150);
> pref("javascript.options.mem.gc_high_frequency_heap_growth_min", 120);
> pref("javascript.options.mem.gc_high_frequency_high_limit_mb", 40);
> pref("javascript.options.mem.gc_high_frequency_low_limit_mb", 10);
> 
> The values in b2g.js mean that for a high allocation scenario we do
> following:
> heap size < 10MB (low_limit): 150% (heap_growth_max)
> 10MB < heap size < 40MB (high_limit): linear between 150% (heap_growth_max)
> and 120% (heap_growth_max)
> heap size > 40MB: 120% (growth_min)

Thanks Gregor, the equation is located in:
  Zone::setGCLastBytes(size_t lastBytes, JSGCInvocationKind gckind)

With this formula and the new settings suggested in comment 3, we have the the maximal allocation possible during the linear interpolation is:

  Max = lastBytes * ((lastBytes - 0) * (300 - 120) / (40 - 0) + 120) / 100
  Max = 120 MB [x = 40MB]

To be clear, to prevent additional OOMs, it seems that we should respect the following rule, which ensure that the high frequency GCs are triggering more GCs before reaching the end of the memory:

  mem.max >= growth_max * high_limit  [low_limit = 0]

    when low_limit = 0, the maximal allocation is reached with lastBytes = high_limit.
> To be clear, to prevent additional OOMs, it seems that we should respect the following rule, which 
> ensure that the high frequency GCs are triggering more GCs before reaching the end of the memory:
>
>  mem.max >= growth_max * high_limit  [low_limit = 0]
>
>    when low_limit = 0, the maximal allocation is reached with lastBytes = high_limit.

I don't understand what you mean.
(Assignee)

Comment 21

4 years ago
(In reply to Justin Lebar [:jlebar] from comment #20)
> > To be clear, to prevent additional OOMs, it seems that we should respect the following rule, which 
> > ensure that the high frequency GCs are triggering more GCs before reaching the end of the memory:
> >
> >  mem.max >= growth_max * high_limit  [low_limit = 0]
> >
> >    when low_limit = 0, the maximal allocation is reached with lastBytes = high_limit.
> 
> I don't understand what you mean.

My goal is to prove that there is a GC before the end of the linear interpolation of the growth factor, as after the high_limit the is the smallest growth factor possible.  This means that we would have at least one or multiple GC before we run out-of-memory.

It appears that I made an error in my previous calculation, I inverted growth_max and growth_min.

So, during the linear interpolation, we reserve the following amount of memory:

  G_max = javascript.options.mem.gc_high_frequency_heap_growth_max
  G_min = javascript.options.mem.gc_high_frequency_heap_growth_min
  M_high = javascript.options.mem.gc_high_frequency_high_limit_mb
  M_low = javascript.options.mem.gc_high_frequency_low_limit_mb (taken as 0)

  NextGC = lastBytes * ((G_min - G_max) / (M_high - M_low) * (lastBytes - M_low) + G_max)

G_max - G_min is negative, so the max of the parabola (derivative = 0) if we omit lastBytes boundaries is:

  lastBytes_max = G_max * M_high / (2 * (G_max - G_min))  [M_low = 0]

with lastBytes included in [M_low .. M_high] is either:

  if M_high < lastBytes_max:
    NextGC_max = M_high * G_min  [lastBytes = M_high, M_low = 0]
  else
    NextGC_max = M_high * G_max² / (4 * (G_max - G_min))

With the settings suggested in comment 3, we have

  lastBytes_max = ~33 MB
  NextGC_max = 50  [M_high = 40, M_low = 0, G_max = 3, G_min = 1.2]

This means that we the GC settings suggested in comment 3, we will at least trigger a GC above 50, when we are running in the high-frequency GC mode.  Above the linear interpolation the growth rate is identical to the growth rate of non high-frequency GCs, which means that this does not cause more issues.

The property highlighted here is that we have at least one or multiple GC after the linear interpolation period, i-e:

  mem.max >= NextGC_max
(Assignee)

Comment 22

4 years ago
Naveed suggested giving more data around the current settings:

octane-prefs-150-120-40-0-30-1500.log:18:Score: 277
octane-prefs-200-120-40-0-30-1500.log:18:Score: 329
octane-prefs-300-120-40-0-30-1500.log:18:Score: 370
octane-prefs-300-120-30-0-30-1500.log:18:Score: 355

snappy-prefs-200-120-40-0-30-1500.log:5:357634.96129706554
snappy-prefs-250-120-40-0-30-1500.log:5:317950.21813542163
snappy-prefs-300-120-40-0-30-1500.log:5:171201.53960104787
snappy-prefs-300-120-30-0-30-1500.log:5:208373.5486512545

But I think the prove in comment 21, is enough to claim that this does not cause any OOM we might have believed before.
Comment 21 is the kind of analysis I was looking for -- it shows that the changes here only have an effect when the heap size is under 50mb.  That's not perfect, but it's a lot better than tripling a 40mb heap to a 120mb heap, which is what we'd understood these changes to be effecting.

I withdraw any objection to this.  We'll keep an eye on things.
(Assignee)

Comment 24

4 years ago
I made a push to try [1] with the corresponding modifications as well as additional modification dedicated to highlight high-frequency GC in the logcat.

Marcia, can someone from QA experiment for a while with this version [1] (it should appear in pvtbuild soon) which is build on top of the latest m-c changes (as of writing).  The interesting point would be to notice if they are new critical pauses caused by high-frequency GCs.

In case of such pauses, a line should appear in the logcat which has "High Frequency: true" in it.  There is a known case where MEM_PRESSURE GCs appear as High frequency GCs, these should not matter here, as they are executed on background applications.

$ adb logcat | grep "High Frequency: true" | grep -v "Reason: MEM_PRESSURE"

If there is an application which has a new noticeable pause which can be correlated to such message, then I will spawn another build with the same dump logic on top of the current settings, to see if this is a regression or not.

[1] https://tbpl.mozilla.org/?tree=Try&rev=bf9af238e343
Flags: needinfo?(mozillamarcia.knous)
(Assignee)

Updated

4 years ago
Status: NEW → ASSIGNED
(Assignee)

Comment 25

4 years ago
(In reply to Nicolas B. Pierron [:nbp] from comment #24)
> I made a push to try [1] with the corresponding modifications as well as
> additional modification dedicated to highlight high-frequency GC in the
> logcat.
> 

I've push again to try because the pvtbuild directories have been flushed since I pushed.

[1-new] https://tbpl.mozilla.org/?tree=Try&rev=2b2c2a3c8f31
Haven't had a chance to spend much time with the build yet due to Bug 847592 - I hope to be able to do some further testing on this build when the investigation on that bug is complete.
Flags: needinfo?(mozillamarcia.knous)
Nicolas: I had to reflash one of my devices, so I had to blow away the build that you had posted in Comment 25. Can you generate another try build?


(In reply to Nicolas B. Pierron [:nbp] from comment #25)
(Assignee)

Comment 28

4 years ago
(In reply to Marcia Knous [:marcia] from comment #27)
> Can you generate another try build?

https://tbpl.mozilla.org/?tree=Try&rev=f821cd0f97d7

This is the same version as the previous one.  The failure of Mochitest-9 is related to the spew which is produced for the GC.  I don't know how the reftest-6 are related to to the test branch.  I will try to rebase it later.
(Assignee)

Comment 29

4 years ago
Any news from the QA side, can this change land without introducing noticeable regressions?
Flags: needinfo?(mozillamarcia.knous)
(In reply to Nicolas B. Pierron [:nbp] from comment #29)
> Any news from the QA side, can this change land without introducing
> noticeable regressions?

Sadly I have not much time to test, with all the other things going on related to finishing 1.0.1 and now 1.1 work. I still have the build if you want me to give it a quick run.
Flags: needinfo?(mozillamarcia.knous)
During the QA roundtable this was discussed, and due to limited resources we did not want to commit to this as high priority testing, but adding qawanted so someone can pick it up when they have time. A question came up as to whether this was intended to be landed for 1.1.
Keywords: qawanted
(Assignee)

Comment 32

4 years ago
(In reply to Marcia Knous [:marcia] from comment #31)
> […] A question came up as to
> whether this was intended to be landed for 1.1.

I did not measure any performance aspect of it on top of b2g18, it is likely similar as we are likely GC-ing because of allocations of objects and less likely because of the allocation of Ion compiled code.

If Justin thinks this might be of interest to prevent too-bad press, I can give it a try with the same settings on top of b2g18.
Flags: needinfo?(justin.lebar+bug)
I think whether we want this on b2g18 is a question for release drivers; I really have no idea.

If you attach a patch and ask for approval-b2g18, that will get that discussion going.
Flags: needinfo?(justin.lebar+bug)
(Assignee)

Comment 34

4 years ago
Created attachment 765486 [details] [diff] [review]
Change GC preferences for octane performances.

NOTE: Please see https://wiki.mozilla.org/Release_Management/B2G_Landing to better understand the B2G approval process and landings.

[Approval Request Comment]
Bug caused by (feature/regressing bug #): N/A

User impact if declined: Potentially more GC during games which are JS intensive with small amount of live memory.

Testing completed: waiting for QA, which is waiting on this approval.

Risk to taking this patch (and alternatives if risky): This patch reduce the number of GC before before an OOM, but should still keep the same number of GC once a huge amount of live memory is present.  This may slightly increase the fragmentation of memory.

String or UUID changes made by this patch: N/A
Attachment #765486 - Flags: review?(justin.lebar+bug)
Attachment #765486 - Flags: approval-mozilla-b2g18?
Assuming we're not going to get any help from QA here, we don't have a lot of good options.

The more I think about this, I think our best option at this point may be to

* Ask product whether they want this for 1.1, and
* Land immediately on trunk.

This bears some risk that we'll regress something on trunk and not notice, because very few people are dogfooding trunk.  Then when we switch to 1.2, we'll have to figure out what's wrong, which will be much more expensive than figuring out now that something is wrong.

But I don't think it's right to hold back engineering because our QA org is unable to test when asked; the result of that would just be that engineers don't ever ask QA to test anything, because

Chris, who's the right person in product to make this call?
Flags: needinfo?(clee)
Comment on attachment 765486 [details] [diff] [review]
Change GC preferences for octane performances.

I'd feel more comfortable if you got a JS peer to sign off on this, but f=me.
Attachment #765486 - Flags: review?(justin.lebar+bug) → feedback+

Updated

4 years ago
Attachment #765486 - Flags: review?(anygregor)
Comment on attachment 765486 [details] [diff] [review]
Change GC preferences for octane performances.

Review of attachment 765486 [details] [diff] [review]:
-----------------------------------------------------------------

This patch is pretty low risk. The change only applies in high allocation situations and we only change the behavior for a small heap.
Looks good!
Attachment #765486 - Flags: review?(anygregor) → review+
(Assignee)

Comment 38

4 years ago
FxOS is currently down on Are we fast yet, I'll fix it before landing this patch.
(Assignee)

Comment 39

4 years ago
(In reply to Nicolas B. Pierron [:nbp] from comment #38)
> FxOS is currently down on Are we fast yet, I'll fix it before landing this
> patch.

AWFY is now fixed.

http://hg.mozilla.org/integration/mozilla-inbound/rev/cc255b9797b7
https://hg.mozilla.org/mozilla-central/rev/cc255b9797b7
Status: ASSIGNED → RESOLVED
Last Resolved: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla25
(Assignee)

Comment 41

4 years ago
(In reply to Nicolas B. Pierron [:nbp] from comment #39)
> http://hg.mozilla.org/integration/mozilla-inbound/rev/cc255b9797b7

AWFY results are:
 - sunspider: 14.6% (4017 ms -> 3429 ms)
 - kraken: 9.1% (47740 ms -> 43394 ms)
 - octane: 38.1% (328 v8p -> 453 v8p)
Depends on: 887125
Comment on attachment 765486 [details] [diff] [review]
Change GC preferences for octane performances.

At this point we are no longer doing approvals to b2g18 branch unless the bug is a security issue - all other bugs must request leo+ status to be uplifted. If there's a partner champion for this issue, they can push for it to be a blocker otherwise this will have to ride the trains and be in v1.2.
Attachment #765486 - Flags: approval-mozilla-b2g18? → approval-mozilla-b2g18-
Backed out for causing bug 887519.

https://hg.mozilla.org/integration/mozilla-inbound/rev/6c0642c2904d
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
(In reply to David Zbarsky (:dzbarsky) from comment #43)
> Backed out for causing bug 887519.
> 
> https://hg.mozilla.org/integration/mozilla-inbound/rev/6c0642c2904d

Backout merged:
https://hg.mozilla.org/mozilla-central/rev/6c0642c2904d
(Assignee)

Comment 45

4 years ago
(In reply to David Zbarsky (:dzbarsky) from comment #43)
> Backed out for causing bug 887519.
> 
> https://hg.mozilla.org/integration/mozilla-inbound/rev/6c0642c2904d

Thanks, this was needed as the previous config will have caused the GC to always multiply by the heap size by 3 in case of high frequency GCs.  Which would have introduced the risk of causing more OOMs in games which are generating a lot of garbage fast.

Checking the graph by ignoring the case where either M_min and M_high are 0 show that the results are not as appealing, but they are still better than the current status of the current GC settings.

I think we can still land the same changes for now, as the surrounding data (without 0 prefs) are still good.  The safest move would be to replace 0 by 10, as we have data on it.
I will have to do more runs to make sure we are not making any mistakes here.
(Assignee)

Comment 46

4 years ago
After checking for more than a week on the same old version (with a backported patch for Bug 887125) and refining the results based on reasonable set of settings.  Reasonable meaning that we cannot afford to have a NextGC_max above 50 MB. (see comment 23)

This means that if we keep G_max = 3.00, and G_min = 1.20, which seems to be the GC settings which are providing the most interesting results, for both benchmarks.

Among these results, "high frequency time limit" with high values such as 2000ms are found to be good candidates for octane results which is interested in the throughput, but even if these seems to also provide a few good results for snappy, I had to filter them out with these high values snappy benchmark is extremely noisy.

I suggest to land again the current GC settings, as it appears that M_low equal 0 is not a terrible choice after all (as long as Bug 887125 is landed), and I will try to improve snappy benchmark to make it less sensible to the noise, and then run again these benchmarks see if there is any better choice.

bbouvier suggested using the Median and IQR[1] instead of the Mean and the Variance, but the IQR is a bad choice to account for GC pauses which are spontaneous events.  And the problem of taking the IQR is that it does not integrate all the pauses and so it does not account for the smoothness of the application, where one pause can have a dramatic effect from the user perspective.

[1] http://en.wikipedia.org/wiki/Interquartile_range
(Assignee)

Comment 47

4 years ago
https://tbpl.mozilla.org/?tree=Try&rev=bf796cb7174e
https://hg.mozilla.org/integration/mozilla-inbound/rev/f847593613de
https://hg.mozilla.org/mozilla-central/rev/f847593613de
Status: REOPENED → RESOLVED
Last Resolved: 4 years ago4 years ago
Resolution: --- → FIXED
Going forward, Ravi Dandu (rdandu@mozilla.com) is the product manager for FxOS Performance.
Flags: needinfo?(clee)

Comment 50

3 years ago
is this the reason why firefox so aggressively runs GC? because its tuned for phones?

On my 16 gig rig which I donbt care about the memory usage, its running CC every 6 secs and GC either every 6 secs or every 12 secs, every time GC runs the app stutters, especially annoying when scrolling or playing html5.

To the author how did you manage to disable GC? I would love a GC'less firefox.
No, Firefox OS / B2G uses different settings from desktop Firefox. There *is* a bug in Firefox 33 with OMTC that causes long GC slices even while animations are happening: bug 1084672. This should be fixed in Firefox 34. A few other problems were also identified; look at the dependent bugs of bug 1083418. Some of these have been fixed, but as the fixes are more involved they will be in Firefox 36.

In general, the intent is for GC to happen frequently in small bursts. Inhibiting GC could easily exhaust even your 16 GiB in seconds for some workloads.
FWIW, I've done some experiments where I disabled free() (i.e. never freeing heap allocations), and after opening 50 different sites, Firefox's memory usage was over 25 GiB. (The machine I tested on has 32 GiB of RAM.)

Disabling free() isn't the same as disabling GC, but I expect the two things to have effects within the same order of magnitude. A GC-less Firefox would be a bad idea.
(Assignee)

Comment 53

3 years ago
(In reply to Chris from comment #50)
> is this the reason why firefox so aggressively runs GC? because its tuned
> for phones?

Yes, as you can read in this thread.

Bug 898556 is opened to address this issue, but this has not yet become a priority, as we are not targeting phones with 4GB of RAM, but more phones with 128MB of zRAM.

Why are you interested in a 16 GB?  This is for a custom build of Firefox OS?  In which case I can suggest you to use the Firefox desktop GC settings, otherwise if the intent is to make a product, then we should properly fix Bug 898556, and never question the GC settings again.
(Assignee)

Comment 54

3 years ago
(In reply to Chris from comment #50)
> is this the reason why firefox so aggressively runs GC? because its tuned
> for phones?

Note, the GC tuning from above was only made for Firefox OS, Firefox Desktop still has the old GC settings.

If you are looking to how to customize your GC settings yourself, then I would simply say:

  Please, don't, you have so many ways to screw up things in ways which would make your firefox even worse!

If you are not convinced, please take time to understand the ins-n-outs of it, by following similar steps as reported above.

Comment 55

3 years ago
My problems reported here https://bugzilla.mozilla.org/show_bug.cgi?id=1101111 are on the default GC settings, the browser is running horrible, every 6 or 12 seconds a GC that is around 100ms delay on a very powerful rig.

Why wont you let end users turn GC off? we dont all run tiny ram machines.

By the way my default GC settings match what is in this thread.

Why does the web browser run a GC every 12 seconds and the GC even worse doesnt even recover any memory, it just keeps retrying every 12 seconds.
The changes in this bug did not affect desktop Firefox. Please continue the discussion in the bug you filed.
You need to log in before you can comment on or make changes to this bug.