+++ This bug was initially created as a clone of Bug #855037 +++ Currently the GC settings are extremely low, and are causing Octane benchmark performances to be way below what we have without any preferences made to prevent GCs. I reported in Bug 855037 comment 3, that we get huge improvement on deltablue (~ x5.2) [on a unagi with 256MB] when GCs are disabled. While running with MOZ_GCTIMER=stdout, I notice 291 GCs on the Unagi with full access to the RAM, compared to 132 on a desktop. Also the ""average"" cycles are around 7ms on the desktop and around 100ms on the Unagi. I cannot get detailled information on which phase is taking most of the time or thre reasons for the GC, because fopen probably deals badly with the sub-process model used by the b2g browser. Looking at the results on desktop, seems to mention that most of the cases are ALLOC_TRIGGER or TOO_MANY_ALLOC reasons. Bill mentioned that for benchmarks it is better to have full GC instead of incremental GCs in terms of performances. At the moment, I am collecting data to see how high frequency settings influence GC pauses (as reported by MOZ_GCTIMER=stdout) and the octane score.
Created attachment 739929 [details] GC logs of automated benchmarks This log contains the GC details of all running processes made during the automation test of octane. Processes can be distinguished based on their relative time-stamp, as the start-time of each GC is based on the startup time of each process. Note that the 7 DOM_WORKER GCs are occurring in a separated process. Based on these logs, I can notice that a some GCs at the end are spaced by a few seconds which makes them low-frequency GCs instead of high-frequency GCs. Note, I do not know if this is related to the fact that these results are dumped but the score has never been that high with these settings. (382 instead of 300) From the benchmark perspective, it would make sense to increase the time between GCs to detect them as high-frequency GCs. Only a few GCs are not considered as high-frequency GCs at the beginning of the benchmark.
I don't want to bikeshed on what looks like a pretty rigorous multivariate optimization, but have you considered how these changes will (or won't) affect memory usage on the phone? If this change is causing us to gc less frequently, I would naively worry that we could OOM more easily. But if you're not worried about that, then I'm not either.
(In reply to Justin Lebar [:jlebar] from comment #4) > I don't want to bikeshed on what looks like a pretty rigorous multivariate > optimization, but have you considered how these changes will (or won't) > affect memory usage on the phone? Yes, I am, I saw the results of GC parameters changes on b2g, back in August and this was a terrible UX. This is exactly the reason why I collected results from snappy which is allocating 2048 objects per frame and keeps 1048576 objects alive across 3750 frames. I hope this benchmark is representative enough of the responsiveness of some B2G Games. The last set of settings gives the following GC stats on Snappy: GC(T+236.391s) Total Time: 1090.2ms, Compartments Collected: 1, Total Compartments: 7, Total Zones: 4, MMU (20ms): 0%, MMU (50ms): 15%, SCC Sweep Total: 0.3ms, SCC Sweep Max Pause: 0.3ms, Max Pause: 40.2ms, Nonincremental Reason: allocation trigger, Allocated: 60MB, +Chunks: 0, -Chunks: 0 > If this change is causing us to gc less frequently, I would naively worry > that we could OOM more easily. But if you're not worried about that, then > I'm not either. Increasing the high growth of the high frequency GC is made to GC less frequently on benchmarks. As the time spent at doing GC is costly and cause bad octane score. I still have to check that high frequency GC are not frequent on the phone, both before and after these changes. As soon as I can check that my-self, I will ask some persons from QA to make sure that they cannot find pitfalls in terms of performance regressions. High frequency GC should be avoided during the normal phone usage, as they are source of GC pauses and unresponsive behavior. Also they are more likely source of fragmentation, as they increase the heap size by a larger factor and tend to keep alive more pages.
I'm getting mixed signals here. Are these changes only for benchmarks? ("Increasing the high growth of the high frequency GC is made to GC less frequently on benchmarks.") Or do you expect them to have user-perceivable effects? ("I saw the results of GC parameters changes on b2g, back in August and this was a terrible UX. [...] I hope this benchmark is representative enough of the responsiveness of some B2G Games.") >> have you considered how these changes will (or won't) affect memory usage on the phone? > > Yes, I am [...] The last set of settings gives the following GC stats on Snappy: I don't understand how the statistics here should inform me about how the changes affect memory usage on Snappy. Should I be looking at the "Allocated" value? I don't know what to compare it to. But moreover, what matters in terms of OOMs is max heap size, and these changes, as I understand them, are explicitly allowing us to get a bigger heap.
I did a small test going through multiple steps: - boot the phone - everything.me > games > tetris - play until level 3 - go back to the home screen - open the clock - open the calendar, swipe to december, create an event. - open the cristal skull app, press right, observe latency for about 30 sec. - open the browser, go to ebay.com, select the first promoted product down the page. I logged all GCs happening durings these steps before and after the changes in the GC settings. In both cases I didn't noticed any *harmful* high frequency GCs. So based on this small experiment I am confident that the changes in the GC settings are fine in most cases. The only high frequency GCs that I noticed were MEM_PRESSURE GCs, which is kind of ambiguous as they are shrinking GCs. This might deserve another Bug if the computation is wrong.
> In both cases I didn't noticed any *harmful* high frequency GCs. What's a harmful high-frequency GC? I am extremely uncomfortable taking any changes on b2g that will make us more likely to OOM if the only upside is benchmark performance. This change cannot be safe under all circumstances, since a webpage can clearly trigger the same behavior as a benchmark. Can you please put me in touch with whoever has been asking for us to improve our benchmark scores? IMO our stability story is far more important than our benchmark scores on B2G, and I'm happy to argue that with anyone in management. The first priority should be ensuring that this change cannot possibly cause B2G to crash more often. Then we can tune for benchmarks to our heart's content.
can someone summarize the issues please? I see that we can win 18% on snappy and 23% on octane - but at what costs? (meminfo, about:memory snapshots, per-app rss, etc.) ftr, we are deathly scared of taking patches that regress memory usage. and if you plan on uplifting this to b2g18... we all better damn sure that this doesn't cause any more ooms or segfaults.
(In reply to Doug Turner (:dougt) from comment #11) > can someone summarize the issues please? - Bad JS benchmarks results implies bad press. - Bad GC settings might cause a bad UX, which implies bad press. The question is how do we determine that these GC settings (comment 3) are better than the one we had before. And this is exactly to address the second aspect that I made the snappy benchmark (comment 2), to balance with the octane benchmark results which only address the first aspect of the problem. > I see that we can win 18% on > snappy and 23% on octane - but at what costs? (meminfo, about:memory > snapshots, per-app rss, etc.) I will collect this information and attach it to this bug, but as there is no high-frequency GCs, the data are extremely likely to be similar, modulo the noise. Is there a clear procedure for running these tests which makes them comparable, is that automated somewhere? > ftr, we are deathly scared of taking patches that regress memory usage. and > if you plan on uplifting this to b2g18... we all better damn sure that this > doesn't cause any more ooms or segfaults. The current suggestion will not change anything on applications (based on comment 8), which are not allocating extremely fast JS objects as octane benchmark do.
(In reply to Nicolas B. Pierron [:nbp] from comment #12) > (In reply to Doug Turner (:dougt) from comment #11) > > I see that we can win 18% on > > snappy and 23% on octane - but at what costs? (meminfo, about:memory > > snapshots, per-app rss, etc.) > > I will collect this information and attach it to this bug, but as there is > no high-frequency GCs, the data are extremely likely to be similar, modulo > the noise. I went through the following process, while taking an about:memory profiles  a few seconds after the end of the load of each of the following steps: - re-start the phone. - start the calendar app. - start the clock app. - start cristal skull app & press top right-ish part of the screen. - start PipesPuzzle market-place app. - start browser & go to http://people.mozilla.org/~npierron/octane (through a recent page visited) - start the email app. - start the contact app. - start the phone app. - start the sms app. - Close all apps. - Go to the everything.me view and come back and start the settings app. I accidentally erased the email & browser profiles while taking the recent ones. What I can notice, is that as expected, the memory usage is *highly similar* as none of the settings are changing the behavior of these apps.  http://people.mozilla.org/~npierron/unagi-gc-data/app-startup-mem.tar.bz2
> This just means that if there is no other GC, such as one triggered by a timer, or by a > MEM_PRESSURE event, then, and only in such cases the application will be killed by the Kernel when > it attempt to over-commit. But we only got into the high-frequency GC regime because the page was allocating quickly, right? In that case, it's likely that memory pressure won't be dispatched quickly enough (although changes to the JS engine could improve this), and it's likely that we'll miss the timer. > How do you define the cap? Like I said, it seems that tuning this heuristic is particularly nasty because it has an additional degree of freedom. But perhaps we could say "we can afford to spend X mb of memory on garbage in this process". We know on our 256mb phones that, on a good day, an app gets 100mb of RSS. Offhand and completely un-scientifically, I might be OK spending 10% of that on garbage. We could then tune the heap-growth-factor sequence around whatever value we chose. > What I can notice, is that as expected, the memory usage is *highly similar* as none of the > settings are changing the behavior of these apps. Yeah, that shouldn't surprise any of us.
(In reply to Justin Lebar [:jlebar] from comment #13) > jld and pnkfelix suggested on IRC that perhaps the heap growth factor could > decrease as the size of the heap increased. [...] This is apparently known > as an "inverse load factor". There's apparently a whole horror story about > tuning an inverse load factor starting at bug 619885 comment 11. Just to keep terminology clear: The term "inverse load factor" corresponds, I think, to what the people here call the "heap growth factor"; i.e. the ratio that nbp has been playing with, varying it from "1.5x" to "3x". (There's nothing special about the word "inverse" here in the term; if you were to look at e.g. a hash table, where load factor == #entries/#buckets, then the "inverse load factor would be #buckets/#entries"; its the same thing here.) The term "heap growth factor" is probably friendlier. The MMgc team at Adobe tried making the inverse load factor itself shrink as the heap grew (i.e. imposing an inverse relationship on the two quantities; so we now have two independent occurrences of the word "inverse"; much potential for confusion). I am not sure varying the factor in this way was a good idea. Or at least, its not clear that it really helps any more than the arguably simpler approach of increasing the initial heap size and putting a bound on the max heap size, (or just using a fixed- rather than dynamically-sized heap, which is what I think you get if you take the approach to its logical limits). I don't know, I waver here, I guess it allows one to explore the continuum between the two extremes of (1.) a fixed heap-growth factor versus (2.) a fixed-size heap, and that that freedom might have value. One other detail: In principle, all Flash memory allocation was supposed to be accounted for by MMgc (regardless of whether the memory itself was managed by MMgc). I'm inferring from the conversation that the Spidermonkey GC does not necessarily have all of that information, and that lack of info forms a large part of jlebar's worry (and I understand such worry). So this is one of many ways that the experience from the Flash runtime does not directly apply here. ---- Oh, finally: a two paragraph summary of the horror story, so you don't have to wade through the thread. Bug 619885 is talking about a problem related to making the heap growth factor too small (on the order of 1.05x, which means you gets a mark/cons ratio of 1/(1.05-1) == 20, so that your GC has a goal of doing 20 words of tracing for every 1 word allocated (yikes)). That was a side-effect of making the inverse load factor itself shrink as the heap grew *plus* carelessness while choosing the what the limiting value in the telescoping series would be. The Tamarin team, after discovering the problem, got around it by changing the limiting value to 1.25 instead of 1.05; a growth factor of 1.25 implies a mark/cons ratio of 1/(1.25 - 1) = 4 words traced for each word allocated, rather than 20 words traced, so the time spent in mostly wasted GC cycles went down by 5x in that case. Most of the analysis I attached to Bug 619885 were an attempt to justify the extra memory usage that resulted from changing the limiting value.
> To be clear, to prevent additional OOMs, it seems that we should respect the following rule, which > ensure that the high frequency GCs are triggering more GCs before reaching the end of the memory: > > mem.max >= growth_max * high_limit [low_limit = 0] > > when low_limit = 0, the maximal allocation is reached with lastBytes = high_limit. I don't understand what you mean.
Naveed suggested giving more data around the current settings: octane-prefs-150-120-40-0-30-1500.log:18:Score: 277 octane-prefs-200-120-40-0-30-1500.log:18:Score: 329 octane-prefs-300-120-40-0-30-1500.log:18:Score: 370 octane-prefs-300-120-30-0-30-1500.log:18:Score: 355 snappy-prefs-200-120-40-0-30-1500.log:5:357634.96129706554 snappy-prefs-250-120-40-0-30-1500.log:5:317950.21813542163 snappy-prefs-300-120-40-0-30-1500.log:5:171201.53960104787 snappy-prefs-300-120-30-0-30-1500.log:5:208373.5486512545 But I think the prove in comment 21, is enough to claim that this does not cause any OOM we might have believed before.
Comment 21 is the kind of analysis I was looking for -- it shows that the changes here only have an effect when the heap size is under 50mb. That's not perfect, but it's a lot better than tripling a 40mb heap to a 120mb heap, which is what we'd understood these changes to be effecting. I withdraw any objection to this. We'll keep an eye on things.
I made a push to try  with the corresponding modifications as well as additional modification dedicated to highlight high-frequency GC in the logcat. Marcia, can someone from QA experiment for a while with this version  (it should appear in pvtbuild soon) which is build on top of the latest m-c changes (as of writing). The interesting point would be to notice if they are new critical pauses caused by high-frequency GCs. In case of such pauses, a line should appear in the logcat which has "High Frequency: true" in it. There is a known case where MEM_PRESSURE GCs appear as High frequency GCs, these should not matter here, as they are executed on background applications. $ adb logcat | grep "High Frequency: true" | grep -v "Reason: MEM_PRESSURE" If there is an application which has a new noticeable pause which can be correlated to such message, then I will spawn another build with the same dump logic on top of the current settings, to see if this is a regression or not.  https://tbpl.mozilla.org/?tree=Try&rev=bf9af238e343
(In reply to Nicolas B. Pierron [:nbp] from comment #24) > I made a push to try  with the corresponding modifications as well as > additional modification dedicated to highlight high-frequency GC in the > logcat. > I've push again to try because the pvtbuild directories have been flushed since I pushed. [1-new] https://tbpl.mozilla.org/?tree=Try&rev=2b2c2a3c8f31
Haven't had a chance to spend much time with the build yet due to Bug 847592 - I hope to be able to do some further testing on this build when the investigation on that bug is complete.
Nicolas: I had to reflash one of my devices, so I had to blow away the build that you had posted in Comment 25. Can you generate another try build? (In reply to Nicolas B. Pierron [:nbp] from comment #25)
(In reply to Marcia Knous [:marcia] from comment #27) > Can you generate another try build? https://tbpl.mozilla.org/?tree=Try&rev=f821cd0f97d7 This is the same version as the previous one. The failure of Mochitest-9 is related to the spew which is produced for the GC. I don't know how the reftest-6 are related to to the test branch. I will try to rebase it later.
Any news from the QA side, can this change land without introducing noticeable regressions?
(In reply to Nicolas B. Pierron [:nbp] from comment #29) > Any news from the QA side, can this change land without introducing > noticeable regressions? Sadly I have not much time to test, with all the other things going on related to finishing 1.0.1 and now 1.1 work. I still have the build if you want me to give it a quick run.
During the QA roundtable this was discussed, and due to limited resources we did not want to commit to this as high priority testing, but adding qawanted so someone can pick it up when they have time. A question came up as to whether this was intended to be landed for 1.1.
(In reply to Marcia Knous [:marcia] from comment #31) > […] A question came up as to > whether this was intended to be landed for 1.1. I did not measure any performance aspect of it on top of b2g18, it is likely similar as we are likely GC-ing because of allocations of objects and less likely because of the allocation of Ion compiled code. If Justin thinks this might be of interest to prevent too-bad press, I can give it a try with the same settings on top of b2g18.
I think whether we want this on b2g18 is a question for release drivers; I really have no idea. If you attach a patch and ask for approval-b2g18, that will get that discussion going.
Created attachment 765486 [details] [diff] [review] Change GC preferences for octane performances. NOTE: Please see https://wiki.mozilla.org/Release_Management/B2G_Landing to better understand the B2G approval process and landings. [Approval Request Comment] Bug caused by (feature/regressing bug #): N/A User impact if declined: Potentially more GC during games which are JS intensive with small amount of live memory. Testing completed: waiting for QA, which is waiting on this approval. Risk to taking this patch (and alternatives if risky): This patch reduce the number of GC before before an OOM, but should still keep the same number of GC once a huge amount of live memory is present. This may slightly increase the fragmentation of memory. String or UUID changes made by this patch: N/A
Assuming we're not going to get any help from QA here, we don't have a lot of good options. The more I think about this, I think our best option at this point may be to * Ask product whether they want this for 1.1, and * Land immediately on trunk. This bears some risk that we'll regress something on trunk and not notice, because very few people are dogfooding trunk. Then when we switch to 1.2, we'll have to figure out what's wrong, which will be much more expensive than figuring out now that something is wrong. But I don't think it's right to hold back engineering because our QA org is unable to test when asked; the result of that would just be that engineers don't ever ask QA to test anything, because Chris, who's the right person in product to make this call?
Comment on attachment 765486 [details] [diff] [review] Change GC preferences for octane performances. I'd feel more comfortable if you got a JS peer to sign off on this, but f=me.
Comment on attachment 765486 [details] [diff] [review] Change GC preferences for octane performances. Review of attachment 765486 [details] [diff] [review]: ----------------------------------------------------------------- This patch is pretty low risk. The change only applies in high allocation situations and we only change the behavior for a small heap. Looks good!
FxOS is currently down on Are we fast yet, I'll fix it before landing this patch.
(In reply to Nicolas B. Pierron [:nbp] from comment #38) > FxOS is currently down on Are we fast yet, I'll fix it before landing this > patch. AWFY is now fixed. http://hg.mozilla.org/integration/mozilla-inbound/rev/cc255b9797b7
(In reply to Nicolas B. Pierron [:nbp] from comment #39) > http://hg.mozilla.org/integration/mozilla-inbound/rev/cc255b9797b7 AWFY results are: - sunspider: 14.6% (4017 ms -> 3429 ms) - kraken: 9.1% (47740 ms -> 43394 ms) - octane: 38.1% (328 v8p -> 453 v8p)
Comment on attachment 765486 [details] [diff] [review] Change GC preferences for octane performances. At this point we are no longer doing approvals to b2g18 branch unless the bug is a security issue - all other bugs must request leo+ status to be uplifted. If there's a partner champion for this issue, they can push for it to be a blocker otherwise this will have to ride the trains and be in v1.2.
Backed out for causing bug 887519. https://hg.mozilla.org/integration/mozilla-inbound/rev/6c0642c2904d
(In reply to David Zbarsky (:dzbarsky) from comment #43) > Backed out for causing bug 887519. > > https://hg.mozilla.org/integration/mozilla-inbound/rev/6c0642c2904d Backout merged: https://hg.mozilla.org/mozilla-central/rev/6c0642c2904d
(In reply to David Zbarsky (:dzbarsky) from comment #43) > Backed out for causing bug 887519. > > https://hg.mozilla.org/integration/mozilla-inbound/rev/6c0642c2904d Thanks, this was needed as the previous config will have caused the GC to always multiply by the heap size by 3 in case of high frequency GCs. Which would have introduced the risk of causing more OOMs in games which are generating a lot of garbage fast. Checking the graph by ignoring the case where either M_min and M_high are 0 show that the results are not as appealing, but they are still better than the current status of the current GC settings. I think we can still land the same changes for now, as the surrounding data (without 0 prefs) are still good. The safest move would be to replace 0 by 10, as we have data on it. I will have to do more runs to make sure we are not making any mistakes here.
After checking for more than a week on the same old version (with a backported patch for Bug 887125) and refining the results based on reasonable set of settings. Reasonable meaning that we cannot afford to have a NextGC_max above 50 MB. (see comment 23) This means that if we keep G_max = 3.00, and G_min = 1.20, which seems to be the GC settings which are providing the most interesting results, for both benchmarks. Among these results, "high frequency time limit" with high values such as 2000ms are found to be good candidates for octane results which is interested in the throughput, but even if these seems to also provide a few good results for snappy, I had to filter them out with these high values snappy benchmark is extremely noisy. I suggest to land again the current GC settings, as it appears that M_low equal 0 is not a terrible choice after all (as long as Bug 887125 is landed), and I will try to improve snappy benchmark to make it less sensible to the noise, and then run again these benchmarks see if there is any better choice. bbouvier suggested using the Median and IQR instead of the Mean and the Variance, but the IQR is a bad choice to account for GC pauses which are spontaneous events. And the problem of taking the IQR is that it does not integrate all the pauses and so it does not account for the smoothness of the application, where one pause can have a dramatic effect from the user perspective.  http://en.wikipedia.org/wiki/Interquartile_range
Going forward, Ravi Dandu (firstname.lastname@example.org) is the product manager for FxOS Performance.
is this the reason why firefox so aggressively runs GC? because its tuned for phones? On my 16 gig rig which I donbt care about the memory usage, its running CC every 6 secs and GC either every 6 secs or every 12 secs, every time GC runs the app stutters, especially annoying when scrolling or playing html5. To the author how did you manage to disable GC? I would love a GC'less firefox.
No, Firefox OS / B2G uses different settings from desktop Firefox. There *is* a bug in Firefox 33 with OMTC that causes long GC slices even while animations are happening: bug 1084672. This should be fixed in Firefox 34. A few other problems were also identified; look at the dependent bugs of bug 1083418. Some of these have been fixed, but as the fixes are more involved they will be in Firefox 36. In general, the intent is for GC to happen frequently in small bursts. Inhibiting GC could easily exhaust even your 16 GiB in seconds for some workloads.
FWIW, I've done some experiments where I disabled free() (i.e. never freeing heap allocations), and after opening 50 different sites, Firefox's memory usage was over 25 GiB. (The machine I tested on has 32 GiB of RAM.) Disabling free() isn't the same as disabling GC, but I expect the two things to have effects within the same order of magnitude. A GC-less Firefox would be a bad idea.
(In reply to Chris from comment #50) > is this the reason why firefox so aggressively runs GC? because its tuned > for phones? Yes, as you can read in this thread. Bug 898556 is opened to address this issue, but this has not yet become a priority, as we are not targeting phones with 4GB of RAM, but more phones with 128MB of zRAM. Why are you interested in a 16 GB? This is for a custom build of Firefox OS? In which case I can suggest you to use the Firefox desktop GC settings, otherwise if the intent is to make a product, then we should properly fix Bug 898556, and never question the GC settings again.
(In reply to Chris from comment #50) > is this the reason why firefox so aggressively runs GC? because its tuned > for phones? Note, the GC tuning from above was only made for Firefox OS, Firefox Desktop still has the old GC settings. If you are looking to how to customize your GC settings yourself, then I would simply say: Please, don't, you have so many ways to screw up things in ways which would make your firefox even worse! If you are not convinced, please take time to understand the ins-n-outs of it, by following similar steps as reported above.
My problems reported here https://bugzilla.mozilla.org/show_bug.cgi?id=1101111 are on the default GC settings, the browser is running horrible, every 6 or 12 seconds a GC that is around 100ms delay on a very powerful rig. Why wont you let end users turn GC off? we dont all run tiny ram machines. By the way my default GC settings match what is in this thread. Why does the web browser run a GC every 12 seconds and the GC even worse doesnt even recover any memory, it just keeps retrying every 12 seconds.
The changes in this bug did not affect desktop Firefox. Please continue the discussion in the bug you filed.