Last Comment Bug 655455 - Long GC pauses on netbooks (50s+)
: Long GC pauses on netbooks (50s+)
Status: RESOLVED FIXED
[MemShrink:P2]
: perf
Product: Core
Classification: Components
Component: JavaScript Engine (show other bugs)
: unspecified
: x86 Windows 7
: -- normal (vote)
: ---
Assigned To: Andrew McCreight [:mccr8]
:
:
Mentors:
Depends on: 664291
Blocks: GC
  Show dependency treegraph
 
Reported: 2011-05-07 00:31 PDT by Gregor Wagner [:gwagner]
Modified: 2011-08-28 18:42 PDT (History)
28 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
-
-


Attachments
GCTimer output (2.86 KB, text/plain)
2011-05-07 00:31 PDT, Gregor Wagner [:gwagner]
no flags Details

Description Gregor Wagner [:gwagner] 2011-05-07 00:31:34 PDT
Created attachment 530814 [details]
GCTimer output

I am testing with my new netbook. Single atom core 1.66GHz, 1GB ram.

We have a 7sec and a 4sec GC pause in there. Looks like it happens during the RegExp benchmark.
Comment 1 Andreas Gal :gal 2011-05-07 00:33:22 PDT
This is Windows or Linux? I am dying to see a profile.
Comment 2 Gregor Wagner [:gwagner] 2011-05-07 01:05:14 PDT
It's Win 7 prof. I already see a 50 sec pause during the realtime raytracer. I guess it's time to change to add some memory pressure for the allocation.
Comment 3 Andreas Gal :gal 2011-05-07 01:10:14 PDT
Gregor just told me we should get a lot more people netbooks so we see this kind of stuff earlier (probably also a good simulation for cell phone use).
Comment 4 Gregor Wagner [:gwagner] 2011-05-07 01:25:12 PDT
I have 1GB Ram and it seems like the process gets about 300-400MB of it. Afterwards it starts paging and the GC performance gets exponentially worse.
Comment 5 Andrew McCreight [:mccr8] 2011-05-07 07:45:25 PDT
Is there any way to specify the max heap size for Firefox?  It would be useful for performance testing like this, and for bugs that show up when OOM.
Comment 6 Andreas Gal :gal 2011-05-07 09:24:26 PDT
We have a patch lying around I made 2 years ago that measures the total process heap size and tries to regulate that. Even better would be to measure paging and GC aggressively as soon we get near that.
Comment 7 Boris Zbarsky [:bz] (still a bit busy) 2011-05-07 18:37:06 PDT
A question:  how does the timescale for doing that compare to the timescale for doing GGC and just GCing more often period?  Or would we still want changes like this with a GGC?
Comment 8 Johnathan Nightingale [:johnath] 2011-05-24 14:53:49 PDT
Discussed in triage today - we don't think this is firefox 6 specific, but rather "ASAP" - would love to approve a safe patch!
Comment 9 Robert Sayre 2011-05-24 14:55:38 PDT
dmandelin, can you find an assignee for this?
Comment 10 Andreas Gal :gal 2011-05-24 14:57:29 PDT
Gregor, want to own this?
Comment 11 David Mandelin [:dmandelin] 2011-05-24 15:13:03 PDT
This doesn't seem to be a well-defined bug yet. What do we want here? To avoid the long GC pause on a 1GB netbook? Don't we need a 1GB netbook in order to test this?

Also, do we know the max heap size required by the application? I.e., is the problem that our GC is allocating too much memory before it does a GC, or is the problem that the workload just uses too much memory?
Comment 12 Andreas Gal :gal 2011-05-24 15:15:48 PDT
Gregor bought a 1GB netbook for testing purposes. The problem is that we exceed the available physical RAM and we have to page to/from disk during GC, which is insanely slow. We need better working set size management.
Comment 13 David Mandelin [:dmandelin] 2011-05-24 15:20:29 PDT
(In reply to comment #12)
> Gregor bought a 1GB netbook for testing purposes. 

+1 to Gregor. If he wants to work on this, that's great, but otherwise, we'd need another one.

> The problem is that we
> exceed the available physical RAM and we have to page to/from disk during
> GC, which is insanely slow. We need better working set size management.

Are you saying that it is because we are not GC'ing soon enough? I know we've talked about detecting and responding to memory pressure before, and that you've been a long-time advocate of it. Do you know of any papers on that subject? The review paper/textbook from the 90s don't seem to have much to say about it. So it sounds like it's going to need to use OS-specific APIs, and also like it's a research project--i.e., don't expect a quick fix. I fully agree that we should do that research at some point, though.

Or is there a quick fix, something like reading out how much memory the machine has, and if < 2GB, reset some tuning parameters?
Comment 14 Andreas Gal :gal 2011-05-24 15:24:48 PDT
All of the above sounds reasonable. I am all for a quick and dirty hack to make GCs more aggressive depending on physical memory size, and we should work on a more thorough approach that measures the working set and tries to balance it. I don't think academia talks much about this, its very application dependent.
Comment 15 Andrew McCreight [:mccr8] 2011-05-24 15:37:53 PDT
MLton attempts to adapt its GC based on the size of the working set vs system memory ( http://mlton.org/GarbageCollection ) but I'm not sure if they do anything beyond switching from a Cheney collector to a Mark-Compact collector when space runs low.
Comment 16 Gregor Wagner [:gwagner] 2011-05-24 15:38:52 PDT
I was already suggesting that many more people should get a netbook. I was testing heap growth parameters on it recently and changes that don't show any regression on my super MBP were a major slowdown on the netbook.
It also works the other way around where smaller memory footprint results in a 2x speedup on the netbook and a 1% regression on the MBP.

This work is very time-consuming because compiling the browser on this device doesn't work very well and it's easier to compile on the try-server.
I am pretty busy right now but we should come up with a strategy first and then I can decide if I have enough time to implement it.
Comment 17 Andreas Gal :gal 2011-05-24 15:40:41 PDT
Thats what you get for commenting on bugs with smart comments ... ;)
Comment 18 Andreas Gal :gal 2011-05-24 15:41:37 PDT
Andrew, want to ask IT for a netbook? If they don't have one just pick one up from best buy and expense it. I know that CC can be ridiculously slow on machines with limited RAM, too, so I guess we can check out both issues here.
Comment 19 Andrew McCreight [:mccr8] 2011-05-24 15:57:27 PDT
Well, that's basically the entirety of my knowledge of resource-constrained GC.  But I can look around to see what people do.  How does Firefox Mobile deal with this?  The Nexus S only has half a gig of RAM.  But it looks like its secondary storage is flash ram, so maybe thrashing doesn't hit it quite as hard?
Comment 20 Andreas Gal :gal 2011-05-24 15:59:28 PDT
I
Comment 21 Andreas Gal :gal 2011-05-24 16:00:17 PDT
dougt can explain us what mobile does, but I think basically on mobile you don't look at as many tabs at the same time, and the OS kills you when you start paging. They could certainly use better working set management too.
Comment 22 Colin Walters 2011-05-24 16:15:21 PDT
In 
https://bugzilla.gnome.org/show_bug.cgi?id=640790
(then later)
https://bugzilla.gnome.org/show_bug.cgi?id=643817

I ended up just parsing /proc/self/stat to look at RSS for this sort of thing.  

gjs is in a tricky place though because we're not in a position where we can easily tell Spidermonkey how much native malloc() allocation we do.
Comment 23 Andreas Gal :gal 2011-05-24 16:17:46 PDT
We are using a custom malloc (jemalloc), we might be able to tell easily what the total count is.
Comment 24 Igor Bukanov 2011-05-24 16:43:20 PDT
Any ideas regarding when to monitor the memory pressure? 

In FF our GC also uses jemalloc to allocate its chunks. If jemalloc is extended with a hook invoked when jemalloc calls mmap to allocate another 1MB of memory, then we can do the monitoring from the hook and schedule the GC accordingly.
Comment 25 Nicholas Nethercote [:njn] 2011-05-29 23:40:05 PDT
(In reply to comment #23)
> We are using a custom malloc (jemalloc), we might be able to tell easily
> what the total count is.

We can, see GetHeapUsed() and GetHeapUnused() in xpcom/base/nsMemoryReporterManager.cpp.
Comment 26 Andrew McCreight [:mccr8] 2011-06-10 16:19:53 PDT
Gregor pointed out an interesting ISMM'11 paper on adapting GC triggers based on memory pressure.  They approximate memory pressure using major page faults since the last GC and resident set sizes.  I guess another thing to consider would be page faults that happen during the GC.  I'll look into that.
Comment 27 Nicholas Nethercote [:njn] 2011-06-26 23:51:08 PDT
(In reply to comment #26)
> Gregor pointed out an interesting ISMM'11 paper on adapting GC triggers
> based on memory pressure.  They approximate memory pressure using major page
> faults since the last GC and resident set sizes.  I guess another thing to
> consider would be page faults that happen during the GC.  I'll look into
> that.

jlebar is working on this in bug 664291.
Comment 28 Nicholas Nethercote [:njn] 2011-06-26 23:53:22 PDT
Gregor, does bug 656120 ameliorate or fix this bug?
Comment 29 Andrew McCreight [:mccr8] 2011-06-27 05:15:15 PDT
In that bug he said "Our memory footprint is a big problem and I filed this bug because FF was/is very painful to use or even unusable (see bug 655455) on my new lowest-end netbook. This patch tries to keep the memory footprint small and therefore is a big win on such devices. Finally I can use FF on my new machine."

The heap sizing mechanism still is a bit funny, so we should still consider tweaking it.
Comment 30 Gregor Wagner [:gwagner] 2011-06-27 10:20:47 PDT
(In reply to comment #28)
> Gregor, does bug 656120 ameliorate or fix this bug?

Bug 656120 makes it possible to leave the browser open over night with some allocating workload or run it in background with another application in parallel. Yeah :)

This bug is (most likely) caused by a swapping problem during the GC and we haven't fixed it. Bug 664291 seems to be the right solution. Or maybe we need a hard upper JS-heap limit for memory constrained devices in addition. Right now we completely ignore this information and let the heap grow to unlimited. Bug 592907 wanted to go in this direction.

We are going in the right direction but we are not close to a point where I would say that it is a pleasure to use FF on my netbook.
Comment 31 Nicholas Nethercote [:njn] 2011-08-28 18:42:12 PDT
Gregor, judging from comment 30, the situation has improved since you filed this bug, and we have other bugs filed for all the remaining ideas we have to improve it further.  So I'll close this one, please reopen if you disagree.

Note You need to log in before you can comment on or make changes to this bug.