655455 - Long GC pauses on netbooks (50s+)

Reporter

Description

•

13 years ago

Attached file GCTimer output — Details

I am testing with my new netbook. Single atom core 1.66GHz, 1GB ram.

We have a 7sec and a 4sec GC pause in there. Looks like it happens during the RegExp benchmark.

Andreas Gal :gal

Comment 1

•

13 years ago

This is Windows or Linux? I am dying to see a profile.

Gregor Wagner [:gwagner]

Reporter

Comment 2

•

13 years ago

It's Win 7 prof. I already see a 50 sec pause during the realtime raytracer. I guess it's time to change to add some memory pressure for the allocation.

Andreas Gal :gal

Updated

•

13 years ago

Summary: Long GC pause time during v8 benchmark → Long GC pauses on netbooks (50s+)

Andreas Gal :gal

Comment 3

•

13 years ago

Gregor just told me we should get a lot more people netbooks so we see this kind of stuff earlier (probably also a good simulation for cell phone use).

Andreas Gal :gal

Updated

•

13 years ago

tracking-fennec: --- → ?

tracking-firefox6: --- → ?

Gregor Wagner [:gwagner]

Reporter

Comment 4

•

13 years ago

I have 1GB Ram and it seems like the process gets about 300-400MB of it. Afterwards it starts paging and the GC performance gets exponentially worse.

Andrew McCreight [:mccr8]

Assignee

Comment 5

•

13 years ago

Is there any way to specify the max heap size for Firefox?  It would be useful for performance testing like this, and for bugs that show up when OOM.

Andreas Gal :gal

Comment 6

•

13 years ago

We have a patch lying around I made 2 years ago that measures the total process heap size and tries to regulate that. Even better would be to measure paging and GC aggressively as soon we get near that.

Boris Zbarsky [:bzbarsky]

Comment 7

•

13 years ago

A question:  how does the timescale for doing that compare to the timescale for doing GGC and just GCing more often period?  Or would we still want changes like this with a GGC?

Tom S [:evilpie]

Updated

•

13 years ago

Blocks: 505308

Doug Turner (:dougt)

Updated

•

13 years ago

tracking-fennec: ? → -

Johnathan Nightingale [:johnath]

Comment 8

•

13 years ago

Discussed in triage today - we don't think this is firefox 6 specific, but rather "ASAP" - would love to approve a safe patch!

tracking-firefox6: ? → -

Robert Sayre

Comment 9

•

13 years ago

dmandelin, can you find an assignee for this?

Andreas Gal :gal

Comment 10

•

13 years ago

Gregor, want to own this?

David Mandelin [:dmandelin]

Comment 11

•

13 years ago

This doesn't seem to be a well-defined bug yet. What do we want here? To avoid the long GC pause on a 1GB netbook? Don't we need a 1GB netbook in order to test this?

Also, do we know the max heap size required by the application? I.e., is the problem that our GC is allocating too much memory before it does a GC, or is the problem that the workload just uses too much memory?

Andreas Gal :gal

Comment 12

•

13 years ago

Gregor bought a 1GB netbook for testing purposes. The problem is that we exceed the available physical RAM and we have to page to/from disk during GC, which is insanely slow. We need better working set size management.

David Mandelin [:dmandelin]

Comment 13

•

13 years ago

(In reply to comment #12)
> Gregor bought a 1GB netbook for testing purposes. 

+1 to Gregor. If he wants to work on this, that's great, but otherwise, we'd need another one.

> The problem is that we
> exceed the available physical RAM and we have to page to/from disk during
> GC, which is insanely slow. We need better working set size management.

Are you saying that it is because we are not GC'ing soon enough? I know we've talked about detecting and responding to memory pressure before, and that you've been a long-time advocate of it. Do you know of any papers on that subject? The review paper/textbook from the 90s don't seem to have much to say about it. So it sounds like it's going to need to use OS-specific APIs, and also like it's a research project--i.e., don't expect a quick fix. I fully agree that we should do that research at some point, though.

Or is there a quick fix, something like reading out how much memory the machine has, and if < 2GB, reset some tuning parameters?

Andreas Gal :gal

Comment 14

•

13 years ago

All of the above sounds reasonable. I am all for a quick and dirty hack to make GCs more aggressive depending on physical memory size, and we should work on a more thorough approach that measures the working set and tries to balance it. I don't think academia talks much about this, its very application dependent.

Andrew McCreight [:mccr8]

Assignee

Comment 15

•

13 years ago

MLton attempts to adapt its GC based on the size of the working set vs system memory ( http://mlton.org/GarbageCollection ) but I'm not sure if they do anything beyond switching from a Cheney collector to a Mark-Compact collector when space runs low.

Gregor Wagner [:gwagner]

Reporter

Comment 16

•

13 years ago

I was already suggesting that many more people should get a netbook. I was testing heap growth parameters on it recently and changes that don't show any regression on my super MBP were a major slowdown on the netbook.
It also works the other way around where smaller memory footprint results in a 2x speedup on the netbook and a 1% regression on the MBP.

This work is very time-consuming because compiling the browser on this device doesn't work very well and it's easier to compile on the try-server.
I am pretty busy right now but we should come up with a strategy first and then I can decide if I have enough time to implement it.

Andreas Gal :gal

Comment 17

•

13 years ago

Thats what you get for commenting on bugs with smart comments ... ;)

Assignee: general → continuation

Andreas Gal :gal

Comment 18

•

13 years ago

Andrew, want to ask IT for a netbook? If they don't have one just pick one up from best buy and expense it. I know that CC can be ridiculously slow on machines with limited RAM, too, so I guess we can check out both issues here.

Andrew McCreight [:mccr8]

Assignee

Comment 19

•

13 years ago

Well, that's basically the entirety of my knowledge of resource-constrained GC.  But I can look around to see what people do.  How does Firefox Mobile deal with this?  The Nexus S only has half a gig of RAM.  But it looks like its secondary storage is flash ram, so maybe thrashing doesn't hit it quite as hard?

Andreas Gal :gal

Comment 20

•

13 years ago

Andreas Gal :gal

Comment 21

•

13 years ago

dougt can explain us what mobile does, but I think basically on mobile you don't look at as many tabs at the same time, and the OS kills you when you start paging. They could certainly use better working set management too.

Colin Walters

Comment 22

•

13 years ago

In 
https://bugzilla.gnome.org/show_bug.cgi?id=640790
(then later)
https://bugzilla.gnome.org/show_bug.cgi?id=643817

I ended up just parsing /proc/self/stat to look at RSS for this sort of thing.  

gjs is in a tricky place though because we're not in a position where we can easily tell Spidermonkey how much native malloc() allocation we do.

Andreas Gal :gal

Comment 23

•

13 years ago

We are using a custom malloc (jemalloc), we might be able to tell easily what the total count is.

Igor Bukanov

Comment 24

•

13 years ago

Any ideas regarding when to monitor the memory pressure? 

In FF our GC also uses jemalloc to allocate its chunks. If jemalloc is extended with a hook invoked when jemalloc calls mmap to allocate another 1MB of memory, then we can do the monitoring from the hook and schedule the GC accordingly.

Nicholas Nethercote [inactive]

Comment 25

•

13 years ago

(In reply to comment #23)
> We are using a custom malloc (jemalloc), we might be able to tell easily
> what the total count is.

We can, see GetHeapUsed() and GetHeapUnused() in xpcom/base/nsMemoryReporterManager.cpp.

Wayne Mery (:wsmwk)

Updated

•

13 years ago

Keywords: perf

Andrew McCreight [:mccr8]

Assignee

Comment 26

•

13 years ago

Gregor pointed out an interesting ISMM'11 paper on adapting GC triggers based on memory pressure.  They approximate memory pressure using major page faults since the last GC and resident set sizes.  I guess another thing to consider would be page faults that happen during the GC.  I'll look into that.

Andrew McCreight [:mccr8]

Assignee

Updated

•

13 years ago

Depends on: 664291

Nicholas Nethercote [inactive]

Comment 27

•

13 years ago

(In reply to comment #26)
> Gregor pointed out an interesting ISMM'11 paper on adapting GC triggers
> based on memory pressure.  They approximate memory pressure using major page
> faults since the last GC and resident set sizes.  I guess another thing to
> consider would be page faults that happen during the GC.  I'll look into
> that.

jlebar is working on this in bug 664291.

Nicholas Nethercote [inactive]

Comment 28

•

13 years ago

Gregor, does bug 656120 ameliorate or fix this bug?

Andrew McCreight [:mccr8]

Assignee

Comment 29

•

13 years ago

In that bug he said "Our memory footprint is a big problem and I filed this bug because FF was/is very painful to use or even unusable (see bug 655455) on my new lowest-end netbook. This patch tries to keep the memory footprint small and therefore is a big win on such devices. Finally I can use FF on my new machine."

The heap sizing mechanism still is a bit funny, so we should still consider tweaking it.

Gregor Wagner [:gwagner]

Reporter

Comment 30

•

13 years ago

(In reply to comment #28)
> Gregor, does bug 656120 ameliorate or fix this bug?

Bug 656120 makes it possible to leave the browser open over night with some allocating workload or run it in background with another application in parallel. Yeah :)

This bug is (most likely) caused by a swapping problem during the GC and we haven't fixed it. Bug 664291 seems to be the right solution. Or maybe we need a hard upper JS-heap limit for memory constrained devices in addition. Right now we completely ignore this information and let the heap grow to unlimited. Bug 592907 wanted to go in this direction.

We are going in the right direction but we are not close to a point where I would say that it is a pleasure to use FF on my netbook.

Nicholas Nethercote [inactive]

Updated

•

13 years ago

Whiteboard: [MemShrink:P2]

Nicholas Nethercote [inactive]

Comment 31

•

13 years ago

Gregor, judging from comment 30, the situation has improved since you filed this bug, and we have other bugs filed for all the remaining ideas we have to improve it further.  So I'll close this one, please reopen if you disagree.

Status: NEW → RESOLVED

Closed: 13 years ago

Resolution: --- → FIXED