699668 - long garbage collector sweep times due to LifoAlloc destructor in JSCompartment::sweep

Reporter

Description

•

13 years ago

On my browser, sweep times are longer than Bill said is expected. Sometimes they are longer than mark times. Some are bad like this: GC(T+938.9) Type:Glob, Total:233.3, Wait:1.3, Mark:84.0, Sweep:147.0, FinObj:2.1, FinStr:0.4, FinScr:0.7, FinShp:2.3, DisCod:2.4, DisAnl:134.2, XPCnct:1.3, Destry:0.0, End:2.3, +Chu:0, -Chu:0, Reason: API GC(T+1161.0) Type:Glob, Total:273.6, Wait:10.0, Mark:87.9, Sweep:139.9, FinObj:4.1, FinStr:0.6, FinScr:2.8, FinShp:30.2, DisCod:2.3, DisAnl:94.0, XPCnct:1.7, Destry:0.3, End:37.3, +Chu:0, -Chu:0, Reason: API Some are not as bad: GC(T+808.4) Type:Glob, Total:192.5, Wait:0.1, Mark:86.4, Sweep:57.8, FinObj:2.2, FinStr:0.4, FinScr:0.7, FinShp:2.3, DisCod:1.9, DisAnl:45.2, XPCnct:1.3, Destry:0.0, End:49.4, +Chu:0, -Chu:3, Reason: API GC(T+1165.1) Type:Glob, Total:133.0, Wait:0.1, Mark:90.0, Sweep:42.6, FinObj:0.9, FinStr:0.1, FinScr:0.7, FinShp:2.3, DisCod:1.8, DisAnl:31.7, XPCnct:1.2, Destry:0.1, End:1.3, +Chu:0, -Chu:0, Reason: API GC(T+1190.0) Type:Glob, Total:115.8, Wait:0.1, Mark:86.0, Sweep:29.0, FinObj:2.0, FinStr:0.3, FinScr:0.6, FinShp:2.2, DisCod:1.7, DisAnl:17.2, XPCnct:1.4, Destry:0.0, End:1.8, +Chu:0, -Chu:9, Reason: API I don't know if this is relevant or not, but the sweep times seemed worse with TI disabled. This is with gmail, twitter, zimbra, facebook, bugzilla, and a few other misc sites, with a little light browsing, like CNN. Addons: about:telemetry 0.6 true Add-on Compatibility Reporter 0.9.2 true Flashblock 1.5.15.1 true pdf.js PDF Reader 0.1 true Places Maintenance 1.0 true Test Pilot 1.2 true Mozilla Labs: Prospector - AwesomeBar HD 14 false

Gregor Wagner [:gwagner]

Comment 1

•

13 years ago

What is DisAnl?

Bill McCloskey [inactive unless it's an emergency] (:billm)

Assignee

Comment 2

•

13 years ago

That's added in bug 699298. It times part of JSCompartment::sweep (the part that discards TI analysis data). I guess it would be nice to allow long names in the error console and short names in the MOZ_GCTIMER column headings.

Gregor Wagner [:gwagner]

Comment 3

•

13 years ago

(In reply to Bill McCloskey (:billm) from comment #2) > That's added in bug 699298. It times part of JSCompartment::sweep (the part > that discards TI analysis data). > Oh ok the one I should review. Will do it tomorrow. > I guess it would be nice to allow long names in the error console and short > names in the MOZ_GCTIMER column headings. That's the place where we point people with long GC pauses right? I guess it would save us some time if we make it more verbose. What's so expensive in the DisAnl part? Do we touch every reachable object?

Brian Hackett [Laid off!]

Comment 4

•

13 years ago

The DisAnl part is sweeping type information in scripts and type objects. The times here are way higher than I saw back in July, and the DisAnl times when TI is disabled should be about zero, so something weird seems to be going on.

Andrew McCreight [:mccr8]

Reporter

Comment 5

•

13 years ago

My home computer (Win7) doesn't seem to be showing this problem. I don't have as much stuff open, but it is synced with my office computer.

Bill McCloskey [inactive unless it's an emergency] (:billm)

Assignee

Comment 6

•

13 years ago

We tracked this down to the LifoAlloc destructor in JSCompartment::sweep. TI allocates memory from the LifoAlloc in 4K chunks. A bunch of them get freed in the destructor. We increased the block size from 4K to 1MB and the sweep times went back to normal. So I think we need to find a better strategy for sizing the blocks. Maybe some kind of doubling or something. Nick, Chris said that he had talked about this with you in the LifoAlloc bugs. Could you guys figure out a solution? We need something that gets rid of these >100ms pauses but doesn't bloat us. And if we do use a little more memory, we need to make sure that the slop is reported in about:memory.

Andrew McCreight [:mccr8]

Reporter

Updated

•

13 years ago

Blocks: 684039

status-firefox10: --- → affected

Andrew McCreight [:mccr8]

Reporter

Updated

•

13 years ago

Summary: long garbage collector sweep times → long garbage collector sweep times due to LifoAlloc destructor in JSCompartment::sweep

Nicholas Nethercote [inactive]

Comment 7

•

13 years ago

I introduced LifoAlloc::sizeOf() in bug 684800 which accounts for unused space and space used by headers, which LifoAlloc::used() doesn't, and I converted the typeLifoAlloc to use sizeOf(). (I probably should have removed used(), too.) So the about:memory side is covered. The simplest thing would be to try a range of different sizes for TYPE_LIFO_ALLOC_PRIMARY_CHUNK_SIZE. 4KB is pretty tiny given that "analysis-temporary" is very often multiple MB. 1MB sounds a little high... maybe something like 128KB or 256KB reduces the sweep times acceptably?

Randell Jesup [:jesup] (needinfo me)

Updated

•

13 years ago

Keywords: perf

Andrew McCreight [:mccr8]

Reporter

Comment 8

•

13 years ago

Is anybody looking into this? I'm currently seeing sweep times that average 250ms on my browser (vs 97ms for marking).

Nicholas Nethercote [inactive]

Comment 9

•

13 years ago

mccr8, since you are seeing the problem manifest, can you try changing TYPE_LIFO_ALLOC_PRIMARY_CHUNK_SIZE to 128KB and see if that helps?

Andrew McCreight [:mccr8]

Reporter

Comment 10

•

13 years ago

Will do.

Bill McCloskey [inactive unless it's an emergency] (:billm)

Assignee

Comment 11

•

13 years ago

Why don't we do some kind of exponential growth here? We could use a small growth factor to ensure we don't waste too much space. It would be much more general than picking a particular size and crossing our fingers.

Andrew McCreight [:mccr8]

Reporter

Comment 12

•

13 years ago

256KB is works fine, with mark times averaging 90ms and sweep times of 26ms. I thought you said 256 instead of 128, Nick, for some reason. I'll try 128KB now...

Andrew McCreight [:mccr8]

Reporter

Comment 13

•

13 years ago

At 128KB, I have 175ms mark times and 48ms sweep times. Longer mark times are probably just different usage.

Nicholas Nethercote [inactive]

Comment 14

•

13 years ago

Is 26ms reasonable for sweep times? If so, we could just up to 256KB and land. If not, we could think about the exponential growth thing, though that would require some changes to LifoAlloc.

Bill McCloskey [inactive unless it's an emergency] (:billm)

Assignee

Comment 15

•

13 years ago

Attached patch simple fix — Details — Splinter Review

Maybe it's better just to fix this before we forget about it.

Assignee: general → wmccloskey

Status: NEW → ASSIGNED

Attachment #574938 - Flags: review?(nnethercote)

Nicholas Nethercote [inactive]

Updated

•

13 years ago

Attachment #574938 - Flags: review?(nnethercote) → review+

Bill McCloskey [inactive unless it's an emergency] (:billm)

Assignee

Comment 16

•

13 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/aa05d607c95e

Target Milestone: --- → mozilla11

Marco Bonardo [:mak]

Comment 17

•

13 years ago

https://hg.mozilla.org/mozilla-central/rev/aa05d607c95e

Status: ASSIGNED → RESOLVED

Closed: 13 years ago

Resolution: --- → FIXED

Marco Bonardo [:mak]

Comment 18

•

13 years ago

tree-management reports a 1.5% TraceMalloc MaxHeap on WinNT5.2 increase from this patch, is this expected effect of enlarging the arenas?

Bill McCloskey [inactive unless it's an emergency] (:billm)

Assignee

Comment 19

•

13 years ago

No, this was not expected. I backed out the change. https://hg.mozilla.org/integration/mozilla-inbound/rev/dd9749c483ce It kind of makes sense that this would cause a regression. Each compartment has one of these allocators, so in the worst case we're wasting ~256K per compartment. That's horrible. I think we're going to have to go with an exponential growth scheme. Do you have time to do this, Chris? We really do need it. Gavin was seeing GC times of over a second caused by this bug.

Status: RESOLVED → REOPENED

Resolution: FIXED → ---

Nicholas Nethercote [inactive]

Comment 20

•

13 years ago

I was about to say the following when I collided with Bill's comment: "It's quite plausible. But the allocations are fairly short-lived -- this structure is purged on every GC -- so I'm not too worried about it. The improvements in sweep times are definite, any slowdowns from slightly more memory being used are much murkier in comparison." As long as we are allocating larger Lifo chunks, be they all the same size or exponentially increasing, it's possible that the last chunk we allocate will not get filled up. And an exponential approach is probably worse. With the 256KB chunks we waste at most 256KB; with an exponential scheme... 4 + 8 + 16 + 32 + 64 + 128 + 256 == 508, which means we'll potentially waste more if the size of the Lifo allocations exceed 508KB, which is really common -- they're commonly multiple MB. I'd recommend re-landing the existing patch. The cost of a 256KB, mostly-untouched, short-lived allocation isn't that high. Bill, if you're still uncomfortable, maybe you could do more measurements with 128KB or 64KB chunks, see if they still avoid the long sweep times enough to satisfy you.

Bill McCloskey [inactive unless it's an emergency] (:billm)

Assignee

Comment 21

•

13 years ago

I'm not sure we (or I) understand the problem enough to re-land the patch. My fear is that there are a bunch of compartments with very little TI data, where nearly all of the 256K is wasted. This might not be happening--maybe if we use any space, we use a lot. But it seems foolish to waste memory if we don't have to. Also, we don't know if the memory is short-lived. Some of the data in the arena is preserved across GCs. If a tiny bit is preserved in each GC, then we're potentially wasting a lot of memory permanently. Finally, an exponential growth scheme doesn't have to use doubling. It just needs to ensure that the next block allocated is some fraction of the total size currently allocated. So that allows us to control the amount of wasted memory as a fraction. Doubling uses 1.0, but we could use 0.25 or something like that. I don't mean to be argumentative here. I just want to make sure we understand the workload before proceeding.

Randell Jesup [:jesup] (needinfo me)

Comment 22

•

13 years ago

I'm tempted to say re-land, and see what difference it makes in other cases. If you want to go with exponential, you don't have to let it grow beyond 256KB - it can grow (at whatever rate) up to a cap.

Nicholas Nethercote [inactive]

Comment 23

•

13 years ago

I was thinking that the typeLifoAlloc was global, but it's per-compartment, so the potential for waste is higher than I thought.

Nicholas Nethercote [inactive]

Comment 24

•

13 years ago

(In reply to Bill McCloskey (:billm) from comment #21) > Doubling uses 1.0, but we could use 0.25 or something > like that. The good thing about doubling is that you avoid slop bytes due to awkward request sizes. (We had one case in the JM assembler where the growth factor was 1.5x and we were wasting heaps of memory, so I changed it to 2x.)

Bill McCloskey [inactive unless it's an emergency] (:billm)

Assignee

Comment 25

•

13 years ago

I did some measurements with and without the patch. I used a pretty simple workload with 8 tabs (nytimes, gmail, etc.). I let them load and then sit for a while, doing a few GCs, before I measured. So this is steady-state behavior. Without the patch, we waste about 31KB and use 2763 blocks. With the patch we waste about 3MB and use 55 blocks. If I reduce the block size to 128K, then we waste about 1.5MB and use 98 blocks. I played around with a bunch of different allocation schemes, but none of them seem to do much better than 128KB blocks. I had been assuming that most of the space was wasted by very small arenas, but it's actually the compartments with lots of type data that waste the most. So if we re-land, I think we definitely should use 128K blocks. However, is 1.5MB waste for 8 tabs acceptable? One thing I noticed is that the amount of type data per compartment seems to be pretty steady. Could we add some kind of trim command that we could call after a GC? It would realloc the last block to be whatever size it needs to be.

Nicholas Nethercote [inactive]

Comment 26

•

13 years ago

> Without the patch, we waste about 31KB and use 2763 blocks. With the patch > we waste about 3MB and use 55 blocks. If I reduce the block size to 128K, > then we waste about 1.5MB and use 98 blocks. With 256KB blocks: - total = 55 * 256KB = 14080KB - waste = 3072KB - useful = 14080KB - 3072KB = 11008KB --> 78.2% With 128KB blocks: - total = 98 * 128KB = 12544KB - waste = 1536KB - useful = 12544KB - 1536KB = 11008KB --> 87.8% With 4KB blocks: - total mem = 2763 * 4KB = 11052KB - waste = 31KB - useful = 11052KB - 31KB = 11021KB --> 99.7% > I played around with a bunch of different allocation schemes, but none of > them seem to do much better than 128KB blocks. I had been assuming that most > of the space was wasted by very small arenas, but it's actually the > compartments with lots of type data that waste the most. Each typeLifoAlloc will have only one block that is not full, right? So the average waste will should be N * B / 2, where N is the number of compartments and B is the block size. Do you know roughly what N was for your test? Hmm, I should be able to figure it out: if W = N * B / 2, then N = W / B * 2... With 256KB blocks: N = 3072KB / 256KB * 2 = 24.0 With 128KB blocks: N = 1536KB / 128KB * 2 = 24.0 With 4KB blocks: N = 31KB / 4KB * 2 = 15.5 Seems reasonable for 8 tabs. Not sure about that 15.5, but this assumes an average waste of B/2 per compartment so it's approximate anyway. > So if we re-land, I think we definitely should use 128K blocks. However, is > 1.5MB waste for 8 tabs acceptable? So that's average waste of 64KB per compartment. That's the kind of question that's really hard to answer, but everyone looks at me when asking it :/ Given that you have definite slow-downs without the patch, I'd say yes. However, this is the kind of thing that will kill us with compartment-per-global.

Bill McCloskey [inactive unless it's an emergency] (:billm)

Assignee

Comment 27

•

13 years ago

OK, I just relanded this for 128K. I guess if it starts to be a problem, the waste will show up in about:memory and we can fix it then. https://hg.mozilla.org/integration/mozilla-inbound/rev/487d3d5dc47f

Ed Morley [:emorley]

Comment 28

•

13 years ago

https://hg.mozilla.org/mozilla-central/rev/487d3d5dc47f

Status: REOPENED → RESOLVED

Closed: 13 years ago → 13 years ago

Resolution: --- → FIXED

Bill McCloskey [inactive unless it's an emergency] (:billm)

Assignee

Comment 29

•

13 years ago

Comment on attachment 574938 [details] [diff] [review] simple fix This is a fairly serious regression in GC pause time for Mac users. It started happening in Firefox 10, and it was fixed on 11, so I think we should try to get this in beta. It's very unlikely that this patch will make us crash or have any other correctness problems. However, it will cause a small increase in memory usage. This is a trade-off to avoid the long pauses (which may be hundreds of milliseconds).

Attachment #574938 - Flags: approval-mozilla-beta?

Alex Keybl [:akeybl]

Comment 30

•

13 years ago

Comment on attachment 574938 [details] [diff] [review] simple fix [Triage Comment] Given our recent CC/GC investigations and how early we are in the cycle, let's take this on beta.

Attachment #574938 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

Bill McCloskey [inactive unless it's an emergency] (:billm)

Assignee

Comment 31

•

13 years ago

https://hg.mozilla.org/releases/mozilla-beta/rev/3003c5a4c3a3

status-firefox10: affected → fixed

u279076

Comment 32

•

13 years ago

Is this something QA can verify without access to Andrew's profile?

Whiteboard: [qa?]

Bill McCloskey [inactive unless it's an emergency] (:billm)

Assignee

Comment 33

•

13 years ago

(In reply to Anthony Hughes, Mozilla QA (irc: ashughes) from comment #32) > Is this something QA can verify without access to Andrew's profile? No.

u279076

Updated

•

13 years ago

Whiteboard: [qa?] → [qa-]

Danial Horton

Comment 34

•

13 years ago

Has anyone noticed an after effect of this patch being lower overall memory use?