1508327 - Clearing mark bitmaps can take a long time

Reporter

Description

•

6 years ago

At the start of a GC we clear all the mark bits for collecting zones. This shows up as something that often causes budget overruns in our telemetry. This is plausible since it's linear in the size of the heap. Currently we do this during the first slice, in parallel with a bunch of other work. One approach to improve this situation might be to parallelise this further and use multiple threads to do this during the first slice. This is fairly simple, but I'm not sure how much benefit we can get with this since there is already other work going on in this slice and we are limited by number of cores and memory bandwidth. Another approach might be to kick this off in the first slice and it run concurrently to the mutator, only starting the GC for real in the second slice. This will add at least one extra slice to every GC, but is probably preferable. There might be other preparatory work we can do in advance too. A third approach would be to move to an epoc-based scheme where 'black' for the previous collection is treated as 'white' for the current collection. However the fact that we also have the extra gray colour makes this more difficult, as does the fact that the epoc would have to be per-zone because we don't collect all zones in every collection. It would remove this work completely but could impact marking performance if we have to test the mark state against per-zone information every time we want to mark something.

Lars T Hansen [:lth]

Comment 1

•

6 years ago

Is it something that could be done as a background low-priority housekeeping task? The task would be started directly after the previous GC finishes and if we have to GC before the clearing is done then the GC's first slice must provide an assist for it, but it seems likely that that won't be common.

Paul Bone [:pbone]

Comment 2

•

6 years ago

(In reply to Jon Coppeard (:jonco) from comment #0) > A third approach would be to move to an epoc-based scheme where 'black' for > the previous collection is treated as 'white' for the current collection. > However the fact that we also have the extra gray colour makes this more > difficult, as does the fact that the epoc would have to be per-zone because > we don't collect all zones in every collection. It would remove this work > completely but could impact marking performance if we have to test the mark > state against per-zone information every time we want to mark something. I've had this idea in the back of my mind, trying to work out how to do it with these concerns. It ought to be possible.

Andrew McCreight [:mccr8]

Comment 3

•

6 years ago

(In reply to Lars T Hansen [:lth] from comment #1) > Is it something that could be done as a background low-priority housekeeping > task? The task would be started directly after the previous GC finishes and > if we have to GC before the clearing is done then the GC's first slice must > provide an assist for it, but it seems likely that that won't be common. The cycle collector uses these bits in between GCs.

Jon Coppeard (:jonco)

Reporter

Comment 4

•

6 years ago

I had an idea for another approach to this: - Mark state is moved out of the chunk trailer and is instead located via a pointer in the arena header which points to the mark bits for all the cells in that arena. This can be null, meaning all cells are unmarked. - Mark state is bump allocated on demand from per-zone blocks. - All mark state is freed for each collected zone at once when we start a GC and the pointers in arena headers are set to null. This has the following advantages: - allocation of mark state is relatively cheap, and mostly happens during marking which is incremental. - clearing the mark bits means overwriting a pointer per arena rather than memset()ing the bits themselves and this is a 16 times reduction in memory writes. - external mark state makes it much easier to change its format later, e.g. to move to one byte per cell. This has the following disadvantages: - this adds an extra load and compare when accessing mark bits.

Jon Coppeard (:jonco)

Reporter

Comment 5

•

6 years ago

(In reply to Jon Coppeard (:jonco) from comment #4) Paul had a better idea, which was to clear the mark bits lazily by having a flag in the arena header to indicate that all cells should be treated as unmarked. At the start of the GC we set the flag for all arenas and we only clear the mark state when we need to mark a cell in an arena that has the flag set. We can do this immediately with the current mark bit layout.

Jon Coppeard (:jonco)

Reporter

Comment 6

•

6 years ago

Clearing the mark bitmaps accounts for 30% of our budget overruns on beta: https://telemetry.mozilla.org/new-pipeline/evo.html#!aggregates=7%2520bucket%2520percentage&cumulative=0&end_date=null&include_spill=0&keys=&max_channel_version=beta%252F65&measure=GC_SLOW_TASK&min_channel_version=beta%252F65&processType=*&product=Firefox&sanitize=1&sort_keys=submissions&start_date=null&trim=1&use_submission_date=0

Jon Coppeard (:jonco)

Reporter

Updated

•

6 years ago

Priority: -- → P3

Paul Bone [:pbone]

Comment 7

•

6 years ago

(In reply to Jon Coppeard (:jonco) from comment #5)

(In reply to Jon Coppeard (:jonco) from comment #4)
Paul had a better idea, which was to clear the mark bits lazily by having a
flag in the arena header to indicate that all cells should be treated as
unmarked. At the start of the GC we set the flag for all arenas and we only
clear the mark state when we need to mark a cell in an arena that has the
flag set. We can do this immediately with the current mark bit layout.

I've been thinking more about this.

I propose having two flags in each arena, One for "Everything is clear" and one for "Everything is marked" aka "This arena was allocated during collection". This second one will remove the cost of premarking (but not fixing the mark bits after sweeping). So there are 3 states we need to represent.

Everything is unmarked
Everything is marked
Check individual mark bits.

We can do this separately also. Except there's currently one spare bit in the arena header, not two! Whichever we do 2nd (probably premarking) will involve changing the usable size of an arena.

Or we can use a single bit to defer to other data:

Check the first cell's mark bit, that's the state of all the other mark bits.
Check individual mark bits.

(if we really care about saving this space).

I chatted with Jon and I'll take this bug soon.

status-firefox66: --- → fix-optional

Priority: P3 → P2

Whiteboard: [qf?]

Paul Bone [:pbone]

Depends on: 566578

Jean Gong :jgong

Updated

•

6 years ago

Whiteboard: [qf?] → [qf:p3:responsiveness]

Jon Coppeard (:jonco)

Reporter

Updated

•

5 years ago

Assignee: nobody → jcoppeard

Jon Coppeard (:jonco)

Reporter

Comment 8

•

5 years ago

I've tried a several different approaches, but haven't found any that give clear benefits. Abandoning this for now.

Assignee: jcoppeard → nobody

Priority: P2 → P3

Jon Coppeard (:jonco)

Reporter

Comment 9

•

4 years ago

We parallelised this in bug 1677765.

Status: NEW → RESOLVED

Closed: 4 years ago

Resolution: --- → DUPLICATE

Dave Hunt [:davehunt] [he/him] ⌚BST

Updated

•

3 years ago

Performance Impact: --- → P3

Keywords: perf:responsiveness

Whiteboard: [qf:p3:responsiveness]

Bugzilla

Clearing mark bitmaps can take a long time

Categories

(Core :: JavaScript: GC, enhancement, P3)

Tracking

()

People

(Reporter: jonco, Unassigned)

References

(Depends on 1 open bug)

Details

(Keywords: perf:responsiveness)

Crash Data

Security

(public)

User Story

Description

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Updated

Comment 7

Updated

Updated

Updated

Updated

Updated

Comment 8

Comment 9

Updated