finalisation slice can run late

ASSIGNED
Assigned to

Status

()

P3
normal
ASSIGNED
2 years ago
a year ago

People

(Reporter: pbone, Assigned: pbone)

Tracking

({perf})

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [qf:p3])

Attachments

(2 attachments)

(Assignee)

Description

2 years ago
The JS shell runs GC slices when the amount of allocated memory crosses a threshold.  If a slice ends after finishing sweeping but not finalisation (or maybe part-way through sweeping) then the allocated memory will have dropped, the next slice won't run now for some time (until the allocated memory increases again).  This means that if other zones want to begin a collection they need to interrupt this unfinished collection or due to bug 1367455 they will be delayed. Also barriers may still be in place (but I don't know) which may slow throughput.
(Assignee)

Updated

2 years ago
Assignee: nobody → pbone
Status: NEW → ASSIGNED
(Assignee)

Updated

2 years ago
See Also: → bug 1367455
Whiteboard: [qf]
(Assignee)

Comment 1

2 years ago
Here are the two regressions that I'm attempting to fix:

https://treeherder.mozilla.org/perf.html#/alerts?id=7016
(Assignee)

Updated

2 years ago
Blocks: 1376904
(Assignee)

Updated

2 years ago
No longer blocks: 1376904
Paul is this something that only shows up with shell or does this pattern happen in the browser as well?  From the description it seems like the general browser could be affected by this too.
Flags: needinfo?(pbone)
(Assignee)

Comment 3

a year ago
Created attachment 8885084 [details]
Profile timeline

This is a screenshot from the gecko profiler.  This could be the same behaviour happening in the browser (either from the same or a different cause). I'm also suspicious about the integration between the GC and the gecko profiler, I'm currently trying to work out how all this fits together so I understand _completely_ what it is I'm seeing.
Flags: needinfo?(pbone)
(Assignee)

Comment 4

a year ago
Created attachment 8885085 [details]
Profile from Splay Latency

Here is the gecko profile file for splay latency.
Whiteboard: [qf] → [qf:p3]
(Assignee)

Comment 5

a year ago
Here's what I know:

 + The same behavior with the final delayed slice does occur in the browser.

 + The GC following the one with the delayed slice becomes non-incremental due to GCBytesTrigger. This is also in Beta (I think 
   my earlier patch is now in Beta).  In my tests with the shell this was because the heap filled up during the collection 
   because the start of the collection was delayed.  But there are also other possibilities.

 + My patch, which i discussed with jonco, attempts to avoid this by adding a new monotonic counter for the amount of used
   memory, as memory is freed the counter is left alone, it is used to trigger the final slice in a reasonable time and 
   complete the zonal GC sooner and therefor not delaying the following GC.

 + My patch works, it does avoid this problem, however _all_ the GCs seem to switch to non-incremental (rather than only the 
   delayed GCs) due to GCBytesTrigger. This creates more frequent and longer pauses, lowering the benchmark scores further.
   This shouldn't happen if my original guess that the delayed GC became non-incremental because it was delayed.  Now none 
   are being delayed and yet ALL are non-incremental, this could also be a separate cause.

 + My patched version seems to have slices that "look" further apart but I haven't measured accurately.  This should not be 
   caused by no-longer delaying the final slice, that doesn't make sense.  However it does suggest my new counter is not
   counting as fast as the previous one. Which could cause the GC to fail to reclaim memory quick enough and therefore 
   switching to non-incremental mode.

 + Since lowering the slice budget to 5ms the whole system is possibly more likely to fail to reclaim enough memory soon 
   enough, as the trigger for slices wasn't changed to correspond to this.  I propose lowering the heap thresholds to begin a 
   collection earlier and therefore hope to run enough slices before the heap fills up.  Ideally we should calculate some of 
   these parameters dynamically, based on a projection of how much work remains during a collection and how much space is
   left in the heap.

There are some things I can test:

 + Adjust the threshold for my new counter, lower it and see if the behavior continues.

 + Search for places I'm missing to where I should be incrementing the counter.

 + Re-increase the GC pause time to 10ms and see if the problem still occurs.

 + Lower the thresholds that start a collection.
(Assignee)

Updated

a year ago
Blocks: 1367455
(Assignee)

Updated

a year ago
See Also: → bug 1386219
Priority: -- → P3
(In reply to Paul Bone [:pbone] from comment #5)
>  + My patch works, it does avoid this problem, however _all_ the GCs seem to
> switch to non-incremental (rather than only the 
>    delayed GCs) due to GCBytesTrigger. This creates more frequent and longer
> pauses, lowering the benchmark scores further.
>    This shouldn't happen if my original guess that the delayed GC became
> non-incremental because it was delayed.  Now none 
>    are being delayed and yet ALL are non-incremental, this could also be a
> separate cause.

It seems like it's important to understand what's going on here. If you look at a profile where this is happening, does the information on thresholds make sense? As in, if this is GCBytesTrigger, then timings.trigger_amount and timings.trigger_threshold should be getting set. Are these significantly different than before your patch? (It just strikes me that if your new counter were never incremented at all, due to some bug, then it would have this observed behavior.)
Flags: needinfo?(pbone)
(Assignee)

Comment 7

a year ago
Good thinking. I'll check that when I get to look at this next,  Probably not until Monday the 25th.
(Assignee)

Updated

a year ago
Flags: needinfo?(pbone)
You need to log in before you can comment on or make changes to this bug.