Closed Bug 1724510 Opened 6 months ago Closed 5 months ago

Freezes on 32-bit ARM after bug 1434542

Categories

(Core :: JavaScript: GC, defect, P2)

defect

Tracking

()

RESOLVED FIXED
Tracking Status
firefox-esr78 --- unaffected
firefox-esr91 --- unaffected
firefox91 --- unaffected
firefox92 --- wontfix
firefox93 --- fixed

People

(Reporter: sfink, Unassigned)

References

(Regression)

Details

(Keywords: regression)

Attachments

(3 files)

fabrice in #spidermonkey reported main-thread freezes on ARM32:

<fabrice> jonco: we are seeing main thread freezing issues on low-memory devices (32 bits ARM) after the changes in bug 1434542. Do you have ideas on what/how to investigate?
<sfink> fabrice: would it be possible to either capture a profile, or run with env var MOZ_GCTIMER=/tmp/gc.txt ?
<fabrice> sfink: Ok, I will use the MOZ_GCTIMER var. Using the profiler at startup is difficult on these devices.
<sfink> you're seeing this during startup?
<fabrice> early enough that it may be hard to profile.

The resulting log shows some ALLOC_TRIGGER GCs that start with a preparation slice with a 30ms budget (which yields after <1ms). The next slice is then given 128ms, which seems far too long. It's enough time for the GC to finish, with pause times of 42ms, 58ms, and 85ms.

So that's one problem: the budget seems to be scaling up too quickly.

But I don't know whether even an 85ms pause would appear as a "freeze".

Attached file MOZ_GCTIMER log
Severity: -- → S3
Priority: -- → P2

Set release status flags based on info from the regressing bug 1434542

Fabrice, how are the JS memory options configured for these devices?

The log shows that GC is happening because the heap has exceeded a threshold of 1.2MB, which is extremely low. We calculate the threshold using a base heap size that means the threshold would normally not be less that ~35MB or so. These parameters are configurable though, in this case as javascript.options.mem.gc_allocation_threshold_mb.

Flags: needinfo?(fabrice)

Jon, the GC prefs are at https://github.com/kaiostech/gecko-b2g/blob/ba870c16e56bf1e77255368097c5b7a28dc9f21c/b2g/app/b2g.js#L591

Clearly they have not been revised for a very long time, so any help figuring out better values would be greatly appreciated! Feel free to ping me in Matrix if that's easier.

Flags: needinfo?(fabrice)
Depends on: 1725584

(In reply to [:fabrice] Fabrice Desré from comment #4)
Thanks, that does explain what's going on. gc_allocation_threshold_mb is very low and combined with the recently-added urgent threshold (default 16MB) this means that we're going to end up finishing a lot of GCs non-incrementally.

They key with setting these prefs is to test representative workloads and find a tradeoff between the amount of time spent doing GC and the max size of heap used. Setting these values very low will result in a lot of GC activity but won't improve memory use beyond a certain point.

Looking at the prefs themselves:

// Increase mark slice time from 10ms to 30ms
pref("javascript.options.mem.gc_incremental_slice_ms", 30);

We use 5ms in the browser nowadays. Reducing this could reduce jank but test first if you run on lower performance devices. It looks like this was intentionally increased from the original 10ms.

pref("javascript.options.mem.gc_high_frequency_heap_growth_max", 300);
pref("javascript.options.mem.gc_high_frequency_heap_growth_min", 120);
pref("javascript.options.mem.gc_high_frequency_high_limit_mb", 40);
pref("javascript.options.mem.gc_high_frequency_low_limit_mb", 0);

These parameters all got renamed so I'm not sure what effect setting them has if any. The new names/defaults are:

pref("javascript.options.mem.gc_high_frequency_small_heap_growth", 300);
pref("javascript.options.mem.gc_high_frequency_large_heap_growth", 150);
pref("javascript.options.mem.gc_small_heap_size_max_mb", 100);
pref("javascript.options.mem.gc_large_heap_size_min_mb", 500);

The growth values are factors that are multiplied by 100 to get an integer, so here 150 means a factor of 1.5.

The purpose of these prefs is to make the high frequency growth factor vary based on the heap size (this is used to determine the GC trigger when GC is happening frequently). The heap is classified into small/medium/large by comparing its size to gc_small_heap_size_max_mb and gc_large_heap_size_min_mb (previous gc_high_frequency_high_limit_mb and gc_high_frequency_low_limit_mb). If the heap is small the gc_high_frequency_small_heap_growth is used, if it is large gc_high_frequency_large_heap_growth is used and otherwise the values are interpolated.

This heap size classification is also used to determine the non-incremental limit, which is the size at which the GC will finish an ongoing incremental GC non-incrementally. In the browser we have:

pref("javascript.options.mem.gc_small_heap_incremental_limit", 140);
pref("javascript.options.mem.gc_large_heap_incremental_limit", 110);

pref("javascript.options.mem.gc_low_frequency_heap_growth", 120);

This is 150 in the browser but 120 is not unreasonable. This will trigger more GCs as the heap grows but keep the max heap size smaller.

pref("javascript.options.mem.high_water_mark", 6);

This pref has been removed.

pref("javascript.options.mem.gc_allocation_threshold_mb", 1);

This is an important one for the purpose of this bug and is used to calculate the threshold size at which GC is triggered. This is calculated by max(heap_size, gc_allocation_threshold_mb) * growth_factor. The upshot of this is that we don't trigger GC on allocation if the heap size is less than gc_allocation_threshold_mb * growth_factor. Other parts of the system do still trigger GCs for other reasons.

In the brower this is 27MB.

Finally, a new pref has been added:

pref("javascript.options.mem.gc_urgent_threshold_mb", 16);

This controls the point at which an incremental GC becomes 'urgent' and the GC increases the budget of GC slices in a effort to finish it before we reach the non-incremental limit (at which point we will finish it synchronously).

Writing all this up makes me realise that this system is too complicated to configure easily and we should try and simplify it.

The problem in this bug is that we are hitting the urgent threshold immediately and increasing the slice time leading to very long slices. I'm not sure why the GC is running repeatedly though.

You want roughly arrange for growth_factor * heap_size * (non_incremental_limit - 1) to be more that 2 * urgent_threshold.

I'd suggest:

  • removing renamed parameters and replacing with the browser defaults from modules/libpref/init/all.js
  • adding missing prefs such as javascript.options.mem.gc_small_heap_incremental_limit and mem.gc_large_heap_incremental_limit
  • increasing javascript.options.mem.gc_allocation_threshold_mb to 10 MB
  • adding a pref for javascript.options.mem.gc_urgent_threshold_mb and setting it to 8 MB

Hopefully that fixes this problem and you can the tune the parameters further from there.

Thanks Jon, that's very helpful!

If you agree I think we can close this bug and we'll deal with the changes on the b2g side.

Great, let us know if you run into any problems.

Status: NEW → RESOLVED
Closed: 5 months ago
Resolution: --- → FIXED

I've added config based on modules/libpref/init/all.js and the suggestion from Comment 5.

// JSGC_SLICE_TIME_BUDGET_MS
// Override the shell's default of unlimited slice time.
pref("javascript.options.mem.gc_incremental_slice_ms", 5);

// JSGC_COMPACTING_ENABLED
pref("javascript.options.mem.gc_compacting", true);

// JSGC_HIGH_FREQUENCY_TIME_LIMIT
pref("javascript.options.mem.gc_high_frequency_time_limit_ms", 1000);

// JSGC_SMALL_HEAP_SIZE_MAX
pref("javascript.options.mem.gc_small_heap_size_max_mb", 100);

// JSGC_LARGE_HEAP_SIZE_MIN
pref("javascript.options.mem.gc_large_heap_size_min_mb", 500);

// JSGC_HIGH_FREQUENCY_SMALL_HEAP_GROWTH
pref("javascript.options.mem.gc_high_frequency_small_heap_growth", 300);

// JSGC_HIGH_FREQUENCY_LARGE_HEAP_GROWTH
pref("javascript.options.mem.gc_high_frequency_large_heap_growth", 120);

// JSGC_LOW_FREQUENCY_HEAP_GROWTH
pref("javascript.options.mem.gc_low_frequency_heap_growth", 120);

// JSGC_ALLOCATION_THRESHOLD
pref("javascript.options.mem.gc_allocation_threshold_mb", 10);

// JSGC_MALLOC_THRESHOLD_BASE
pref("javascript.options.mem.gc_malloc_threshold_base_mb", 38);

// JSGC_SMALL_HEAP_INCREMENTAL_LIMIT
pref("javascript.options.mem.gc_small_heap_incremental_limit", 140);

// JSGC_LARGE_HEAP_INCREMENTAL_LIMIT
pref("javascript.options.mem.gc_large_heap_incremental_limit", 110);

// JSGC_URGENT_THRESHOLD_MB
pref("javascript.options.mem.gc_urgent_threshold_mb", 8);

// JSGC_MIN_EMPTY_CHUNK_COUNT
pref("javascript.options.mem.gc_min_empty_chunk_count", 1);

// JSGC_MAX_EMPTY_CHUNK_COUNT
pref("javascript.options.mem.gc_max_empty_chunk_count", 30);

// JSGC_HELPER_THREAD_RATIO
pref("javascript.options.mem.gc_helper_thread_ratio", 50);

// JSGC_MAX_HELPER_THREADS
pref("javascript.options.mem.gc_max_helper_threads", 8);


I still see long latency and the reason is:

Invocation Kind: Shrink
Reason: DOM_WORKER
Incremental: no - NonIncrementalRequested

Sometimes Max pause is up to 100ms. And I found that "Wait Background Thread" time is longer, comparing to gecko 86 version. I have seen this symptom even without applying new GC config.

GC(T+17.775s) =================================================================
Invocation Kind: Shrink
Reason: DOM_WORKER
Incremental: no - NonIncrementalRequested
Zones Collected: 2 of 2 (-0)
Compartments Collected: 1 of 1 (-0)
MinorGCs since last GC: 1
Store Buffer Overflows: 0
MMU 20ms:0.0%; 50ms:0.0%
SCC Sweep Total (MaxPause): 4.139ms (4.139ms)
HeapSize: 0.238 MiB
Chunk Delta (magnitude): +2 (2)
Arenas Relocated: 0.059 MiB
---- Slice 0 ----
Reason: DOM_WORKER
Trigger: n/a
Reset: no
State: NotActive -> NotActive
Page Faults: 1745
Pause: 129.688ms of unlimited budget (@ 0.000ms)
Evict Nursery For Major GC: 35.058ms
Other: 34.994ms
Mark Roots: 0.064ms
Other: 0.059ms
Mark C and JS stacks: 0.005ms
Wait Background Thread: 80.300ms
Prepare For Collection: 1.227ms
Other: 0.150ms
Mark Discard Code: 0.006ms
Relazify Functions: 0.043ms
Purge: 0.856ms
Purge PropMapTables: 0.168ms
Purge Source URLs: 0.002ms
Join Parallel Tasks: 0.002ms
Mark: 1.697ms
Other: 1.477ms
Mark Roots: 0.220ms
Other: 0.057ms
Mark Cross Compartment Wrappers: 0.002ms
Mark C and JS stacks: 0.004ms
Mark Runtime-wide Data: 0.156ms
Mark Embedding: 0.001ms
Sweep: 6.823ms
Other: 0.942ms
Mark During Sweeping: 0.090ms
Mark Incoming Black Pointers: 0.002ms
Mark Weak: 0.026ms
Mark Gray: 0.051ms
Finalize Start Callbacks: 0.004ms
Sweep Atoms Bitmap: 0.046ms
Sweep Atoms Table: 1.733ms
Sweep Compartments: 3.960ms
Other: 0.085ms
Sweep FinalizationRegistries: 0.003ms
Sweep JIT Data: 0.004ms
Sweep Miscellaneous: 0.002ms
Join Parallel Tasks: 3.866ms
Sweep Object: 0.006ms
Sweep String: 0.003ms
Sweep Script: 0.002ms
Sweep Scope: 0.002ms
Sweep RegExpShared: 0.002ms
Sweep Shape: 0.002ms
Sweep PropMap: 0.014ms
Finalize End Callback: 0.003ms
Deallocate: 0.015ms
Compact: 4.289ms
Compact Move: 0.118ms
Compact Update: 4.138ms
Other: 0.390ms
Mark Roots: 0.444ms
Other: 0.435ms
Mark Cross Compartment Wrappers: 0.003ms
Mark C and JS stacks: 0.003ms
Mark Embedding: 0.002ms
Join Parallel Tasks: 3.305ms
Decommit: 0.030ms
---- Totals ----
Total Time: 129.688ms
Max Pause: 129.688ms
Evict Nursery For Major GC: 35.058ms
Other: 34.994ms
Mark Roots: 0.064ms
Other: 0.059ms
Mark C and JS stacks: 0.005ms
Wait Background Thread: 80.300ms
Prepare For Collection: 1.227ms
Other: 0.150ms
Mark Discard Code: 0.006ms
Relazify Functions: 0.043ms
Purge: 0.856ms
Purge PropMapTables: 0.168ms
Purge Source URLs: 0.002ms
Join Parallel Tasks: 0.002ms
Mark: 1.697ms
Other: 1.477ms
Mark Roots: 0.220ms
Other: 0.057ms
Mark Cross Compartment Wrappers: 0.002ms
Mark C and JS stacks: 0.004ms
Mark Runtime-wide Data: 0.156ms
Mark Embedding: 0.001ms
Sweep: 6.823ms
Other: 0.942ms
Mark During Sweeping: 0.090ms
Mark Incoming Black Pointers: 0.002ms
Mark Weak: 0.026ms
Mark Gray: 0.051ms
Finalize Start Callbacks: 0.004ms
Sweep Atoms Bitmap: 0.046ms
Sweep Atoms Table: 1.733ms
Sweep Compartments: 3.960ms
Other: 0.085ms
Sweep FinalizationRegistries: 0.003ms
Sweep JIT Data: 0.004ms
Sweep Miscellaneous: 0.002ms
Join Parallel Tasks: 3.866ms
Sweep Object: 0.006ms
Sweep String: 0.003ms
Sweep Script: 0.002ms
Sweep Scope: 0.002ms
Sweep RegExpShared: 0.002ms
Sweep Shape: 0.002ms
Sweep PropMap: 0.014ms
Finalize End Callback: 0.003ms
Deallocate: 0.015ms
Compact: 4.289ms
Compact Move: 0.118ms
Compact Update: 4.138ms
Other: 0.390ms
Mark Roots: 0.444ms
Other: 0.435ms
Mark Cross Compartment Wrappers: 0.003ms
Mark C and JS stacks: 0.003ms
Mark Embedding: 0.002ms
Join Parallel Tasks: 3.305ms
Decommit: 0.030ms

I'm not sure it's normal to have max pause upto 100ms from DOM_WORKER, is there anyting we can improve here? Are there parameters we could tweak a bit?

Flags: needinfo?(jcoppeard)

(In reply to Shawn Huang [:shawnjohnjr][:shuang] (as a happy gecko contributor) from comment #8)

I still see long latency and the reason is:

Invocation Kind: Shrink
Reason: DOM_WORKER
Incremental: no - NonIncrementalRequested

This shouldn't be happening on the main thread though, since this a worker GC.

This sounds like a separate issue since you say it's happening already without the changed config. If this is a problem can you file a separate bug?

Are there parameters we could tweak a bit?

Worker runtimes get collected fairly aggressively when they go idle and this isn't related to the tuning parameters. I'm surprised it's taking so long though.

Flags: needinfo?(jcoppeard)
Attachment #9240013 - Attachment description: MOZ_GCTIMER log after applying suggested gc mem config → (master)MOZ_GCTIMER log after applying suggested gc mem config

(In reply to Jon Coppeard (:jonco) from comment #10)

(In reply to Shawn Huang [:shawnjohnjr][:shuang] (as a happy gecko contributor) from comment #8)

I still see long latency and the reason is:

Invocation Kind: Shrink
Reason: DOM_WORKER
Incremental: no - NonIncrementalRequested

This shouldn't be happening on the main thread though, since this a worker GC.
This sounds like a separate issue since you say it's happening already without the changed config. If this is a problem can you file a separate bug?

Are there parameters we could tweak a bit?

Worker runtimes get collected fairly aggressively when they go idle and this isn't related to the tuning parameters. I'm surprised it's taking so >long though.

Ok, if it's not happening on the main thread then it's not a problem for us now. I'm surprised too, since I don't see similar long max pause from gecko84 log.

Has Regression Range: --- → yes
You need to log in before you can comment on or make changes to this bug.