Closed Bug 1608289 Opened 5 years ago Closed 3 years ago

More GC's occurring during parsing in Fenix compared to Fennec 68

Categories

(Core :: JavaScript: GC, defect, P3)

defect

Tracking

()

RESOLVED WONTFIX
Performance Impact high

People

(Reporter: bas.schouten, Unassigned)

References

(Blocks 1 open bug)

Details

(Keywords: perf:pageload)

Current Fenix builds are showing GC majors happening during parsing, considerably slowing JS execution during pageload. Examples where this can be seen are:

Fennec: https://perfht.ml/37Jvx0u vs Fenix: https://perfht.ml/30aNdzp

In particular during the execution of <script> inline at line 72 of https://accounts.google.com/ServiceLogin?passive=1209600&continue=https%3A%2F%2Faccounts.google.com%2F&followup=https%3A%2F%2Faccounts.google.com%2F

Or on another site:

Fennec: https://perfht.ml/36DJk8x vs Fenix: https://perfht.ml/2tNXghC

In particular during the execution of <script src="https://support.microsoft.com/bundles/jslibraries?v=7595FctQ4nhDnzr1rSpN4H12FKAoBCwtOJxjwZt9U2s1">

In both these cases we are also seeing other parts of JS being slower, this could be related, but I'm filing these separately. Since a clean pageload having GC majors like that seems to be a problem anyway.

Blocks: 1604248
See Also: → 1608290
Blocks: 1604930
Whiteboard: [qf:p1:pageload]

Bas, could you perhaps tell how the measurements were done. Like were the profiles captured while running browsertime or some such or just normal browsing and was the page the first page to be loaded? Just to make it easier for other people to reproduce.

I see some marionette code being executed in the child process, so I assume this is some testing framework.

Flags: needinfo?(bas)

This was measured with browsertime. I believe that does stabilize on about:blank, but it should be the first 'real' pageload. I don't think browsertime does the stabilization on example.com.

Flags: needinfo?(bas)

(In reply to Bas Schouten (:bas.schouten) from comment #0)
Hi, I'm having trouble understanding these profiles.

Fennec: https://perfht.ml/37Jvx0u vs Fenix: https://perfht.ml/30aNdzp

These both seem to link to the same profile.

Fennec: https://perfht.ml/36DJk8x vs Fenix: https://perfht.ml/2tNXghC

The Fenix profile shows some GC the content process, but the Fennec profile doesn't include the content process so it's hard to compare. These GCs are triggered because scripts are allocating a lot of memory.

Flags: needinfo?(bas)

(In reply to Jon Coppeard (:jonco) from comment #3)

(In reply to Bas Schouten (:bas.schouten) from comment #0)
Hi, I'm having trouble understanding these profiles.

Fennec: https://perfht.ml/37Jvx0u vs Fenix: https://perfht.ml/30aNdzp

These both seem to link to the same profile.

Oh my. Sorry about that. Here's the Fennec profile: https://perfht.ml/2Refi4Y

Fennec: https://perfht.ml/36DJk8x vs Fenix: https://perfht.ml/2tNXghC

The Fenix profile shows some GC the content process, but the Fennec profile doesn't include the content process so it's hard to compare. These GCs are triggered because scripts are allocating a lot of memory.

Fennec doesn't use e10s. The main process for Fennec is where you can see the script I linked being executed. You can clearly compare the execution time of the total script for both cases. (As I mentioned there is non-GC causes as well)

FWIW when we run Fenix is single process mode, we do see the GCs go away (but still see the other slowdown). So this may be assumed to be related to e10s somehow. (Perhaps an initial growing of the JS heap that non-e10s doesn't suffer from)

Flags: needinfo?(bas)

The browser is conditioned by:
• installing it
• navigating to example.com and waiting 30 seconds after pageload
• navigating to about:blank and waiting 5 after pageload
and then loading the target page.

(In reply to Bas Schouten (:bas.schouten) from comment #4)
Ah, thanks for the clarification.

So yes it seems we get more GCs on Fenix, possibly because it's using a separate processe for content. I wonder if Fennec uses a single zone for everything? That might explain it.

I'm not sure what we can do about this.

(In reply to Jon Coppeard (:jonco) from comment #6)

(In reply to Bas Schouten (:bas.schouten) from comment #4)
Ah, thanks for the clarification.

So yes it seems we get more GCs on Fenix, possibly because it's using a separate processe for content. I wonder if Fennec uses a single zone for everything? That might explain it.

I'm not sure what we can do about this.

Should be grow the JS heap in advance on a new content process for Fenix (after all, we only have one content process for Fenix)?

Or in some other way at least defer this GC until after pageload is complete..

Flags: needinfo?(jcoppeard)

(In reply to Bas Schouten (:bas.schouten) from comment #7)

Should be grow the JS heap in advance on a new content process for Fenix (after all, we only have one content process for Fenix)?

Growing the heap itself is not the issue. The problem is that GC is triggered based on heap size. We can change the triggers, but this will increase memory usage.

Although I didn't land any changes to improve it, in Bug 1575943, we see that it can be the DOM requesting GCs during pageload based on heuristics.
(e.g. mozilla::dom::Document::HasRecentlyStartedForegroundLoads()
We could try tuning these again in our current Fenix test environment.

(In reply to Bas Schouten (:bas.schouten) from comment #7)

(after all, we only have one content process for Fenix)?

Note that this will be changing, likely during H1 2020, so I would recommend against making any decisions based on the current behaviour.

It doesn't seem like there is much we can usefully do here. You can experiment with changing the parameters that govern GC, which are mostly defined here:

https://searchfox.org/mozilla-central/source/modules/libpref/init/all.js#1167-1206

However this will probably just result in moving GCs around rather than eliminating them.

How much of a factor is GC in Fenix performance at the moment?

Flags: needinfo?(jcoppeard) → needinfo?(bas)
Priority: -- → P3

(In reply to Jon Coppeard (:jonco) from comment #11)
To be more specific, the javascript.options.mem.gc_allocation_threshold_mb will affect when the first GC takes place (it's used as a base when calculating the heap size that will trigger a GC). So increasing this would delay the first GC.

Other parameters that control the heap growth are javascript.options.mem.gc_low_frequency_heap_growth and the javascript.options.mem.gc_high_frequency_* parameters. I don't think GCs are occuring at high frequency here (defined as more that 1 per second, configurable), so the first parameter is probably the relevant one. Increasing this will make us use more memory and GC less often.

It seems like for now we've accepted this regression.

Status: NEW → RESOLVED
Closed: 3 years ago
Flags: needinfo?(bas)
Resolution: --- → WONTFIX
Performance Impact: --- → P1
Keywords: perf:pageload
Whiteboard: [qf:p1:pageload]
You need to log in before you can comment on or make changes to this bug.