Open Bug 1822411 Opened 2 years ago Updated 2 months ago

MajorGC pauses in WebWorker possibly due to lack of incremental GC

Categories

(Core :: JavaScript: GC, defect, P2)

Firefox 112
defect

Tracking

()

UNCONFIRMED

People

(Reporter: robert, Unassigned)

References

(Depends on 1 open bug, Blocks 1 open bug)

Details

User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:109.0) Gecko/20100101 Firefox/112.0

Steps to reproduce:

I'm building a WebGL experience on https://thirdroom.io which is utilizing OffscreenCanvas in one WebWorker driven by requestAnimationFrame and setTimeout with a fixed 16ms interval for driving my physics / game loop.

My physics engine (rapier.js) allocates about 4kb per second of short-lived JS objects.

Every 5 seconds or so the physics WebWorker experiences a ~41ms MajorGC event as observed via Firefox Profiler. This causes the render thread to fall behind and drop frames and the physics engine to compute physics incorrectly.

To reproduce:

  1. Go to https://thirdroom.io
  2. Login in as a guest or with your Matrix account (This is a Matrix.org/Element funded project)
  3. Enter your "Home World" room and wait about 5 seconds. You can move about the 3D world with the WASD keys and mouse to see the effects even more.

If you want to debug this further, you can run the whole project locally by checking out the code here:
https://github.com/matrix-org/thirdroom

And installing via yarn and then yarn dev. The workers use module imports so you'll want to be on FF Nightly when testing locally. This will allow you to properly read the source maps in the debugger / Firefox Profiler.

Actual results:

There are ~41ms GC pauses every 5 seconds in one of our app's WebWorkers. After talking with @jrmuizel in the SpiderMonkey Matrix room it was suggested that SpiderMonkey may lack incremental GC in WebWorkers leading to occasional MajorGC events vs many MinorGC events.

Expected results:

SpiderMonkey should be scheduling MinorGC events much more often to clean up these short-lived JS objects. This currently affects any app trying to use OffscreenCanvas or WebWorkers needing consistent timing for multithreaded synchronization.

Clarification: incremental GC would break up major GCs into multiple major GC slices. In the profiler, that would translate into leaving the GCMajor markers unchanged, but adding in a bunch of GCSlice markers within each GCMajor. The worker would then only be pausing for the duration of a single GCSlice at a time.

Minor GCs are unrelated, though they also may be relevant to this scenario. For example in this profile posted to #spidermonkey, there aren't any GCMinor markers showing up. I'm not sure why not—if you look at the details of the GCMajor markers, there are about 16-17 minor GCs happening in between the major GCs. Maybe it's a problem in reporting worker minor GCs to the profiler?

I'll also note that the major GCs in the profile are long. One is 447ms, and that's a single nonincremental slice and so corresponds to a 447ms pause on the worker.

I think we need to come up with a heuristic to decide that a worker's output is latency-sensitive, and enable incremental GCs in that case.

See Also: → 941794

Another issue I see in this profile: these major GCs aren't collecting much, and the heap isn't growing much.

I collected my own profile, and it shows the same thing—the process's memory usage climbs steadily, but the GC heap for the DOM worker reports a pretty much level amount of GC memory used. I will note that my local profile (using Nightly) shows a lot more problems, and my machine was mostly unusable while Third Room was running under the profiler. (It was also updating at about 1 frame every 3 seconds or so.)

Additional notes on my local run: the major GCs came in pairs. There would be 2-4 minor GCs (not shown; inferred from the GCMajor marker), then a major GC that cleaned up nothing, and then immediately after that another GCMajor that also cleaned up nothing. The reason for both is DOM_WORKER.

The 1st GC discards the JIT code, and there are quite a few bailouts (~35) as the JIT gets going again. So there is a whole sequence of: do two GCs and discard JIT, recompile a bunch of stuff.

All GCs are all-zones GCs. Half of all time in the DOM worker is spent doing useless GCs.

Sadly, I am unable to share my local profile because it runs out of memory when I try to upload it.

I opened up the browser in another profile and it gives me basically good performance, and also reproduces the problem clearly.

Depends on: 1822626
Severity: -- → S3
Priority: -- → P2

We have been seeing problems with this as well. We have a number of performance-critical animations which are driven by code in workers and they have been seeing noticeable jank when the worker GC runs.

I have a simple use case that basically boils down to a worker running:

const blocks = Array(500);
requestAnimationFrame(function onFrame() {
    blocks.pop();
    blocks.unshift(Array.from({ length: 500 }).map((_, index) => ({ [index]: index })));
    requestAnimationFrame(onFrame);
});

A live demo with some extra bells and whistles for adjusting the object creation count and a spinning rectangle to help visually see the issue:

On my machine, Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:121.0) Gecko/20100101 Firefox/121.0, I'm seeing two (due to 1822626 ?) 300-425ms stalls from the worker GC around every 5 seconds.

There's also an example of the same animation running (smoothly) on the UI thread:

I am actively working on this, using thirdroom as my test case. Enabling incremental GC on workers is easy; scheduling when the slices should happen and for how long is hard. I have something roughly like what I want working well enough to push a try job for people to experiment with. I've really only looked at one scenario, and even there I'm trying to figure out why my fallback GCs are not triggering, but it seems to give a usable experience for now.

It also includes a bunch of logging that you can enable in about:logging by using the log string "WorkerGC:5". However, in my testing that is too verbose and slows everything down, mostly preventing the worker loop from going idle. "WorkerGC:4" should be more tolerable, though it loses some fine-grained detail.

Now that things are more or less in place, I'll start looking at some other examples. I like using thirdroom as a test, though, because it's really cool.

Robert, I don't know if you're still actively developing thirdroom, but I'd be happy to hear any impressions you have of the trial Firefox version. Sorry it took me a year to do something about this!

thirdroom kind of doesn't create enough garbage to really test things out, though. The < 1ms minor GCs are able to handle pretty much everything, so the whole incremental GC stuff doesn't come into play. (Which is exactly what a realtime app should look like, of course, but I'll need to work with something stupider for testing.) The thing that matters here is a redo of the scheduling that gets rid of the GC every 5 seconds when there's no garbage to speak of.

Flags: needinfo?(robert)

Hey Steve, thanks for checking this out! I'm not actively developing ThirdRoom anymore, however I'm still hoping that the work will live on in some form. I'm also working on a new project at Bezi which will likely need incremental GC in workers at some point.

My initial impressions are great! It looks like this issues I was seeing are fixed in Nightly. I just tested this on a M3 Macbook Pro and the difference between Firefox Stable 128 vs Firefox Nighly 130 at Medium graphics quality is substantial. It seems to be performing on the same level as Chrome now. I don't have time to do an in-depth analysis right now, but my experience is that the dropped frames due to worker GC pauses are no longer there.

Thank you so much for looking into this!

Flags: needinfo?(robert)
You need to log in before you can comment on or make changes to this bug.