Closed Bug 1907794 Opened 2 months ago Closed 2 months ago

Very long cycle collection times (4.3s) in parser-worker.js

Categories

(DevTools :: Debugger, defect, P2)

defect

Tracking

(firefox130 fixed)

RESOLVED FIXED
130 Branch
Tracking Status
firefox130 --- fixed

People

(Reporter: jrmuizel, Assigned: smaug)

References

(Depends on 2 open bugs)

Details

Attachments

(2 files)

I saw this while debugging https://fairfaxmedia.pressreader.com/the-dominion-post for bug 1901497

An about memory report gives:

├──2,522.20 MB (44.08%) -- workers/workers(chrome)
│  ├──2,257.18 MB (39.45%) -- worker(resource://devtools/client/debugger/dist/parser-worker.js, 0x503753200)
│  │  ├──2,220.59 MB (38.81%) -- zone(0x3051f7200)
│  │  │  ├──2,110.67 MB (36.89%) -- realm(web-worker)
│  │  │  │  ├──2,110.10 MB (36.88%) -- classes
│  │  │  │  │  ├──1,909.70 MB (33.37%) ++ class(Object)/objects
│  │  │  │  │  ├────177.39 MB (03.10%) ++ class(Array)/objects
│  │  │  │  │  └─────23.00 MB (00.40%) ++ (4 tiny)
│  │  │  │  └──────0.58 MB (00.01%) ++ (3 tiny)
│  │  │  └────109.92 MB (01.92%) ++ (11 tiny)
│  │  └─────36.59 MB (00.64%) ++ (3 tiny)
│  ├────167.91 MB (02.93%) ++ worker(resource://devtools/client/shared/source-map-loader/worker.js, 0x5f6262400)
│  └─────97.12 MB (01.70%) ++ (9 tiny)
Depends on: 1907989
Depends on: 1908327

We bumped into this, without being able to fully understand why the GC was so slow (see bug 1843454 comment 6).
Nor was I able to workaround this by killing this worker and respawn it (for now, we keep it alive for the whole debugger lifecycle). (bug 1854421)
I communicated offline with andrew about this, but killing the worker cause the same very slow GC pause.
I highlighted this via this test page: http://techno-barje.fr/fission/worker-terminate/

There is also some mention of this worker issue at the end of that gdoc:
https://docs.google.com/document/d/1NRL-8c7o-cQdPwWLon5fu0PzShsFClgeveKxZypLkrM/edit

Today, the plan is to finish the migration of Debugger to CodeMirror 5 to 6 (bug 1904488) [this is an active project coming to its end].
Which would then allow us to remove this parser-worker thing (Babel parsing javascript sources in a worker) to CodeMirror 6 new parser (lezer) (bug 1908327), which is way way faster as it supports parsing javascript by chunks (i.e. only what is displayed in the viewport).

Well, that's an enormous amount of stuff. I feel like you might be getting close to the upper limit on the JS heap size. So, the teardown CC is likely going to be slow no matter what.

As for the normal CCs, if the CC is traversing those objects, that means they are reachable from C++ but not from JS roots (like variables in JS code). As I said in the other bug, the first step would be to capture a CC log for this worker. You'd want to do it in a state where the heap is not so large, because analyzing a log this big will be near impossible. Once we have a CC log, I can take a look and figure out what C++ is holding onto things. On the main thread, we have a lot of optimizations to remove "obviously alive" things from the CC graph. For instance, if JS is held alive by a DOM node that is in a document that is currently being displayed, then the JS must be alive and the CC does not need to consider it. It is possible that an existing thing here should be removing the JS from the CC graph, but it isn't happening because we don't run the CC cleanup stuff we do on the main thread. This is the first instance I can recall of a CC being so slow on workers. We've done very little performance work for CCs on workers because it hasn't been an issue.

Attached file cc-parser-worker.zip

Here are gc-edges and cc-edges for parser-worker.js on a page that looks like:

<script src="https://ajax.googleapis.com/ajax/libs/jquery/3.3.1/jquery.min.js"></script>
Flags: needinfo?(continuation)

Running census.py on the CC log, there are over 2000 function objects for functions named "get". That looked like the most common non-generic thing, so I picked a few of those and looked at why they are alive.

0x13893cbc0 [CallbackObject]
    --[mCallback]--> 0x3dcd64249af8 [JS Object (Function - requireWorkerUtils/workerHandler/<)]
    --[**UNKNOWN SLOT 1**]--> 0x3dcd642767b0 [JS Object (Call)]
    --[enclosing_environment]--> 0x3dcd6426b270 [JS Object (Call)]
    --[enclosing_environment]--> 0x31bfed574388 [JS Object (LexicalEnvironment)]
    --[enclosing_environment]--> 0x31bfed5745e8 [JS Object (Call)]
    --[requireUppercase]--> 0x31bfed582510 [JS Object (Function - requireUppercase)]
    --[JSFunction script]--> 0x31bfed516e20 [JS Script]
    --[script-gcthing]--> 0x37b8a09b3468 [JS Object (Function - requireUppercase/<)]
    --[JSFunction script]--> 0x31bfed516e70 [JS Script]
    --[script-gcthing]--> 0x37b8a09b6078 [JS Object (Function - get)]

The half dozen or so I looked at were all like this: a DOM CallbackObject was holding alive a requireWorkerUtils/workerHandler/< function, which then entrained some other functions via enclosing_environment.

They were all doing this via a single callback object, 0x13893cbc0. In fact, in the ones I looked at, the retaining path for all of them was the same through the call object 0x31bfed5745e8. One of the call objects has a property requireSourcemapCodec_umd, so I guess that is somehow related with this so maybe this is all associated with the giant lambda assigned to Buffer.prototype.fill earlier.

I'm not sure why there's so much stuff leaking. Maybe these are being assigned repeatedly? We might be able to figure out what is holding onto the callback and skip it in the CC somehow but it also seems kind of bad that so much stuff is being held alive through these function closures.

Flags: needinfo?(continuation)

I guess maybe we're entraining all of this stuff here on this line: self.onmessage = workerUtilsExports.workerHandler({ getMatches, findSourceMatches });? I think that's sticking a reference to one of the objects in this huge lambda into some kind of worker event handler? Olli, do you have any idea how we could reduce the CC size here? We don't run any of the "event handler skippable stuff" on workers do we?

Flags: needinfo?(smaug)

You could also probably hack around the slow CC times by putting a reference to this big workerUtilsExports object on the worker global. Just be sure to clear it when the event listener is cleared or it'll be a huge leak!

I grabbed another much bigger CC log for https://fairfaxmedia.pressreader.com/the-dominion-post. Running census on it gives:

Object frequency. Showing no more than 5 classes of objects, with at least 5 objects each.
10241616 JS Object (other)
  155015 JS Object (Function)
  147247 JS GetterSetter
    3679 JS Scope
    3596 JS Script

JS function object frequency. Showing no more than 5 JS function objects, with at least 5 objects each.
  145356 get expression
    2232 get
     825 validate
     294 constructor
     226 message

Objects with highest ref counts. Showing no more than 5 objects, with ref count of at least 0
  rc=1 0x30aa5eb60 ScriptFetchOptions
  rc=1 0x30a4b8cd0 LoadedScript
  rc=1 0x13893cbc0 CallbackObject

Because the number of get functions is 2000 in both I think it might not be the problem.

The high number of get expression looks like it's coming from things like:

          identifiers.push({
            name: path.node.value,
            get expression() {
              delete this.expression;
              this.expression = getObjectExpressionValue(path.parent);
              return this.expression;
            },
            location: path.node.loc,
          });

Where we build up big arrays of get expressions

But yeah, the get expression stuff is making our object graph big and we should try to reduce its size but it's not what's causing us to cycle collect.

Your suggestion to keep workerUtilsExports alive seems to fix the problem for me:

diff --git a/devtools/client/debugger/dist/parser-worker.js b/devtools/client/debugger/dist/parser-worker.js
index f120a4753b0ce..8c5eb41efcdbf 100644
--- a/devtools/client/debugger/dist/parser-worker.js
+++ b/devtools/client/debugger/dist/parser-worker.js
@@ -41415,20 +41415,21 @@ Please specify the "importAttributesKeyword" generator option, whose value can b
     }

     var workerUtilsExports = requireWorkerUtils();

     function clearAllHelpersForSources(sourceIds) {
       clearASTs(sourceIds);
       clearScopes(sourceIds);
       clearSources(sourceIds);
       clearSymbols(sourceIds);
     }
+    globalThis.wue = workerUtilsExports;

     self.onmessage = workerUtilsExports.workerHandler({
       findOutOfScopeLocations,
       findBestMatchExpression,
       getSymbols,
       getFunctionSymbols,
       getClassSymbols,
       getClosestFunctionName,
       getScopes,
       clearSources: clearAllHelpersForSources,

That patch makes debugging large script files much more snappy for me.

(In reply to Jeff Muizelaar [:jrmuizel] from comment #9)

Your suggestion to keep workerUtilsExports alive seems to fix the problem for me:

diff --git a/devtools/client/debugger/dist/parser-worker.js b/devtools/client/debugger/dist/parser-worker.js
index f120a4753b0ce..8c5eb41efcdbf 100644
--- a/devtools/client/debugger/dist/parser-worker.js
+++ b/devtools/client/debugger/dist/parser-worker.js
     var workerUtilsExports = requireWorkerUtils();
+    globalThis.wue = workerUtilsExports;
     self.onmessage = workerUtilsExports.workerHandler({

dist/parser-worker.js is generated with rollup (via devtools/client/debugger/bin/bundle.js ).
workerHandler comes from https://searchfox.org/mozilla-central/rev/b220e40ff2ee3d10ce68e07d8a8a577d5558e2a2/devtools/client/debugger/src/workers/parser/worker.js#19,28

import { workerHandler } from "../../../../shared/worker-utils";
...
self.onmessage = workerHandler({

Rollup seems to create this intermediary workerUtilsExports var.
I'm not sure this is the right solution here, but I was trying to get rid of this variable, and turning the worker-utils file in a proper ES module does that.
Running our perf test locally, it doesn't seem to make a big difference, I'm waiting for some TRY results to see if that's really the case.

Jeff, would you have detailed steps I can go through to see the long CC? You mentioned https://fairfaxmedia.pressreader.com/the-dominion-post , which script should I open/add breakpoints into? Also, how do you see these long CC, in the profiler or by another mean?

Flags: needinfo?(jmuizelaar)

Yeah, I wouldn't expect your change to help because it's not keeping the world alive via GC

Doing the following also seems to help and should work in worker.js

globalThis.wh = workerHandler

My rough STR:

Flags: needinfo?(jmuizelaar)

(In reply to Andrew McCreight [:mccr8] from comment #5)

I guess maybe we're entraining all of this stuff here on this line: self.onmessage = workerUtilsExports.workerHandler({ getMatches, findSourceMatches });? I think that's sticking a reference to one of the objects in this huge lambda into some kind of worker event handler? Olli, do you have any idea how we could reduce the CC size here? We don't run any of the "event handler skippable stuff" on workers do we?

We could trace black JS? On workers there are way fewer EventTarget objects, or any native objects, so we could perhaps try to go through the certainly alive DETHs and call ELM::TraceListeners.

Flags: needinfo?(smaug)

Smaug's patch seems to fix the problem for me.

Assignee: nobody → smaug
Status: NEW → ASSIGNED
Attachment #9413850 - Attachment description: WIP: Bug 1907794, trace certainly black event listeners, r=mccr8 → Bug 1907794, trace certainly black event listeners, r=mccr8

(In reply to Jeff Muizelaar [:jrmuizel] from comment #8)

The high number of get expression looks like it's coming from things like:

Thanks for digging into this more! I really need some kind of random option for find_roots.py that will pick a certain number of random objects matching a given pattern as that is often useful for these kinds of leaks.

Thanks Jeff, Andrew and Olli for the help figuring out why the CC was slow!!

We just couldn't understand why CC was slow and how to fix it.
But now we (the devtools team) are curious to understand and be able to figure out such issue in the future.

Is there some documentation to understand the difference between CC and GC?
Something that would help understand why, when an object isn't in the GC, we have to go through the CC and then things would be slow. (my comprehension of comment 2)

Also, is there some documentation about how to capture and browse these CC/GC logs? (I see no mention of censys.py in Firefox source docs)
And especially how to identify that some node is GC-rooted or CC-rooted? Is that something we can understand by reading these logs?
Also, is the memory panel in DevTools of any help? I imagine it may not, as it only shows JS objects and may not help identify that problematic CC-rooted object?

Otherwise I would like to make it clear that we knew that this code allocates tons and tons of JS objects.
The debugger, because it currently relies on Babel, does parse the whole JavaScript files opened in the debugger.
You have seen that expressions is very large, because it will contain all expressions of the debugged JS files...
There is nothing wrong about that. This is a logical issue related to being forced to do full-file parsing.
We know about that, and we have in scope to migrate from babel to lezer (bug 1908327) which allows to do partial parsing. We will stop having to allocate so much data, and stop keeping most of the parsing data in memory!

Severity: -- → S3
Priority: -- → P2
Pushed by opettay@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/da675cd57ab1 trace certainly black event listeners, r=mccr8,dom-worker-reviewers,asuth
Status: ASSIGNED → RESOLVED
Closed: 2 months ago
Resolution: --- → FIXED
Target Milestone: --- → 130 Branch

Perfherder has detected a devtools performance change from push 9d5f8b003b5f4927750e5b57a4db9701cb52ad1e.

Regressions:

Ratio Test Platform Options Absolute values (old vs new)
15% damp browser-toolbox.debugger-ready.DAMP linux1804-64-shippable-qr e10s fission stylo webrender 867.67 -> 1,000.75
4% damp custom.jsdebugger.open-large-minified-file.full-selection.DAMP macosx1015-64-shippable-qr e10s fission stylo webrender 3,924.00 -> 4,065.02

Improvements:

Ratio Test Platform Options Absolute values (old vs new)
29% damp custom.pretty-print.jsdebugger.reload.DAMP macosx1015-64-shippable-qr e10s fission stylo webrender 9,838.06 -> 6,952.87
29% damp custom.jsdebugger.pretty-print.reload-and-pause.DAMP macosx1015-64-shippable-qr e10s fission stylo webrender 9,849.11 -> 7,010.48
27% damp custom.pretty-print.jsdebugger.reload.DAMP macosx1015-64-shippable-qr e10s fission stylo webrender 10,213.58 -> 7,410.05
27% damp custom.jsdebugger.pretty-print.reload-and-pause.DAMP macosx1015-64-shippable-qr e10s fission stylo webrender 10,218.61 -> 7,466.92
9% damp custom.jsdebugger.open-large-minified-file.full-selection.DAMP windows11-64-shippable-qr e10s fission stylo webrender 2,323.90 -> 2,121.25
... ... ... ... ...
3% damp custom.jsdebugger.stepIn.DAMP linux1804-64-shippable-qr e10s fission stylo webrender 3,510.08 -> 3,413.65

As author of one of the patches included in that push, we need your help to address this regression.
Details of the alert can be found in the alert summary, including links to graphs and comparisons for each of the affected tests. Please follow our guide to handling regression bugs and let us know your plans within 3 business days, or the patch(es) may be backed out in accordance with our regression policy.

If you need the profiling jobs you can trigger them yourself from treeherder job view or ask a sheriff to do that for you.

You can run these tests on try with ./mach try perf --alert 1497

For more information on performance sheriffing please see our FAQ.

You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: