1907794 - Very long cycle collection times (4.3s) in parser-worker.js

Reporter

Description

•

1 year ago

I saw this while debugging https://fairfaxmedia.pressreader.com/the-dominion-post for bug 1901497

An about memory report gives:

├──2,522.20 MB (44.08%) -- workers/workers(chrome)
│  ├──2,257.18 MB (39.45%) -- worker(resource://devtools/client/debugger/dist/parser-worker.js, 0x503753200)
│  │  ├──2,220.59 MB (38.81%) -- zone(0x3051f7200)
│  │  │  ├──2,110.67 MB (36.89%) -- realm(web-worker)
│  │  │  │  ├──2,110.10 MB (36.88%) -- classes
│  │  │  │  │  ├──1,909.70 MB (33.37%) ++ class(Object)/objects
│  │  │  │  │  ├────177.39 MB (03.10%) ++ class(Array)/objects
│  │  │  │  │  └─────23.00 MB (00.40%) ++ (4 tiny)
│  │  │  │  └──────0.58 MB (00.01%) ++ (3 tiny)
│  │  │  └────109.92 MB (01.92%) ++ (11 tiny)
│  │  └─────36.59 MB (00.64%) ++ (3 tiny)
│  ├────167.91 MB (02.93%) ++ worker(resource://devtools/client/shared/source-map-loader/worker.js, 0x5f6262400)
│  └─────97.12 MB (01.70%) ++ (9 tiny)

Jeff Muizelaar [:jrmuizel]

Reporter

Updated

•

1 year ago

Depends on: 1907989

Alexandre Poirot [:ochameau]

Updated

•

1 year ago

Depends on: 1908327

Alexandre Poirot [:ochameau]

Comment 1

•

1 year ago

We bumped into this, without being able to fully understand why the GC was so slow (see bug 1843454 comment 6).
Nor was I able to workaround this by killing this worker and respawn it (for now, we keep it alive for the whole debugger lifecycle). (bug 1854421)
I communicated offline with andrew about this, but killing the worker cause the same very slow GC pause.
I highlighted this via this test page: http://techno-barje.fr/fission/worker-terminate/

There is also some mention of this worker issue at the end of that gdoc:
https://docs.google.com/document/d/1NRL-8c7o-cQdPwWLon5fu0PzShsFClgeveKxZypLkrM/edit

Today, the plan is to finish the migration of Debugger to CodeMirror 5 to 6 (bug 1904488) [this is an active project coming to its end].
Which would then allow us to remove this parser-worker thing (Babel parsing javascript sources in a worker) to CodeMirror 6 new parser (lezer) (bug 1908327), which is way way faster as it supports parsing javascript by chunks (i.e. only what is displayed in the viewport).

Andrew McCreight [:mccr8]

Comment 2

•

1 year ago

Well, that's an enormous amount of stuff. I feel like you might be getting close to the upper limit on the JS heap size. So, the teardown CC is likely going to be slow no matter what.

As for the normal CCs, if the CC is traversing those objects, that means they are reachable from C++ but not from JS roots (like variables in JS code). As I said in the other bug, the first step would be to capture a CC log for this worker. You'd want to do it in a state where the heap is not so large, because analyzing a log this big will be near impossible. Once we have a CC log, I can take a look and figure out what C++ is holding onto things. On the main thread, we have a lot of optimizations to remove "obviously alive" things from the CC graph. For instance, if JS is held alive by a DOM node that is in a document that is currently being displayed, then the JS must be alive and the CC does not need to consider it. It is possible that an existing thing here should be removing the JS from the CC graph, but it isn't happening because we don't run the CC cleanup stuff we do on the main thread. This is the first instance I can recall of a CC being so slow on workers. We've done very little performance work for CCs on workers because it hasn't been an issue.

Jeff Muizelaar [:jrmuizel]

Reporter

Comment 3

•

1 year ago

Attached file cc-parser-worker.zip — Details

Here are gc-edges and cc-edges for parser-worker.js on a page that looks like:

<script src="https://ajax.googleapis.com/ajax/libs/jquery/3.3.1/jquery.min.js"></script>

Flags: needinfo?(continuation)

Andrew McCreight [:mccr8]

Comment 4

•

1 year ago

Running census.py on the CC log, there are over 2000 function objects for functions named "get". That looked like the most common non-generic thing, so I picked a few of those and looked at why they are alive.

0x13893cbc0 [CallbackObject]
    --[mCallback]--> 0x3dcd64249af8 [JS Object (Function - requireWorkerUtils/workerHandler/<)]
    --[**UNKNOWN SLOT 1**]--> 0x3dcd642767b0 [JS Object (Call)]
    --[enclosing_environment]--> 0x3dcd6426b270 [JS Object (Call)]
    --[enclosing_environment]--> 0x31bfed574388 [JS Object (LexicalEnvironment)]
    --[enclosing_environment]--> 0x31bfed5745e8 [JS Object (Call)]
    --[requireUppercase]--> 0x31bfed582510 [JS Object (Function - requireUppercase)]
    --[JSFunction script]--> 0x31bfed516e20 [JS Script]
    --[script-gcthing]--> 0x37b8a09b3468 [JS Object (Function - requireUppercase/<)]
    --[JSFunction script]--> 0x31bfed516e70 [JS Script]
    --[script-gcthing]--> 0x37b8a09b6078 [JS Object (Function - get)]

The half dozen or so I looked at were all like this: a DOM CallbackObject was holding alive a requireWorkerUtils/workerHandler/< function, which then entrained some other functions via enclosing_environment.

They were all doing this via a single callback object, 0x13893cbc0. In fact, in the ones I looked at, the retaining path for all of them was the same through the call object 0x31bfed5745e8. One of the call objects has a property requireSourcemapCodec_umd, so I guess that is somehow related with this so maybe this is all associated with the giant lambda assigned to Buffer.prototype.fill earlier.

I'm not sure why there's so much stuff leaking. Maybe these are being assigned repeatedly? We might be able to figure out what is holding onto the callback and skip it in the CC somehow but it also seems kind of bad that so much stuff is being held alive through these function closures.

Flags: needinfo?(continuation)

Andrew McCreight [:mccr8]

Comment 5

•

1 year ago

I guess maybe we're entraining all of this stuff here on this line: self.onmessage = workerUtilsExports.workerHandler({ getMatches, findSourceMatches });? I think that's sticking a reference to one of the objects in this huge lambda into some kind of worker event handler? Olli, do you have any idea how we could reduce the CC size here? We don't run any of the "event handler skippable stuff" on workers do we?

Flags: needinfo?(smaug)

Andrew McCreight [:mccr8]

Comment 6

•

1 year ago

You could also probably hack around the slow CC times by putting a reference to this big workerUtilsExports object on the worker global. Just be sure to clear it when the event listener is cleared or it'll be a huge leak!

Jeff Muizelaar [:jrmuizel]

Reporter

Comment 7

•

1 year ago

I grabbed another much bigger CC log for https://fairfaxmedia.pressreader.com/the-dominion-post. Running census on it gives:

Object frequency. Showing no more than 5 classes of objects, with at least 5 objects each.
10241616 JS Object (other)
  155015 JS Object (Function)
  147247 JS GetterSetter
    3679 JS Scope
    3596 JS Script

JS function object frequency. Showing no more than 5 JS function objects, with at least 5 objects each.
  145356 get expression
    2232 get
     825 validate
     294 constructor
     226 message

Objects with highest ref counts. Showing no more than 5 objects, with ref count of at least 0
  rc=1 0x30aa5eb60 ScriptFetchOptions
  rc=1 0x30a4b8cd0 LoadedScript
  rc=1 0x13893cbc0 CallbackObject

Because the number of get functions is 2000 in both I think it might not be the problem.

Jeff Muizelaar [:jrmuizel]

Reporter

Comment 8

•

1 year ago

The high number of get expression looks like it's coming from things like:

          identifiers.push({
            name: path.node.value,
            get expression() {
              delete this.expression;
              this.expression = getObjectExpressionValue(path.parent);
              return this.expression;
            },
            location: path.node.loc,
          });

Where we build up big arrays of get expressions

Jeff Muizelaar [:jrmuizel]

Reporter

Comment 9

•

1 year ago

But yeah, the get expression stuff is making our object graph big and we should try to reduce its size but it's not what's causing us to cycle collect.

Your suggestion to keep workerUtilsExports alive seems to fix the problem for me:

diff --git a/devtools/client/debugger/dist/parser-worker.js b/devtools/client/debugger/dist/parser-worker.js
index f120a4753b0ce..8c5eb41efcdbf 100644
--- a/devtools/client/debugger/dist/parser-worker.js
+++ b/devtools/client/debugger/dist/parser-worker.js
@@ -41415,20 +41415,21 @@ Please specify the "importAttributesKeyword" generator option, whose value can b
     }

     var workerUtilsExports = requireWorkerUtils();

     function clearAllHelpersForSources(sourceIds) {
       clearASTs(sourceIds);
       clearScopes(sourceIds);
       clearSources(sourceIds);
       clearSymbols(sourceIds);
     }
+    globalThis.wue = workerUtilsExports;

     self.onmessage = workerUtilsExports.workerHandler({
       findOutOfScopeLocations,
       findBestMatchExpression,
       getSymbols,
       getFunctionSymbols,
       getClassSymbols,
       getClosestFunctionName,
       getScopes,
       clearSources: clearAllHelpersForSources,

Jeff Muizelaar [:jrmuizel]

Reporter

Comment 10

•

1 year ago

That patch makes debugging large script files much more snappy for me.

Nicolas Chevobbe [:nchevobbe]

Comment 11

•

1 year ago

(In reply to Jeff Muizelaar [:jrmuizel] from comment #9)

Your suggestion to keep workerUtilsExports alive seems to fix the problem for me:

diff --git a/devtools/client/debugger/dist/parser-worker.js b/devtools/client/debugger/dist/parser-worker.js
index f120a4753b0ce..8c5eb41efcdbf 100644
--- a/devtools/client/debugger/dist/parser-worker.js
+++ b/devtools/client/debugger/dist/parser-worker.js
     var workerUtilsExports = requireWorkerUtils();
+    globalThis.wue = workerUtilsExports;
     self.onmessage = workerUtilsExports.workerHandler({

dist/parser-worker.js is generated with rollup (via devtools/client/debugger/bin/bundle.js ).
workerHandler comes from https://searchfox.org/mozilla-central/rev/b220e40ff2ee3d10ce68e07d8a8a577d5558e2a2/devtools/client/debugger/src/workers/parser/worker.js#19,28

import { workerHandler } from "../../../../shared/worker-utils";
...
self.onmessage = workerHandler({

Rollup seems to create this intermediary workerUtilsExports var.
I'm not sure this is the right solution here, but I was trying to get rid of this variable, and turning the worker-utils file in a proper ES module does that.
Running our perf test locally, it doesn't seem to make a big difference, I'm waiting for some TRY results to see if that's really the case.

Jeff, would you have detailed steps I can go through to see the long CC? You mentioned https://fairfaxmedia.pressreader.com/the-dominion-post , which script should I open/add breakpoints into? Also, how do you see these long CC, in the profiler or by another mean?

Flags: needinfo?(jmuizelaar)

Jeff Muizelaar [:jrmuizel]

Reporter

Comment 12

•

1 year ago

Yeah, I wouldn't expect your change to help because it's not keeping the world alive via GC

Doing the following also seems to help and should work in worker.js

globalThis.wh = workerHandler

My rough STR:

Load https://fairfaxmedia.pressreader.com/the-dominion-post with a breakpoint that gets hit in https://r.prcdn.co/scripts/se2sky/3.35.10/viewer.nolibs.build.min.js
Step around a bit
Start the profiler
Trigger some CC using about:memory

Flags: needinfo?(jmuizelaar)

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 13

•

1 year ago

(In reply to Andrew McCreight [:mccr8] from comment #5)

I guess maybe we're entraining all of this stuff here on this line: self.onmessage = workerUtilsExports.workerHandler({ getMatches, findSourceMatches });? I think that's sticking a reference to one of the objects in this huge lambda into some kind of worker event handler? Olli, do you have any idea how we could reduce the CC size here? We don't run any of the "event handler skippable stuff" on workers do we?

We could trace black JS? On workers there are way fewer EventTarget objects, or any native objects, so we could perhaps try to go through the certainly alive DETHs and call ELM::TraceListeners.

Flags: needinfo?(smaug)

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 14

•

1 year ago

Attached file Bug 1907794, trace certainly black event listeners, r=mccr8 — Details

Jeff Muizelaar [:jrmuizel]

Reporter

Comment 15

•

1 year ago

Smaug's patch seems to fix the problem for me.

Phabricator Automation

Updated

•

1 year ago

Assignee: nobody → smaug

Status: NEW → ASSIGNED

Phabricator Automation

Updated

•

1 year ago

Attachment #9413850 - Attachment description: WIP: Bug 1907794, trace certainly black event listeners, r=mccr8 → Bug 1907794, trace certainly black event listeners, r=mccr8

Andrew McCreight [:mccr8]

Comment 16

•

1 year ago

(In reply to Jeff Muizelaar [:jrmuizel] from comment #8)

The high number of get expression looks like it's coming from things like:

Thanks for digging into this more! I really need some kind of random option for find_roots.py that will pick a certain number of random objects matching a given pattern as that is often useful for these kinds of leaks.

Alexandre Poirot [:ochameau]

Comment 17

•

1 year ago

Thanks Jeff, Andrew and Olli for the help figuring out why the CC was slow!!

We just couldn't understand why CC was slow and how to fix it.
But now we (the devtools team) are curious to understand and be able to figure out such issue in the future.

Is there some documentation to understand the difference between CC and GC?
Something that would help understand why, when an object isn't in the GC, we have to go through the CC and then things would be slow. (my comprehension of comment 2)

Also, is there some documentation about how to capture and browse these CC/GC logs? (I see no mention of censys.py in Firefox source docs)
And especially how to identify that some node is GC-rooted or CC-rooted? Is that something we can understand by reading these logs?
Also, is the memory panel in DevTools of any help? I imagine it may not, as it only shows JS objects and may not help identify that problematic CC-rooted object?

Otherwise I would like to make it clear that we knew that this code allocates tons and tons of JS objects.
The debugger, because it currently relies on Babel, does parse the whole JavaScript files opened in the debugger.
You have seen that expressions is very large, because it will contain all expressions of the debugged JS files...
There is nothing wrong about that. This is a logical issue related to being forced to do full-file parsing.
We know about that, and we have in scope to migrate from babel to lezer (bug 1908327) which allows to do partial parsing. We will stop having to allocate so much data, and stop keeping most of the parsing data in memory!

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 18

•

1 year ago

•

Edited

https://firefox-source-docs.mozilla.org/performance/memory/gc_and_cc_logs.html

One very simple presentation about CC/GC https://docs.google.com/presentation/d/1RrMihl2HLNGyjDvQei5mLpMGVXgdzAcgD5sVx_CsdxI/edit#slide=id.p

Julian Descottes [:jdescottes]

Updated

•

1 year ago

Severity: -- → S3

Priority: -- → P2

Pulsebot

Comment 19

•

1 year ago

Pushed by opettay@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/da675cd57ab1 trace certainly black event listeners, r=mccr8,dom-worker-reviewers,asuth

Cosmin Sabou [:CosminS]

Comment 20

•

1 year ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/da675cd57ab1

Status: ASSIGNED → RESOLVED

Closed: 1 year ago

status-firefox130: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → 130 Branch

Nicolas Chevobbe [:nchevobbe]

Comment 21

•

1 year ago

Perfherder has detected a devtools performance change from push 9d5f8b003b5f4927750e5b57a4db9701cb52ad1e.

Regressions:

Ratio	Test	Platform	Options	Absolute values (old vs new)
15%	damp browser-toolbox.debugger-ready.DAMP	linux1804-64-shippable-qr	e10s fission stylo webrender	867.67 -> 1,000.75
4%	damp custom.jsdebugger.open-large-minified-file.full-selection.DAMP	macosx1015-64-shippable-qr	e10s fission stylo webrender	3,924.00 -> 4,065.02

Improvements:

Ratio	Test	Platform	Options	Absolute values (old vs new)
29%	damp custom.pretty-print.jsdebugger.reload.DAMP	macosx1015-64-shippable-qr	e10s fission stylo webrender	9,838.06 -> 6,952.87
29%	damp custom.jsdebugger.pretty-print.reload-and-pause.DAMP	macosx1015-64-shippable-qr	e10s fission stylo webrender	9,849.11 -> 7,010.48
27%	damp custom.pretty-print.jsdebugger.reload.DAMP	macosx1015-64-shippable-qr	e10s fission stylo webrender	10,213.58 -> 7,410.05
27%	damp custom.jsdebugger.pretty-print.reload-and-pause.DAMP	macosx1015-64-shippable-qr	e10s fission stylo webrender	10,218.61 -> 7,466.92
9%	damp custom.jsdebugger.open-large-minified-file.full-selection.DAMP	windows11-64-shippable-qr	e10s fission stylo webrender	2,323.90 -> 2,121.25
...	...	...	...	...
3%	damp custom.jsdebugger.stepIn.DAMP	linux1804-64-shippable-qr	e10s fission stylo webrender	3,510.08 -> 3,413.65

As author of one of the patches included in that push, we need your help to address this regression.
Details of the alert can be found in the alert summary, including links to graphs and comparisons for each of the affected tests. Please follow our guide to handling regression bugs and let us know your plans within 3 business days, or the patch(es) may be backed out in accordance with our regression policy.

If you need the profiling jobs you can trigger them yourself from treeherder job view or ask a sheriff to do that for you.

You can run these tests on try with ./mach try perf --alert 1497

For more information on performance sheriffing please see our FAQ.

Hubert Boma Manilla (:bomsy)

Updated

•

1 year ago

Depends on: 1942459

Hubert Boma Manilla (:bomsy)

Updated

•

1 year ago

No longer depends on: 1942459

cc-parser-worker.zip 1 year ago Jeff Muizelaar [:jrmuizel] 6.21 MB, application/zip		Details
Bug 1907794, trace certainly black event listeners, r=mccr8 1 year ago Olli Pettay [:smaug][bugs@pettay.fi] 48 bytes, text/x-phabricator-request		Details \| Review