Closed Bug 1546513 Opened 5 years ago Closed 5 years ago

Closing the network panel with many entries takes multiple seconds

Categories

(DevTools :: Netmonitor, defect, P2)

defect

Tracking

(Performance Impact:low, firefox68 fixed)

RESOLVED FIXED
Firefox 68
Performance Impact low
Tracking Status
firefox68 --- fixed

People

(Reporter: Harald, Assigned: Honza)

References

(Blocks 1 open bug)

Details

(Keywords: perf:responsiveness)

Attachments

(1 file)

Have had this repeatedly, where closing the Network panel freezes Firefox for a while – even causing slow script.

What were you doing?

  1. Open Network panel on cnn.com, enable Persist Logs and refresh a few times
  2. Close the Network panel

What happened?

Closing takes 3+ seconds of beach balling.

What should have happened?

Instantly close.

Anything else we should know?

https://perfht.ml/2Uzqj04

componentWillUnmount > destroy > setImageElement takes the time in JS, while DOM code it calls into (Document.mozSetImageElement) is hung up on nsContentUtils::RemoveScriptBlocker.

This profile has a multitude more items but also just 3.5+ sec hang; so it doesn't seem linear with the number of resources: https://perfht.ml/2Uwd6W4

P2 as this could potentially affect many users.

Marking for [qf] as the impact of nsContentUtils::RemoveScriptBlocker for removing a background SVG in the main thread seems suspicious.

Whiteboard: [qf]

Honza, do you remember why the background image has to be cleared?

Flags: needinfo?(odvarko)
Whiteboard: [qf] → [qf:p3:responsiveness]

(In reply to :Harald Kirschner :digitarald from comment #2)

Honza, do you remember why the background image has to be cleared?

I think you are referring to this code:
https://searchfox.org/mozilla-central/rev/7944190ad1668a94223b950a19f1fffe8662d6b8/devtools/client/netmonitor/src/widgets/WaterfallBackground.js#147

Not done by me, so not sure if/why it's needed.

Honza

Flags: needinfo?(odvarko)

Alex, teardown of panels was a point during the central meeting. Is this image reset potentially an anti-pattern as the element is destroyed anyways?

Flags: needinfo?(poirot.alex)

It was originaly introduced when converting netmonitor to react in bug 1309866.
I imagine that is just the common practice of trying to cleanup things without any proof that is any useful.

Actually, that can be useful if:

  • mozSetImageElement leaks internaly if you do not nullify it before the element is removed from the DOM and GCed
  • the element on which we set this image somehow leaks when it is GCed

Now, I may miss something, but that would be worth removing and see while watching the memory.
Also, it may hide some other issue. We do update the waterfall frequently, so I would expect the setImageElement to be slow not only on destroy but also on updates? Isn't that the case?

Flags: needinfo?(poirot.alex)

mozSetImageElement leaks internaly if you do not nullify it before the element is removed from the DOM and GCed

I only see tests setting mozSetImageElement to null. document's destroy sequence also clears the references created by using mozSetImageElement, so it seems a sane experiment to just remove it and see if memory leaks.

We do update the waterfall frequently, so I would expect the setImageElement to be slow not only on destroy but also on updates? Isn't that the case?

No, mozSetImageElement is set once with a reference to a canvas, which is then redrawn. This seems fast or at least it hasn't shown up in profiles as worst offender.

Notes from my analysis:

tldr: mozSetImageElement might not be a good way to draw the waterfall lines, due to the expensive teardown (or it can be optimized).

Rapid closing and opening With mozSetImageElement reset: https://perfht.ml/2ZRalTe and without: https://perfht.ml/2UYMAFb

Patterns are pretty similar and I can not see any leakage – because the second set of profiles show that the expensive nsINode::RemoveMutationObserver is executed no matter what as it is part of the document teardown: Closing a long list with reset: https://perfht.ml/2ZU62GO and without: https://perfht.ml/2ZR8AFw (both 2s+)

What would be using mutation observers – DevTools does not depend on them. mozSetImageElement seems to add mutation observers to the element and/or the canvas, as closing without use of that method is instant: https://perfht.ml/2ZTdJg3

–––

Performance panel uses the same -moz-element technique but does not re-use the same Canvas instance, creating a new instance for every background update: https://searchfox.org/mozilla-central/source/devtools/client/performance/modules/waterfall-ticks.js#59

This spreads a massive overhead of RemoveMutationObserver across every update when used in the Network panel: https://perfht.ml/2ZOr7lK . Strangely enough, RemoveMutationObserver does not show up when profiling the Performance panel: https://perfht.ml/2ZQJRRw .

The different is that the -moz-element CSS is applied to every resource row with the requests-list-waterfall class; so every element gets an observer. In the Performance panel on the other side, the styling is applied to an infinite scrolling table, and only ever uses and recycles a few dozen rows.

(Merged into prior comment)

I tried removing the setImageElement call from destroy and pushed on Talos

--- a/devtools/client/netmonitor/src/widgets/WaterfallBackground.js
+++ b/devtools/client/netmonitor/src/widgets/WaterfallBackground.js
@@ -139,13 +139,13 @@ class WaterfallBackground {
   getThemeColorAsRgba(colorName, theme) {
     const colorStr = getColor(colorName, theme);
     const color = new colorUtils.CssColor(colorStr);
     const { r, g, b } = color.getRGBATuple();
     return [r, g, b, REQUESTS_WATERFALL.TICKS_COLOR_OPACITY];
   }
 
   destroy() {
-    this.setImageElement("waterfall-background", null);
+    // this.setImageElement("waterfall-background", null);
   }
 }

Interesting results:
https://treeherder.mozilla.org/perf.html#/comparesubtest?originalProject=mozilla-central&newProject=try&newRevision=40cb028ed1582236cb52ab8b18e0c1a4876a8104&originalSignature=1760367&newSignature=1759151&framework=12&selectedTimeRange=172800

  • complicated.netmonitor.close.DAMP +3.28%
  • simple.styleeditor.close.DAMP - 12.33%

I also tried manual testing, but I am still experiencing freeze at closing.

Honza

Avoiding setImageElement on destroy doesn't help much for me, but the following does:

--- a/devtools/client/netmonitor/src/components/RequestListContent.js
+++ b/devtools/client/netmonitor/src/components/RequestListContent.js
@@ -124,16 +124,19 @@ class RequestListContent extends Compone
   }

   componentWillUnmount() {
     this.refs.scrollEl.removeEventListener("scroll", this.onScroll, true);

     // Uninstall the tooltip event handler
     this.tooltip.stopTogglingOnHover();
     window.removeEventListener("resize", this.onResize);
+
+    const { rowGroupEl } = this.refs;
+    rowGroupEl.innerHTML = "";
   }

   /*
    * Removing onResize() method causes perf regression - too many repaints of the panel.
    * So it is needed in ComponentDidMount and ComponentDidUpdate. See Bug 1532914.
    */
   onResize() {
     const parent = this.refs.scrollEl.parentNode;

When profiling the closing, I noticed that a lot of time is taken by layout - removing all the child nodes representing HTTP requests in the list. Removing the entire list upfront using rowGroupEl.innerHTML = ""; makes the close instant.

You can test with this test case: http://janodvarko.cz/tests/bugzilla/1549747/
I have generated 1800 requests and closing the Toolbox happens immediately.

However this technique isn't safe since none of the RequestListItem components below will know they have been unmounted. So, their's componentWillUnmount (if implemented) wouldn't be called. I didn't test this, so needs to be verified. Note that RequestListItem doesn't need the method at the moment, but still I am not sure whether we want to have this kind of workaround in the code base.

Alex, WDYT?

Perhaps there is another way how to clean up the list fast?

Another thing is that DAMP test (especially complicated.netmonitor.close) doesn't big improvement:
https://treeherder.mozilla.org/perf.html#/comparesubtest?originalProject=try&newProject=try&newRevision=0ad51612b5def855765fbc74d2df665a619b8994&originalSignature=1759151&newSignature=1759151&framework=12&originalRevision=0254166f23ad42f6812e761088c5b87d793fd196

Perhaps, the test isn't doing the right thing?

Honza

Flags: needinfo?(poirot.alex)

First, about DAMP not catching up the improvement. May be that's because we don't have any "custom" document to test netmonitor performance?
We only test netmonitor against complicated (bild.de) and simple (empty document). The complicated document has only 280 requests. May be that's not enough to trigger the issue?
Or there is a timing issue, we are not waiting for something correctly and we should fix DAMP in order to see the improvement that we see locally.

Then, regarding the fix using innerHTML = "", it sounds reasonable to me. This is already something we use in the console.
But this doesn't really highlight an issue with React unmount callbacks here.
Your patch doesn't replace unmount with innerHTML reset. Instead, we still unmount the React elements, but, in parallel of that, you do reset the innerHTML from RequestListContent element.
So, I imagine we somehow limit the image-element overhead by destroying the DOM Elements slightly earlier than before.

To better understand why this code actually speed up things, it would be interesting to know what React is really doing on unmount. It does remove the DOM from the document, but how? Does it only remove the top level element? Or does it remove every single DOM Element? Or only each top level DOM Element of each React Component? If it isn't the first option, if it doesn't only remove the very top level element, it can easily explain any performance issue during unmount!

Flags: needinfo?(poirot.alex)
Summary: Closing the network panel with many entries takes multiple seconds because of clearing canvas → Closing the network panel with many entries takes multiple seconds

Thanks for the response Alex!

  1. Patch attached and ready for review
  2. I filed a bug covering the DAMP test: bug 1551432

Honza

Assignee: nobody → odvarko
Status: NEW → ASSIGNED

When I removed the unmount logic which resets mozImageElement, I still saw the slow-down as the window was teared down by the browser. Would removing the elements really hit a code path that doesn't have the overhead? It seems that there a risk that the mutation observers are somehow retained – did you check that memory isn't ballooning?

Flags: needinfo?(odvarko)

(In reply to :Harald Kirschner :digitarald from comment #14)

When I removed the unmount logic which resets mozImageElement, I still saw the slow-down as the window was teared down by the browser.

I am unsure what exactly you removed, but removing setImageElement doesn't help me at all (see also comment #9)

Would removing the elements really hit a code path that doesn't have the overhead? It seems that there a risk that the mutation observers are somehow retained – did you check that memory isn't ballooning?

Cleaning up the entire list using innerHTML = "" has significant impact on the performance (panel with ~2000 requests closes immediately for me). The innerHTML = "" removes only the list of RequestListItem elements generated here:
https://searchfox.org/mozilla-central/rev/116bd975c30746ddefc3d20e6947d1871469354f/devtools/client/netmonitor/src/components/RequestListContent.js#330-348

RequestListItem doesn't use any mutation observers (nor event listeners), so there shouldn't be memory leaks. Also try is giving clean results: https://treeherder.mozilla.org/#/jobs?repo=try&revision=da2cc763f6b768883fce588205d496a4f0369da2

Honza

Flags: needinfo?(odvarko)
Pushed by jodvarko@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/b8c66ae8e149
Closing the network panel with many entries takes multiple seconds r=ochameau
Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 68

I am unsure what exactly you removed, but removing setImageElement doesn't help me at all (see also comment #9)

Yes, that is what I was trying to say as well :)

RequestListItem doesn't use any mutation observers (nor event listeners), so there shouldn't be memory leaks.

mozSetImageElement uses mutation observers, which is the main function in the hang.

Also try is giving clean results

I didn't think that we really trusted our test suite to be good at finding memory leaks.

Note: bug 1550531 may have been the underlying platform-bug here. (I wouldn't be surprised if its fix [landed a few hours ago] would've independently fixed this, in the absence of the patch that landed here.)

Thanks Daniel!

I tested this and it's true, bug 1550531 fixed that.
I filled bug 1555909 for reverting the patch introduced in this bug.

Honza

Performance Impact: --- → P3
Whiteboard: [qf:p3:responsiveness]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: