Closed Bug 1026295 Opened 10 years ago Closed 10 months ago

browser permanently hangs when logging tons of stuff

Categories

(DevTools :: Console, defect, P2)

x86
macOS
defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: fitzgen, Unassigned)

References

(Blocks 3 open bugs)

Details

(Keywords: perf, Whiteboard: [console-papercuts][polish-backlog])

Attachments

(3 files)

STR: 1. Load http://fitzgen.github.io/ocean-noise/ 2. Open a scratchpad 3. Paste attachment into the scratchpad 4. Run the scratchpad 5. Click the big "play" button ER: Stuff logs in the console. Failing that, when there are too many messages, some are displayed and the browser is still responsive. Seems like at minimum we could just put some loop inside a DevToolsUtils.yieldingEach[0] or queue up messages to be rendered so that the browser is still responsive. [0]: http://dxr.mozilla.org/mozilla-central/source/toolkit/devtools/DevToolsUtils.js#171 AR: Permanent beach ball. Need to force-quit firefox.
Attached file scratchpad
Note that if you modify the scrachpad so that `strace.logging = false;` before the `strace(window);` call, there are no issues, so it is definitely a problem with handling that many log messages.
Thanks for the bug report. This might be caused by recent changes/perf regressions. Needs investigation.
Keywords: perf
Priority: -- → P2
profile data from running this in the browser toolbox' profiler.
Screenshot of the profiler showing 35% of the time being spent in utils.js cloneObject method. An additional 11% is being spent in the webconsole.js pruneOutput function.
s/pruneOutput/flushMessageQueue/
Whiteboard: [console-papercuts]
Whiteboard: [console-papercuts] → [console-papercuts][devedition-40]
Whiteboard: [console-papercuts][devedition-40] → [console-papercuts][polish-backlog]
Update: e10s helps a lot here. Instead of the browser beachballing in an unrecoverable way, instead you get the notificationbox popup and can kill the web content process and get the browser back.
Product: Firefox → DevTools
Using the initial STR, 2 segments from the profile: Initial longer hangs for large amounts of warnings: with https://perfht.ml/2Bg6HGM 1. Expensive reflow caused by `scrollToBottom` 2. `Frame::render` is about 16% as most expensive component Could React fiber methods help split up this work further and delay it? Batches of logs: https://perfht.ml/2Bgp4v8 1. Suffers from expensive reflows (150ms+). ni? Daniel if this links to a known bug.
Flags: needinfo?(dholbert)
I can't view the profiles in comment 8 -- I get: ==== Could not download the profile. Error: Unserializing the profile failed: Error: Unable to parse a processed profile of version 19 - are you running an outdated version of perf.html? The most recent version understood by this version of perf.html is version 18. ==== Reloading (including w/ shift) doesn't help. Harald, do you know if there's anything special I need to do for those profiles to load?
Sorry, using the staging environment. After resolving the shortened link, switching out https://perf-html.io with https://master.perf-html.io should allow viewing the profiles.
(In reply to :Harald Kirschner :digitarald from comment #8) > Using the initial STR, 2 segments from the profile: > > Initial longer hangs for large amounts of warnings: with https://perfht.ml/2Bg6HGM > 1. Expensive reflow caused by `scrollToBottom` Here's a working direct link (fixed per comment 10), w/ "Call Tree" tab, filtered for "scrollToBottom": https://master.perf-html.io/public/72303b387e1239196dae812877411828b9da0c1b/calltree/?globalTrackOrder=11-0-1-2-3-4-5-6-7-8-9-10&hiddenGlobalTracks=1-2-3-4-5-6-7-8-9-10&hiddenLocalTracksByPid=32522-1-0&localTrackOrderByPid=32522-1-0~32530-0~32526-0~32525-0~32671-0~32524-0~32527-0~&range=141.6189_163.4843&search=scrollToBottom&thread=0&v=3 Looks like we're spending ~2x as much time restyling as we are reflowing here. I'm guessing we're just creating insane amounts of web content? (and when we finally get around to styling/reflowing it, it takes a while) > Batches of logs: https://perfht.ml/2Bgp4v8 > 1. Suffers from expensive reflows (150ms+). Working direct link for this one (in Call Tree tab): https://master.perf-html.io/public/72303b387e1239196dae812877411828b9da0c1b/calltree/?globalTrackOrder=11-0-1-2-3-4-5-6-7-8-9-10&hiddenGlobalTracks=1-2-3-4-5-6-7-8-9-10&hiddenLocalTracksByPid=32522-1-0&localTrackOrderByPid=32522-1-0~32530-0~32526-0~32525-0~32671-0~32524-0~32527-0~&range=141.6189_177.4539~160.0979_177.4539&thread=0&v=3 Indeed, there seem to be repeated 150ms+ reflows for that window, which are the dominant slow thing for most of that period of time. I wonder what's triggering those reflows -- I do see hits for "appendChild" / "contentAppended" between many of the reflows (and you can see the scrollable area changing in the profile's screenshots). So I suspect that between those reflows, we're just adding content to the backscroll area, which is invalidating the cached intrinsic sizes of the backscroll, and causing us to perform a more thorough layout of the flex items than we otherwise would have to do... > ni? Daniel if this links to a known bug. I don't know, offhand, but I'll tie this to Bug 1304473 (flexbox-perf-issues) for further investigation when time allows.
Flags: needinfo?(dholbert)

(retriaging "flexbox-perf-issues" dependencies)

Here's a profile I captured of this today (following STR from comment 0, except using the Web Console instead of the Scratchpad since the scratchpad isn't a thing anymore):
https://share.firefox.dev/3fmCmZw

(Note: I had to close the tab in order to be able to capture a profile and to avoid letting the runaway logging completely lock up the browser.)

Observations:

  • Focusing on 0-30s (which I thinks is the time up until I closed the tab), there are repeated "batches of work", with each batch progressively more time-consuming than the previous one. (As I recall, this may be due to some webconsole behavior where we clamp the amount of log output, and we repeatedly double the clamp-limit over time, or something like that?)

  • The reflow/restyle time in the longer batches is substantial, but the vast majority of the time is JS (E.g. in the last/longest batch from 12s-27s, we do 1.2s of reflow and 500ms of restyle at the very end, after ~13 seconds of JS/DOM operations)

  • I would guess that this growing reflow/restyle time is consistent with the growing amount of content that we're logging per-"batch".

Conclusion: it doesn't look like the reflow performance here is a significant part of the jank to narrow in on -- and it seems consistent with what I might expect from churning out progressively-larger batches of log lines to be laid out all at once.

To really address this bug, I think we would need to focus in on the more expensive things that are happening in the profile; and to improve the relatively-small-part that is the layout performance, I think we would probably want a frontend change in the web console (stricter clamping on the number of lines that can be logged at once, for example; and/or stricter clamping on the upper limit of lines that we'll allow to be shown; or some sort of virtual impl that only attempts to lay out some upper-limit number of lines.)

So: I'm dropping the "flexbox-perf-issues" dependency since it's not clear that there's a flexbox perf problem to be addressed here, at this point. (There may be one for other webconsole bugs, but I'm not seeing one here; as noted above, the reflow times seem consistent with giant amounts of content being added at once and needing to be laid out.)

No longer blocks: flexbox-perf-issues
Severity: normal → S3

(In reply to Daniel Holbert [:dholbert] from comment #12)

To really address this bug, I think we would need to focus in on the more expensive things that are happening in the profile; and to improve the relatively-small-part that is the layout performance, I think we would probably want a frontend change in the web console (stricter clamping on the number of lines that can be logged at once, for example; and/or stricter clamping on the upper limit of lines that we'll allow to be shown;

or some sort of virtual impl that only attempts to lay out some upper-limit number of lines.)

Note that this has been implemented in bug 1753177 (virtual viewport).

A new profiler record with today's mozilla-central:
https://share.firefox.dev/41Pjfkj
There is surely something to do around massive usage of Array.includes on large arrays.
We spend almost 3second checking for inclusion in this 21s profile.
Otherwise resource throttling would probably help reduce the cost of JSActor messages as well as the overall overhead of RDP.
And in the content process, the generation of object grip might be able to be optimized (_populateGripPreview).

(In reply to Alexandre Poirot [:ochameau] from comment #13)

There is surely something to do around massive usage of Array.includes on large arrays.
We spend almost 3second checking for inclusion in this 21s profile.

More detail on this point -- in at least one jank section, here:
https://share.firefox.dev/47mrPYT
...we have 357 samples in Array.prototype.includes, which seems to be split between two portions:
(1) Calls inside of removeMessagesFromState/state.messagesUiById calling isInRemovedId (160 samples), here:
https://searchfox.org/mozilla-central/rev/762f24e00a9548d80ebba1b985c871ba6d9b829d/devtools/client/webconsole/reducers/messages.js#1022
(2) Calls inside of UNSAFE_componentWillUpdate (197 samples), here:
https://searchfox.org/mozilla-central/rev/762f24e00a9548d80ebba1b985c871ba6d9b829d/devtools/client/webconsole/components/Output/ConsoleOutput.js#190-194

In both cases it looks like we've probably got a massive array of messages, and we're spending time traversing it to satisfy this includes() check. (not sure if this is one extremely-slow traversal vs. many traversals which collectively eat up lots of time).

Maybe we should resolve this as FIXED by bug 1753177, and spin off a new bug for the remaining less-substantial jank?

(The title of this bug, "browser permanently hangs", is much worse than the ~1s periodic janks shown in the latest profile here.)

Flags: needinfo?(poirot.alex)

Overall fixed by bug 1753177 (virtual viewport for the console).

(In reply to Daniel Holbert [:dholbert] from comment #15)

Maybe we should resolve this as FIXED by bug 1753177, and spin off a new bug for the remaining less-substantial jank?

(The title of this bug, "browser permanently hangs", is much worse than the ~1s periodic janks shown in the latest profile here.)

We are at 1s freeze every 2-3s. So this is still serious and useful STR to keep investigating.
But I agree, for sake of clarity, I opened bug 1873065 on the various things highlighted in the profiler and bug 1873066 to especially focus on Array.includes.

Status: NEW → RESOLVED
Closed: 10 months ago
Flags: needinfo?(poirot.alex)
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: