Closed Bug 1770606 Opened 2 years ago Closed 5 months ago

Browser freezing jank, when viewing website with 95160 elements (chrome does not freeze)

Categories

(Core :: Disability Access APIs, defect)

Firefox 100
x86_64
Linux
defect

Tracking

()

RESOLVED FIXED
121 Branch
Performance Impact medium
Tracking Status
firefox119 --- wontfix
firefox120 --- wontfix
firefox121 --- fixed

People

(Reporter: Krischna, Assigned: Jamie)

References

(Blocks 1 open bug)

Details

(4 keywords)

Attachments

(2 files, 1 obsolete file)

Attached image total cpu usage over time (obsolete) —

User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:100.0) Gecko/20100101 Firefox/100.0

Steps to reproduce:

  1. Start a new blank profile.
  2. Open this link https://coreboot.org/status/board-status.html . You may have dev tools for performance capturing open.
  3. Wait and test the responsiveness of the browser by opening a new tab and/or navigating between tabs.

The website in question contains 95160 individual HTML elements and no scripts.

Actual results:

The tab first loads for around 50 seconds and then the entire browser window becomes unresponsive.

Expected results:

Like Chromium 101.0.4951.64, Firefox should have been able to view the page (chromium takes on average 5.4 seconds till it's done) and under no circumstances cause the entire window/browser to become unresponsive, thus making it hardly impossible to stop the tab from loading.

The Bugbug bot thinks this bug should belong to the 'Core::Widget: Gtk' component, and is moving the bug to that component. Please correct in case you think the bot is wrong.

Component: Untriaged → Widget: Gtk
Product: Firefox → Core

Reporter, please navigate to about:support and select "Copy text to clipboard", then attach those contents to this bug.

Flags: needinfo?(Krischna)
Attached file about:support.md
Flags: needinfo?(Krischna)

Wonderful. It looks like accessibility services are being activated within Firefox. Do you use assistive software/anything that would cause Firefox to do this? There are known performance issues with this area of Firefox (bug 1462956) for example

No not to my knowledge.
I didn't knew about accessibility services before, but sure enough when disabling them with this guide https://support.mozilla.org/en-US/kb/accessibility-services#w_how-do-i-disable-firefox-accessibility-service
Firefox displays the page without hiccups.
Do i understand correctly that accessibility services allow other programs to just interfere with the browser? That is super malicious looking.
Is there a way to see which (possibly malicious) programs are activating and using accessibility services?

Status: UNCONFIRMED → NEW
Component: Widget: Gtk → Disability Access APIs
Depends on: 1737192
Ever confirmed: true
OS: Unspecified → Linux
Hardware: Unspecified → x86_64

Small addition, there is the "Your browser is being managed by your organization." in about:preferences, and the applied policy is "DisableAppUpdate: true". Linux does that by default to update Firefox from the repo, but I don't expect this to be the reason for the accessibility services.

On Linux, Cache the World is unlikely to make this better - it's more likely to make it worse - so I'm removing bug 1737192 from dependencies.

I can maybe understand why the tab takes a long time to load, but on Linux, I'm really confused as to why the browser process would become unresponsive unless you have an accessibility client on your system which is actively querying for information. Would you be willing to provide a performance profile to help us diagnose this? See this page for instructions.

Blocks: a11yperf
Severity: -- → S2
No longer depends on: 1737192
Flags: needinfo?(Krischna)
Keywords: perf

Performance with this page in Chromium is also really bad (multiple minutes to load) on my system. However, I'm guessing Chromium accessibility might be disabled on your system; I know it's still under development.

Flags: needinfo?(Krischna)

Curious. If I download the page locally and load from there, it's still slow, but it's a lot better than when I load from the internet, where it basically never recovers. I don't understand why yet, as I can't profile it due to it never coming back.

For now, here's a profile of loading the page from a local file: https://share.firefox.dev/3864PF2

We seem to spend most of the time in nsBlockInFlowLineIterator due to our use of nsTextFrame::GetRenderedText. I'm not quite sure why that's using a line iterator yet or whether we can avoid that somehow.

Thanks. Looks like your profile is pretty similar; same issue with GetRenderedText. I'll keep digging.

Attachment #9277645 - Attachment is obsolete: true

I had a suspicion that the Mozilla VPN program may use accessibility services, but after repeated testing and installing/uninstalling there seem to be no correlation.

The performance profile i've just send (https://share.firefox.dev/387pvg1) did not lead to the whole browser lagging.
In fact, the reactions of the browser are quite unstable. I've categorized three main behaviors I encountered:

1. The page does load.

After a minute or so, the page would actually be displayed. This behavior would consistently stop when clearing data in about:preferences#privacy.

2. Browser freezing

The browser process would becomes unresponsive, probably because Firefox takes up all available CPU time.
Here's a successful capture of this scenario.
https://share.firefox.dev/3yRSZJy
Attached file: ( AW: cpu usage when capturing 3yRSZJy.png )

3. The tab is just never ending loading, browser still mostly responsive

This happens most of the time. The performance profile already given was this case.
Notably the overall cpu usage is completely sporadic. Sometimes a singe core is used extensively, sometimes all four available are used on half capacity and sometimes it's switching sporadically between both ways.

(In reply to Krischna from comment #13)

2. Browser freezing

The browser process would becomes unresponsive, probably because Firefox takes up all available CPU time.
Here's a successful capture of this scenario.
https://share.firefox.dev/3yRSZJy

Ah. Thank you. That's very useful. It looks like there is definitely something accessibility (a11y) related on your system that is listening to a11y events and responding to them with queries, even if only occasionally. Unfortunately, on Linux, we're not able to determine what client is doing that.

With our current architecture, a11y queries arrive in the main process and then get synchronously sent to the content (web page) process, meaning that the main process is "blocked" while waiting for a response. If the content process is busy (which is very much the case here), that means the browser will hang altogether until that is done. With the new architecture we're currently working on, we use a cache in the main process to avoid this. That's not going to make the content process part any faster - it may actually make it worse - but it at least means we won't hang the whole browser, so you can close the tab, switch to other tabs, etc. if you wish.

Given that the cache will partially help here, I'm reinstating the dependency on bug 1737192. However, that's not the whole solution; we still need to figure out how to mitigate the content process jank (the third scenario you describe).

Depends on: 1737192

:jfkthame, I think the issue here is that there is a very large block containing a lot of inline elements. There are block elements and hard line breaks interspersed between the inline elements. However, GetRenderedText needs to figure out whether a text node is at the start of its line, which I think means nsBlockInFlowLineIterator has to potentially crawl through all the lines in the (very large) block. Because a11y calls GetRenderedText for every text leaf, we end up with O(n^2) or worse.

I see that layout already has an optimisation for this: the line cursor. GetRenderedText even uses nsBlockFrame::AutoLineCursorSetup already. This doesn't help us, though, because AutoLineCursorSetup clears the cursor when it goes out of scope. And since we need to call this separately on each text frame, we don't benefit from that cursor as set previously.

Assuming I'm correct about any of this (I don't know this code well at all), could we create some mechanism by which a11y could tell layout it's about to do a bunch of consecutive text node retrieval? Something like AutoLineCursorSetup , but more global? AutoLineCursorSetup would see this and not reset the cursor. I also wonder whether we could use this somehow in mozilla::a11y::IsLocalAccAtLineStart, though I'm not sure if nsBlockInFlowLineIterator and nsILineIterator are related and/or can be used in the same way. This will be important for our caching project, since with the cache enabled, I see huge bottlenecks there, too.

Flags: needinfo?(jfkthame)

(In reply to James Teh [:Jamie] from comment #15)

I also wonder whether we could use this somehow in mozilla::a11y::IsLocalAccAtLineStart, though I'm not sure if nsBlockInFlowLineIterator and nsILineIterator are related and/or can be used in the same way.

I realised we can probably do this somewhat on the a11y side by keeping track of the last line number we queried for each block. It's still probably not as cheap as having a cursor, since GetLineAt still has to walk the list, but it's (maybe) still cheaper than what we're doing now.

Ah, but we could cache the nsLineIterator for each block temporarily, which would avoid a lot of walking.

I don't know this code intimately, but I notice that nsBlockFrame has SetupLineCursor / ClearLineCursor, which should allow you to create a line cursor that persists for as long as you wish -- provided you're not disturbing things, e.g. by calling/allowing Reflow to happen -- across multiple operations that query the same block. So I thought you could be using this, but...

GetRenderedText even uses nsBlockFrame::AutoLineCursorSetup

...the use of AutoLineCursorSetup in GetRenderedText would tend to actively sabotage that, because if I'm reading things correctly, if there's already a line-cursor present on the block, it will be reset to its original position when the AutoLineCursorSetup goes out of scope. So successive calls to GetRenderedText would all restart from the beginning, rather than benefiting from an updated cursor.

I wonder if you could actually avoid AutoLineCursorSetup altogether in GetRenderedText, and instead manage the setup/removal of the line cursor from somewhere further up in the stack. Which is kind of what you were saying in comment 15, I guess.

Another thing I wondered is whether the cursor used in GetRenderedText (and other "querying" methods) could/should be split from that used in BuildDisplayList. It looks to me like that would allow GetRenderedText to use a more long-lived line cursor, without the risk of creating one that isn't valid for BuildDisplayList's usage (as it has requirements about non-decreasing y-coordinates).

I'm having some difficulty getting reliably reproducible profiling results (when the testcase gets big, it seems like the parent process has a tendency to hit some other performance cliff as well), but I've pushed a patch to try at https://treeherder.mozilla.org/jobs?repo=try&revision=8f2c12f614a6a102491c427a622ba444a8f89523 that seems to help GetRenderedText in my local testing; let's see if it breaks anything.

If you also have a chance to try this and see how it behaves for you, that'd be great.

Flags: needinfo?(jfkthame)

Somewhat unsurprisingly, the above try push seems to have broken some tests.... I'll see if it looks fixable, or a dead-end.

OK, I did some tests with a locally-saved copy of the coreboot.org board-status page, and I'm seeing a definite improvement with this patch, which largely eliminates the GetRenderedText cost. This improves the initial loading time (before the page gets rendered) from around 34 seconds to 6 seconds on my Linux machine, and makes interactions with the loaded page much quicker (e.g. scroll to the end and select a word: without patch, profile shows 1400ms in GetRenderedText; with patch, this drops to under 20ms).

However, the biggest issue I'm seeing is that once the document has loaded in the content process, there's a long period (over 2 minutes) during which the browser is unresponsive because we're busy sending IPC messages between the content and parent processes. The line-cursor patch doesn't make any difference there. So attempting to load that page with a11y enabled is still a terrible experience.

(I also notice that loading the page from the network performs far worse than loading it from a local file. I'm guessing this may be because it gets reflowed multiple times as content arrives, and this results in a lot more of the expensive, parent-blocking IPC chatter.)

Given that the line-cursor patch addresses only one part of the problem here, I'll spin off a dependent bug and post the patch there.

Depends on: 1770981

(In reply to Jonathan Kew (:jfkthame) from comment #21)

OK, I did some tests with a locally-saved copy of the coreboot.org board-status page, and I'm seeing a definite improvement with this patch, which largely eliminates the GetRenderedText cost. This improves the initial loading time (before the page gets rendered) from around 34 seconds to 6 seconds on my Linux machine, and makes interactions with the loaded page much quicker (e.g. scroll to the end and select a word: without patch, profile shows 1400ms in GetRenderedText; with patch, this drops to under 20ms).

Amazing. Thank you!

However, the biggest issue I'm seeing is that once the document has loaded in the content process, there's a long period (over 2 minutes) during which the browser is unresponsive because we're busy sending IPC messages between the content and parent processes.

Yuck. I can't seem to reproduce that on Windows. Would you mind providing a profile?

The other area where we're going to hit line related problems is mozilla::a11y::IsLocalAccAtLineStart. (You won't see that unless accessibility.cache.enabled is true.) That use nsLineIterator, not BlockInFlowLineIterator, so it doesn't benefit from the line cursor. I guess it's hard to make that benefit from the cursor because it calculates line numbers and the lines don't actually have cached numbers. However, a11y doesn't really care about the line numbers; it just wants to know whether two frames are on the same line or not. Do you think we could use BlockInFlowLineIterator instead so we could benefit from the cursor there too? Or is BlockInFlowLineIterator fundamentally different somehow?

Flags: needinfo?(jfkthame)

I wrote a patch in spin-off bug 1771060 to make IsLocalAccAtLineStart use the line cursor via nsBlockInFlowLineIterator. This seems to pass our tests at least.

Curiously, 30 seconds or so after I load this page, the page (or at least a large chunk of it) seems to get re-flowed and we end up janking due to IsLocalAccAtLineStart again. The line cursor doesn't help there because mutations are queued into a hash table, which isn't ordered according to the tree. I guess we could use a combination of an array and a hash table to try to deal with that. Still, this reflow is kinda odd.

(In reply to James Teh [:Jamie] from comment #22)

However, the biggest issue I'm seeing is that once the document has loaded in the content process, there's a long period (over 2 minutes) during which the browser is unresponsive because we're busy sending IPC messages between the content and parent processes.

Yuck. I can't seem to reproduce that on Windows. Would you mind providing a profile?

Startup profiles of loading board-status.html from a local file (on ThinkStation/Linux, so this is a fast machine):

Baseline, without a11y active:
https://share.firefox.dev/3lIAOOJ - this isn't too bad, IMO. There's a jank of close to 1 second in the content process while it does frame construction, styling and reflow, but given the size of the document that's not too surprising.

With screen reader enabled, to force a11y code to be active:
https://share.firefox.dev/3NBSrM4 - the content becomes visible reasonably quickly (see screenshots), but the parent process remains unresponsive (as judged by waiting for toolbar buttons to become usable, for example) for 2+ minutes, and this seems to be dominated by a11y-related IPC traffic.

Any idea what's going on here?

Flags: needinfo?(jfkthame) → needinfo?(jteh)

(In reply to James Teh [:Jamie] from comment #23)

Curiously, 30 seconds or so after I load this page, the page (or at least a large chunk of it) seems to get re-flowed and we end up janking due to IsLocalAccAtLineStart again. The line cursor doesn't help there because mutations are queued into a hash table, which isn't ordered according to the tree. I guess we could use a combination of an array and a hash table to try to deal with that. Still, this reflow is kinda odd.

If you're profiling on Windows and this is immediately after browser startup (e.g. you're launching the browser with the board-status URL on the command line, or navigating to it immediately after launch), you're probably seeing a reflow that is triggered when we finish the asynchronous loading of font data, which is a one-time thing that happens shortly after startup. This is usually more-or-less invisible to users, but if you're on a page where reflow is particularly expensive (as here), it can result in a one-time jank.

To take this out of the equation (as it's not part of the normal browsing experience), you could try setting gfx.font_loader.delay to zero, so we'll do that work more eagerly, or set it to a really large value such that the task doesn't even attempt to run until you've finished your profiling.

Thanks for the profiles.

(In reply to Jonathan Kew (:jfkthame) from comment #21)

once the document has loaded in the content process, there's a long period (over 2 minutes) during which the browser is unresponsive because we're busy sending IPC messages between the content and parent processes.

I might be misreading the profile (not all of the profiler is accessible to me yet, but I'm also probably just not that good at using it yet :) ), but I don't see that much time spent in parent -> content IPDL. The bulk of the time seems to be spent in Accessible::IsText, which doesn't involve IPDL. It looks like the a11y client (I'm guessing you were using Orca?) is asking for a lot of children and we're walking the children many times to figure out which child we want. I can think of a way to optimise that, but I want to first clear up whether I'm missing anything regarding parent -> content IPDL before I dive into that. Is there something I'm missing that makes you think this is parent -> content IPC?

Flags: needinfo?(jteh) → needinfo?(jfkthame)

I do see a bunch of D-Bus queries where we spend even more time than we do on IsText/GetEmbeddedChildAt. However, I can't see what Gecko code (if any) is being triggered in those code paths; there's no Gecko a11y stuff in the stack there that I can see.

I don't know much about profiling the inter-process stuff, but in the profile at https://share.firefox.dev/3NBSrM4, the "IPC - file:// Content" track seems to indicate a huge number of "IPC - received from Parent Process" for the sync message PDocAccessible::Msg_State, and a parallel stream of sync PDocAccessible::Reply_State being sent back to the parent.

There are also PContent::Msg_RecordDiscardedData messages going from the content process to the parent, with extremely high "Recv Thread Latency" (like around 60 seconds), but these are async messages so they may be less critical here.

Presumably these PDocAccessible::Msg_State messages and replies are happening per-element or something like that? So even though any given message takes only a few microseconds, the huge number of them, and the fact that they're sync messages (so that parent and content process each keep waiting for their message to be handled on the other side) means that overall they end up blocking the whole browser for an extended time.

Flags: needinfo?(jfkthame)
See Also: → 1771579

(In reply to Jonathan Kew (:jfkthame) from comment #28)

I don't know much about profiling the inter-process stuff, but in the profile at https://share.firefox.dev/3NBSrM4, the "IPC - file:// Content" track

Ah. I would have expected IPC calls taking a lot of time to show up much higher in the Stack Tree. They do show up in the Stack Tree, but in the content process, I only see about 400 ms spent in RecvState, which is kinda weird given how much the parent is getting blocked. The Marker Chart is inaccessible to me as a screen reader user, unfortunately.

seems to indicate a huge number of "IPC - received from Parent Process" for the sync message PDocAccessible::Msg_State, and a parallel stream of sync PDocAccessible::Reply_State being sent back to the parent.

Interesting. So we must be firing a few events and Orca must be querying State for each of them. Those sync IPDL calls will go away with the new a11y caching architecture (accessibility.cache.enabled pref; restart Firefox after toggling). That will incur some additional time spent in the content process, though.

There are also PContent::Msg_RecordDiscardedData messages going from the content process to the parent, with extremely high "Recv Thread Latency" (like around 60 seconds), but these are async messages so they may be less critical here.

I have no idea what those are. They aren't managed by the a11y code.

See Also: → 1771429
Performance Impact: --- → ?

So... Since Firefox 100 the a11y code has been largely re-written. Jamie, do you know if this is still an issue? Also, is this Linux-specific?

Flags: needinfo?(jteh)

This is definitely still a big problem, but it's significantly better than it was. I can reproduce a nasty perf cliff on Windows, so it's not Linux specific. I will grab another profile and take a look soon.

When loaded from the network, this page often seems to reflow after the initial render. It's difficult to be certain because the behaviour differs intermittently, but I think if the page doesn't reflow (or the a11y tree's initial update occurs after the reflows), everything works relatively well. On the other hand, if there is a reflow, we hit a very nasty performance cliff and jank for potentially minutes. I'm not really sure why we're reflowing here, but it's something a11y needs to cope with better somehow.

As I noted in comment 15, there is a very large block containing a lot of inline elements. There are block elements and hard line breaks interspersed between the inline elements. However, GetRenderedText and a11y TextLeafRange need to figure out whether a text node is at the start of its line. Without the line cursor to help us, that means nsBlockInFlowLineIterator has to potentially crawl through all the lines in the (very large) block.

As for why the line cursor isn't helping us, both the text update and queued cache update maps aren't ordered, so we're effectively querying lines in arbitrary order. Making those ordered might help because reflow presumably follows the tree order to some extent, so our calls are more likely to be sequential.

Flags: needinfo?(jteh)
Assignee: nobody → jteh
Depends on: 1860724
Depends on: 1860941
Depends on: 1771579

The Performance Impact Calculator has determined this bug's performance impact to be medium. If you'd like to request re-triage, you can reset the Performance Impact flag to "?" or needinfo the triage sheriff.

Platforms: [x] Windows [x] macOS [x] Linux
Impact on browser: Renders browser effectively unusable
Impact on site: Renders site effectively unusable
Configuration: Rare
Page load impact: Severe
Websites affected: Rare
[x] Affects animation smoothness
[x] Able to reproduce locally
[x] Bug affects multiple sites

Performance Impact: ? → medium

This should now be a lot better in latest Nightly. It's difficult for me to say how much because there are still some outstanding issues that affect screen readers (and I am a screen reader user), but these won't apply to most users, even if they have accessibility enabled for some other reason.

I asked Anna to test this with accessibility enabled but no screen reader. Accessibility does add a few seconds to the load time, but no more than that.

Status: NEW → RESOLVED
Closed: 5 months ago
Resolution: --- → FIXED
Target Milestone: --- → 121 Branch
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: