Closed Bug 1824064 Opened 2 years ago Closed 2 years ago

Very poor performance with certain pages with lots of time spent in SnowWhiteKiller

Categories

(Core :: Disability Access APIs, defect)

defect

Tracking

()

VERIFIED FIXED
113 Branch
Tracking Status
firefox-esr102 --- unaffected
firefox111 --- unaffected
firefox112 --- verified
firefox113 --- verified

People

(Reporter: Jamie, Assigned: Jamie)

References

(Blocks 1 open bug, Regression)

Details

(Keywords: regression)

Attachments

(1 file, 1 obsolete file)

STR:

  1. Enable accessibility; e.g. by running a screen reader.
  2. Visit this page: https://www.easypost.com/docs/api#scan-forms
    • Expected: The page loads and the browser responds quickly.
    • Actual: The browser hangs for a long time.

Profile: https://share.firefox.dev/3TyHaR2

  • We spend a huge amount of time in SnowWhiteKiller. I believe this suggests we're cleaning up a lot of objects, but probably not due to cycles, as I think snow whites are objects which don't have any references to them.
  • This doesn't happen with accessibility disabled, so it does seem to be triggered by accessibility somehow.
  • It does look like there are quite a lot of calls to LocalAccessible::Release via ~AccMutationEvent. I guess there are a lot of mutation events we're cleaning up? But I don't understand why there would be so many or why that would cause SnowWhiteKiller so much trouble.
  • I've seen this on other pages too, but I don't have the URLs now. :(

It does look like this page is doing something odd which causes thousands of mutation events to be generated. It seems to individually remove a whole lot of paragraphs and text leaf nodes, then add a lot more paragraphs and text leaf nodes. If this were a reflow we weren't handling properly, I'd expect it might remove an entire subtree and add it again, regenerating the subtree, but this isn't what's happening.

Regardless, it's still pretty concerning that we spend so much time in SnowWhiteKiller. It would seem we spend more time there than we do actually coalescing and firing the events.

That code is trying to find snowwhite objects, but doesn't necessarily find them from the purple buffer.
Something somewhere adds lots of objects to the purple buffer, which hints we're leaking a lot and/or creating lots of objects.

A11y is probably creating lots of AccMutationEvent objects because of the way the page is being updated (see comment 1). It's hard to know what's triggering that without dissecting all the JS on the page. Assuming we can't reduce those, given that AccMutationEvent objects are meant to be short lived (they get processed and disposed of in the next refresh driver tick), we need to find some way to reduce their impact on the cycle collector.

Can we improve this by just removing AccEvent's participation in cycle collection? As I understand it:

  1. An AccEvent might hold a reference to another AccEvent. However, we know about these potential cycles and we can break them ourselves before we dispose of the events.
  2. An AccEvent does hold references to LocalAccessible, which does hold a reference to nsIContent and must (I think?) thus participate in cycle collection. However, the reverse should not be possible: a LocalAccessible should never hold a reference to an AccEvent.

That is one option yes.

Another option might be to use SKIPPABLE macros and break the cycles during skippable phase. DOM elements try to utilize this when possible to optimize themselves out of CC graph.

There is no option to optimize out creation of these AccEvents? Somehow batch changes and fire eventually just one event informing about the last change or some group of changes? (A bit similar to DOM MutationObserver API)

(In reply to Olli Pettay [:smaug][bugs@pettay.fi] from comment #5)

Another option might be to use SKIPPABLE macros and break the cycles during skippable phase. DOM elements try to utilize this when possible to optimize themselves out of CC graph.

I was wondering about the skippable phase. I saw mention of that in the cc documentation, but there isn't any real documentation for it yet that I could find.

That would probably work for LocalAccessible where the object itself knows whether it's still alive. AccEvent doesn't have that information, though. I guess it could check the LocalAccessible's state, but an event might be irrelevant long before its LocalAccessible is.

There is no option to optimize out creation of these AccEvents? Somehow batch changes and fire eventually just one event informing about the last change or some group of changes?

I'm not ruling out anything at this stage. It's possibly an option. The challenge is that a11y clients (which we don't control) depend on these events and there isn't really an event that says "this whole subtree changed, just throw it away and start over". And even if there were, we'd only want to use that if the number of events for a given subtree got large enough that we were worried about perf impact, as otherwise, clients would continually be throwing away info and causing another performance problem.

Also, AccEvents themselves are used to batch changes. We have an event coalescence phase where we de-duplicate and filter AccEvents. But all of those AccEvents are currently ref-counted because LocalAccessibles are ref-counted.

How does DOM mutation observer batch changes without holding references to DOM nodes?

Mutation events are stored in a doubly linked list. We clear the previous pointers when we clean up to remove cycles, but it seems this isn't sufficient. Clearing the next pointers reduces the pressure on SnowWhiteKiller such that it is no longer significant in the profile. Performance still isn't great due to the sheer number of a11y events, but that's something to try to address separately.

I regressed this in bug 1784752. Before that, we did clear the previous pointers.

Assignee: nobody → jteh
Keywords: regression
Regressed by: 1784752

While there can be cycles between AccEvents (AccTreeMutationEvent::mNext/PrevEvent), we know how and when to reliably break these.
Thus, we can avoid some overhead by not having them participate in cycle collection.
Note that this patch is not complete.
For example, if we return early in ProcessMutationEvents or if we shut down NotificationController before processing mutation events, there will be unbroken cycles.
We would need to explicitly deal with those cases.
I also haven't been able to prove that this has any observable benefit, so this is just a patch for reference in case it's useful in future.

Pushed by jteh@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/8bf5f1b0d82f When cleaning up a11y mutation events, clear the next pointer as well as the previous pointer to reduce pressure on the cycle collector. r=eeejay
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → 113 Branch

The patch landed in nightly and beta is affected.
:Jamie, is this bug important enough to require an uplift?

  • If yes, please nominate the patch for beta approval.
  • If no, please set status-firefox112 to wontfix.

For more information, please visit auto_nag documentation.

Flags: needinfo?(jteh)

Comment on attachment 9324819 [details]
Bug 1824064: When cleaning up a11y mutation events, clear the next pointer as well as the previous pointer to reduce pressure on the cycle collector.

Beta/Release Uplift Approval Request

  • User impact if declined: Very poor performance when a11y is enabled on a page with a large number of updates.
  • Is this code covered by automated tests?: Yes
  • Has the fix been verified in Nightly?: Yes
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): Just clears an additional pointer during cleanup code instead of waiting for the cycle collector to do it.
  • String changes made/needed:
  • Is Android affected?: Yes
Flags: needinfo?(jteh)
Attachment #9324819 - Flags: approval-mozilla-beta?

Comment on attachment 9324819 [details]
Bug 1824064: When cleaning up a11y mutation events, clear the next pointer as well as the previous pointer to reduce pressure on the cycle collector.

Approved for 112.0b9

Attachment #9324819 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
QA Whiteboard: [qa-triaged]

I was able to reproduce the issue on Firefox 112.0b2 on macOS 13.3 with VoiceOver by following the STR from Comment 0.

The performance is much improved although Firefox still hangs for some time until loads the page (I think it's expected as there is much content to be loaded) but it will recover and work as expected after. Tests were performed on macOS 13.3 (Voiceover), Windows 11 (NVDA) and Ubuntu 22.04 (ORCA) with Firefox 112.0b9 and Firefox Nightly 113.0a1 (2023-04-02).

Status: RESOLVED → VERIFIED
QA Whiteboard: [qa-triaged]
Flags: qe-verify+

For the record, we'd ideally improve this further; it really shouldn't take this long given the size of the document. However, that's going to be a very different fix, as the page is mutating a lot for some reason and we need to somehow reduce the number of a11y mutation events it fires. That will need to be looked at in a separate bug.

(In reply to Dianna Smith [:diannaS] from comment #16)

https://hg.mozilla.org/releases/mozilla-beta/rev/1256430e3714

== Change summary for alert #37967 (as of Mon, 03 Apr 2023 00:15:00 GMT) ==

Improvements:

Ratio Test Platform Options Absolute values (old vs new)
8% a11yr macosx1015-64-shippable-qr e10s fission stylo webrender-sw 198.96 -> 183.71

For up to date results, see: https://treeherder.mozilla.org/perfherder/alerts?id=37967

Attachment #9324821 - Attachment is obsolete: true
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: