Closed Bug 1824064 Opened 2 years ago Closed 2 years ago

Very poor performance with certain pages with lots of time spent in SnowWhiteKiller

Tracking

()

Status:

VERIFIED FIXED

Milestone:

113 Branch

Tracking Flags:

Tracking

Status

firefox-esr102

---

unaffected

firefox111

---

unaffected

firefox112

---

verified

firefox113

---

verified

People

(Reporter: Jamie, Assigned: Jamie)

References

(Blocks 1 open bug, Regression)

Details

(Keywords: regression)

Attachments

(1 file, 1 obsolete file)

Bug 1824064: When cleaning up a11y mutation events, clear the next pointer as well as the previous pointer to reduce pressure on the cycle collector. 2 years ago James Teh [:Jamie] 48 bytes, text/x-phabricator-request	diannaS : approval-mozilla-beta+	Details \| Review
Bug 1824064 abandoned: Don't cycle collect AccEvent. 2 years ago James Teh [:Jamie] 48 bytes, text/x-phabricator-request		Details \| Review

James Teh [:Jamie]

Assignee

Description

•

2 years ago

STR:

Enable accessibility; e.g. by running a screen reader.
Visit this page: https://www.easypost.com/docs/api#scan-forms
- Expected: The page loads and the browser responds quickly.
- Actual: The browser hangs for a long time.

Profile: https://share.firefox.dev/3TyHaR2

We spend a huge amount of time in SnowWhiteKiller. I believe this suggests we're cleaning up a lot of objects, but probably not due to cycles, as I think snow whites are objects which don't have any references to them.
This doesn't happen with accessibility disabled, so it does seem to be triggered by accessibility somehow.
It does look like there are quite a lot of calls to LocalAccessible::Release via ~AccMutationEvent. I guess there are a lot of mutation events we're cleaning up? But I don't understand why there would be so many or why that would cause SnowWhiteKiller so much trouble.
I've seen this on other pages too, but I don't have the URLs now. :(

James Teh [:Jamie]

Assignee

Comment 1

•

2 years ago

It does look like this page is doing something odd which causes thousands of mutation events to be generated. It seems to individually remove a whole lot of paragraphs and text leaf nodes, then add a lot more paragraphs and text leaf nodes. If this were a reflow we weren't handling properly, I'd expect it might remove an entire subtree and add it again, regenerating the subtree, but this isn't what's happening.

Regardless, it's still pretty concerning that we spend so much time in SnowWhiteKiller. It would seem we spend more time there than we do actually coalescing and firing the events.

Olli Pettay [:smaug][bugs@pettay.fi](vacation until Nov 16)

Comment 2

•

2 years ago

That code is trying to find snowwhite objects, but doesn't necessarily find them from the purple buffer.
Something somewhere adds lots of objects to the purple buffer, which hints we're leaking a lot and/or creating lots of objects.

James Teh [:Jamie]

Assignee

Comment 3

•

2 years ago

A11y is probably creating lots of AccMutationEvent objects because of the way the page is being updated (see comment 1). It's hard to know what's triggering that without dissecting all the JS on the page. Assuming we can't reduce those, given that AccMutationEvent objects are meant to be short lived (they get processed and disposed of in the next refresh driver tick), we need to find some way to reduce their impact on the cycle collector.

James Teh [:Jamie]

Assignee

Comment 4

•

2 years ago

Can we improve this by just removing AccEvent's participation in cycle collection? As I understand it:

An AccEvent might hold a reference to another AccEvent. However, we know about these potential cycles and we can break them ourselves before we dispose of the events.
An AccEvent does hold references to LocalAccessible, which does hold a reference to nsIContent and must (I think?) thus participate in cycle collection. However, the reverse should not be possible: a LocalAccessible should never hold a reference to an AccEvent.

Olli Pettay [:smaug][bugs@pettay.fi](vacation until Nov 16)

Comment 5

•

2 years ago

That is one option yes.

Another option might be to use SKIPPABLE macros and break the cycles during skippable phase. DOM elements try to utilize this when possible to optimize themselves out of CC graph.

There is no option to optimize out creation of these AccEvents? Somehow batch changes and fire eventually just one event informing about the last change or some group of changes? (A bit similar to DOM MutationObserver API)

James Teh [:Jamie]

Assignee

Comment 6

•

2 years ago

(In reply to Olli Pettay [:smaug][bugs@pettay.fi] from comment #5)

Another option might be to use SKIPPABLE macros and break the cycles during skippable phase. DOM elements try to utilize this when possible to optimize themselves out of CC graph.

I was wondering about the skippable phase. I saw mention of that in the cc documentation, but there isn't any real documentation for it yet that I could find.

That would probably work for LocalAccessible where the object itself knows whether it's still alive. AccEvent doesn't have that information, though. I guess it could check the LocalAccessible's state, but an event might be irrelevant long before its LocalAccessible is.

There is no option to optimize out creation of these AccEvents? Somehow batch changes and fire eventually just one event informing about the last change or some group of changes?

I'm not ruling out anything at this stage. It's possibly an option. The challenge is that a11y clients (which we don't control) depend on these events and there isn't really an event that says "this whole subtree changed, just throw it away and start over". And even if there were, we'd only want to use that if the number of events for a given subtree got large enough that we were worried about perf impact, as otherwise, clients would continually be throwing away info and causing another performance problem.

James Teh [:Jamie]

Assignee

Comment 7

•

2 years ago

Also, AccEvents themselves are used to batch changes. We have an event coalescence phase where we de-duplicate and filter AccEvents. But all of those AccEvents are currently ref-counted because LocalAccessibles are ref-counted.

How does DOM mutation observer batch changes without holding references to DOM nodes?

James Teh [:Jamie]

Assignee

Comment 8

•

2 years ago

Mutation events are stored in a doubly linked list. We clear the previous pointers when we clean up to remove cycles, but it seems this isn't sufficient. Clearing the next pointers reduces the pressure on SnowWhiteKiller such that it is no longer significant in the profile. Performance still isn't great due to the sheer number of a11y events, but that's something to try to address separately.

I regressed this in bug 1784752. Before that, we did clear the previous pointers.

Assignee: nobody → jteh

Keywords: regression

Regressed by: 1784752

James Teh [:Jamie]

Assignee

Comment 9

•

2 years ago

Attached file Bug 1824064: When cleaning up a11y mutation events, clear the next pointer as well as the previous pointer to reduce pressure on the cycle collector. — Details

James Teh [:Jamie]

Assignee

Comment 10

•

2 years ago

Attached file Bug 1824064 abandoned: Don't cycle collect AccEvent. (obsolete) — Details

While there can be cycles between AccEvents (AccTreeMutationEvent::mNext/PrevEvent), we know how and when to reliably break these.
Thus, we can avoid some overhead by not having them participate in cycle collection.
Note that this patch is not complete.
For example, if we return early in ProcessMutationEvents or if we shut down NotificationController before processing mutation events, there will be unbroken cycles.
We would need to explicitly deal with those cases.
I also haven't been able to prove that this has any observable benefit, so this is just a patch for reference in case it's useful in future.

James Teh [:Jamie]

Assignee

Updated

•

2 years ago

status-firefox111: --- → unaffected

status-firefox112: --- → affected

status-firefox113: --- → affected

status-firefox-esr102: --- → unaffected

Pulsebot

Comment 11

•

2 years ago

Pushed by jteh@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/8bf5f1b0d82f When cleaning up a11y mutation events, clear the next pointer as well as the previous pointer to reduce pressure on the cycle collector. r=eeejay

Cosmin Sabou [:CosminS]

Comment 12

•

2 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/8bf5f1b0d82f

Status: NEW → RESOLVED

Closed: 2 years ago

status-firefox113: affected → fixed

Resolution: --- → FIXED

Target Milestone: --- → 113 Branch

BugBot [:suhaib / :marco/ :calixte]

Comment 13

•

2 years ago

The patch landed in nightly and beta is affected.
:Jamie, is this bug important enough to require an uplift?

If yes, please nominate the patch for beta approval.
If no, please set status-firefox112 to wontfix.

For more information, please visit auto_nag documentation.

Flags: needinfo?(jteh)

James Teh [:Jamie]

Assignee

Comment 14

•

2 years ago

Comment on attachment 9324819 [details]
Bug 1824064: When cleaning up a11y mutation events, clear the next pointer as well as the previous pointer to reduce pressure on the cycle collector.

Beta/Release Uplift Approval Request

User impact if declined: Very poor performance when a11y is enabled on a page with a large number of updates.
Is this code covered by automated tests?: Yes
Has the fix been verified in Nightly?: Yes
Needs manual test from QE?: No
If yes, steps to reproduce:
List of other uplifts needed: None
Risk to taking this patch: Low
Why is the change risky/not risky? (and alternatives if risky): Just clears an additional pointer during cleanup code instead of waiting for the cycle collector to do it.
String changes made/needed:
Is Android affected?: Yes

Flags: needinfo?(jteh)

Attachment #9324819 - Flags: approval-mozilla-beta?

Dianna Smith [:diannaS]

Comment 15

•

2 years ago

Comment on attachment 9324819 [details]
Bug 1824064: When cleaning up a11y mutation events, clear the next pointer as well as the previous pointer to reduce pressure on the cycle collector.

Approved for 112.0b9

Attachment #9324819 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

Dianna Smith [:diannaS]

Comment 16

•

2 years ago

bugherder uplift

https://hg.mozilla.org/releases/mozilla-beta/rev/1256430e3714

status-firefox112: affected → fixed

Camelia Badau [:cbadau], Desktop Test Engineering

Updated

•

2 years ago

Flags: qe-verify+

Mihai Boldan, Desktop QA [:mboldan]

Updated

•

2 years ago

QA Whiteboard: [qa-triaged]

Catalin Sasca, Desktop Test Engineering [:csasca]

Comment 17

•

2 years ago

I was able to reproduce the issue on Firefox 112.0b2 on macOS 13.3 with VoiceOver by following the STR from Comment 0.

The performance is much improved although Firefox still hangs for some time until loads the page (I think it's expected as there is much content to be loaded) but it will recover and work as expected after. Tests were performed on macOS 13.3 (Voiceover), Windows 11 (NVDA) and Ubuntu 22.04 (ORCA) with Firefox 112.0b9 and Firefox Nightly 113.0a1 (2023-04-02).

Status: RESOLVED → VERIFIED

QA Whiteboard: [qa-triaged]

status-firefox112: fixed → verified

status-firefox113: fixed → verified

Flags: qe-verify+

James Teh [:Jamie]

Assignee

Comment 18

•

2 years ago

For the record, we'd ideally improve this further; it really shouldn't take this long given the size of the document. However, that's going to be a very different fix, as the page is mutating a lot for some reason and we need to somehow reduce the number of a11y mutation events it fires. That will need to be looked at in a separate bug.

Andrej (:aglavic)

Comment 19

•

2 years ago

(In reply to Dianna Smith [:diannaS] from comment #16)

https://hg.mozilla.org/releases/mozilla-beta/rev/1256430e3714

== Change summary for alert #37967 (as of Mon, 03 Apr 2023 00:15:00 GMT) ==

Improvements:

Ratio	Test	Platform	Options	Absolute values (old vs new)
8%	a11yr	macosx1015-64-shippable-qr	e10s fission stylo webrender-sw	198.96 -> 183.71

For up to date results, see: https://treeherder.mozilla.org/perfherder/alerts?id=37967

Phabricator Automation

Updated

•

1 year ago

Attachment #9324821 - Attachment is obsolete: true

You need to log in before you can comment on or make changes to this bug.