Closed Bug 500240 Opened 15 years ago Closed 13 years ago

graphs-new takes 10x as long to load when accessibility is enabled

Categories

(Core :: Disability Access APIs, defect)

x86_64
Linux
defect
Not set
normal

Tracking

()

VERIFIED WORKSFORME

People

(Reporter: catlee, Unassigned)

References

(Blocks 1 open bug)

Details

Attachments

(2 files)

If I have accessibility enabled in Gnome, and visit graphs-new.mozilla.org, it takes well over a minute for the page to complete loading / displaying.  Most of the time is spent with the CPU pegged at 100% and the browser is unresponsive.

According to sysprof, most of this time is being spent in nsCoreUtils::FindDescendantPointingToIDImpl

Disabling accessibility in Gnome, and then logging out and back in again results in the graphs page loading in < 10 seconds.
Chris, can sysprof give a hierarchical profile?  Or alternately, can someone tell me how to exercise that code on Mac?
Yeah I'd like to see more profile data. This might be related to our a11y relations calculations (which are expensive).
Attached file sysprof log
I'm not sure I can get hierarchical data out of sysprof.  Here's a saved copy of the log, you might be able to open in up on another machine with sysprof installed.

It tells me that the only caller of the FindDescendant method is firefox-bin, which isn't too helpful.  It does say that firefox-bin accounted for 68% of total system time for the sampling period, and that the above method is 28% of total system time.

Another method which shows up fairly high in the profile is nsAttrAndChildArray::GetSafeChildAt, accounting for 12% of total system time.
Again, I'm happy to profile this on Mac and get hierarchical data, if someone tells me how to enable accessibility on mac.  David?
(In reply to comment #4)
> Again, I'm happy to profile this on Mac and get hierarchical data, if someone
> tells me how to enable accessibility on mac.  David?

You should add ac_add_options --enable-accessibility to your mozconfig.
I have that already.  A build with that option does not show the hang that Chris is seeing.  Is there something I need to do past that (e.g. to enable the code at runtime, not just build it)?
I think you should run application that starts accessibility like VoiceOver or AccessibilityInspector utility. Also you could try to run DOMi and inspect a11y tree.
I think you need to check "Enable access for assistive devices" in "Sys Pref" -> "Universal Access".

However, I don't think the profile data would be similar to on Linux.
A11y is quite platform dependent.
Here's the stack. Looks like dom mutations (via add/insert/append) which end up calling into libaccessibility here: nsDocAccessible::ContentAppended and indeed we are invalidating our cache. We need a better strategy for this, perhaps some kind of dom-busy dom-calm detection (timer?).
Boris I have the shark data saved locally, let me know if you want me to upload it somewhere.
So if I read that stack and code correctly, on append you InvalidateCacheSubtree on the newly appended node, right?  And the problem is that this ends up looking all over the DOM (or a least the DOM anchored at the ancestor 5 levels up) for <label> elements pointing to the new nodes?  So each append operation is automatically O(N) in the length of siblings and the whole setup is automatically O(N^2) with a nontrivial constant.

Am I missing anything, or does that describe the problem we're seeing here?

If I'm right, how would a busy/calm distinction help?  You'd still need to InvalidateCacheSubtree on all the new nodes, wouldn't you?  Or is the page removing nodes a lot too, in addition to inserting them?
So if the list of tagnames of things that you care about is limited, would it make sense to just have a few getElementsByTagName(NS) result lists hanging around and only walk those lists instead of walking the whole DOM?  Especially because in the common case there won't in fact be any such elements....

Another option might be to parse the attrs in question into atom arrays (like we do for @class) and then use some content lists similar to the way GetElementsByClassName does.  It does look like there's a case where you want to look for all nodes regardless of tagname in FindDescendantPointingToIDImpl...

In general, what is this code trying to accomplish?
(In reply to comment #11)
> So if I read that stack and code correctly, on append you
> InvalidateCacheSubtree on the newly appended node, right?  And the problem is
> that this ends up looking all over the DOM (or a least the DOM anchored at the
> ancestor 5 levels up) for <label> elements pointing to the new nodes?  So each
> append operation is automatically O(N) in the length of siblings and the whole
> setup is automatically O(N^2) with a nontrivial constant.
> 
> Am I missing anything, or does that describe the problem we're seeing here?
>

I think you are correct.
 
> If I'm right, how would a busy/calm distinction help?  You'd still need to
> InvalidateCacheSubtree on all the new nodes, wouldn't you?  Or is the page
> removing nodes a lot too, in addition to inserting them?

I have to look at our cache invalidation, but I'm trying to think of ways of invalidating and rebuilding subtrees non-redundantly (if that is in fact what is happening).
That's worth checking in this case; maybe log exactly what notifications you get here and see what they look like?  But the code as written looks to me like it'll be slow even if non-redundant (e.g. if the page just appends 10,000 nodes to a parent).
Yeah this algorithm is a total pig. I don't like the header comment for FindDescendantPointingToIDImpl... I'll have to dig (soon but not immediately).
Okay got back to this briefly. This appears to be about computing (in reverse) relations like:
aria_labelledby, aria_describedby, aria_owns, aria_controls, aria_flowto.

When we decide if a node foo is worthy of an accessible object instance, one of the things we do is see if a neighboring node has one of these relational attributes with foo as a value. Let's hear from Ginn and Alexander who most recently touched this code.
Yes we calculate reversed relations. FindNeighbour... works with two cases: 1) look for attribute and 2) look for attribute and tagname. For example

<label id="label"/>
<input aria-labelledby="label"/>

if we calculate label_for relation for label element then we search aria-labelledby attribute pointing to label in neighbourhood.

and

<label for="input"/>
<input id="input"/>

if we calculate labelled_by relation for input element then we search label element with @for attribute pointing to input in neighbourhood.

I can see two approaches:
1. Like Boris suggested: implement something more quick than whole DOM traversing
2. Cache relations and update them when DOM is changed. The problem of second approach is we need to build accessible tree when DOM and frame tree are building when page is loaded. I haven't idea how to accomplish this. Possibly we need more tight integration with content and layout modules.
I believe this bug is even worse on Linux/Solaris.
Loading this page took about 300 seconds of user CPU time on my box.

In general, we are slow because we creates a lots of accessible objects and fire a lots of accessible events.
About 11700 a11y events are sent during loading this page.
Most of them are children-changed:add/remove and text-changed.

Besides to find a way to reduce the event numbers, I found some issues we may improve.

1) On creating accessible objects, we do HasRelatedContent()
This function took about 1/3 of user CPU time. i.e. 100 seconds.
I think that's the stack you're discussing.

2) In nsAccTextChangeEvent, we do GetText(aStart, aStart + aLength, mModifiedText)
But mModifiedText is only used on MSAA platform.
It's redundant on Linux/Solaris.
I suspect we can call GetText on demand.
It takes 20 seconds.

3) In at-spi/atk-bridge/bridge.c, it also calls atk_text_get_text().
I don't know if it is necessary. I'll double check.
It takes 14 seconds.

4) Still in atk-bridge, there're some calls to atk_object_get_name().
I think not all of them are necessary.
It takes 101 seconds.

If I remove the code of 1) - 4), it takes 57s of user CPU time with a11y on, and 24s of user CPU time with a11y off.
The 23s are the cost of emitting 11700 a11y events.
(In reply to comment #17)
> I can see two approaches:
> 1. Like Boris suggested: implement something more quick than whole DOM
> traversing
> 2. Cache relations and update them when DOM is changed. The problem of second
> approach is we need to build accessible tree when DOM and frame tree are
> building when page is loaded. I haven't idea how to accomplish this. Possibly
> we need more tight integration with content and layout modules.

I like 1.

We can limit the depth of our tree walking, as well as the range of direct siblings. We can also use something called a utility function and 'prune' the tree that we want to walk; if we can think of a quick utility func...
(In reply to comment #18)
> 1) On creating accessible objects, we do HasRelatedContent()
> This function took about 1/3 of user CPU time. i.e. 100 seconds.
> I think that's the stack you're discussing.

I wonder why don't we do this more lazy-ily? Do we really need to call HasRelatedContent when the DOM mutates? Why not calculate it when callers might care about it? I suppose the answer might be we need to do this when figuring out what events we need to fire; I'll need to dive back into the code.

I need to talk to our API consumers to see what events really matter -- in case we can simplify the problem trivially.
HasRelatedContent() is called during creating a11y objects.
A11y objects are created to fire a11y event when DOM mutates.
However, I don't know if it is really necessary to call it during creating a11y objects.
Chris, graphs-new reroutes to graphs.m.o, where HasRelatedContent() takes up about 7.5% of user CPU (Mac/Shark). This is a lot less than Ginn reported. What is the current status of graphs.m.o -- did the new stuff version it over?
So with a Namoroka nightly (20091001062507), going to graphs.m.o with a11y enabled in gnome, and messing around with the Branch selector results in increasing periods of unresponsiveness in the browser.  First time was about 30 seconds, next was about 20 seconds, next was about 60 seconds, and the last time I changed selections it took 5 minutes before I gave up and killed the browser.

If there's a way to toggle a11y in gnome without having to logout / login, that would be great, and would really make it easier to test this stuff faster!
Ah, Marco noticed you said "Namoroka". Could you try with a mozilla-central?
Latest Minefield build still takes more than one minute and a half to become responsive again after selecting something from the "Branch" dropdown menu.
Chris, can you please check if this is still an issue for FX4?
Looks to be ok. Certainly isn't snappy, but that's regardless of if accessibility is enabled!
(In reply to comment #27)
> Looks to be ok. Certainly isn't snappy, but that's regardless of if
> accessibility is enabled!

great! thank you. Closing as worksforme then.
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → WORKSFORME
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: