Closed Bug 1485097 Opened Last year Closed Last year

Firefox Hangs with JAWS/NVDA Running on webpage with Auto-Complete

Categories

(Core :: Disability Access APIs, defect, P1, major)

61 Branch
x86_64
Windows 10
defect

Tracking

()

VERIFIED FIXED
mozilla64
Tracking Status
firefox64 --- verified

People

(Reporter: beth.frost, Assigned: Jamie)

References

(Blocks 1 open bug)

Details

Attachments

(4 files)

Windows 10, Firefox 61.0.2 (64-bit), JAWS 2018.1807.8 ILM

We have developed a web application that uses a Telerik RadAutoComplete control. The Telerik auto-complete appears to be tagged properly with ARIA attributes. When running JAWS and loading this page with Firefox, Firefox hangs and becomes non-responsive. The page starts to render, then stops rendering and then Firefox hangs. 

https://awarepreview.aedmz.com/AwareEngineeringDemo
Username: FreedomScientific 
Password: can I provide privately somewhere?

This does not happen if JAWS is not running. The issue does not happen in IE11, Chrome or Edge browsers, with or without JAWS running.

We have worked with VFO (Freedom Scientific) Support and while they can reproduce the error, they have indicated it is a Mozilla bug, not a JAWS bug. They have also reproduced with NVDA to confirm the issue is affecting multiple screenreaders. 

While Firefox was unresponsive, our developer used DebugDiag to capture and analyze a series of memory dumps.

It showed a consistent stack trace of:
[various child method calls]

      xul.dll!mozilla::a11y::Accessible::ApplyARIAState(unsigned __int64 * aState=0x00c5eb70) Line 1249  C++
>     xul.dll!mozilla::a11y::Accessible::State() Line 1154  C++
   xul.dll!mozilla::a11y::AccTextChangeEvent::AccTextChangeEvent(mozilla::a11y::Accessible * aAccessible=0x1cab9580, int aStart=0, const nsAString_internal & aModifiedText={...}, bool aIsInserted=true, mozilla::a11y::EIsFromUserInput aIsFromUserInput=eNoUserInput) Line 92     C++
  xul.dll!mozilla::a11y::NotificationController::QueueMutationEvent(mozilla::a11y::AccTreeMutationEvent * aEvent=0x1a35d0c0) Line 257 C++
      xul.dll!mozilla::a11y::TreeMutation::AfterInsertion(mozilla::a11y::Accessible * aChild=0x1cab9460) Line 72    C++
      xul.dll!mozilla::a11y::DocAccessible::MoveChild(mozilla::a11y::Accessible * aChild=0x1cab9460, mozilla::a11y::Accessible * aNewParent=0x1cab9580, int aIdxInParent=0) Line 2253     C++
      xul.dll!mozilla::a11y::DocAccessible::CacheChildrenInSubtree(mozilla::a11y
rendered markup for the Quick Navigation auto-complete.
<span id="QN_QuickNav_Status" aria-live="polite" aria-hidden="false" aria-atomic="true" class="sr-only"></span><div id="QN_QuickNav" title="Quick Navigation (ALT+Q) autocomplete" class="RadAutoCompleteBox RadAutoCompleteBox_Vista AEQN" style="width:200px;">
	<div class="racTokenList">
		<input class="racInput radPreventDecorate" name="QN$QuickNav" type="text" id="QN_QuickNav_Input" accesskey="Q" />
	</div><div class="racSlide" style="z-index:7000;display:none;">
		<div class="RadAutoCompleteBoxPopup RadAutoCompleteBoxPopup_Vista">
			<ul class="racList">
				<li class="racItem"><!-- --></li>
			</ul>
		</div>
	</div><input id="QN_QuickNav_ClientState" name="QN_QuickNav_ClientState" type="hidden" />
</div>
  
<span id="QN_QuickNavDS" style="display:none;"></span>
Hi Beth, thank you for filing the bug! I just sent you private mail for the password. We definitely need that to reproduce and analyze the bug. First order of business will be to verify if the bug still occurs in Firefox 63 (Nightly), because there were changes since Firefox 61, and we need to verify if one of these changes fixed this bug. The markup itself doesn't look too suspicious to me at first glance, so we'll definitely need to see this in context. Also copying my colleagues Jamie and Alex to see if they see anything suspicious in the stacks from comment #0.
Flags: needinfo?(surkov.alexander)
Flags: needinfo?(jteh)
Confirmed bug. Firefox hangs, NVDA remains responsive. I have to close the parent Firefox window by force-quitting it. Unfortunately, this does not generate a crash. The steps are:

With NVDA running:

1. Open the page stated in comment #0.
2. Log in using the user name from comment #0 and the password received from Beth and I e-mailed to Alex and Jamie.
3. The next page immediately shows the behavior. The document announced that it is busy, then hangs indefinitely. NVDA doesn't even start loading a buffer.

Setting P1 since this is a reproducible hang.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Priority: -- → P1
(In reply to Beth Frost from comment #0)

>       xul.dll!mozilla::a11y::Accessible::ApplyARIAState(unsigned __int64 *
> aState=0x00c5eb70) Line 1249  C++
> >     xul.dll!mozilla::a11y::Accessible::State() Line 1154  C++
>   
> xul.dll!mozilla::a11y::AccTextChangeEvent::AccTextChangeEvent(mozilla::a11y::
> Accessible * aAccessible=0x1cab9580, int aStart=0, const nsAString_internal
> & aModifiedText={...}, bool aIsInserted=true,
> mozilla::a11y::EIsFromUserInput aIsFromUserInput=eNoUserInput) Line 92    
> C++
>  
> xul.dll!mozilla::a11y::NotificationController::QueueMutationEvent(mozilla::
> a11y::AccTreeMutationEvent * aEvent=0x1a35d0c0) Line 257 C++
>      
> xul.dll!mozilla::a11y::TreeMutation::AfterInsertion(mozilla::a11y::
> Accessible * aChild=0x1cab9460) Line 72    C++
>      
> xul.dll!mozilla::a11y::DocAccessible::MoveChild(mozilla::a11y::Accessible *
> aChild=0x1cab9460, mozilla::a11y::Accessible * aNewParent=0x1cab9580, int
> aIdxInParent=0) Line 2253     C++
>      
> xul.dll!mozilla::a11y::DocAccessible::CacheChildrenInSubtree(mozilla::a11y

having no numbers, I can make an educated guess that we deal with a couple of problems here: 
* AfterInsertion is slow since it rebuilds the indices
* State() is also slow, AccTextChangeEvent shouldn't really rely on it
* there's something weird about MoveChild, it shouldn't be such numerous to appear on the stack

I would be curious to see a perf profile. Beth, could you capture one please? https://developer.mozilla.org/en-US/docs/Mozilla/Performance/Reporting_a_Performance_Problem
Flags: needinfo?(surkov.alexander) → needinfo?(beth.frost)
Here's a more revealing stack:

0:043> ~0 kp 10
 # ChildEBP RetAddr  
00 004fe318 563e4dd1 xul!nsAttrAndChildArray::IndexOfAttr(class nsAtom * aLocalName = 0x572be23c, int aNamespaceID = 0n0)+0xd0 [c:\Users\jamie\src\gecko\dom\base\nsAttrAndChildArray.cpp @ 394] 
01 004fe34c 563e7aec xul!mozilla::a11y::Accessible::ApplyARIAState(unsigned int64 * aState = 0x004fe360)+0xb1 [c:\Users\jamie\src\gecko\accessible\generic\Accessible.cpp @ 1304] 
02 004fe380 563c6a9d xul!mozilla::a11y::Accessible::State(void)+0x3c [c:\Users\jamie\src\gecko\accessible\generic\Accessible.cpp @ 1208] 
03 004fe3d4 563c607f xul!mozilla::a11y::NotificationController::QueueMutationEvent(class mozilla::a11y::AccTreeMutationEvent * aEvent = 0x27511d00)+0x9ed [c:\Users\jamie\src\gecko\accessible\base\NotificationController.cpp @ 259] 
04 004fe3f0 563f1b43 xul!mozilla::a11y::TreeMutation::AfterInsertion(class mozilla::a11y::Accessible * aChild = 0x0cbf1f40)+0x10f [c:\Users\jamie\src\gecko\accessible\base\EventTree.cpp @ 74] 
05 004fe440 563f08cb xul!mozilla::a11y::DocAccessible::MoveChild(class mozilla::a11y::Accessible * aChild = <Value unavailable error>, class mozilla::a11y::Accessible * aNewParent = 0x27517040, int aIdxInParent = 0n0)+0x1c3 [c:\Users\jamie\src\gecko\accessible\generic\DocAccessible.cpp @ 2307] 
06 004fe860 563f1058 xul!mozilla::a11y::DocAccessible::CacheChildrenInSubtree(class mozilla::a11y::Accessible * aRoot = 0x27517040, class mozilla::a11y::Accessible ** aFocusedAcc = 0x004fe870)+0xcb [c:\Users\jamie\src\gecko\accessible\generic\DocAccessible.cpp @ 2337] 
07 004fe880 563f16b9 xul!mozilla::a11y::DocAccessible::CreateSubtree(class mozilla::a11y::Accessible * aChild = 0x27517040)+0x28 [c:\Users\jamie\src\gecko\accessible\generic\DocAccessible-inl.h @ 160] 
08 004fe8e4 563cb91b xul!mozilla::a11y::DocAccessible::DoARIAOwnsRelocation(class mozilla::a11y::Accessible * aOwner = 0x0cbf1f40)+0x1f9 [c:\Users\jamie\src\gecko\accessible\generic\DocAccessible.cpp @ 2095] 
09 004fea80 5590d482 xul!mozilla::a11y::NotificationController::WillRefresh(void)+0x98b [c:\Users\jamie\src\gecko\accessible\base\NotificationController.cpp @ 851] 

Some observations:
1. This is related to aria-owns; see frame 08.

2. Looking at the DOM, there is some pretty wacky stuff going on with aria-owns. Simplifying the markup, it goes something like this:

<div id="QN_QuickNav">
  <div role="list" id="QN_QuickNav_tokens">
    <input id="QN_QuickNav_Input" role="textbox" aria-owns="QN_QuickNav_listbox QN_QuickNav_tokens">
  </div>
  <div role="listbox" id="QN_QuickNav_listbox">
    ...
  </div>
</div>

In particular, note that the input is trying to own its parent, which is certainly authoring error. Fixing this would probably fix the issue, but Firefox still shouldn't freeze because of this.

3. Simply owning a parent isn't enough to trigger this:

data:text/html,<div id="container"><div id="tokens"><input id="input" role="textbox" aria-owns="listbox tokens"></div><div id="listbox" role="listbox">&nbsp;</div></div>

4. In ApplyARIAState (frame 01), we seem to get stuck in a loop traversing ancestors:

1301          const Accessible* ancestor = this;
1302          while ((ancestor = ancestor->Parent()) && !ancestor->IsDoc()) {
1303            dom::Element* el = ancestor->Elm();
1304            if (el &&
1305                el->HasAttr(kNameSpaceID_None, nsGkAtoms::aria_activedescendant)) {
1306              *aState |= states::FOCUSABLE;
1307              break;

That would suggest that we have a loop in our ancestry? But that in turn would suggest we haven't reparented the accessible by the time we queue the event, which was I thought the whole point of MoveChild.

Still investigating...
Flags: needinfo?(jteh)
This test case reproduces the same infinite loop in ApplyARIAState:

data:text/html,<div id="tokens" role="presentation"><input id="input" role="textbox" aria-owns="tokens"></div>

However, it's not a direct parallel, since it uses role="presentation" and it doesn't happen during a mutation event, but rather, when a client tries to get states. I thought of something like this because we're dealing with an aria-owned element whose subtree must be created, even though it's actually an ancestor of the owner.
Flags: needinfo?(beth.frost)
One other item to note is that the div with ID "QN_QuickNav_tokens" is of role "list", but this list has no children with role "listitem", as would be the correct markup. Instead, it gets an input and a listbox as children. So aside from owning its parent, the role suggests that there's other authoring problems here.

Jamie's test case from comment #8 is, as he said himself, a little different, too, since here, we aria-own an element that, due to role "presentation", shouldn't even get an accessible in the first place. The div with role "list", on the other hand, does, even when there are no listitem children.
Assignee: nobody → jteh
Blocks: ariaowns
This can easily be reproduced if the ancestor being owned has role="presentation", but there are other cases as well.
If we don't prevent this, we end up with a loop.
This patch fixes the test case in comment 8, and also prevents the infinite loop for the originally reported page in comment 0. However, there is still a further problem lurking here. The reason that this code path triggers in the first place is that the accessible for the parent didn't exist at the time that aria-owns relocation occurred. I can't quite work out why. My guess is that it has something to do with the invalidation list, but I don't quite understand what the invalidation list is for. So, even though this patch prevents the hang, we also end up with that parent list (#QN_QuickNav_tokens) not having an accessible in the tree at all; it just gets pruned. I think this is far better than a hang and this is invalid authoring after all, but it'd still be good to know what is going on here. I've not yet been able to produce a simple distilled test case for that.
Eek. A simpler version of the test case in comment 8 without role="presentation" that reproduces this loop:

data:text/html,<span id="tokens"><input aria-owns="tokens"></span>

We don't create an accessible for the span normally, so... boom. The attached patch fixes this one too.
Our tester installed Firefox Nightly and she is still experiencing the hang on our site. Did we perhaps jump the gun and test too early? I saw in Comment 12 there was a patch, but I wasn't sure if that meant we could test it ourselves.
(In reply to Beth Frost from comment #14)
> Our tester installed Firefox Nightly and she is still experiencing the hang
> on our site. Did we perhaps jump the gun and test too early? I saw in
> Comment 12 there was a patch, but I wasn't sure if that meant we could test
> it ourselves.

the bug is not yet fixed, so yeah it's too early for testing.
Comment on attachment 9003368 [details]
Bug 1485097: When handling aria-owns relocation and an owned child doesn't yet have an accessible, skip it if the owned child is actually an ancestor of its owner.

alexander :surkov (:asurkov) has approved the revision.
Attachment #9003368 - Flags: review+
Pushed by jteh@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/ed6ac1e1266c
When handling aria-owns relocation and an owned child doesn't yet have an accessible, skip it if the owned child is actually an ancestor of its owner. r=surkov
https://hg.mozilla.org/mozilla-central/rev/ed6ac1e1266c
Status: NEW → RESOLVED
Closed: Last year
Resolution: --- → FIXED
Target Milestone: --- → mozilla64
Just double-checking - does this mean we can test with Firefox Nightly tomorrow and see the fix?
(In reply to Beth Frost from comment #19)
> Just double-checking - does this mean we can test with Firefox Nightly
> tomorrow and see the fix?

Yes. However, please note that while Firefox shouldn't freeze regardless, there is still some major authoring error in this autocomplete implementation which really should be fixed. See point 2 in comment 7 for details, but in short, a child element is trying to aria-owns its parent, which is a spec violation and doesn't make any sense.
I've managed to reproduce this bug on an affected Firefox 61.0.2 on Windows 10x64 using NVDA 2018.2.1 and JAWS 2018.1808.10 ILM.

This is verified fixed using Firefox 64.0b4 (BuildId:20181025233934) on Windows 10 x64.
Status: RESOLVED → VERIFIED
Flags: qe-verify+
You need to log in before you can comment on or make changes to this bug.