Open Bug 1396039 Opened 3 years ago Updated 2 years ago

get_accChild hangs somewhat frequently

Categories

(Core :: Disability Access APIs, enhancement, P3)

enhancement

Tracking

()

People

(Reporter: nika, Unassigned)

References

(Blocks 1 open bug)

Details

(Whiteboard: [aes+][bhr][bhr-html][fxperf:p3])

This is a (hopefully stable-ish) view into some of the hang's we're seeing in the parent process due to acc_getChild. It makes up about 5% of the parent process main thread hangs over 2 seconds long being detected by BHR.

https://squarewave.github.io/?durationSpec=2048_65536&payloadID=131360179ef146ab85e01052d16ebe89&platform=Windows_NT%3A10%3Ax86-64&range=4_4&runnable=%3F%3F%3F&search=get_acc&thread=0
Putting this on our e10s a11y ship radar.
Priority: -- → P1
Whiteboard: [bhr] → [aes?][bhr]
Blocks: 1365655
Priority: P1 → P3
Adding the [bhr-html] tag to match arewesmoothyet's File a Bug autofilling. Nika, what are you using the [bhr] tag for? Should we consolidate these tags? I'm using [bhr-html] to track things that were filed due to arewesmoothyet.
Whiteboard: [aes?][bhr] → [aes?][bhr][bhr-html]
(In reply to Doug Thayer [:dthayer] from comment #2)
> Adding the [bhr-html] tag to match arewesmoothyet's File a Bug autofilling.
> Nika, what are you using the [bhr] tag for? Should we consolidate these
> tags? I'm using [bhr-html] to track things that were filed due to
> arewesmoothyet.

I've been using [bhr] to mark performance issues which have been detected through BHR, although I haven't been super consistent about it. The idea was to have some way to track performance issues which cause hangs in a similar way to how we tracked [qf] bugs.
Whiteboard: [aes?][bhr][bhr-html] → [aes+][bhr][bhr-html]
Jamie just a needinfo for your awareness. Wondering if this is related to freezes Mick was seeing.
Flags: needinfo?(jteh)
Actually, Nika, is this still showing up? I could find it in recent report data. I do get some hits for "Accessible" in the call stack though...
Flags: needinfo?(nika)
This definitely still shows up - not sure if the volume is still comparable to the old one - we seem to get (from nightly) 50-200 reports of >2s hangs in get_accChild per release in the last week. 

http://arewesmoothyet.com/?category=all&durationSpec=2048_65536&payloadID=76f882ef7b6343269088e1acee7ef0a0&search=get_accChild&thread=1
Flags: needinfo?(nika)
I can't read these reports due to inaccessibility. It'd be good if someone could give me an example of a stack where this occurs.

The only reason I can think that this would occur is if the content process is busy handling other stuff.
Flags: needinfo?(jteh)
This stack is still very frequently hanging, yesterday it caused an accumulated 2.8 hours of hang in over 15k occurrences in Nightly alone.

This is the most frequently hanging stack:

mozilla::a11y::AccessibleHandler::get_accChild(tagVARIANT,IDispatch * *) AccessibleHandler
static struct already_AddRefed<IDispatch> GetProxiedAccessibleInSubtree(const class mozilla::a11y::DocAccessibleParent *, const struct tagVARIANT & const) xul
struct already_AddRefed<IAccessible> mozilla::a11y::AccessibleWrap::GetRemoteIAccessibleFor(const struct tagVARIANT & const) xul
mozilla::a11y::AccessibleWrap::GetIAccessibleFor(tagVARIANT const &,bool *) xul
mozilla::a11y::AccessibleWrap::get_accChild(tagVARIANT,IDispatch * *) xul
AccWrap_Base::get_accChild(tagVARIANT,IDispatch * *) oleacc
AccessibleObjectFromEvent oleacc
CThreadInputMgr::OnAccFocusEvent(HWND__ *,long,long) msctf
WinEventProc(HWINEVENTHOOK__ *,unsigned long,HWND__ *,long,long,unsigned long,unsigned long) msctf
_ClientCallWinEventProc user32
KiUserCallbackDispatcherContinue ntdll
NtUserNotifyWinEvent win32u
NotifyWinEvent user32
mozilla::a11y::DocAccessibleParent::RecvFocusEvent(unsigned __int64 const &,mozilla::gfx::IntRectTyped<mozilla::LayoutDevicePixel> const &) xul
mozilla::a11y::PDocAccessibleParent::OnMessageReceived(IPC::Message const &) xul
PDocAccessible::Msg_FocusEvent 
void mozilla::ipc::MessageChannel::DispatchMessageW(class IPC::Message *) xul
mozilla::ipc::MessageChannel::MessageTask::Run() xul
nsThread::ProcessNextEvent(bool,bool *) xul

Jamie, does this help you in any way? It would be great to see this reduced.
Flags: needinfo?(jteh)
Whiteboard: [aes+][bhr][bhr-html] → [aes+][bhr][bhr-html][fxperf]
Are these hangs still > 2 seconds as suggested in comment 0? Do we have any idea of the average hang time, etc.?

From this stack, it looks like the focus changes in content, we fire an accessibility focus event, and then Microsoft Text Services Framework (used for IMEs) wants to query info about the focus (which seems perfectly reasonable). Because the focus is in content, it has to make a cross-process query. That's where the hang seems to occur. That suggests that the content main thread is hung for some reason which may not even be related to accessibility. In order to really figure out what's oging on here, we'd need to correlate this with the stack at that same time from the content process being waited on, but I'm guessing there's no way for us to get that info?

One idea is to send up a COM proxy (along with cached handler data) for the focus along with the focus event. That way, the info TSF needs would hopefully already be there and not require a cross-proc call. I'm not sure whether this is possible, though; it would need investigation. Unfortunately, there's also no guarantee it would work: it's entirely possible TSF will query for something that isn't in the cache, at which point we're back to square 1. And the worst part is that there's no way to even know what TSF will ask for in these situations without being able to reproduce this ourselves.
Flags: needinfo?(jteh)
Yes, there are definitely still cases where this is hanging more than 2 seconds. I'm not sure if just averaging out the number I posted above is helpful here, in the > 2s hang case the mean time seems to be 4s, in the 0.5 - 2s range the mean is about 900ms.
Nika, regarding comment 9, it seems like we could correlate hangs across processes reasonably well just using timestamps, no? We could either send the timestamps with the ping or add some kind of correlatedHangId to hangs based on timestamps when we're building the ping. Thoughts?
Flags: needinfo?(nika)
(In reply to Doug Thayer [:dthayer] from comment #11)
> Nika, regarding comment 9, it seems like we could correlate hangs across
> processes reasonably well just using timestamps, no? We could either send
> the timestamps with the ping or add some kind of correlatedHangId to hangs
> based on timestamps when we're building the ping. Thoughts?

I'm honestly not sure. Perhaps? I haven't touched the code much recently. If you have an idea for how to implement it go for it :-)
Flags: needinfo?(nika)
Whiteboard: [aes+][bhr][bhr-html][fxperf] → [aes+][bhr][bhr-html][fxperf:p3]
You need to log in before you can comment on or make changes to this bug.