get_accChild hangs somewhat frequently
Categories
(Core :: Disability Access APIs, enhancement, P3)
Tracking
()
People
(Reporter: nika, Unassigned)
References
(Blocks 1 open bug)
Details
(Whiteboard: [aes+][bhr:get_accChild][bhr-html][fxperf:p3])
This is a (hopefully stable-ish) view into some of the hang's we're seeing in the parent process due to acc_getChild. It makes up about 5% of the parent process main thread hangs over 2 seconds long being detected by BHR. https://squarewave.github.io/?durationSpec=2048_65536&payloadID=131360179ef146ab85e01052d16ebe89&platform=Windows_NT%3A10%3Ax86-64&range=4_4&runnable=%3F%3F%3F&search=get_acc&thread=0
Comment 1•7 years ago
|
||
Putting this on our e10s a11y ship radar.
Updated•7 years ago
|
Comment 2•7 years ago
|
||
Adding the [bhr-html] tag to match arewesmoothyet's File a Bug autofilling. Nika, what are you using the [bhr] tag for? Should we consolidate these tags? I'm using [bhr-html] to track things that were filed due to arewesmoothyet.
Reporter | ||
Comment 3•7 years ago
|
||
(In reply to Doug Thayer [:dthayer] from comment #2) > Adding the [bhr-html] tag to match arewesmoothyet's File a Bug autofilling. > Nika, what are you using the [bhr] tag for? Should we consolidate these > tags? I'm using [bhr-html] to track things that were filed due to > arewesmoothyet. I've been using [bhr] to mark performance issues which have been detected through BHR, although I haven't been super consistent about it. The idea was to have some way to track performance issues which cause hangs in a similar way to how we tracked [qf] bugs.
Updated•7 years ago
|
Comment 4•7 years ago
|
||
Jamie just a needinfo for your awareness. Wondering if this is related to freezes Mick was seeing.
Comment 5•7 years ago
|
||
Actually, Nika, is this still showing up? I could find it in recent report data. I do get some hits for "Accessible" in the call stack though...
Reporter | ||
Comment 6•7 years ago
|
||
This definitely still shows up - not sure if the volume is still comparable to the old one - we seem to get (from nightly) 50-200 reports of >2s hangs in get_accChild per release in the last week. http://arewesmoothyet.com/?category=all&durationSpec=2048_65536&payloadID=76f882ef7b6343269088e1acee7ef0a0&search=get_accChild&thread=1
Comment 7•7 years ago
|
||
I can't read these reports due to inaccessibility. It'd be good if someone could give me an example of a stack where this occurs. The only reason I can think that this would occur is if the content process is busy handling other stuff.
Comment 8•6 years ago
|
||
This stack is still very frequently hanging, yesterday it caused an accumulated 2.8 hours of hang in over 15k occurrences in Nightly alone. This is the most frequently hanging stack: mozilla::a11y::AccessibleHandler::get_accChild(tagVARIANT,IDispatch * *) AccessibleHandler static struct already_AddRefed<IDispatch> GetProxiedAccessibleInSubtree(const class mozilla::a11y::DocAccessibleParent *, const struct tagVARIANT & const) xul struct already_AddRefed<IAccessible> mozilla::a11y::AccessibleWrap::GetRemoteIAccessibleFor(const struct tagVARIANT & const) xul mozilla::a11y::AccessibleWrap::GetIAccessibleFor(tagVARIANT const &,bool *) xul mozilla::a11y::AccessibleWrap::get_accChild(tagVARIANT,IDispatch * *) xul AccWrap_Base::get_accChild(tagVARIANT,IDispatch * *) oleacc AccessibleObjectFromEvent oleacc CThreadInputMgr::OnAccFocusEvent(HWND__ *,long,long) msctf WinEventProc(HWINEVENTHOOK__ *,unsigned long,HWND__ *,long,long,unsigned long,unsigned long) msctf _ClientCallWinEventProc user32 KiUserCallbackDispatcherContinue ntdll NtUserNotifyWinEvent win32u NotifyWinEvent user32 mozilla::a11y::DocAccessibleParent::RecvFocusEvent(unsigned __int64 const &,mozilla::gfx::IntRectTyped<mozilla::LayoutDevicePixel> const &) xul mozilla::a11y::PDocAccessibleParent::OnMessageReceived(IPC::Message const &) xul PDocAccessible::Msg_FocusEvent void mozilla::ipc::MessageChannel::DispatchMessageW(class IPC::Message *) xul mozilla::ipc::MessageChannel::MessageTask::Run() xul nsThread::ProcessNextEvent(bool,bool *) xul Jamie, does this help you in any way? It would be great to see this reduced.
Updated•6 years ago
|
Comment 9•6 years ago
|
||
Are these hangs still > 2 seconds as suggested in comment 0? Do we have any idea of the average hang time, etc.? From this stack, it looks like the focus changes in content, we fire an accessibility focus event, and then Microsoft Text Services Framework (used for IMEs) wants to query info about the focus (which seems perfectly reasonable). Because the focus is in content, it has to make a cross-process query. That's where the hang seems to occur. That suggests that the content main thread is hung for some reason which may not even be related to accessibility. In order to really figure out what's oging on here, we'd need to correlate this with the stack at that same time from the content process being waited on, but I'm guessing there's no way for us to get that info? One idea is to send up a COM proxy (along with cached handler data) for the focus along with the focus event. That way, the info TSF needs would hopefully already be there and not require a cross-proc call. I'm not sure whether this is possible, though; it would need investigation. Unfortunately, there's also no guarantee it would work: it's entirely possible TSF will query for something that isn't in the cache, at which point we're back to square 1. And the worst part is that there's no way to even know what TSF will ask for in these situations without being able to reproduce this ourselves.
Comment 10•6 years ago
|
||
Yes, there are definitely still cases where this is hanging more than 2 seconds. I'm not sure if just averaging out the number I posted above is helpful here, in the > 2s hang case the mean time seems to be 4s, in the 0.5 - 2s range the mean is about 900ms.
Comment 11•6 years ago
|
||
Nika, regarding comment 9, it seems like we could correlate hangs across processes reasonably well just using timestamps, no? We could either send the timestamps with the ping or add some kind of correlatedHangId to hangs based on timestamps when we're building the ping. Thoughts?
Reporter | ||
Comment 12•6 years ago
|
||
(In reply to Doug Thayer [:dthayer] from comment #11) > Nika, regarding comment 9, it seems like we could correlate hangs across > processes reasonably well just using timestamps, no? We could either send > the timestamps with the ping or add some kind of correlatedHangId to hangs > based on timestamps when we're building the ping. Thoughts? I'm honestly not sure. Perhaps? I haven't touched the code much recently. If you have an idea for how to implement it go for it :-)
Updated•6 years ago
|
Updated•4 years ago
|
Comment 13•3 years ago
|
||
I just ran into this and got a profile (link). It doesn't look tremendously useful - the content process is busy responding to a memory-pressure
event during the window where we're blocking on get_accChild
. The only conclusion I can draw here is "sometimes content processes are hung and get_accChild
means the parent process can hang too," which brings us back to the suggestion in the bottom paragraph of comment 9.
That being said, I don't know why accessibility code is even running on my machine? It may have something to do with the fact that I have a drawing tablet plugged in? James, should this code be running for me if that's the case, and I haven't done anything else on my machine to set up accessibility utilities? The tablet is pretty common, just a Wacom Intuos something or other.
Comment 14•3 years ago
|
||
(In reply to Doug Thayer [:dthayer] (he/him) from comment #13)
"sometimes content processes are hung and
get_accChild
means the parent process can hang too," which brings us back to the suggestion in the bottom paragraph of comment 9.
The a11y team has a (massive) project underway to completely re-architect cross-process a11y to deal with this among other things. The idea is to cache the a11y trees for all content processes in the parent process; see bug 1694563. Unfortunately, it keeps getting de-prioritised due to other work and too few people. :(
That being said, I don't know why accessibility code is even running on my machine? It may have something to do with the fact that I have a drawing tablet plugged in? James, should this code be running for me if that's the case, and I haven't done anything else on my machine to set up accessibility utilities?
The tablet is very likely the cause, yes. However, bug 1687535 dealt with a lot of cases of unwanted instantiation like this. I'm not sure why that tweak isn't working in your case, and without being able to debug on the machine itself, I can't really diagnose that. If you're familiar with Windows debugging, you could try placing a breakpoint on mozilla:a11y::LazyInstantiator::ResolveMsaaRoot. I'd be interested to know what the stack is (and I made need arguments from some functions, so a minidump would be very helpful).
Also, what does about:support show for "Accessibility instantiator"?
Comment 16•2 years ago
|
||
Ack! Sorry for dropping the thread here. Anyway, I can no longer reproduce this. However, the request sounded really familiar so I did some digging, and I found a chat thread with aklotz from a year ago where he asked for the same info, and my response was "about:support says the Accessbility Instantiator is UNKNOWN|
" and "I couldn't see anything in the stack - it all looked pretty generic?". Then aklotz said "hmmm... interesting...", and that's the end of the thread... 🤷
Updated•2 years ago
|
Comment 17•1 year ago
|
||
This is resolved by Cache the World, which is enabled by default in Firefox 113.
Description
•