Open Bug 1396039 Opened 3 years ago Updated 2 years ago
_acc Child hangs somewhat frequently
This is a (hopefully stable-ish) view into some of the hang's we're seeing in the parent process due to acc_getChild. It makes up about 5% of the parent process main thread hangs over 2 seconds long being detected by BHR. https://squarewave.github.io/?durationSpec=2048_65536&payloadID=131360179ef146ab85e01052d16ebe89&platform=Windows_NT%3A10%3Ax86-64&range=4_4&runnable=%3F%3F%3F&search=get_acc&thread=0
Putting this on our e10s a11y ship radar.
Priority: -- → P1
Whiteboard: [bhr] → [aes?][bhr]
Adding the [bhr-html] tag to match arewesmoothyet's File a Bug autofilling. Nika, what are you using the [bhr] tag for? Should we consolidate these tags? I'm using [bhr-html] to track things that were filed due to arewesmoothyet.
Whiteboard: [aes?][bhr] → [aes?][bhr][bhr-html]
(In reply to Doug Thayer [:dthayer] from comment #2) > Adding the [bhr-html] tag to match arewesmoothyet's File a Bug autofilling. > Nika, what are you using the [bhr] tag for? Should we consolidate these > tags? I'm using [bhr-html] to track things that were filed due to > arewesmoothyet. I've been using [bhr] to mark performance issues which have been detected through BHR, although I haven't been super consistent about it. The idea was to have some way to track performance issues which cause hangs in a similar way to how we tracked [qf] bugs.
Jamie just a needinfo for your awareness. Wondering if this is related to freezes Mick was seeing.
Actually, Nika, is this still showing up? I could find it in recent report data. I do get some hits for "Accessible" in the call stack though...
This definitely still shows up - not sure if the volume is still comparable to the old one - we seem to get (from nightly) 50-200 reports of >2s hangs in get_accChild per release in the last week. http://arewesmoothyet.com/?category=all&durationSpec=2048_65536&payloadID=76f882ef7b6343269088e1acee7ef0a0&search=get_accChild&thread=1
I can't read these reports due to inaccessibility. It'd be good if someone could give me an example of a stack where this occurs. The only reason I can think that this would occur is if the content process is busy handling other stuff.
This stack is still very frequently hanging, yesterday it caused an accumulated 2.8 hours of hang in over 15k occurrences in Nightly alone. This is the most frequently hanging stack: mozilla::a11y::AccessibleHandler::get_accChild(tagVARIANT,IDispatch * *) AccessibleHandler static struct already_AddRefed<IDispatch> GetProxiedAccessibleInSubtree(const class mozilla::a11y::DocAccessibleParent *, const struct tagVARIANT & const) xul struct already_AddRefed<IAccessible> mozilla::a11y::AccessibleWrap::GetRemoteIAccessibleFor(const struct tagVARIANT & const) xul mozilla::a11y::AccessibleWrap::GetIAccessibleFor(tagVARIANT const &,bool *) xul mozilla::a11y::AccessibleWrap::get_accChild(tagVARIANT,IDispatch * *) xul AccWrap_Base::get_accChild(tagVARIANT,IDispatch * *) oleacc AccessibleObjectFromEvent oleacc CThreadInputMgr::OnAccFocusEvent(HWND__ *,long,long) msctf WinEventProc(HWINEVENTHOOK__ *,unsigned long,HWND__ *,long,long,unsigned long,unsigned long) msctf _ClientCallWinEventProc user32 KiUserCallbackDispatcherContinue ntdll NtUserNotifyWinEvent win32u NotifyWinEvent user32 mozilla::a11y::DocAccessibleParent::RecvFocusEvent(unsigned __int64 const &,mozilla::gfx::IntRectTyped<mozilla::LayoutDevicePixel> const &) xul mozilla::a11y::PDocAccessibleParent::OnMessageReceived(IPC::Message const &) xul PDocAccessible::Msg_FocusEvent void mozilla::ipc::MessageChannel::DispatchMessageW(class IPC::Message *) xul mozilla::ipc::MessageChannel::MessageTask::Run() xul nsThread::ProcessNextEvent(bool,bool *) xul Jamie, does this help you in any way? It would be great to see this reduced.
2 years ago
Whiteboard: [aes+][bhr][bhr-html] → [aes+][bhr][bhr-html][fxperf]
Are these hangs still > 2 seconds as suggested in comment 0? Do we have any idea of the average hang time, etc.? From this stack, it looks like the focus changes in content, we fire an accessibility focus event, and then Microsoft Text Services Framework (used for IMEs) wants to query info about the focus (which seems perfectly reasonable). Because the focus is in content, it has to make a cross-process query. That's where the hang seems to occur. That suggests that the content main thread is hung for some reason which may not even be related to accessibility. In order to really figure out what's oging on here, we'd need to correlate this with the stack at that same time from the content process being waited on, but I'm guessing there's no way for us to get that info? One idea is to send up a COM proxy (along with cached handler data) for the focus along with the focus event. That way, the info TSF needs would hopefully already be there and not require a cross-proc call. I'm not sure whether this is possible, though; it would need investigation. Unfortunately, there's also no guarantee it would work: it's entirely possible TSF will query for something that isn't in the cache, at which point we're back to square 1. And the worst part is that there's no way to even know what TSF will ask for in these situations without being able to reproduce this ourselves.
Yes, there are definitely still cases where this is hanging more than 2 seconds. I'm not sure if just averaging out the number I posted above is helpful here, in the > 2s hang case the mean time seems to be 4s, in the 0.5 - 2s range the mean is about 900ms.
Nika, regarding comment 9, it seems like we could correlate hangs across processes reasonably well just using timestamps, no? We could either send the timestamps with the ping or add some kind of correlatedHangId to hangs based on timestamps when we're building the ping. Thoughts?
(In reply to Doug Thayer [:dthayer] from comment #11) > Nika, regarding comment 9, it seems like we could correlate hangs across > processes reasonably well just using timestamps, no? We could either send > the timestamps with the ping or add some kind of correlatedHangId to hangs > based on timestamps when we're building the ping. Thoughts? I'm honestly not sure. Perhaps? I haven't touched the code much recently. If you have an idea for how to implement it go for it :-)
Whiteboard: [aes+][bhr][bhr-html][fxperf] → [aes+][bhr][bhr-html][fxperf:p3]
You need to log in before you can comment on or make changes to this bug.