Significant increase of WM_GETOBJECT handling failures

RESOLVED FIXED in mozilla32

Status

()

defect
--
major
RESOLVED FIXED
5 years ago
4 years ago

People

(Reporter: MarcoZ, Assigned: jimm)

Tracking

({regression})

Trunk
mozilla32
x86_64
Windows 8.1
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(1 attachment, 2 obsolete attachments)

This is a very recent regression, introduced in the May 21 nightly build.
I can reproduce this all over the place, on Bugzilla, Facebook, MDN, anywhere. The steps are:

1. Bring up any page with NVDA running.
2. Start navigating downwards through the elements.

Result: Often, when on links, but also on some form fields, NVDA will lose focus, and will speak the name of the Firefox window, followed by the word "unknown", indicating it landed on an unknown accessible. One has to press tab or shift+tab to get Firefox back in working order.

This also happened while I was tabbing through the form while filing this bug, for example on the "Severity" combo box.

Suspected regression range:
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=cb9f34f73ebe&tochange=9d8d16695f6a
It's be good to have bisection to find a regression. I don't really have ideas what can be wrong. We didn't have new intermittent focus failures so that's something NVDA (screen reader) specific.
Jamie, could you take a look and see if you can find out where the source of that focus event is that NVDA is getting? I can only do a bisect tomorrow, but perhaps if you investigated this issue from your end first, assuming you can reproduce it, will already give us a hint as to what's wrong here.
Flags: needinfo?(jamie)
As soon as this happens, developer info for the "unknown" looks like this:
INFO - globalCommands.GlobalCommands.script_navigatorObject_devInfo (19:49:09):
Developer info for navigator object:
name: None
role: ROLE_UNKNOWN
states: 
isFocusable: False
hasFocus: False
Python object: <NVDAObjects.IAccessible.IAccessible object at 0x04CE1A10>
Python class mro: (<class 'NVDAObjects.IAccessible.IAccessible'>, <class 'NVDAObjects.window.Window'>, <class 'NVDAObjects.NVDAObject'>, <class 'baseObject.ScriptableObject'>, <class 'baseObject.AutoPropertyObject'>, <type 'object'>)
description: None
location: None
value: None
appModule: <'firefox' (appName u'firefox', process ID 5276) at address 4c3c990>
appModule.productName: u'Nightly'
appModule.productVersion: u'32.0a1'
TextInfo: <class 'NVDAObjects.NVDAObjectTextInfo'>
windowHandle: 853060
windowClassName: u'MozillaWindowClass'
windowControlID: 0
windowStyle: 399441920
windowThreadID: 5848
windowText: u'1014673 \u2013 Frequent loss of focus when navigating around web pages with NVDA - Nightly'
displayText: exception: 'NoneType' object is not iterable
IAccessibleObject: <POINTER(IAccessible) ptr=0xe4071d0 at 10157530>
IAccessibleChildID: -487324912
IAccessible event parameters: windowHandle=853060, objectID=-4, childID=-487324912
IAccessible accName: exception: (-2147024809, 'Falscher Parameter.', (None, None, None, 0, None))
IAccessible accRole: exception: (-2147024809, 'Falscher Parameter.', (None, None, None, 0, None))
IAccessible accState: exception: (-2147024809, 'Falscher Parameter.', (None, None, None, 0, None))
IAccessible accDescription: exception: (-2147024809, 'Falscher Parameter.', (None, None, None, 0, None))
IAccessible accValue: exception: (-2147024809, 'Falscher Parameter.', (None, None, None, 0, None))

By comparison, a successful call of NVDA+F1 in the same Firefox session brings up this info:

INFO - globalCommands.GlobalCommands.script_navigatorObject_devInfo (19:51:29):
Developer info for navigator object:
name: u'http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=cb9f34f73ebe&tochange=9d8d16695f6a'
role: ROLE_LINK
states: STATE_FOCUSABLE, STATE_LINKED, STATE_FOCUSED, STATE_VISITED
isFocusable: True
hasFocus: True
Python object: <NVDAObjects.IAccessible.mozilla.Mozilla object at 0x04C3C370>
Python class mro: (<class 'NVDAObjects.IAccessible.mozilla.Mozilla'>, <class 'NVDAObjects.IAccessible.IAccessible'>, <class 'NVDAObjects.window.Window'>, <class 'NVDAObjects.NVDAObject'>, <class 'baseObject.ScriptableObject'>, <class 'baseObject.AutoPropertyObject'>, <type 'object'>)
description: u''
location: (41, 421, 926, 19)
value: u'http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=cb9f34f73ebe&tochange=9d8d16695f6a'
appModule: <'firefox' (appName u'firefox', process ID 5276) at address 4c3c990>
appModule.productName: u'Nightly'
appModule.productVersion: u'32.0a1'
TextInfo: <class 'NVDAObjects.IAccessible.IA2TextTextInfo'>
windowHandle: 853060L
windowClassName: u'MozillaWindowClass'
windowControlID: 0
windowStyle: 399441920
windowThreadID: 5848
windowText: u'1014673 \u2013 Frequent loss of focus when navigating around web pages with NVDA - Nightly'
displayText: u''
IAccessibleObject: <POINTER(IAccessible2) ptr=0xdd0a0c4 at 4c368a0>
IAccessibleChildID: 0
IAccessible event parameters: windowHandle=853060, objectID=-4, childID=-487321328
IAccessible accName: u'http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=cb9f34f73ebe&tochange=9d8d16695f6a'
IAccessible accRole: ROLE_SYSTEM_LINK
IAccessible accState: STATE_SYSTEM_TRAVERSED, STATE_SYSTEM_FOCUSED, STATE_SYSTEM_FOCUSABLE, STATE_SYSTEM_LINKED, STATE_SYSTEM_VALID (13631492)
IAccessible accDescription: u''
IAccessible accValue: u'http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=cb9f34f73ebe&tochange=9d8d16695f6a'
IAccessible2 windowHandle: 853060
IAccessible2 uniqueID: -487321328
IAccessible2 role: ROLE_SYSTEM_LINK
IAccessible2 states: IA2_STATE_SELECTABLE_TEXT, IA2_STATE_OPAQUE (5120)
IAccessible2 attributes: u'margin-left:0px;text-align:start;draggable:true;text-indent:0px;margin-right:0px;tag:a;margin-top:0px;margin-bottom:0px;display:inline;'
Additional info: my wife just confirmed that there is nothing particular happening visually when such a focus loss occurs. It looks to her like the keyboard focus landed on the item it is supposed to. But NVDA loses focus. The faster one switches from one focused item to the next, the more likely focus loss occurs.
Uh oh. Not again. :(

This is almost certainly because WM_GETOBJECT is not being handled by Gecko a11y, so we're getting a generic client accessible from oleacc. The result is the same as bug 677883 and bug 599814. (Btw, NVDA users still regularly complain about bug 599814, but it's difficult to pin down.)

For future reference, whenever you see the following symptoms in Firefox with NVDA, it is almost certainly due to WM_GETOBJECT failure:
1. NVDA reports "unknown" (but not defunct).
2. NVDA developer info reports that IAccessibleObject is an IAccessible instead of an IAccessible2.
Failure:
IAccessibleObject: <POINTER(IAccessible) ptr=0xe4071d0 at 10157530>
Normal:
IAccessibleObject: <POINTER(IAccessible2) ptr=0xdd0a0c4 at 4c368a0>
3. NVDA developer info reports a non-0 IAccessibleChildID.
Failure:
IAccessibleChildID: -487324912
Normal:
IAccessibleChildID: 0
Flags: needinfo?(jamie)
Thanks Jamie! Adjusting the summary accordingly. I also asked on the Release Drivers if something called OMTC that was recently enabled for Windows builds could have something to do with these failures. I know for a fact that there was nothing in our MSAA code recently that changed, which could account for this massive increase in failures that I am experiencing on two machines.
Summary: Frequent loss of focus when navigating around web pages with NVDA → Significant increase of WM_GETOBJECT handling failures
It should be some ipc code change. We need to know what bug caused a regression.
cc'ing Bas as bug 899785 assignee
Could this be related to bug 933733?
Flags: needinfo?(jmathies)
Additional info: Setting layers.offmainthreadcomposition.enabled to "false" in about:config also makes the problem go away. So this is directly related to OMTC indeed.
(In reply to Bas Schouten (:bas.schouten) from comment #10)
> Could this be related to bug 933733?

Do either of the threads involved in omtc create a chromium TYPE_MOZILLA_UI loop? If so the problem could be deferred message handling in WindowsMessageLoop.
Flags: needinfo?(jmathies)
(In reply to Jim Mathies [:jimm] from comment #12)
> (In reply to Bas Schouten (:bas.schouten) from comment #10)
> > Could this be related to bug 933733?
> 
> Do either of the threads involved in omtc create a chromium TYPE_MOZILLA_UI
> loop?

How to check that?

> If so the problem could be deferred message handling in
> WindowsMessageLoop.

can you describe please how all it works? I think either of a11y devs has certain lack of knowledge in the area.
(In reply to Jim Mathies [:jimm] from comment #12)
> (In reply to Bas Schouten (:bas.schouten) from comment #10)
> > Could this be related to bug 933733?
> 
> Do either of the threads involved in omtc create a chromium TYPE_MOZILLA_UI
> loop? If so the problem could be deferred message handling in
> WindowsMessageLoop.

Well... the -main- thread is involved with OMTC? That presumably is a MOZILLA_UI loop. The non-main-thread loops shouldn't be. I wrote a patch for IPC for that.
(In reply to alexander :surkov from comment #13)
> > Do either of the threads involved in omtc create a chromium TYPE_MOZILLA_UI
> > loop?
> 
> How to check that?

Running a debug build with an ally client connected might show some useful errors if the problem is in the deferred message handling. It'll dump error info if it's trying to handle windowing events that it can't support. (WM_GETOBJECT isn't currently supported.)

What we may have to do is disable omtc when an ally client is connected, similar to what we discussed about disabling e10s.
> What we may have to do is disable omtc when an ally client is connected,
> similar to what we discussed about disabling e10s.

don't the grafix people want to kill off support for non omtc though?
(In reply to Trevor Saunders (:tbsaunde) from comment #16)
> > What we may have to do is disable omtc when an ally client is connected,
> > similar to what we discussed about disabling e10s.
> 
> don't the grafix people want to kill off support for non omtc though?

No idea. Bas?

Before we go this route though we should figure out what's going wrong here. Might be an easy fix.
(In reply to Jim Mathies [:jimm] from comment #17)
> (In reply to Trevor Saunders (:tbsaunde) from comment #16)
> > > What we may have to do is disable omtc when an ally client is connected,
> > > similar to what we discussed about disabling e10s.
> > 
> > don't the grafix people want to kill off support for non omtc though?
> 
> No idea. Bas?
> 
> Before we go this route though we should figure out what's going wrong here.
> Might be an easy fix.

Jim, do you have cycles to look at this? (Are you the right person?)
(In reply to David Bolter [:davidb] from comment #18)
> (In reply to Jim Mathies [:jimm] from comment #17)
> > (In reply to Trevor Saunders (:tbsaunde) from comment #16)
> > > > What we may have to do is disable omtc when an ally client is connected,
> > > > similar to what we discussed about disabling e10s.
> > > 
> > > don't the grafix people want to kill off support for non omtc though?
> > 
> > No idea. Bas?
> > 
> > Before we go this route though we should figure out what's going wrong here.
> > Might be an easy fix.
> 
> Jim, do you have cycles to look at this? (Are you the right person?)

Yeah I've been lent out to Bas help on omtc where needed. It would be great if someone could walk me through getting an ally client (preferably something build into windows vs. third party) running that reproduces the issue.
(In reply to Jim Mathies [:jimm] from comment #19)
> It would be great
> if someone could walk me through getting an ally client (preferably
> something build into windows vs. third party) running that reproduces the
> issue.

Unfortunately, Narrator (built into Windows) is not an option because it doesn't use MSAA/Iaccessible2 to access stuff. Your best bet is to get NVDA and use that.

1. Go to http://www.nvda-project.org and click the Download link.
2. Skip the donation part and go straight for the current release 2014.2.
3. Install it. Make sure the "Run NVDA on the logon screen" is unchecked in the installer.
4. After installation, start NVDA or let it start itself.
5. Go to Firefox.
6. Press CTRL+L or double-click the address bar and go to any page with a bit of content. This bug is good for that.
7. If NVDA already speaks something about "unknown" at this point, which it sometimes does, you've already reproduced the problem.
If all goes well, NVDA will start reading the page top to botom.
8. Press Control to stop it.
9. Now, you are in what NVDA calls browse mode. Your up and down arrows will walk the web page as if it was a Word document. Start pressing DownArrow and listen a bit to what NVDA says. Then continue.
10. At some point, depending a bit on timing, NVDA will get to a point where up and down arrow don't react any more. It will have said "unknown" by then.
Bingo, you're there! You have to force a focus change by pressing tab to get functionality back, and can then continue to up or down arrow. Until you again run into this issue. Which usually happens frequently on a single page.

You can shut down NVDA by right-clicking its system tray icon from the overflow area.
Running nvda shows that we are spewing hundreds of unhandled WM_GETOBJECT calls. Not sure if these can get deferred or not.
Most of the stacks look like the one below where we are requesting a sync layers update.

The WM_GETOBJECT call retrieves our IAccessible pointer so we can't defer these.

We could pass these calls through but if there's any chance they can trigger painting we won't be able to. Honestly I'm not sure what might result from these calls, can the ally folks help clear that up?

xul.dll!`anonymous namespace'::ProcessOrDeferMessage(HWND__ * hwnd=0x000205b4, unsigned int uMsg=49961, unsigned int wParam=4692, long lParam=1) Line 316	C++
xul.dll!NeuteredWindowProc(HWND__ * hwnd=0x000205b4, unsigned int uMsg=49961, unsigned int wParam=4692, long lParam=1) Line 355	C++
user32.dll!_InternalCallWinProc@20()	Unknown
user32.dll!_UserCallWinProcCheckWow@32()	Unknown
user32.dll!_DispatchClientMessage@24()	Unknown
user32.dll!___fnDWORD@4()	Unknown
ntdll.dll!_KiUserCallbackDispatcher@12()	Unknown
user32.dll!_NtUserPeekMessage@20()	Unknown
user32.dll!__PeekMessage@24()	Unknown
user32.dll!_PeekMessageW@20()	Unknown
xul.dll!mozilla::ipc::MessageChannel::WaitForSyncNotify() Line 857	C++
xul.dll!mozilla::ipc::MessageChannel::SendAndWait(IPC::Message * aMsg=0x15259000, IPC::Message * aReply=0x0016cf38) Line 674	C++
xul.dll!mozilla::ipc::MessageChannel::Send(IPC::Message * aMsg=0x15259000, IPC::Message * aReply=0x0016cf38) Line 588	C++
xul.dll!mozilla::layers::PLayerTransactionChild::SendUpdate(const nsTArray<mozilla::layers::Edit> & cset={...}, const mozilla::layers::TargetConfig & targetConfig={...}, const bool & isFirstPaint=false, const bool & scheduleComposite=true, const unsigned int & paintSequenceNumber=7, nsTArray<mozilla::layers::EditReply> * reply=0x0016ea44) Line 234	C++
xul.dll!mozilla::layers::ShadowLayerForwarder::EndTransaction(nsTArray<mozilla::layers::EditReply> * aReplies=0x0016ea44, const nsIntRegion & aRegionToClear={...}, bool aScheduleComposite=true, unsigned int aPaintSequenceNumber=7, bool * aSent=0x0016ec0f) Line 567	C++
xul.dll!mozilla::layers::ClientLayerManager::ForwardTransaction(bool aScheduleComposite=true) Line 425	C++
xul.dll!mozilla::layers::ClientLayerManager::EndTransaction(void (mozilla::layers::ThebesLayer *, gfxContext *, const nsIntRegion &, mozilla::layers::DrawRegionClip, const nsIntRegion &, void *) * aCallback=0x54d7af84, void * aCallbackData=0x0016edb8, mozilla::layers::LayerManager::EndTransactionFlags aFlags=END_DEFAULT) Line 237	C++
xul.dll!nsDisplayList::PaintForFrame(nsDisplayListBuilder * aBuilder=0x0016edb8, nsRenderingContext * aCtx=0x00000000, nsIFrame * aForFrame=0x0aad7280, unsigned int aFlags=13) Line 1371	C++
xul.dll!nsDisplayList::PaintRoot(nsDisplayListBuilder * aBuilder=0x0016edb8, nsRenderingContext * aCtx=0x00000000, unsigned int aFlags=13) Line 1212	C++
xul.dll!nsLayoutUtils::PaintFrame(nsRenderingContext * aRenderingContext=0x00000000, nsIFrame * aFrame=0x0aad7280, const nsRegion & aDirtyRegion={...}, unsigned int aBackstop=0, unsigned int aFlags=772) Line 2914	C++
xul.dll!PresShell::Paint(nsView * aViewToPaint=0x0aace0a8, const nsRegion & aDirtyRegion={...}, unsigned int aFlags=1) Line 5917	C++
(In reply to Jim Mathies [:jimm] from comment #22)
> Most of the stacks look like the one below where we are requesting a sync
> layers update.
> 
> The WM_GETOBJECT call retrieves our IAccessible pointer so we can't defer
> these.

right, WM_GETOBJECT have to be processed sync

> We could pass these calls through but if there's any chance they can trigger
> painting we won't be able to.

in general a11y shouldn't trigger painting. Can we add some assertions to handle it if we do that?

> Honestly I'm not sure what might result from
> these calls, can the ally folks help clear that up?

is it complete stack? I don't see a11y involved there
(In reply to Jim Mathies [:jimm] from comment #22)
> Most of the stacks look like the one below where we are requesting a sync
> layers update.
> 
> The WM_GETOBJECT call retrieves our IAccessible pointer so we can't defer
> these.
> 
> We could pass these calls through but if there's any chance they can trigger
> painting we won't be able to. Honestly I'm not sure what might result from
> these calls, can the ally folks help clear that up?

I'd need to do a bit of auditing, and of course asserts would be nice, but I believe the code at widget/windows/nsWindow.cpp:5203 should never call into layout.  If it does we could probably add some sort of try less hard mode that doesn't touch layout if this is the best approach here.
(In reply to Trevor Saunders (:tbsaunde) from comment #24)
> (In reply to Jim Mathies [:jimm] from comment #22)
> > Most of the stacks look like the one below where we are requesting a sync
> > layers update.
> > 
> > The WM_GETOBJECT call retrieves our IAccessible pointer so we can't defer
> > these.
> > 
> > We could pass these calls through but if there's any chance they can trigger
> > painting we won't be able to. Honestly I'm not sure what might result from
> > these calls, can the ally folks help clear that up?
> 
> I'd need to do a bit of auditing, and of course asserts would be nice, but I
> believe the code at widget/windows/nsWindow.cpp:5203 should never call into
> layout.  If it does we could probably add some sort of try less hard mode
> that doesn't touch layout if this is the best approach here.

It would call into dom though right? Couldn't that trigger a layout flush? This is pretty gray here, I'm not sure what *might* go wrong if you let the gecko thread run around in platform code while there's a sync send on an ipc stack.
(In reply to Jim Mathies [:jimm] from comment #25)
> (In reply to Trevor Saunders (:tbsaunde) from comment #24)
> > (In reply to Jim Mathies [:jimm] from comment #22)
> > > Most of the stacks look like the one below where we are requesting a sync
> > > layers update.
> > > 
> > > The WM_GETOBJECT call retrieves our IAccessible pointer so we can't defer
> > > these.
> > > 
> > > We could pass these calls through but if there's any chance they can trigger
> > > painting we won't be able to. Honestly I'm not sure what might result from
> > > these calls, can the ally folks help clear that up?
> > 
> > I'd need to do a bit of auditing, and of course asserts would be nice, but I
> > believe the code at widget/windows/nsWindow.cpp:5203 should never call into
> > layout.  If it does we could probably add some sort of try less hard mode
> > that doesn't touch layout if this is the best approach here.
> 
> It would call into dom though right? Couldn't that trigger a layout flush?

I don't *think* we need to ask dom code to do anything significant either to answer WM_GETOBJECT.

> This is pretty gray here, I'm not sure what *might* go wrong if you let the
> gecko thread run around in platform code while there's a sync send on an ipc
> stack.

yeah, I imagine there'd be plenty of code that if its called in this context will cause the world to end, but I'm not sure exactly what is included in that set.
Posted patch test pass through (obsolete) — Splinter Review
Here's a patch that passes this call through. Running this doesn't have any obvious side effects, at least I didn't get any channel aborts.
(In reply to Trevor Saunders (:tbsaunde) from comment #26)

> > It would call into dom though right? Couldn't that trigger a layout flush?
> 
> I don't *think* we need to ask dom code to do anything significant either to
> answer WM_GETOBJECT.

you should keep in mind that AT may do sync calls into a11y right after WM_GETOBJECT return.

We don't usually flush layout on a11y calls but that could happen I guess but if it does then I think it can be fixed on case by case basis.
Comment on attachment 8429450 [details] [diff] [review]
test pass through

Review of attachment 8429450 [details] [diff] [review]:
-----------------------------------------------------------------

::: widget/xpwidgets/PuppetWidget.cpp
@@ +653,5 @@
>      return NS_OK;
>    }
>  
>    if (mTabChild &&
> +      !mTabChild->SendSetCursor(aCursor, mUpdateCursor)) {

ignore this last bit which should be in another patch in my queue.
Attachment #8429450 - Flags: feedback?(marco.zehe)
This might not be such a bad situation: For one thing, the get object call is just a query, so we're just returning a ptr there [1]. The pass through call I'm adding looks safe afaict.

http://mxr.mozilla.org/mozilla-central/source/widget/windows/nsWindow.cpp#5225

Ally should do all its dom accessing on the gecko thread, which is currently wrapped up in this sync ipc call. So ally code shouldn't execute further calls until the ipc stack unwinds from the current WaitForSyncNotify call.

But I wonder if my assumption about ally is correct? Does it do everything on the gecko thread?
Posted patch better patch v.1 (obsolete) — Splinter Review
Assignee: nobody → jmathies
Attachment #8429450 - Attachment is obsolete: true
Attachment #8429450 - Flags: feedback?(marco.zehe)
Attachment #8429629 - Flags: feedback?(bent.mozilla)
Comment on attachment 8429629 [details] [diff] [review]
better patch v.1

I can confirm that this patch fixes the problem in a local build. That local build reproduces the problem without the patch.
Is there any chance that this will improve Gecko WM_GETOBJECT failures where there are intensive Flash applets (bug 599814)?
(In reply to James Teh [:Jamie] from comment #33)
> Is there any chance that this will improve Gecko WM_GETOBJECT failures where
> there are intensive Flash applets (bug 599814)?

Yes it should.
Attachment #8429629 - Flags: feedback?(bent.mozilla) → review?(bent.mozilla)
Comment on attachment 8429629 [details] [diff] [review]
better patch v.1

Review of attachment 8429629 [details] [diff] [review]:
-----------------------------------------------------------------

::: ipc/glue/WindowsMessageLoop.cpp
@@ +161,5 @@
>    }
>  }
>  
> +static void
> +DumpNeuteredMessage(HWND hwnd, UINT uMsg)

I don't really understand why this was moved, it doesn't look like we ever call this in more than one place. Just to make the ProcessOrDeferMessage function smaller?

@@ +323,2 @@
>  #endif
> +      } else if (uMsg == WM_GETOBJECT) {

This should go into the case blocks above, not here in the default block. (The default block is just for message values we can't know at compile time).

@@ +326,5 @@
> +        // IAccessible pointer. This should be safe, and needs to be sync.
> +        DWORD objId = static_cast<DWORD>(lParam);
> +        WNDPROC oldWndProc = (WNDPROC)GetProp(hwnd, kOldWndProcProp);
> +        if (objId == OBJID_CLIENT && oldWndProc) {
> +          return CallWindowProcW(oldWndProc, hwnd, uMsg, wParam, lParam);

This is pretty scary, especially since we have no idea what it will do beyond handing back an IAccessible. As asurkov notes in comment 28 there can be other synchronous calls here...

The only way I'm really comfortable with this is to add release-mode aborts to all a11y code paths to ensure that we don't call into DOM/Layout when this is on the stack. Thankfully that can be checked using mozilla::ipc::MessageChannel::IsPumpingMessages().
Attachment #8429629 - Flags: review?(bent.mozilla) → review-
Comment on attachment 8429629 [details] [diff] [review]
better patch v.1

Review of attachment 8429629 [details] [diff] [review]:
-----------------------------------------------------------------

::: ipc/glue/WindowsMessageLoop.cpp
@@ +323,2 @@
>  #endif
> +      } else if (uMsg == WM_GETOBJECT) {

Also, can we make sure that we only do this for one of our nsWindow windows? (i.e. not a Flash window, QT window, etc.). Otherwise we *really* have no way of knowing what will happen or how we'll be re-entered.
(In reply to ben turner [:bent] (use the needinfo? flag!) from comment #35)
> Comment on attachment 8429629 [details] [diff] [review]
> better patch v.1
> 
> Review of attachment 8429629 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: ipc/glue/WindowsMessageLoop.cpp
> @@ +161,5 @@
> >    }
> >  }
> >  
> > +static void
> > +DumpNeuteredMessage(HWND hwnd, UINT uMsg)
> 
> I don't really understand why this was moved, it doesn't look like we ever
> call this in more than one place. Just to make the ProcessOrDeferMessage
> function smaller?
> 
> @@ +323,2 @@
> >  #endif
> > +      } else if (uMsg == WM_GETOBJECT) {
> 
> This should go into the case blocks above, not here in the default block.
> (The default block is just for message values we can't know at compile time).
> 
> @@ +326,5 @@
> > +        // IAccessible pointer. This should be safe, and needs to be sync.
> > +        DWORD objId = static_cast<DWORD>(lParam);
> > +        WNDPROC oldWndProc = (WNDPROC)GetProp(hwnd, kOldWndProcProp);
> > +        if (objId == OBJID_CLIENT && oldWndProc) {
> > +          return CallWindowProcW(oldWndProc, hwnd, uMsg, wParam, lParam);
> 
> This is pretty scary, especially since we have no idea what it will do
> beyond handing back an IAccessible. As asurkov notes in comment 28 there can
> be other synchronous calls here...

I don't think there's an issue here provided calls that come into a11y are using the main thread. I asked this in comment 30. Maybe surkov can confirm.

> The only way I'm really comfortable with this is to add release-mode aborts
> to all a11y code paths to ensure that we don't call into DOM/Layout when
> this is on the stack. Thankfully that can be checked using
> mozilla::ipc::MessageChannel::IsPumpingMessages().

afaict the only call that needs protection is the widget call into GetAccessibleOrDescendant.

http://mxr.mozilla.org/mozilla-central/source/accessible/src/generic/DocAccessible.cpp#1318
Flags: needinfo?(surkov.alexander)
(In reply to ben turner [:bent] (use the needinfo? flag!) from comment #36)
> Also, can we make sure that we only do this for one of our nsWindow windows?
> (i.e. not a Flash window, QT window, etc.). Otherwise we *really* have no
> way of knowing what will happen or how we'll be re-entered.

Sure, sounds like a good added protection.
a11y works in main thread (presumably it's same thing as gecko thread, right?)
Flags: needinfo?(surkov.alexander)
(In reply to alexander :surkov from comment #39)
> a11y works in main thread (presumably it's same thing as gecko thread,
> right?)

Yes on Windows. I was tinking this must be the case. "Sync calls" from a client should (do?) come in via a COM thread or window message that gets picked up by a com component running in our process.

Bent, with this being the case, there's little risk here. The WM_GETOBJECT call will get processed and return the pointer. After that, a11y shouldn't make any new calls until the gecko thread unwinds out of the current waitfornotify call.
(In reply to Jim Mathies [:jimm] from comment #37)
> I don't think there's an issue here provided calls that come into a11y are
> using the main thread.

I'm not worried about which thread we're running code on. The thing I'm worried about is what windows will do with the pointer we hand out. Are we guaranteed that no actual IAccessible methods will be called reentrantly (maybe inside the LresultFromObject function)? That's what I thought asurkov meant in comment 28. As long as "right after" means in another message then I guess that sounds fine.
(In reply to ben turner [:bent] (use the needinfo? flag!) from comment #41)
> (In reply to Jim Mathies [:jimm] from comment #37)
> > I don't think there's an issue here provided calls that come into a11y are
> > using the main thread.
> 
> I'm not worried about which thread we're running code on. The thing I'm
> worried about is what windows will do with the pointer we hand out. Are we
> guaranteed that no actual IAccessible methods will be called reentrantly
> (maybe inside the LresultFromObject function)? That's what I thought asurkov
> meant in comment 28. As long as "right after" means in another message then
> I guess that sounds fine.

That call looks pretty harmless, looks like all it does is a query interface on the IUnknown pointer.

If on returning from the WM_GETOBJECT call a11y calls were made that triggered ipc traffic then we would get an abort and we could address it. That's no worse than where we are now with a broken browser people can't access using a11y tools.
Posted patch patch v.2Splinter Review
Attachment #8429629 - Attachment is obsolete: true
Attachment #8431726 - Flags: review?(bent.mozilla)
Comment on attachment 8431726 [details] [diff] [review]
patch v.2

Review of attachment 8431726 [details] [diff] [review]:
-----------------------------------------------------------------

This looks great, thanks!
Attachment #8431726 - Flags: review?(bent.mozilla) → review+
https://hg.mozilla.org/mozilla-central/rev/69f12c694630
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla32
Sorry I didn't answer your question, I missed it. Thanks a lot Jim!
Duplicate of this bug: 580644
Depends on: 1143395
Depends on: 1111541
You need to log in before you can comment on or make changes to this bug.