857348 - crash in mozilla::a11y::XULButtonAccessible::ContainsMenu

Reporter

Description

•

12 years ago

similar to one of the other bug reports regarding windows file dialog This bug was filed from the Socorro interface and is report bp-bb4584e0-0f5f-4f8f-b986-dcce12130329 . ============================================================= 0 xul.dll mozilla::a11y::XULButtonAccessible::ContainsMenu accessible/src/xul/XULFormControlAccessible.cpp:233 1 xul.dll mozilla::a11y::XULButtonAccessible::NativeState accessible/src/xul/XULFormControlAccessible.cpp:118 2 xul.dll Accessible::State accessible/src/generic/Accessible.cpp:1440 3 xul.dll AccessibleWrap::get_accState accessible/src/msaa/AccessibleWrap.cpp:468 4 rpcrt4.dll Invoke 5 rpcrt4.dll NdrStubCall2 6 ole32.dll NdrpCreateStub 7 oleaut32.dll CUnivStubWrapper::Invoke 8 ole32.dll SyncStubInvoke

Trevor Saunders (:tbsaunde)

Updated

•

12 years ago

Component: Disability Access → Disability Access APIs

Product: Thunderbird → Core

Version: 17 → unspecified

Trevor Saunders (:tbsaunde)

Comment 1

•

12 years ago

this looks like our old friend the mysterious null mContent that we can't come up with an explaination for...

alexander :surkov (:asurkov)

Comment 2

•

12 years ago

(In reply to Trevor Saunders (:tbsaunde) from comment #1) > this looks like our old friend the mysterious null mContent that we can't > come up with an explaination for... if that was cycle collected object again then cc unlink should call Shutdown (i.e. set up eIsDefunct bit). Trev?

Wayne Mery (:wsmwk)

Reporter

Updated

•

12 years ago

Whiteboard: [tbird crash]

Trevor Saunders (:tbsaunde)

Comment 3

•

12 years ago

(In reply to alexander :surkov from comment #2) > (In reply to Trevor Saunders (:tbsaunde) from comment #1) > > this looks like our old friend the mysterious null mContent that we can't > > come up with an explaination for... > > if that was cycle collected object again then cc unlink should call Shutdown > (i.e. set up eIsDefunct bit). Trev? as I may have said on irc I can't see how we'd be here if the accessible had been unlinked by cc since if the cc decided it was garbage then nobody outside could have held a ref, but clearly they did to make the call into us that crashes.

alexander :surkov (:asurkov)

Comment 4

•

12 years ago

it's reasonable but I don't see other way how we can null out mContent and stay not defunct. Wouldn't it safer/correct to call Shutdown on unlink?

BMO Automation

Updated

•

9 years ago

Crash Signature: [@ mozilla::a11y::XULButtonAccessible::ContainsMenu()] → [@ mozilla::a11y::XULButtonAccessible::ContainsMenu()] [@ mozilla::a11y::XULButtonAccessible::ContainsMenu]

alexander :surkov (:asurkov)

Comment 5

•

7 years ago

still there - 58 [1], still no explanation, how we get run out of a content. 1) It'd be nice to make State/NativeState a const methods, so we are guaranteed that we don't Shutdown() after IsDefunct() call. I don't think it's the case though. 2) We should probably also add a guard that guarantees (or asserts) us that mContent and eIsDefunct flag are always in sync. Jamie, anything else in approaching this? [1] https://crash-stats.mozilla.com/signature/?signature=mozilla%3A%3Aa11y%3A%3AXULButtonAccessible%3A%3AContainsMenu&date=%3E%3D2018-02-06T13%3A20%3A48.000Z&date=%3C2018-02-13T13%3A20%3A48.000Z&_columns=date&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=reason&_columns=address&_columns=install_time&_sort=-date&page=1#reports

Flags: needinfo?(jteh)

James Teh [:Jamie]

Comment 6

•

7 years ago

(In reply to alexander :surkov from comment #5) > 1) It'd be nice to make State/NativeState a const methods, so we are > guaranteed that we don't Shutdown() after IsDefunct() call. I don't think > it's the case though. Agreed that it's unlikely, but the const makes sense. > 2) We should probably also add a guard that guarantees (or asserts) us that > mContent and eIsDefunct flag are always in sync. Any thoughts on where this would be added? > Jamie, anything else in approaching this? Am I right that your suggestion in comment 4 about calling Shutdown on unlink got implemented in 0fe2b3363c9e (bug 1240893)? I took a look at the raw dump in WinDBG. I wanted to inspect mStateFlags and mContent, but unfortunately, those values were unavailable in the dump. I really don't have a clue, but here are a few points for thought: 1. I wonder whether |this| actually got deleted already. Of course, that raises the question of how - that should be impossible - but it's another possible explanation for mContent being null I guess. Or not, maybe... :) 2. I see this is a re-entrant call. A client is fetching a remote (content) accessible, and while that is happening, UIA comes along and tries to query a local XUL accessible. I don't see how that could break anything in this case, but it makes me wonder, for example, whether there's any mutation that doesn't complete fully until a platform a11y call returns. Again, I very much doubt it.

Flags: needinfo?(jteh)

alexander :surkov (:asurkov)

Comment 7

•

7 years ago

(In reply to James Teh [:Jamie] from comment #6) > (In reply to alexander :surkov from comment #5) > > 1) It'd be nice to make State/NativeState a const methods, so we are > > guaranteed that we don't Shutdown() after IsDefunct() call. I don't think > > it's the case though. > > Agreed that it's unlikely, but the const makes sense. ok, filed bug 1438193 > > 2) We should probably also add a guard that guarantees (or asserts) us that > > mContent and eIsDefunct flag are always in sync. > > Any thoughts on where this would be added? If that's the case then it either mContent was null from beginning, or it was nullified without eIsDefunct flag setting on. Both cases are unlikely, because I believe we don't pass null into constructor, and we nullify mContent in Shutdown() only. So at least it makes sense to assert in IsDefunct() if we've got out of sync. If we hit it, then we can dig it into more. > > Jamie, anything else in approaching this? > > Am I right that your suggestion in comment 4 about calling Shutdown on > unlink got implemented in 0fe2b3363c9e (bug 1240893)? yes > I took a look at the raw dump in WinDBG. I wanted to inspect mStateFlags and > mContent, but unfortunately, those values were unavailable in the dump. > > I really don't have a clue, but here are a few points for thought: > 1. I wonder whether |this| actually got deleted already. Of course, that > raises the question of how - that should be impossible - but it's another > possible explanation for mContent being null I guess. Or not, maybe... :) so we killed an object while somebody keeps it, which would point to either UIA bug, which keeps a raw not addrefed pointer, or we kill non cyclely-zero addrefed object? Btw, there's ole32 crash on Firefox 52, which probably doesn't speaks in favor of UIA bug theory. > 2. I see this is a re-entrant call. A client is fetching a remote (content) > accessible, and while that is happening, UIA comes along and tries to query > a local XUL accessible. I don't see how that could break anything in this > case, but it makes me wonder, for example, whether there's any mutation that > doesn't complete fully until a platform a11y call returns. Again, I very > much doubt it. Not sure, I caught the not yet completed mutation case, could you elaborate it a bit please? Also curious if overlapping of these crashes tells anything to you.

James Teh [:Jamie]

Comment 8

•

7 years ago

(In reply to alexander :surkov from comment #7) > So at least it makes sense to assert in > IsDefunct() if we've got out of sync. If we hit it, then we can dig it into > more. Makes perfect sense. Ship it. :) > > 1. I wonder whether |this| actually got deleted already. Of course, that > > raises the question of how - that should be impossible - but it's another > > possible explanation for mContent being null I guess. Or not, maybe... :) > so we killed an object while somebody keeps it, which would point to either > UIA bug, which keeps a raw not addrefed pointer, It's not possible for it to get a pointer without it being addRefed, so the bug would have to be that Release got called inappropriately. Same thing at the end of the day; just clarifying. > or we kill non cyclely-zero addrefed object? Yeah, though it does seem unlikely. > > 2. I see this is a re-entrant call. A client is fetching a remote (content) > > accessible, and while that is happening, UIA comes along and tries to query > > a local XUL accessible. I don't see how that could break anything in this > > case, but it makes me wonder, for example, whether there's any mutation that > > doesn't complete fully until a platform a11y call returns. > Not sure, I caught the not yet completed mutation case, could you elaborate > it a bit please? It's tricky because I don't have any ideas about what this might be, so it's hard to come up with an example. The best example I can think of is bug 1336971. The issue there was that during a call to AccessibleWrap::GetRemoteIAccessibleFor, the array of remote docs mutated while we were walking it. So, we made a COM call for one of the remote docs, and during that call, COM pumped messages. One of those messages somehow caused the remote docs array to mutate. We still don't really understand what caused it, even though a band-aid patch was landed to work around it. As Aaron noted in bug 1336971 comment 2: > I wouldn't expect that to happen given that the parent main thread is blocked on a remote COM call. ie, it shouldn't be able to dispatch any incoming IPDL while waiting for a reply from COM. > OTOH, since the main thread is in the single-threaded apartment, COM is continuously pumping messages while waiting for that reply. If we handle a message in such a way that we inadvertently trigger a nested event loop that ends up dispatching IPDL, then I guess it could happen. So, this raises two questions: 1. What message (or COM call) could possibly be (perhaps indirectly) dispatching ipdl messages during our outer COM call? 2. How could some ipdl message end up causing this non-defunct null mContent behaviour? If an ipdl message arrived telling us to shut down an accessible, well, the defunct bit should have been set. Is there some call that only partially mutates state and then requires a subsequent queued call to finish up?

James Teh [:Jamie]

Comment 9

•

7 years ago

Another disturbing possibility: could something that executes as part of the call to Accessible::State (after the call to IsDefunct) end up triggering a nested event loop which dispatches ipdl? If that were to happen, the accessible might get shut down *after* IsDefunct is called but before Accessible::State returns. That would, unfortunately, also probably mean that making State const wouldn't solve the problem.

alexander :surkov (:asurkov)

Comment 10

•

7 years ago

(In reply to James Teh [:Jamie] from comment #9) > Another disturbing possibility: could something that executes as part of the > call to Accessible::State (after the call to IsDefunct) end up triggering a > nested event loop which dispatches ipdl? If that were to happen, the > accessible might get shut down *after* IsDefunct is called but before > Accessible::State returns. That would, unfortunately, also probably mean > that making State const wouldn't solve the problem. Yep, a good and scary theory. Being unfamiliar to outer calls and inner loops, do 1 and 2 from comment 8 suggest different scenarios from this one? I guess we could set wrap each Accessible class made by MSAA, to set a flag before a call and unset it after, and crash if we hit Shutdown() in between, which is ugly, but option. Is there a single entry point to a triggering a nested event loop, and if so, then can we ban all 'unauthorized' callers threre? I don't know the subject well, so sorry if it sounds vague.

James Teh [:Jamie]

Comment 11

•

7 years ago

(In reply to alexander :surkov from comment #10) > > Another disturbing possibility: could something that executes as part of the > > call to Accessible::State (after the call to IsDefunct) end up triggering a > > nested event loop which dispatches ipdl? If that were to happen, the > > accessible might get shut down *after* IsDefunct is called but before > > Accessible::State returns. > Yep, a good and scary theory. Being unfamiliar to outer calls and inner > loops, do 1 and 2 from comment 8 suggest different scenarios from this one? I'm not that familiar with IPDL dispatch in Gecko either, unfortunately, so a lot of this is partially educated speculation. That said, I'm starting to think my idea from comment 8 isn't possible. As you've noted, I don't see how mContent could become null without a call to Shutdown, which would also set the defunct flag. > I guess we could set wrap each Accessible class made by MSAA, to set a flag > before a call and unset it after, and crash if we hit Shutdown() in between, > which is ugly, but option. I think that might be the only way we're going to track this down. Note that this would have to be state shared across Accessibles, not just the same Accessible instance. We really don't want *any* Accessible being shut down during an accessibility call, since it could be a related accessible (e.g. a child) that we're about to query. > Is there a single entry point to a triggering a nested event loop, and if > so, then can we ban all 'unauthorized' callers threre? Unfortunately, I don't know either. I only know that dispatching IPDL within a nested event loop is perhaps possible because of Aaron's bug 1336971 comment 2. However, even Aaron didn't seem to know what might actually trigger this.

alexander :surkov (:asurkov)

Comment 12

•

7 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/1db3a600ea0139e7fa9d637a3c3930155b236d4d Bug 857348 - assert if defunct state is out of sync with content ref, rs=jamie

Cristina Coroiu [:ccoroiu]

Comment 13

•

7 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/1db3a600ea01

Status: NEW → RESOLVED

Closed: 7 years ago

status-firefox60: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla60

alexander :surkov (:asurkov)

Comment 14

•

7 years ago

not yet fixed, just one assertion from comment #8

Status: RESOLVED → REOPENED

Resolution: FIXED → ---

alexander :surkov (:asurkov)

Comment 15

•

7 years ago

here's another reincarnation of the bug I believe: bug 857339. https://crash-stats.mozilla.com/signature/?signature=mozilla%3A%3Aa11y%3A%3AFocusManager%3A%3AIsFocused&date=%3E%3D2017-12-22T10%3A39%3A21.000Z&date=%3C2018-03-22T12%3A39%3A21.000Z&_columns=date&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=reason&_columns=address&_columns=install_time&_sort=-date&page=1

alexander :surkov (:asurkov)

Updated

•

7 years ago

Blocks: 857339

alexander :surkov (:asurkov)

Comment 16

•

7 years ago

(In reply to alexander :surkov from comment #15) > here's another reincarnation of the bug I believe: bug 857339. not bug 857339, because that bug has os x crash stacks, these bugs just have same signature. > https://crash-stats.mozilla.com/signature/ > ?signature=mozilla%3A%3Aa11y%3A%3AFocusManager%3A%3AIsFocused&date=%3E%3D2017 > -12-22T10%3A39%3A21.000Z&date=%3C2018-03-22T12%3A39%3A21. > 000Z&_columns=date&_columns=product&_columns=version&_columns=build_id&_colum > ns=platform&_columns=reason&_columns=address&_columns=install_time&_sort=- > date&page=1 the latest crash is 58, taking into account that esr is 60 now, then it probably should also go to wontfixes

Priority: -- → P5

alexander :surkov (:asurkov)

Updated

•

7 years ago

Whiteboard: [tbird crash] → [tbird crash] a11y:crash-tree

BugBot [:suhaib / :marco/ :calixte]

Comment 17

•

6 years ago

Closing because no crashes reported for 12 weeks.

Status: REOPENED → RESOLVED

Closed: 7 years ago → 6 years ago

Resolution: --- → WORKSFORME

Bugzilla

crash in mozilla::a11y::XULButtonAccessible::ContainsMenu

Categories

(Core :: Disability Access APIs, defect, P5)

Tracking

()

People

(Reporter: wsmwk, Unassigned)

References

Details

(Keywords: crash, Whiteboard: [tbird crash] a11y:crash-tree)

Crash Data

Security

(public)

User Story

Description

Updated

Comment 1

Comment 2

Updated

Comment 3

Comment 4

Updated

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Comment 10

Comment 11

Comment 12

Comment 13

Comment 14

Comment 15

Updated

Comment 16

Updated

Comment 17