Closed Bug 1263517 Opened 9 years ago Closed 8 years ago

crash in MaiAtkObject::Shutdown

Categories

(Core :: Disability Access APIs, defect)

Unspecified
Linux
defect
Not set
critical

Tracking

()

RESOLVED WORKSFORME
Tracking Status
firefox48 --- affected
firefox-esr45 --- affected

People

(Reporter: n.nethercote, Unassigned)

Details

(Keywords: crash)

Crash Data

This bug was filed from the Socorro interface and is report bp-f1285b59-449a-423b-b858-661282160408. ============================================================= This started happening in Nightly 20160408030212 and has occurred 75 times since, which is a very high number of crashes for Linux. Stack: > 0 libxul.so MaiAtkObject::Shutdown accessible/base/AccessibleOrProxy.h > 1 libxul.so mozilla::a11y::ProxyDestroyed accessible/atk/AccessibleWrap.cpp > 2 libxul.so mozilla::a11y::DocAccessibleParent::Destroy accessible/ipc/DocAccessibleParent.cpp > 3 libxul.so mozilla::dom::PBrowserParent::DestroySubtree obj-firefox/ipc/ipdl/PBrowserParent.cpp > 4 libxul.so mozilla::dom::PContentParent::DestroySubtree obj-firefox/ipc/ipdl/PContentParent.cpp > 5 libxul.so mozilla::dom::PContentParent::OnChannelError obj-firefox/ipc/ipdl/PContentParent.cpp > 6 libxul.so mozilla::dom::ContentParent::OnChannelError dom/ipc/ContentParent.cpp > 7 libxul.so mozilla::ipc::MessageChannel::NotifyMaybeChannelError ipc/glue/MessageChannel.cpp > 8 libxul.so mozilla::ipc::MessageChannel::OnNotifyMaybeChannelError ipc/glue/MessageChannel.cpp Given that the crash involves a11y, there's a good chance it's related to bug 1263188.
Surkov, is this a dup of bug 1263188?
Flags: needinfo?(surkov.alexander)
I came from https://crash-stats.mozilla.com/report/index/94466063-05cf-4562-b367-28ec92160413 and it happens to me 100% (5 out of 5) on twitter.com after login.
(In reply to Nicholas Nethercote [:njn] from comment #1) > Surkov, is this a dup of bug 1263188? It doesn't look so. It's rather an issue described in bug 1255009, a case when an accessible was moved, and a hide event was fired before a show event. CC'ing Trevor for his insight. David, why do we keep running multiprocess a11y on Linux?
Flags: needinfo?(surkov.alexander)
(In reply to alexander :surkov from comment #3) > (In reply to Nicholas Nethercote [:njn] from comment #1) > David, why do we keep running multiprocess a11y on Linux? Trevor confirmed recently that accessibility support is mostly ready for e10s Linux except for some patches that haven't landed. I'm not sure what's up with this stack, seeing channel errors seems pretty bad. We should probably turn off e10s + a11y on Linux to stop the bleeding.
Flags: needinfo?(wmccloskey)
Flags: needinfo?(tbsaunde+mozbugs)
(In reply to David Bolter [:davidb] from comment #4) > We should probably turn off e10s + a11y on Linux to stop the bleeding. Which one? I can probably survive a11y switched off in Nightly for some time, but could we finally make e10s working, please?
Davidb, why do we keep a11y enabled on Linux if no Orca running? It must be regressing the users.
(In reply to alexander :surkov from comment #3) > > Surkov, is this a dup of bug 1263188? > > It doesn't look so. It's rather an issue described in bug 1255009, a case > when an accessible was moved, and a hide event was fired before a show > event. Hmm, so it did happen infrequently before Nightly 20160408030212, but it became much more frequent then. So the changes that caused bug 1263188 must have also made this crash much more common.
(In reply to Nicholas Nethercote [:njn] from comment #7) > (In reply to alexander :surkov from comment #3) > > > Surkov, is this a dup of bug 1263188? > > > > It doesn't look so. It's rather an issue described in bug 1255009, a case > > when an accessible was moved, and a hide event was fired before a show > > event. > > Hmm, so it did happen infrequently before Nightly 20160408030212, but it > became much more frequent then. So the changes that caused bug 1263188 must > have also made this crash much more common. it is plausible since bug 1263188 tweaks event ordering. the easiest way to check my guess would be applying the patch from bug 1255009, and checking if the crash goes away
(In reply to Matěj Cepl from comment #5) > (In reply to David Bolter [:davidb] from comment #4) > > We should probably turn off e10s + a11y on Linux to stop the bleeding. > > Which one? I can probably survive a11y switched off in Nightly for some > time, but could we finally make e10s working, please? It is a top top top priority. If you are blocked you can try flipping about:config accessibility.force_disabled, (but remember to switch it back at some point.)
(In reply to David Bolter [:davidb] from comment #9) > It is a top top top priority. If you are blocked you can try flipping > about:config accessibility.force_disabled, (but remember to switch it back > at some point.) Pain is not so severe, so far, but thanks for the hint.
(In reply to alexander :surkov from comment #8) > (In reply to Nicholas Nethercote [:njn] from comment #7) > > > > So the changes that caused bug 1263188 must > > have also made this crash much more common. > > it is plausible since bug 1263188 tweaks event ordering. the easiest way to > check my guess would be applying the patch from bug 1255009, and checking if > the crash goes away Bug 1255009 is RESOLVED FIXED. Which patch are you referring to? That bug has lots of comments, multiple patches that didn't land, and at least one backout, so it's hard for me to tell what its status is. surkov, can you please clarify what needs to be done here? And whatever that is, are you able to do it? Thank you.
Flags: needinfo?(surkov.alexander)
Alex if we need to buy time until Trevor is back I think we can turn e10s off for Linux in nsAppRunner::MultiprocessBlockPolicy -- look for disabledForA11y.
(In reply to David Bolter [:davidb] from comment #4) > Trevor confirmed recently that accessibility support is mostly ready for > e10s Linux except for some patches that haven't landed. > > I'm not sure what's up with this stack, seeing channel errors seems pretty > bad. > > We should probably turn off e10s + a11y on Linux to stop the bleeding. Sorry, I haven't been following a11y lately.
Flags: needinfo?(wmccloskey)
(In reply to Nicholas Nethercote [:njn] from comment #11) > > it is plausible since bug 1263188 tweaks event ordering. the easiest way to > > check my guess would be applying the patch from bug 1255009, and checking if > > the crash goes away > > Bug 1255009 is RESOLVED FIXED. Which patch are you referring to? 'ipc fix' patch > surkov, can you please clarify what needs to be done here? > And whatever that > is, are you able to do it? Thank you. Trevor is in charge or that part of code, and it seems he didn't like an approach from that patch, but if it was me that I would try it to land it and check if it helps. (In reply to David Bolter [:davidb] from comment #12) > Alex if we need to buy time until Trevor is back I think we can turn e10s > off for Linux in nsAppRunner::MultiprocessBlockPolicy -- look for > disabledForA11y. if the crash numbers are high, then it's better to do this. Maybe there is a patch somewhere we can just re-land?
Flags: needinfo?(surkov.alexander)
To summarize my understanding: 1. This bug was landed: Bug 1261425 - coalesce mutation events by a tree structure 2. Regressions were fixed: Bug 1263188 - crash in PLDHashTable::Search | mozilla::a11y::DocAccessible::UnbindFromDocument 3. Those regression fixes spiked the crashes we see in this bug. (comment 7) 4. Bug 1261425 has an unlanded 'ipc fix' patch hanging around that Trevor doesn't like (?), but that might fix this bug. Trevor is away until mid next week. Possible actions: A. Backout 1261425 and follow ups until known issues (hopefully reproducible) are fixed. B. Disable e10s when a11y is used on Linux. (comment 12) C. Something else? Checking crash-stats again... The last build id I see in crash-stats is 20160413030239 -- so this might crash already be fixed?
Flags: needinfo?(surkov.alexander)
(In reply to David Bolter [:davidb] from comment #15) > Possible actions: > > A. Backout 1261425 and follow ups until known issues (hopefully > reproducible) are fixed. > B. Disable e10s when a11y is used on Linux. (comment 12) > C. Something else? Personally I'd keep bug 1261425, since it is a good perfomance improvement. Otherwise we should have some common procedure for cases like this, i.e. we should have somebody in charge to back out things, when they don't meet certain criteria.
Flags: needinfo?(surkov.alexander)
Still no crashes since 20160413030239. Closing WORKSFORME.
Status: NEW → RESOLVED
Closed: 9 years ago
Flags: needinfo?(tbsaunde+mozbugs)
Resolution: --- → WORKSFORME
Status: RESOLVED → REOPENED
Flags: needinfo?(dbolter)
Resolution: WORKSFORME → ---
Yeah I think reopening is correct, Trevor what's next here?
Flags: needinfo?(dbolter) → needinfo?(tbsaunde+mozbugs)
Still hitting this in Nightly, roughly a couple of times per day.
Crash volume for signature 'MaiAtkObject::Shutdown': - nightly (version 51): 0 crashes from 2016-08-01. - aurora (version 50): 0 crashes from 2016-08-01. - beta (version 49): 0 crashes from 2016-08-02. - release (version 48): 53 crashes from 2016-07-25. - esr (version 45): 2 crashes from 2016-05-02. Crash volume on the last weeks (Week N is from 08-22 to 08-28): W. N-1 W. N-2 W. N-3 - nightly 0 0 0 - aurora 0 0 0 - beta 0 0 0 - release 17 10 3 - esr 0 0 2 Affected platform: Linux Crash rank on the last 7 days: Browser Content Plugin - nightly - aurora - beta - release #928 - esr
David you should probably do something with this.
Flags: needinfo?(tbsaunde+mozbugs) → needinfo?(dbolter)
No crashes since 48. Closing WFM
Status: REOPENED → RESOLVED
Closed: 9 years ago8 years ago
Flags: needinfo?(dbolter)
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.