Closed
      
        Bug 1263517
      
      
        Opened 9 years ago
          Closed 8 years ago
      
        
    
  
crash in MaiAtkObject::Shutdown  
    Categories
(Core :: Disability Access APIs, defect)
Tracking
()
        RESOLVED
        WORKSFORME
        
    
  
People
(Reporter: n.nethercote, Unassigned)
Details
(Keywords: crash)
Crash Data
This bug was filed from the Socorro interface and is 
report bp-f1285b59-449a-423b-b858-661282160408.
=============================================================
This started happening in Nightly 20160408030212 and has occurred 75 times since, which is a very high number of crashes for Linux.
Stack:
> 0 	libxul.so 	MaiAtkObject::Shutdown 	accessible/base/AccessibleOrProxy.h
> 1 	libxul.so 	mozilla::a11y::ProxyDestroyed 	accessible/atk/AccessibleWrap.cpp
> 2 	libxul.so 	mozilla::a11y::DocAccessibleParent::Destroy 	accessible/ipc/DocAccessibleParent.cpp
> 3 	libxul.so 	mozilla::dom::PBrowserParent::DestroySubtree 	obj-firefox/ipc/ipdl/PBrowserParent.cpp
> 4 	libxul.so 	mozilla::dom::PContentParent::DestroySubtree 	obj-firefox/ipc/ipdl/PContentParent.cpp
> 5 	libxul.so 	mozilla::dom::PContentParent::OnChannelError 	obj-firefox/ipc/ipdl/PContentParent.cpp
> 6 	libxul.so 	mozilla::dom::ContentParent::OnChannelError 	dom/ipc/ContentParent.cpp
> 7 	libxul.so 	mozilla::ipc::MessageChannel::NotifyMaybeChannelError 	ipc/glue/MessageChannel.cpp
> 8 	libxul.so 	mozilla::ipc::MessageChannel::OnNotifyMaybeChannelError 	ipc/glue/MessageChannel.cpp
Given that the crash involves a11y, there's a good chance it's related to bug 1263188.
| Reporter | ||
          Comment 1•9 years ago
           
         | 
      ||
Surkov, is this a dup of bug 1263188?
Flags: needinfo?(surkov.alexander)
          Comment 2•9 years ago
           
         | 
      ||
I came from https://crash-stats.mozilla.com/report/index/94466063-05cf-4562-b367-28ec92160413 and it happens to me 100% (5 out of 5) on twitter.com after login.
          Comment 3•9 years ago
           
         | 
      ||
(In reply to Nicholas Nethercote [:njn] from comment #1)
> Surkov, is this a dup of bug 1263188?
It doesn't look so. It's rather an issue described in bug 1255009, a case when an accessible was moved, and a hide event was fired before a show event. CC'ing Trevor for his insight.
David, why do we keep running multiprocess a11y on Linux?
Flags: needinfo?(surkov.alexander)
          Comment 4•9 years ago
           
         | 
      ||
(In reply to alexander :surkov from comment #3)
> (In reply to Nicholas Nethercote [:njn] from comment #1)
> David, why do we keep running multiprocess a11y on Linux?
Trevor confirmed recently that accessibility support is mostly ready for e10s Linux except for some patches that haven't landed.
I'm not sure what's up with this stack, seeing channel errors seems pretty bad.
We should probably turn off e10s + a11y on Linux to stop the bleeding.
Flags: needinfo?(wmccloskey)
Flags: needinfo?(tbsaunde+mozbugs)
          Comment 5•9 years ago
           
         | 
      ||
(In reply to David Bolter [:davidb] from comment #4)
> We should probably turn off e10s + a11y on Linux to stop the bleeding.
Which one? I can probably survive a11y switched off in Nightly for some time, but could we finally make e10s working, please?
          Comment 6•9 years ago
           
         | 
      ||
Davidb, why do we keep a11y enabled on Linux if no Orca running? It must be regressing the users.
| Reporter | ||
          Comment 7•9 years ago
           
         | 
      ||
(In reply to alexander :surkov from comment #3)
> > Surkov, is this a dup of bug 1263188?
> 
> It doesn't look so. It's rather an issue described in bug 1255009, a case
> when an accessible was moved, and a hide event was fired before a show
> event.
Hmm, so it did happen infrequently before Nightly 20160408030212, but it became much more frequent then. So the changes that caused bug 1263188 must have also made this crash much more common.
          Comment 8•9 years ago
           
         | 
      ||
(In reply to Nicholas Nethercote [:njn] from comment #7)
> (In reply to alexander :surkov from comment #3)
> > > Surkov, is this a dup of bug 1263188?
> > 
> > It doesn't look so. It's rather an issue described in bug 1255009, a case
> > when an accessible was moved, and a hide event was fired before a show
> > event.
> 
> Hmm, so it did happen infrequently before Nightly 20160408030212, but it
> became much more frequent then. So the changes that caused bug 1263188 must
> have also made this crash much more common.
it is plausible since bug 1263188 tweaks event ordering. the easiest way to check my guess would be applying the patch from bug 1255009, and checking if the crash goes away
          Comment 9•9 years ago
           
         | 
      ||
(In reply to Matěj Cepl from comment #5)
> (In reply to David Bolter [:davidb] from comment #4)
> > We should probably turn off e10s + a11y on Linux to stop the bleeding.
> 
> Which one? I can probably survive a11y switched off in Nightly for some
> time, but could we finally make e10s working, please?
It is a top top top priority. If you are blocked you can try flipping about:config accessibility.force_disabled, (but remember to switch it back at some point.)
          Comment 10•9 years ago
           
         | 
      ||
(In reply to David Bolter [:davidb] from comment #9)
> It is a top top top priority. If you are blocked you can try flipping
> about:config accessibility.force_disabled, (but remember to switch it back
> at some point.)
Pain is not so severe, so far, but thanks for the hint.
| Reporter | ||
          Comment 11•9 years ago
           
         | 
      ||
(In reply to alexander :surkov from comment #8)
> (In reply to Nicholas Nethercote [:njn] from comment #7)
> >
> > So the changes that caused bug 1263188 must
> > have also made this crash much more common.
> 
> it is plausible since bug 1263188 tweaks event ordering. the easiest way to
> check my guess would be applying the patch from bug 1255009, and checking if
> the crash goes away
Bug 1255009 is RESOLVED FIXED. Which patch are you referring to? That bug has lots of comments, multiple patches that didn't land, and at least one backout, so it's hard for me to tell what its status is.
surkov, can you please clarify what needs to be done here? And whatever that is, are you able to do it? Thank you.
Flags: needinfo?(surkov.alexander)
          Comment 12•9 years ago
           
         | 
      ||
Alex if we need to buy time until Trevor is back I think we can turn e10s off for Linux in nsAppRunner::MultiprocessBlockPolicy -- look for disabledForA11y.
(In reply to David Bolter [:davidb] from comment #4)
> Trevor confirmed recently that accessibility support is mostly ready for
> e10s Linux except for some patches that haven't landed.
> 
> I'm not sure what's up with this stack, seeing channel errors seems pretty
> bad.
> 
> We should probably turn off e10s + a11y on Linux to stop the bleeding.
Sorry, I haven't been following a11y lately.
Flags: needinfo?(wmccloskey)
          Comment 14•9 years ago
           
         | 
      ||
(In reply to Nicholas Nethercote [:njn] from comment #11)
> > it is plausible since bug 1263188 tweaks event ordering. the easiest way to
> > check my guess would be applying the patch from bug 1255009, and checking if
> > the crash goes away
> 
> Bug 1255009 is RESOLVED FIXED. Which patch are you referring to?
'ipc fix' patch
> surkov, can you please clarify what needs to be done here?
> And whatever that
> is, are you able to do it? Thank you.
Trevor is in charge or that part of code, and it seems he didn't like an approach from that patch, but if it was me that I would try it to land it and check if it helps.
(In reply to David Bolter [:davidb] from comment #12)
> Alex if we need to buy time until Trevor is back I think we can turn e10s
> off for Linux in nsAppRunner::MultiprocessBlockPolicy -- look for
> disabledForA11y.
if the crash numbers are high, then it's better to do this. Maybe there is a patch somewhere we can just re-land?
Flags: needinfo?(surkov.alexander)
          Comment 15•9 years ago
           
         | 
      ||
To summarize my understanding:
1. This bug was landed: Bug 1261425 - coalesce mutation events by a tree structure
2. Regressions were fixed: Bug 1263188 - crash in PLDHashTable::Search | mozilla::a11y::DocAccessible::UnbindFromDocument
3. Those regression fixes spiked the crashes we see in this bug. (comment 7)
4. Bug 1261425 has an unlanded 'ipc fix' patch hanging around that Trevor doesn't like (?), but that might fix this bug.
Trevor is away until mid next week.
Possible actions:
A. Backout 1261425 and follow ups until known issues (hopefully reproducible) are fixed.
B. Disable e10s when a11y is used on Linux. (comment 12)
C. Something else?
Checking crash-stats again...
The last build id I see in crash-stats is 20160413030239 -- so this might crash already be fixed?
Flags: needinfo?(surkov.alexander)
          Comment 16•9 years ago
           
         | 
      ||
(In reply to David Bolter [:davidb] from comment #15)
> Possible actions:
> 
> A. Backout 1261425 and follow ups until known issues (hopefully
> reproducible) are fixed.
> B. Disable e10s when a11y is used on Linux. (comment 12)
> C. Something else?
Personally I'd keep bug 1261425, since it is a good perfomance improvement. Otherwise we should have some common procedure for cases like this, i.e. we should have somebody in charge to back out things, when they don't meet certain criteria.
Flags: needinfo?(surkov.alexander)
          Comment 17•9 years ago
           
         | 
      ||
Still no crashes since 20160413030239.
Closing WORKSFORME.
Status: NEW → RESOLVED
Closed: 9 years ago
Flags: needinfo?(tbsaunde+mozbugs)
Resolution: --- → WORKSFORME
| Reporter | ||
          Comment 18•9 years ago
           
         | 
      ||
Three more crashes, all from one installation, in Nightly 20160429030215:
https://crash-stats.mozilla.com/report/index/0fd26cc3-4d3e-498f-8dee-2fc1f2160429
https://crash-stats.mozilla.com/report/index/94e5b13e-cb25-464b-b9fa-f9c832160429
https://crash-stats.mozilla.com/report/index/94625aad-ed00-4273-b402-65ead2160429
dbolter, worth reopening?
Status: RESOLVED → REOPENED
Flags: needinfo?(dbolter)
Resolution: WORKSFORME → ---
          Comment 19•9 years ago
           
         | 
      ||
Yeah I think reopening is correct, Trevor what's next here?
Flags: needinfo?(dbolter) → needinfo?(tbsaunde+mozbugs)
| Reporter | ||
          Comment 20•9 years ago
           
         | 
      ||
Still hitting this in Nightly, roughly a couple of times per day.
          Comment 21•9 years ago
           
         | 
      ||
Crash volume for signature 'MaiAtkObject::Shutdown':
 - nightly (version 51): 0 crashes from 2016-08-01.
 - aurora  (version 50): 0 crashes from 2016-08-01.
 - beta    (version 49): 0 crashes from 2016-08-02.
 - release (version 48): 53 crashes from 2016-07-25.
 - esr     (version 45): 2 crashes from 2016-05-02.
Crash volume on the last weeks (Week N is from 08-22 to 08-28):
            W. N-1  W. N-2  W. N-3
 - nightly       0       0       0
 - aurora        0       0       0
 - beta          0       0       0
 - release      17      10       3
 - esr           0       0       2
Affected platform: Linux
Crash rank on the last 7 days:
           Browser     Content   Plugin
 - nightly
 - aurora
 - beta
 - release #928
 - esr
          status-firefox-esr45:
          --- → affected
          Comment 22•8 years ago
           
         | 
      ||
David you should probably do something with this.
Flags: needinfo?(tbsaunde+mozbugs) → needinfo?(dbolter)
          Comment 23•8 years ago
           
         | 
      ||
No crashes since 48. Closing WFM
Status: REOPENED → RESOLVED
Closed: 9 years ago → 8 years ago
Flags: needinfo?(dbolter)
Resolution: --- → WORKSFORME
          You need to log in
          before you can comment on or make changes to this bug.
        
Description
•