Closed Bug 1264852 Opened 8 years ago Closed 8 years ago

crash in mozilla::a11y::EventTree::Clear

Categories

(Core :: Disability Access APIs, defect)

Unspecified
Windows NT
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla49
Tracking Status
firefox48 --- fixed
firefox49 --- fixed

People

(Reporter: n.nethercote, Assigned: surkov)

Details

(Keywords: crash, regression)

Crash Data

Attachments

(1 file)

This bug was filed from the Socorro interface and is 
report bp-aefee8ce-46ec-4b5b-994c-b47c72160414.
=============================================================

Just 2 crashes so far with this signature, both in Nightly 20160414030247. Both of them only have a single stack frame, oddly enough:

> 0 	xul.dll 	mozilla::a11y::EventTree::Clear() 	accessible/base/EventTree.cpp
surkov, this is in code you touched recently.
Flags: needinfo?(surkov.alexander)
No good ideas. Boris, do you know what it could mean?
Flags: needinfo?(surkov.alexander)
No, I do not.  You could try loading the minidump in a debugger on Windows (or finding someone willing to do that and take a look) and seeing what you can figure out.
There have now been four crashes. They are all stack overflows.

https://crash-stats.mozilla.com/report/index/f238bad9-1eb8-484c-9e24-5428d2160417 has a much better stack, which makes it clear that there's some kind of infinite loop during EventTree() destruction.
Flags: needinfo?(surkov.alexander)
Other EventTree signatures showing up:

mozilla::a11y::EventTree::~EventTree
mozilla::a11y::EventTree::`scalar deleting destructor''

NOTE: all are EXCEPTION_STACK_OVERFLOW (like this bug report)
I don't have good ideas on this, I'm not sure how nsAutoPtr may get into a loop at all. I tried to browsing a day with a11y turning on to catch a crash, but no luck. Can we summon some c++ guru to audit at the code, in case if I miss something evident.
Flags: needinfo?(surkov.alexander)
I took a look.

Here's a relevant part of the stack frame.

> 0       xul.dll         mozilla::a11y::EventTree::Clear()       accessible/bas
e/EventTree.cpp
> 1       xul.dll         mozilla::a11y::EventTree::~EventTree()  accessible/bas
e/EventTree.h
> 2       xul.dll         mozilla::a11y::EventTree::`scalar deleting destructor'
(unsigned int)    
> 3       xul.dll         nsAutoPtr<mozilla::a11y::EventTree>::assign(mozilla::a
11y::EventTree*)  obj-firefox/dist/include/nsAutoPtr.h
> 4       xul.dll         mozilla::a11y::EventTree::Clear()       accessible/bas
e/EventTree.cpp
> 5       xul.dll         mozilla::a11y::EventTree::~EventTree()  accessible/bas
e/EventTree.h

Working from frame 5 up:

- 5. ~EventTree() calls Clear()

- 4. Clear() nulls out mNext, which is an nsAutoPtr<EventTree>.

- 3. We end up in nsAutoPtr::assign...
 
- 2. ... which calls |delete| on the EventTree pointed to by mNext.

- 1. So enter ~EventTree() again, but for a different EventTree instance.
  
So it looks like it's not infinite recursion, but if the EventTree list 
structure is long enough we could end up blowing the stack anyway.

I suggest putting this into ~EventTree():

    fprintf(stderr, "~EventTree(): %p\n", this);

and see if it prints out the same address repeatedly (infinite recursion) or 
different addresses each time (non-infinite recursion). I tried this but I 
didn't get any output -- I don't know what's involved to trigger the execution 
of this code.

If it is non-infinite recursion, I see two paths forward.

- Determine how big the list can get, and if it is getting big, whether this 
  should be happening.

- Find a way to destroy the list using iteration -- changing mNext so it's not
  an nsAutoPtr<> might help.
Flags: needinfo?(surkov.alexander)
Crash Signature: [@ mozilla::a11y::EventTree::Clear] → [@ mozilla::a11y::EventTree::Clear] [@ mozilla::a11y::EventTree::~EventTree] [@ mozilla::a11y::EventTree::`scalar deleting destructor'']
I was able to reproduce it once, it's indeed a huge tree, and all nodes are different objects

(In reply to Nicholas Nethercote [:njn] from comment #7)

> If it is non-infinite recursion, I see two paths forward.
> 
> - Determine how big the list can get, and if it is getting big, whether this 
>   should be happening.

It can be as big as DOM tree in time between two layout flushes. All accessible tree changes caused by DOM/style changes are accumulated in that tree.

> - Find a way to destroy the list using iteration -- 

that should make a trick, since Process() called right before Clear() doesn't crash. Btw, not sure I see a reason why.
 
> changing mNext so it's
> not
>   an nsAutoPtr<> might help.

can you give details on this?
Flags: needinfo?(surkov.alexander)
Attached patch patchSplinter Review
clear as we go
Assignee: nobody → surkov.alexander
Attachment #8745074 - Flags: review?(n.nethercote)
Comment on attachment 8745074 [details] [diff] [review]
patch

Review of attachment 8745074 [details] [diff] [review]:
-----------------------------------------------------------------

Seems plausible. Let's cross our fingers.
Attachment #8745074 - Flags: review?(n.nethercote) → review+
Is it possible to write a test for this?
(In reply to Nicholas Nethercote [:njn] from comment #14)
> Is it possible to write a test for this?

it should be doable, let me check
(In reply to alexander :surkov from comment #15)
> (In reply to Nicholas Nethercote [:njn] from comment #14)
> > Is it possible to write a test for this?
> 
> it should be doable, let me check

failed to reproduce in artificial test, but I do on a web site snapshot I taken. I filed bug 1268562.
https://hg.mozilla.org/mozilla-central/rev/13d3afcb47e4
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla49
Comment on attachment 8745074 [details] [diff] [review]
patch

Approval Request Comment
[Feature/regressing bug #]:bug 1261425
[User impact if declined]:crashes
[Describe test coverage new/current, TreeHerder]:test coverage presented, however the crash itself doesn't have a test yet
[Risks and why]: moderate, a simple patch for not trivial code
[String/UUID change made/needed]:no
Attachment #8745074 - Flags: approval-mozilla-aurora?
Keywords: regression
Comment on attachment 8745074 [details] [diff] [review]
patch

Crash fix for regression in 48.
Attachment #8745074 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
You need to log in before you can comment on or make changes to this bug.