Closed Bug 387252 Opened 17 years ago Closed 17 years ago

Crash [@ nsQueryInterface::operator] with accessibility and setting display:none on second option

Categories

(Core :: Disability Access APIs, defect)

x86
All
defect
Not set
critical

Tracking

()

VERIFIED FIXED

People

(Reporter: martijn.martijn, Assigned: MatsPalmgren_bugz)

Details

(Keywords: crash, testcase, Whiteboard: [sg:moderate?] using freed a11y node)

Crash Data

Attachments

(5 files, 1 obsolete file)

Attached file testcase
See testcase, which crashes for me with current trunk build after 1 or 2 seconds or so.
The testcase uses privileged code, so you need to download it to your computer.

This seems to have regressed between 2007-05-03 and 2007-05-05:
http://bonsai.mozilla.org/cvsquery.cgi?treeid=default&module=all&branch=HEAD&branchtype=match&dir=&file=&filetype=match&who=&whotype=match&sortby=Date&hours=2&date=explicit&mindate=2007-05-03+04&maxdate=2007-05-05+09&cvsroot=%2Fcvsroot
Regression from bug 357583, somehow?
Assignee: aaronleventhal → mats.palmgren
OS: Windows XP → All
Also crashes on 1.8 branch with a slightly different testcase...
It doesn't look like a recent regression to me, I think the problem
has always been there, at least since the code that caches <option>
accessibles was added (bug 278872?), although I suspect there
are other cases that might trigger it too...
Group: security
Keywords: regression
Whiteboard: [sg:moderate?] using freed a11y node
Version: Trunk → unspecified
Currently, this crashes on a null-pointer 'domElement':
http://bonsai.mozilla.org/cvsblame.cgi?file=/mozilla/accessible/src/html/nsHTMLSelectAccessible.cpp&rev=1.46.2.3&root=/cvsroot&mark=477#468
(since the node has been Shutdown), but it should be possible to
tweak the testcase so that a garbage collect of this node occurs,
then we would crash GetNextSibling instead, like we do on trunk.
I'm not sure why there is a difference really, the code that caches
the option a11y nodes are pretty much the same on trunk and branch -
it seems the garbage collector on trunk does a better job and that's
why we run into freed memory access more often there.
Martijn, Jesse, if you are doing any systematic tests traversing a11y
nodes you might try walking siblings (without invoking parent.firstChild
first) like I did in the second testcase.  It could trigger more bugs
of the same sort.  If it's possible to force GC to be more aggressive
that would help too I think.
(In reply to comment #3)
> ... try walking siblings (without invoking parent.firstChild first)

... or a random mix of the two would probably be best...
Attached patch branch triggerSplinter Review
If I add this on branch then we get the nastier crash on branch with
the second testcase.
Attached file Trace
Sequence of events:
1. script get the combobox accessible firstChild, which triggers the
   option caching code. The resulting tree is this:
   ...
     nsHTMLComboboxListAccessible 0x179c970 (blue)
       nsHTMLSelectOptionAccessible 0x179caf0 (green)
       nsHTMLSelectOptionAccessible 0x179c970 (blue)
   note that for the 2nd option there is *one* single strong ref
   on this a11y node - the cache entry.
2. we do a DOM/style change that triggers RefreshNodes on the
   2nd option, which triggers Shutdown() and removal from the cache:
http://bonsai.mozilla.org/cvsblame.cgi?file=/mozilla/accessible/src/base/nsDocAccessible.cpp&rev=1.161&root=/cvsroot&mark=1340,1342#1302
This triggers InvalidateChildren on the parent (blue) but note that
this does *not* reset the mNextSibling pointer on the first node:
http://bonsai.mozilla.org/cvsblame.cgi?file=/mozilla/accessible/src/base/nsAccessible.cpp&rev=1.283&root=/cvsroot&mark=584-586#582
which is now a dangling pointer to something that will be destroyed
as soon as the GC finds it. The script could still have a valid handle
on the first option accessible and invoking GetNextSibling() on it
will crash if the GC got to it first.
Sorry, that should have been:
   ...
     nsHTMLComboboxListAccessible 0x179c970 (blue)
       nsHTMLSelectOptionAccessible 0x179caf0 (green)
       nsHTMLSelectOptionAccessible 0x179cba0 (red)
Flags: wanted1.8.1.x+
Flags: wanted1.8.0.x+
Attached patch Patch rev. 1 (obsolete) — Splinter Review
I don't see any reason why this problem would be limited to the
nsHTMLSelectOptionAccessible siblings so I fixed it for all types of
accessible nodes.
Attachment #271451 - Flags: review?(aaronleventhal)
I forgot I should change the uuid on nsPIAccessible.idl as well...
Attachment #271451 - Flags: review?(surkov.alexander)
Attachment #271451 - Flags: review?(ginn.chen)
Attachment #271451 - Flags: review?(aaronleventhal)
I think technically it looks correct because if we set parent and next sibling for child accessibles and if we unset parent accessible then we should unset sibling accessibles too. But why do you introduce new method for this, doesn't SetNextSibling(nsnull) work correct here?
(In reply to comment #10)
> But why do you introduce new method for this, doesn't
> SetNextSibling(nsnull) work correct here?

SetNextSibling(nsnull) on each child would work, but the problem is that
we would need to use GetNextSibling() for the iteration.  I'm trying to avoid
GetNextSibling() because it sometimes creates new nodes which I think we
should avoid doing at this point.  The new method gives me direct access to
mNextSibling for each child.  Let me know if there is a better solution
without using GetNextSibling().
Comment on attachment 271451 [details] [diff] [review]
Patch rev. 1

ok, r=me then.

nit: I'm not sure you should mention bug 387252 in comment because at least for me it is associated with some existing problems in code.
Attachment #271451 - Flags: review?(surkov.alexander) → review+
I concerns about the recursion, it may cause more time and maybe stack overflow.

Can we static cast it to nsAccessible?
looks good?
Attachment #273399 - Flags: review?(mats.palmgren)
(In reply to comment #12)
> nit: I'm not sure you should mention bug 387252 in comment because at least for
> me it is associated with some existing problems in code.

References in the code to Bugzilla reports does not imply there's an
existing problem in the code, it's just there to hint that there is
a bug report which contains useful discussion/stacks/testcases etc for
the code at hand.  I generally find such references very useful, even
if the cvs log (or Bonsai) contains bug numbers, they are harder to
track down after a few commits.  It's common practice in the rest
of the tree.
(In reply to comment #13)
> I concerns about the recursion, it may cause more time and maybe stack
> overflow.

Good point.
Attachment #271451 - Attachment is obsolete: true
Attachment #271451 - Flags: review?(ginn.chen)
Comment on attachment 273399 [details] [diff] [review]
patch using static cast 

r=mats

I extended the comment a bit, mentioning that we explicitly avoid
using GetNextSibling() there, and checked it in.
Attachment #273399 - Flags: review?(mats.palmgren) → review+
mozilla/accessible/src/base/nsAccessible.cpp 	1.285 

-> FIXED
Status: NEW → RESOLVED
Closed: 17 years ago
Resolution: --- → FIXED
Verified fixed, using:
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9a8pre) Gecko/2007080605 Minefield/3.0a8pre
Both testcases don't crash anymore.

(In reply to comment #3)
> Martijn, Jesse, if you are doing any systematic tests traversing a11y
> nodes you might try walking siblings (without invoking parent.firstChild
> first) like I did in the second testcase.  It could trigger more bugs
> of the same sort.  If it's possible to force GC to be more aggressive
> that would help too I think.

I did some testing with a script like yours in the second testcase (not sure though, if I did it completely correct). I haven't found any other crashers with it.
There is now Components.utils.forceGC() to trigger GC, I can use that in my testing.
Status: RESOLVED → VERIFIED
Crash Signature: [@ nsQueryInterface::operator]
Group: core-security → core-security-release
Group: core-security-release
You need to log in before you can comment on or make changes to this bug.