Closed Bug 398021 Opened 14 years ago Closed 11 years ago

Crash [@ nsAccessible::GetFinalRole] with moving options and using visibility: hidden

Categories

(Core :: Disability Access APIs, defect)

x86
Windows XP
defect
Not set
critical

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: martijn.martijn, Unassigned)

References

(Blocks 2 open bugs)

Details

(Keywords: crash, testcase, Whiteboard: [sg:dos] stack exhaustion)

Crash Data

Attachments

(4 files, 9 obsolete files)

1.61 KB, text/html
Details
46.26 KB, patch
Details | Diff | Splinter Review
40.33 KB, patch
Details | Diff | Splinter Review
3.84 KB, patch
surkov
: review+
Details | Diff | Splinter Review
Attached file testcase
See testcase, which crashes current trunk build after 500ms.
You need to download the testcase to your computer to see the crash, because of the use of enhanced privileges.

It also crashes current Firefox branch when I close the browser, so I'm marking it security sensitive because of that.

http://crash-stats.mozilla.com/report/index/ea8e8c51-6ed3-11dc-a666-001a4bd43e5c
0  	nsAccessible::GetFinalRole(unsigned int*)  	 mozilla/accessible/src/base/nsAccessible.cpp:1996
1 	nsAccessible::Role(nsIAccessible*) 	mozilla/accessible/src/base/nsAccessible.h:147
2 	nsHTMLSelectOptionAccessible::GetRole(unsigned int*) 	mozilla/accessible/src/html/nsHTMLSelectAccessible.cpp:487
3 	nsAccessible::GetFinalRole(unsigned int*) 	mozilla/accessible/src/base/nsAccessible.cpp:1996
4 	nsAccessible::Role(nsIAccessible*) 	mozilla/accessible/src/base/nsAccessible.h:147
5 	nsHTMLSelectOptionAccessible::GetRole(unsigned int*) 	mozilla/accessible/src/html/nsHTMLSelectAccessible.cpp:487
6 	nsAccessible::GetFinalRole(unsigned int*) 	mozilla/accessible/src/base/nsAccessible.cpp:1996
7 	nsAccessible::Role(nsIAccessible*) 	mozilla/accessible/src/base/nsAccessible.h:147
8 	nsHTMLSelectOptionAccessible::GetRole(unsigned int*) 	mozilla/accessible/src/html/nsHTMLSelectAccessible.cpp:487
9 	nsAccessible::GetFinalRole(unsigned int*) 	mozilla/accessible/src/base/nsAccessible.cpp:1996
10 	nsAccessible::Role(nsIAccessible*) 	mozilla/accessible/src/base/nsAccessible.h:147
etc..
Blocks: fox3access
Attached patch rough patch (obsolete) — Splinter Review
the idea is we should freeze the accessible tree until we fire events. Now after document is modified we try to traverse the tree and create new accessibles. Patch chop off children of accessible if they weren't initialized. Aaron, does idea looks correct?
Attachment #284155 - Flags: review?(aaronleventhal)
btw, the patch has a ring of bug 398205
> btw, the patch has a ring of bug 398205
Surkov, does it prevent the situation that originally caused us to need the null check for bug 394198?
(In reply to comment #3)
> > btw, the patch has a ring of bug 398205
> Surkov, does it prevent the situation that originally caused us to need the
> null check for bug 394198?
> 

testcase from the bug 394198 doesn't crash with this patch.
Right, but what if you reverse the patch from bug 394198? It was a "wallpaper" null check fix. We already don't crash with the testcase from that patch, because of the temporary fix.

I want to know if this is the real fix for that one.
(In reply to comment #5)
> Right, but what if you reverse the patch from bug 394198? It was a "wallpaper"
> null check fix. We already don't crash with the testcase from that patch,
> because of the temporary fix.
> 
> I want to know if this is the real fix for that one.
> 

Sure, I removed that patch before test.
And I think it's worth to change NS_WARNING there on NS_ASSERTION since shell is not null there anymore.
Surko
(In reply to comment #8)
> Surko
> 

Heh? Eventually letters are finished on your computer? :)
Duplicate of this bug: 398205
Comment on attachment 284155 [details] [diff] [review]
rough patch

1. Why do we only fire this event if cached child count == -1
2. Why do we need SetCachedChildCount()?
3. Why do we still need InvalidateChildren() in FlushPendingEvents() for SHOW/CREATE/HIDE/DESTROY
4. Could you add answers these things as comments in the code.
Attachment #284155 - Flags: review?(aaronleventhal) → review-
Attached patch the same idea another way (obsolete) — Splinter Review
Assignee: aaronleventhal → surkov.alexander
Attachment #284155 - Attachment is obsolete: true
Status: NEW → ASSIGNED
Attachment #285097 - Flags: review?(aaronleventhal)
(In reply to comment #12)
> Created an attachment (id=285097) [details]
> the same idea another way
> 

idea is to save tree until it is invalidated, so if node is pending to invalidate then we shoulnd't create accessible for it or if it exists then to initialize children of its accessible.
1. IsPendingToInvalidate() should probably use nsAccUtils::IsAncestorOf()

2. In the past, we would invalidate the children after we called FireAccessibleEvent() for a hide event.
Now we invalidate the children first. I'm not sure what that will do to us. We might want to fire INVALIDATE_CACHE_INTERNAL after 
we fire the hide event from InvalidateCacheSubtree(). But, that would make RefreshNodes() come first, and I'm not sure what that will change.

3. I'm not sure if we want to remove to remove event dupes with INVALIDATE_CACHE_INTERNAL, and what it will do if we do that. Related: it looks like there is a case where we will pass -1 back for GetAccChildCount. If INVALIDATE_CACHE_INTERNAL happens twice in a row.
After the first, the child count is set to mAccChildCount == eChildCountUninitialized. When CacheChildren() is called it will return that for the number of children.

4. We have more implementations of CacheChildren() that would probably need to change.

5. Nit: if we go this route I want to change the name of the event to INVALIDATE_CACHE_INTERNAL -- we have another INTERNAL method already.

6. In the following case, if accessible subtree is changing for child B
 Top
A B C
this new code is saying that A and C cannot be created. Is that what we want? 

7. What if, instead of adding a new event, we just froze the accessible subtree under a hide event. Would that fix this bug as well?
Attachment #285097 - Flags: review?(aaronleventhal)
I'm curiois -- what was actually trying to create the accessible object before the tree was invalidated?
(In reply to comment #15)
> I'm curiois -- what was actually trying to create the accessible object before
> the tree was invalidated?
> 

in tescase beginAccessible() function called after timeout. So we get something like:
1. doe1()
2. beginAccessible()
3. nsDocAccessible::flushPendingEvents()
(In reply to comment #14)
> 1. IsPendingToInvalidate() should probably use nsAccUtils::IsAncestorOf()

I thought about this but IsPendingToInvalidate() starts from this node but IsAncestorOf does from parent node.

> 2. In the past, we would invalidate the children after we called
> FireAccessibleEvent() for a hide event.
> Now we invalidate the children first. I'm not sure what that will do to us. We
> might want to fire INVALIDATE_CACHE_INTERNAL after 
> we fire the hide event from InvalidateCacheSubtree(). But, that would make
> RefreshNodes() come first, and I'm not sure what that will change.

It looks correct if we will invalidate after show/hide event proccessing. That do you mean?

> 4. We have more implementations of CacheChildren() that would probably need to
> change.

Right, I rember about this but eventually I missed this.

> 5. Nit: if we go this route I want to change the name of the event to
> INVALIDATE_CACHE_INTERNAL -- we have another INTERNAL method already.

Ok.

> 6. In the following case, if accessible subtree is changing for child B
>  Top
> A B C
> this new code is saying that A and C cannot be created. Is that what we want? 

I'm not sure about A, sounds we can create accessible for it but we shouldn't intialize its children. But do we need such accessible?

Accessible for C shouldn't be created.

> 7. What if, instead of adding a new event, we just froze the accessible subtree
> under a hide event. Would that fix this bug as well?
> 

Just under show/hide events. I guess it should. I'll try.
I think we should chat on IRC when you're available.
> It looks correct if we will invalidate after show/hide event proccessing. 
> That do you mean?
I mean we currently invalidate after we fire MSAA/ATK's hide event, but before we fire the show event. We should keep it that way.

Is the basic problem with this bug that there is more than one invalidation at about the same time? The processinging of the 2nd invalidation in InvalidateCacheSubtree() can cause the subtree to be rebuilt before the DOM is ready.

>> A B C
>Accessible for C shouldn't be created.
Why?

> Just under show/hide events. I guess it should. I'll try.
I guess we still need the new event because of the problem explained in bug 398205

But, I wonder if we need to freeze the accessible tree like this for show events.
I think we should pull the INVALIDATE_CACHE_INTERNAL event into a separate patch and put in bug 398205. That way we can check it in to see if it's a good change on it's own.
Blocks: 398205
I don't crash, but unfortunately I get an assertion:
###!!! ASSERTION: Two accessibles have the same first child accessible.: '!realP
arent || realParent == this', file c:/moz/mozilla/accessible/src/base/nsAccessib
le.cpp, line 691

Surkov, can you take a look?
Good sign, it fixes the test case in bug 394404, although I get this assertion:
###!!! ASSERTION: Creating accessible for node with no document: 'content->GetDo
cument()', file c:/moz/mozilla/accessible/src/base/nsAccessibilityService.cpp, l
ine 1293
I think there is two problems:
1) you do not freeze the tree for show events

A
 C
B

Let I remove C and append it to B. With your patch C should have two parents A and B because A subtree is freezed and not invalidated. B subtree is not freezed.

2) you do not freeze the tree for random access to the tree

If I get accessible from point or for certain node (for example C after adding) then it should lead to the same stuff.

It's safely I guess to freeze the whole modified tree until we sync it again. At least because it's hard to track what's going on even for simple tests :).
(In reply to comment #23)

> 2) you do not freeze the tree for random access to the tree
> 
> If I get accessible from point or for certain node (for example C after adding)
> then it should lead to the same stuff.

Ah, no, it shouldn't. But in any way I afraid this.
No longer blocks: 398205
Depends on: 398205
Attached patch one more try (obsolete) — Splinter Review
Attachment #285634 - Flags: review?(aaronleventhal)
(In reply to comment #22)
> Good sign, it fixes the test case in bug 394404, although I get this assertion:
> ###!!! ASSERTION: Creating accessible for node with no document:
> 'content->GetDo
> cument()', file c:/moz/mozilla/accessible/src/base/nsAccessibilityService.cpp,
> l
> ine 1293
> 

latest patch fixes bug 394404 too but the assertion is there.
nsDocAccessible::ShutDown() should clear the mNodesToInvalidate array to avoid leaks, where it does the same for mEventsToFire:
http://lxr.mozilla.org/seamonkey/source/accessible/src/base/nsDocAccessible.cpp#572

isInStaleSubtree
* @param  canBeRoot  if true then the given node can be a root of stale subtrees
This is a bit hard to understand. How about: 
Having the new parameter seems like this is less of a radical change, which I like. The root of a change can still be part of normal processing, but the descendants aren't until processed the root is no longer stale.

I think the parameter name is a bit confusing. I suggest something like:
* @param mustBeDescendant
if true then the root of a changing subtree will not be considered stale
Please also add comments to all the places explaining why you pass that parameter as you do.

Nit: I suggest renaming s/isPending/isStale now because of the new method name.

Comment on attachment 285634 [details] [diff] [review]
one more try

r+, but I don't mind reviewing another patch with those nits fixed.

Also if you could figure out the assertion (comment 22) that would be great.
Attachment #285634 - Flags: review?(aaronleventhal) → review+
One more thing, please take special care to look at all the CacheChildren() methods, 
1) that they don't have a problem if they upcall to nsAccessible:CacheChildren() and mAccChildCount becomes -1, and 
2) that they check for staleness

Also for #1 check for any other callers of CacheChildren() and GetChildCount() and make sure that if a node is not shut down but returns -1 children it's still okay.

Attached patch patch (obsolete) — Splinter Review
Attachment #285097 - Attachment is obsolete: true
Attachment #285408 - Attachment is obsolete: true
Attachment #285634 - Attachment is obsolete: true
Attached patch patch -w (obsolete) — Splinter Review
Attachment #285884 - Flags: review?(aaronleventhal)
> // Do not initialize children unless the accessible is pending to
> // invalidate
s/unless/when ?

In nsIAccessible.idl it says
 94   /**
 95    * Number of accessible children
 96    */
 97   readonly attribute long childCount;
It doesn't say what it means if a childCount of -1 is returned
Should ::GetChildCount() return a failure code if the current element is stale and mAccChildCount is uninitialized?

I'm still concerned that the callers of GetChildCount() haven't taken that into account, that they could get -1 (or fail, which is probably what GetChildCount() should do if it only knows -1)
For example:
http://lxr.mozilla.org/seamonkey/source/accessible/src/atk/nsXULTreeAccessibleWrap.cpp#73
   mFirstChild->GetChildCount(&colCount);
  *aAccChildCount += rowCount * colCount;
If colCount is -1, and we have 100 rows, then *aAccChildCount will end up as -100. I see other similar problems elsewhere in the tree code.

Or here:
http://lxr.mozilla.org/seamonkey/source/accessible/src/msaa/nsAccessibleWrap.cpp#1069
It looks like mEnumVARIANTPosition can become 65535 because of this

Or if I'm an AT vendor, and I call:
ULONG childCount;
pAcc->get_accChildCount(&childCount);
for (count = 0; count < childCount; count ++) { ... }
Comment on attachment 285884 [details] [diff] [review]
patch -w

Minus because childCount can still be returned as -1
Attachment #285884 - Flags: review?(aaronleventhal) → review-
If isn't initialized then return 0 and throw an exception/return failure. Sounds good?
I think it's good, but make sure everything that calls GetChildCount() checks the return value now.
Attached patch patch2Splinter Review
Attachment #285883 - Attachment is obsolete: true
Attachment #285884 - Attachment is obsolete: true
Attached patch patch2 -wSplinter Review
Attachment #286395 - Flags: review?(aaronleventhal)
Blocks: 401395
Comment on attachment 286395 [details] [diff] [review]
patch2 -w

What about nsAccessibleWrap::get_accChildCount() -- shouldn't it return E_FAIL if GetChildCount() fails?
(In reply to comment #38)
> (From update of attachment 286395 [details] [diff] [review])
> What about nsAccessibleWrap::get_accChildCount() -- shouldn't it return E_FAIL
> if GetChildCount() fails?
> 

Not sure it's good idea, actually there is no error if children wasn't initialized and accessible is stale. For gecko I proposed this to get more attention (probably it wasn't correct though too). But client developers can tract that E_FAIL unpredictable. I mean in gecko we can dictate our's will but we can't do it outside to require special proccessing for firefox because http://msdn2.microsoft.com/en-us/library/ms696128.aspx doesn't say a word about something similar.
Blocks: 394493
Surkov, is the only time we need this fix when there are 2 mutations that happen at the same time, where the 2nd mutation happens before the 1st one is fired?

I'm still not sure exactly what's happening here and how this bug fixes the problem. I will fire up a debugger and investigate when I can but haven't been able to yet. If you could provide a nice, thorough description of the problem and solution that would be great. 
I found a regression of the patch.

When input a first character into an entry, no text-changed event get fired.
Comment on attachment 286395 [details] [diff] [review]
patch2 -w

We need Evan's comment addressed and a better description of what this patch does. I'm concerned about regressions with it generally.
Attachment #286395 - Flags: review?(aaronleventhal)
Surkov, I think this may be a much simpler way to deal with the problem. It also doesn't regress the text insertion event for the first char in an input field.
Assignee: surkov.alexander → aaronleventhal
Attachment #287000 - Flags: review?(surkov.alexander)
(In reply to comment #43)
> Created an attachment (id=287000) [details]
> Flush event queue before operating more on tree
> 
> Surkov, I think this may be a much simpler way to deal with the problem. It
> also doesn't regress the text insertion event for the first char in an input
> field.
> 

It sounds this get rid your idea of delayed events. Iirc we use delayed events to coalesce them. Right?
Attached patch patch3 (obsolete) — Splinter Review
with fixed Evan's regression.

Evan can you look it works fine?
Attachment #287111 - Flags: review?(aaronleventhal)
Attachment #287000 - Attachment is obsolete: true
Attachment #287145 - Flags: review?(surkov.alexander)
Attachment #287000 - Flags: review?(surkov.alexander)
Attachment #287145 - Attachment is obsolete: true
Attachment #287150 - Flags: review?(surkov.alexander)
Attachment #287145 - Flags: review?(surkov.alexander)
Can you give some hints how it fixes the bug, why there is no reparenting or rechilding?

Probably it's better to put the adoption into GetAccessible() to catch all cases?

It sounds it's more correct to use CacheChildren directly not to introduce new method (I mean you might want to put it into interface).
It fixes a problem where you have 

    X
  /  \
C1   C2

  and it becomes

    X 
    | 
    Y 
   / \
 C1   C2

The new node Y was inserted and it needs to adopt C1 and C2. Otherwise mParent for C1 and C2 are incorrect, and mFirstChild of X is incorrect. Also, C2->mNextSi bling may need to be set to null because Y may have captured some of the children but X may have had other children.

It's only a problem when the tree mutates, otherwise the parents/children of the objects nearby are not in danger of being incorrect. That's why I did not put it in GetAccessible(). It cause a lot of unnecessary processing while walking the tree there, and we'd have to watch out for recursion.

Does the latest patch still fix the regression caused by delayed invalidation?

By "delayed invalidation", I mean that we now invalidate the tree after events are fired. So if we try to query accessible after the corresponding dom node changed but before the delayed invalidate happen, we would get wrong information.

The delayed invalidation caused bug 394493 and probably bug 401395 also, those two bugs are depending on this one.
Evan, I'm not sure. Can you check please?
(In reply to comment #51)
> Evan, I'm not sure. Can you check please?
> 
My check shows that the latest patch doesn't help the two depending bugs.
Evan, what about my patch3, the same?
I still want the first crash fix to go in first. Then I will look at tree freezing. But, I'm very cautious about tree freezing.
Aaron, do you want to put new patch?
Surkov, what would you like to see in the new patch?

I'd like to get this in and work on the dependencies, from this as a base. I'm not sure whether we will need tree freezing. I need to prove it to myself because it's not trivial.
(In reply to comment #56)
> Surkov, what would you like to see in the new patch?

You said it's worth to try adoption after GetParentInChain in InvalidateCHildren instead of inside. And if you like to move CacheChildren into PI interface.
> You said it's worth to try adoption after GetParentInChain in
> InvalidateCHildren instead of inside. 
Now that I'm working on another fix that will use GetAccessibleInParentChain() with aCanCreate == PR_TRUE, I'd like to keep it in that method. I think it's safer to keep it there. Do you see a big reason to move it outside that method?

> And if you like to move CacheChildren
> into PI interface.
That's bug 398021. For now I'd like to leave it as it is.

So, seeking r+ on this patch.
(In reply to comment #58)
> > You said it's worth to try adoption after GetParentInChain in
> > InvalidateCHildren instead of inside. 
> Now that I'm working on another fix that will use GetAccessibleInParentChain()
> with aCanCreate == PR_TRUE, I'd like to keep it in that method. I think it's
> safer to keep it there. Do you see a big reason to move it outside that method?

I have the one reason actually is why. I get we need this in InvalidateCachedSubtree, but GetAccessibleInParentChain() is used more widely. So do you need actually to keep an adoption inside GetAccessibleInParentChain for your future patch?
Actually, for the future patch I don't think it matters.

But, I will need to add a new out parameter to GetAccessibleInParentChain(), which tells me whether accessible was created or a cached one is used.

Because if it just returns an already-cached accessible we don't need to adopt children. Everything is already correct.

That seems unnecessary.

> I have the one reason actually is why. I get we need this in
> InvalidateCachedSubtree, but GetAccessibleInParentChain() is used more widely.
GetAccessibleInParentChain() is used widely but not very often with aCanCreate == PR_TRUE. Since we only do this extra work when aCanCreate == PR_TRUE, I don't understand the concern. Do we really need to move it?
Comment on attachment 287150 [details] [diff] [review]
A little better at adopting children

Ok, I do not see explicit problems with the patch, excepting probably redurant  adoption in GetAccessibleInParentChain. Maybe new argument in the future patch would be nicer :)
Attachment #287150 - Flags: review?(surkov.alexander) → review+
Attachment #287150 - Flags: approval1.9?
Attachment #287150 - Flags: approval1.9? → approval1.9+
Status: ASSIGNED → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Flags: in-testsuite?
Verified fixed, using:
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9b2pre) Gecko/2007111605 Minefield/3.0b2pre
Status: RESOLVED → VERIFIED
I have backed this out because of bug 404343.
Status: VERIFIED → REOPENED
Resolution: FIXED → ---
Depends on: 404343
aab

abcc
This may have also caused bug 404582
Depends on: 404582
Looks like my technique didn't work. Fortunately these crashes aren't happening in the wild, at least according to the crash-stats.

While I want to look more into both Surkov and my approaches, I'm not sure what's safe enough for Firefox 3. 

The "tree freezing" technique may well be the right approach, but I really need a document on the wiki that throroughly describes tree freezing and what it does. I'm thinking we should check that into the the trunk after Firefox 3 branches, and let it bake for a while. We may be able to take it into a point release if it proves to be the right thing.
No longer blocks: 394493
This is a null dereference, right?  So not a security hole?

In general, please decide whether to mark a bug as security-sensitive based on whether the crash looks exploitable, not based on whether a crash also occurs on branch.  http://www.squarefree.com/2006/11/02/determining-whether-a-crash-looks-exploitable/ has useful hints.
I'm just chipping in on a tangent as while I don't follow all the details of this a couple of comments make me wonder if you sometimes re-create accessibles? I have a couple of cases where that seems to be happening and is causing problems. 

Basically I keep a reference to an accessible so when an event occurs I can compare the source to see if it is that particular node. If it gets recreated than the comparison fails.

In the absence of good practice guidelines for AT-SPI I guess that might be a valid thing to do but it makes life harder. IMHO it is reasonable for an accessible to have a lifetime as long as the object it represents.
Steve, can you call getRole() to check if the AtkObject you ref-ed is still alive in Gecko?
hi Ginn, yes I'm doing that anyway to do a property compare rather than identity check. Actually it's the top level ROLE_FRAME and Peter Parent says that is managed by the window manager not gecko. I had another case but worked round that so I'll try to find it again.
ROLE_FRAME of Firefox window should be managed by Gecko not libgail.
(In reply to comment #71)
> ROLE_FRAME of Firefox window should be managed by Gecko not libgail.

I'm trying to reproduce it again now.

Comment on attachment 287150 [details] [diff] [review]
A little better at adopting children

clearing the approval flag since this got backed out
Attachment #287150 - Flags: approval1.9+
This is a stack overflow, doesn't need to remain hidden.
Group: core-security
Whiteboard: [sg:nse] stack overflow
Mass un-assigning bugs assigned to Aaron.
Assignee: aaronleventhal → nobody
Whiteboard: [sg:nse] stack overflow → [sg:dos] stack exhaustion
worksforme on 22/10/2010 (local build)
Status: REOPENED → RESOLVED
Closed: 14 years ago11 years ago
Resolution: --- → WORKSFORME
Can anybody with rights cc me on bug 306663 that blocks this one?
Comment on attachment 287111 [details] [diff] [review]
patch3

doesn't make any sense for Firefox 4.
Attachment #287111 - Attachment is obsolete: true
Attachment #287111 - Flags: review?(aaronlev)
Crash Signature: [@ nsAccessible::GetFinalRole]
You need to log in before you can comment on or make changes to this bug.