Closed Bug 468727 Opened 16 years ago Closed 15 years ago

Crash [@ nsHTMLTextFieldAccessible::GetStateInternal()]

Categories

(Core :: Disability Access APIs, defect, P2)

1.9.1 Branch
defect

Tracking

()

RESOLVED FIXED
mozilla1.9.2a1

People

(Reporter: MarcoZ, Assigned: ginnchen+exoracle)

References

Details

(4 keywords)

Crash Data

Attachments

(1 file, 4 obsolete files)

Attached patch null check. (obsolete) — Splinter Review
Assignee: nobody → marco.zehe
Status: NEW → ASSIGNED
Attachment #352213 - Flags: review?(surkov.alexander)
mDOMNode of nsHTMLTextFieldAccessible should always implement nsIContent interface. It means there is one case when you can't query nsIContent, this case is when accessible is defunct (i.e. mDOMNode is null) but this is checked in nsAccessible::GetStateInternal which returns NS_OK_DEFUNCT_OBJECT in this case and this nsresult is checked inside nsHTMLTextFieldAccessible::GetStateInternal (by NS_ENSURE_A11Y_SUCCESS macros). So I can't see the case when you can't query nsIContent. Any thoughts? Are you able to reproduce this bug?
I ran into this bug yesterday once, I was simply going back to a previous page, and suddenly it crashed onme with this same signature. But no, I cannot reproduce the bug at will.
I have half a theory. I think it relates to this code:
http://mxr.mozilla.org/mozilla-central/source/accessible/src/xul/nsXULFormControlAccessible.cpp#819

There we create a temporary nsHTMLTextFieldAccessible, and guess what, we call GetStateInternal() on that.

Not sure which part is crashing, though.
That's the only place in the code where we create a new accessible on the stack, and don't init it. The accessible is not in the cache.

We did this to save code because the XUL textfield's state is dependent on the state of the HTML text field inside of it. Since we already had code to get the state of an HTML textfield we wanted an easy way to grab that.
Flags: blocking1.9.1?
See my new comments in 4 & 5. I believe it's a major hint. Anyone have any idea what's going wrong? Ginn, this is the kind of bug you are good at.

If we don't figure it out, I suggest we make the code more standard. Just pull all the logic to get the state from an <input> node into a static method that takes a DOM node, like:
static nsHTMLTextFieldAccessible::GetStateForTextInput(nsIDOMNode *aTextInput)
{

}

Then have both impls of GetStateInternal() call that, and remove the temporary accessible. I bet that will fix the bug.
In addition to that, we will need to borrow some code from nsAccessible::GetStateInternal() for nsXULTextFieldAccessible::GetStateInternal()

   else if (content->IsNodeOfType(nsINode::eELEMENT)) {
     nsIFrame *frame = GetFrame();
     if (frame && frame->IsFocusable()) {
       *aState |= nsIAccessibleStates::STATE_FOCUSABLE;
     }
 
     if (gLastFocusedNode == mDOMNode) {
       *aState |= nsIAccessibleStates::STATE_FOCUSED;
     }
   }
I'll take a stab at it tomorrow.
I think the null check makes sense.
It seems nsHyperTextAccessibleWrap::GetStateInternal() doesn't return an error if content is null.
Ginn, it doesn't return an error but it retunrs early with NS_OK_DEFUNCT_OBJECT. And, I think the solution of no longer creating the accessible on the stack is a good idea anyway. It can't hurt.

Here the details of how it returns early:

nsHTMLTextFieldAccessible::GetStateInternal() has this:
   nsresult rv = nsHyperTextAccessibleWrap::GetStateInternal(aState,
                                                             aExtraState);
   NS_ENSURE_A11Y_SUCCESS(rv, rv);

There is no nsHyperTextAccessibleWrap::GetStateInternal(),
so this calls up to nsHyperTextAccessible::GetStateInternal(), which has:
   nsresult rv = nsAccessibleWrap::GetStateInternal(aState, aExtraState);
   NS_ENSURE_A11Y_SUCCESS(rv, rv);

Which actually calls up to nsAccessible::GetStateInternal(aState, aExtraState), which has:
   if (IsDefunct()) {
     if (aExtraState)
       *aExtraState = nsIAccessibleStates::EXT_STATE_DEFUNCT;
 
    return NS_OK_DEFUNCT_OBJECT;
   }

The definition of NS_ENSURE_A11Y_SUCCESS ensures that NS_OK_DEFUNCT_OBJECT is returned early.
http://mxr.mozilla.org/mozilla-central/source/accessible/src/base/nsAccessNode.h#80

So, NS_OK_DEFUNCT object is returned up the chain, an  nsHTMLTextFieldAccessible::GetStateInternal() would return early.
OK, I see.

But how did msaa get the temp accessible object?
Ginn, we create it in nsXULTextFieldAccessible::GetStateInternal()
Maybe I don't understand your question.
My question is
the crash stack is called from msaa nsAccessibleWrap::get_accState,
so the call is from AT client.
how did the AT client get the pointer to temp accessible object?
I didn't see a good stack in awhile. Now that crash-stats is coming up I see that we may need to change the theory:
http://crash-stats.mozilla.com/report/index/468ea81c-3b2c-450f-bb40-80cd12090116

0  	xul.dll  	nsHTMLTextFieldAccessible::GetStateInternal  	accessible/src/html/nsHTMLFormControlAccessible.cpp:457
1 	xul.dll 	nsAccessNode::Release 	accessible/src/base/nsAccessNode.cpp:126
2 	xul.dll 	nsAccessible::GetState 	accessible/src/base/nsAccessible.cpp:2276
3 	xul.dll 	nsAccessibleWrap::get_accState 	accessible/src/msaa/nsAccessibleWrap.cpp:554 

But, I don't think it's a temp accessible issue now that I see that.
MArco, I'd love to know if the crashes started after we implemented cycle collecting.  

That stack is just weird, why would Release() call GetStateInternal? I don't get it.

Maybe some others have a different stack.

Bottom line: the problem needs more research.
Most of the stacks in reports are like this.
0	xul.dll	nsHTMLTextFieldAccessible::GetStateInternal	 accessible/src/html/nsHTMLFormControlAccessible.cpp:457
1	xul.dll	nsAccessible::GetState	 accessible/src/base/nsAccessible.cpp:2276
2	xul.dll	nsAccessibleWrap::get_accState	 accessible/src/msaa/nsAccessibleWrap.cpp:554

But some reports are strange.
e.g.
0	xul.dll	nsHTMLTextFieldAccessible::GetStateInternal	 accessible/src/html/nsHTMLFormControlAccessible.cpp:457
1	ntdll.dll	RtlFreeHeap

0	xul.dll	nsHTMLTextFieldAccessible::GetStateInternal	 accessible/src/html/nsHTMLFormControlAccessible.cpp:457
1	oleaut32.dll	LateInitRpcDll

0	xul.dll	nsHTMLTextFieldAccessible::GetStateInternal	 accessible/src/html/nsHTMLFormControlAccessible.cpp:457
1	nssutil3.dll	nssutil3.dll@0x10048
Wait, it's getting interesting. I wish crash-stats was working! But, the query above works -- perhaps it is cached.

All of these crashes are on Windows.

I just looked at the modules in each crash:
* every crash contains 3rd party apps that inject themselves into the process. This is on Windows only. That means we have no protection against bad pointers they might pass in. Normally the marshalling process will give us its own pointers before copying the data to another process, which means we only get good pointers.
* every crash except 1 contains this dll
APSHook.dll -- From Cognizance software. They make security software with things like single sign-on and web access management
* I also saw these here and there:
- GrooveSystemServices, GrooveShellExtensions, GrooveUtil -- part of Microsoft Groove, seems to be another file sharing/synching system
- DropboxExt -- from getdropbox.com, an online file sharing/synching program
- RocketDock -- a fancy mac like doc for Windows
- Yodm3d -- desktop 3D enhancer
- BtMmHook or BTKeyInd -- part of Blue Tooth, injects itself. Not clear what they do.

I'm amazed at how many applications inject themselves into Firefox's process!

If crash-stats was working we could look for more cases where it wasn't APSHook, or get a rough regression range. It could be that our new cycle collection is affecting this. It might always be related to file inputs or XUL inputs. I'm not even sure if the bad pointer is the accState pointer or the IAccessible.
The fact that it is just with HTML text field is suspicious.

Here is a hypothesis:
APSHook holds onto the IAccessible* of a given HTML textfield. It then uses it again and again, but the text field goes away at some point. They are still using the old pointer.

Either we have a refcounting error or they do. Perhaps they don't ref count it at all, assuming it will always stay the same.
Cognizance Corporation who made this APSHook.dll, was bought out by Bioscrypt in 2005.

I *think* this is related to single sign on solutions -- automatically filling in user name and password fields, via fingerprint identification, for example. I've heard of such apps wanting to use Firefox's MSAA support before.
A single sign on package is one that reads your fingerprint and fills in password fields. That means it needs to scan for textfields and see if they are STATE_PROTECTED, and thus passwords. So, it needs to call get_accState on textfields.
A web search for Firefox APSHook reveals people having all sorts of problems with sluggishness, lock ups, stack overflows, extra memory usage etc. Web searches for Verisoft Firefox reveal that many users are frustrated that Firefox 3 doesn't work with it. I called support and they told me that Firefox isn't supported, but that they would forward my email to the developers, so I've done that. I'm not holding my breath though.

Maybe the HP Pavilion folks who bundle Verisoft and a fingerprint reader will need to find out about this.

I guess one solution is to disable our accessibility support when this DLL is detected.
Summary: Crash [@ nsHTMLTextFieldAccessible::GetStateInternal(unsigned int*, unsigned int*) ] → Crash [@ nsHTMLTextFieldAccessible::GetStateInternal(unsigned int*, unsigned int*) -- probably Verisoft fingerprint reader that comes with HP laptops
Summary: Crash [@ nsHTMLTextFieldAccessible::GetStateInternal(unsigned int*, unsigned int*) -- probably Verisoft fingerprint reader that comes with HP laptops → Crash [@ nsHTMLTextFieldAccessible::GetStateInternal()], probably Verisoft fingerprint reader that comes with HP laptops
Depends on: 444749
Update: a developer from Bioscrypt, Alex Krasnov, has replied to me and will try and help resolve the issue.

He writes "The fact that the problem did not happen with Firefox 2 could suggest that the issue is caused by a combination of behavior of our product and new behaviors in Firefox 3."

We need crash-stats queries to work, in order to see what changed in Firefox that makes APSHook cause us to crash now.
Flags: blocking1.9.1? → blocking1.9.1+
Priority: -- → P2
We absolutely need to find the right fix here but what about, in the meantime, pushing Marco's null check? Would we just bury the problem deeper?

Also regarding comment 16, I'm confused too. What is up with those strange frame stacks?
The DOM node is shutdown during thensHTMLTextFieldAccessible::GetStateInternal() call.

  [3] nsCOMPtr<nsIDOMNode>::operator=(this = 0xef2fa47c, rhs = (nil)), line 640 in "nsCOMPtr.h"
  [4] nsAccessNode::Shutdown(this = 0xef2fa470), line 225 in "nsAccessNode.cpp"
  [5] nsAccessible::Shutdown(this = 0xef2fa470), line 486 in "nsAccessible.cpp"
  [6] nsAccessibleWrap::Shutdown(this = 0xef2fa470), line 331 in "nsAccessibleWrap.cpp"
  [7] nsAccessNode::GetPresShell(this = 0xef2fa470), line 357 in "nsAccessNode.cpp"
  [8] nsAccessible::IsVisible(this = 0xef2fa470, aIsOffscreen = 0x8045624), line 847 in "nsAccessible.cpp"
  [9] nsAccessible::GetStateInternal(this = 0xef2fa470, aState = 0x8045b6c, aExtraState = 0x8045b68), line 999 in "nsAccessible.cpp"
  [10] nsHyperTextAccessible::GetStateInternal(this = 0xef2fa470, aState = 0x8045b6c, aExtraState = 0x8045b68), line 178 in "nsHyperTextAccessible.cpp"
  [11] nsHTMLTextFieldAccessible::GetStateInternal(this = 0xef2fa470, aState = 0x8045b6c, aExtraState = 0x8045b68), line 450 in "nsHTMLFormControlAccessible.cpp"
  [12] nsAccessible::GetState(this = 0xef2fa470, aState = 0x8045b6c, aExtraState = 0x8045b68), line 2276 in "nsAccessible.cpp"
  [13] refStateSetCB(aAtkObj = 0xef25e970), line 1036 in "nsAccessibleWrap.cpp"

That's why nsHyperTextAccessibleWrap::GetStateInternal() passed, but mDOMNode and content is null after that.

This bug is reliably reproducible with orca testcase doc_tabs.py.
Aaron, Surkov, David,

Would it be a performance issue if we do GetPresShell().get() in IsDefunct() ?
Other concerns?
(In reply to comment #26)
> Aaron, Surkov, David,
> 
> Would it be a performance issue if we do GetPresShell().get() in IsDefunct() ?
> Other concerns?

Aaron doesn't watch bugzilla anymore, you should email him directly to get his attention.

It might be because every method calls IsDefunct(). But on another hand it might be ok if it's  right approach in general.
technically if we shutdown an accessible during external call then we must stop to execute this method and return error.
Right, so I think we should do it in IsDefunct().

It is better than return error in IsVisible().
If we do it in IsVisible(), we need to reset aState, aExtraState for error, it doesn't sound good.

And do it in IsDefunt() is safer.
Since we are going to fix it in 1.9.1, we should take the safe approach.
Attached patch patch (obsolete) — Splinter Review
Attachment #366781 - Flags: review?(surkov.alexander)
Ginn, I still thinking of two things
1) do we need to shutdown accessilbe on GetPresShell (this method and logic was introduced in bug 193802)
2) understand why we didn't invalidate accessible tree when there is no presshell.

But these might be result of following up bug and in the meantime we could take IsDefunct approach.
Comment on attachment 366781 [details] [diff] [review]
patch


>-    virtual PRBool IsDefunct() { return !mDOMNode; }
>+    virtual PRBool IsDefunct() { return !mDOMNode || !GetPresShell().get(); }

! operator isn't defined for already_AddRefed?

I would appreciate a comment here, possibly it makes sense to move the method into .cpp file.

Ginn, could you file a following up bug to investigate what's wrong going on here, why we don't invalidate accessible tree?

r=me
Attachment #366781 - Flags: review?(surkov.alexander) → review+
Attachment #352213 - Flags: review?(surkov.alexander)
Comment on attachment 352213 [details] [diff] [review]
null check.

cancelling reivew, though this should fix the bug :)
Assignee: marco.zehe → ginn.chen
The problem is we didn't shutdown doc accessible when a tab is closed.
I filed Bug 482932 to track it.
Summary: Crash [@ nsHTMLTextFieldAccessible::GetStateInternal()], probably Verisoft fingerprint reader that comes with HP laptops → Crash [@ nsHTMLTextFieldAccessible::GetStateInternal()]
http://hg.mozilla.org/mozilla-central/rev/f570b2539bc0
Status: ASSIGNED → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla1.9.2a1
back out due to leaks
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Attached patch patch v2 (obsolete) — Splinter Review
fix the leak
Attachment #352213 - Attachment is obsolete: true
Attachment #366781 - Attachment is obsolete: true
Attachment #367199 - Flags: review?(surkov.alexander)
Status: REOPENED → ASSIGNED
Attached patch patch v2 (obsolete) — Splinter Review
duh!
Attachment #367199 - Attachment is obsolete: true
Attachment #367203 - Flags: review?(surkov.alexander)
Attachment #367199 - Flags: review?(surkov.alexander)
Attachment #367203 - Flags: review?(surkov.alexander) → review+
Comment on attachment 367203 [details] [diff] [review]
patch v2

Sorry I noticed this leak after my previous r+ and I didn't say you about it. I thought you work on another patch (invalidate accessibles when tab is destroyed) and I thought you're not going to push this patch.

If you want to keep this patch as temporary solution then it's ok with me and r=me. But I still hope to see correct fix for this problem.
I think we should also apply the change to
nsXULTreeitemAccessible::IsDefunct()

I'll continue work on Bug 482932.
But I'm not sure if the fix is suitable for 1.9.1 branch.
OS: Windows Vista → All
Hardware: x86 → All
(In reply to comment #40)
> I think we should also apply the change to
> nsXULTreeitemAccessible::IsDefunct()

Probably, but xultreeitem accessible is very specific and doesn't use many things from base accessible class.

> But I'm not sure if the fix is suitable for 1.9.1 branch.

Technically it should be suitable. But I'm not sure this approach is right. In general I think GetPresShell shouldn't shutdown the accessible. It's something wrong. I would say, it's probably good to morph your patch and query nsIPressShell from mWeakShell and assert if it can't be queried. That should be enough with your fix.
Attached patch patch v3Splinter Review
Attachment #367203 - Attachment is obsolete: true
Attachment #367564 - Flags: review?(surkov.alexander)
Attachment #367564 - Flags: review?(david.bolter)
Comment on attachment 367564 [details] [diff] [review]
patch v3

>+PRBool nsAccessNode::IsDefunct()
>+{
>+  if (!mDOMNode)
>+    return PR_TRUE;
>+
>+  // Call GetPresShell() since the accessible may be shut down in it.
>+  nsCOMPtr<nsIPresShell> presShell(GetPresShell());
>+  return !presShell;

So in this case we think we have an mDOMNode but we can't get a pres shell. Can we prevent this? If so please open a bug for that work (or fix here).

Either way I think this patch makes us more stable :) so if Alexander is happy, r=me
Attachment #367564 - Flags: review?(david.bolter) → review+
Ginn, could you address second part of comment #42? I'm not sure we really need
your patch because of bug 482932.

(In reply to comment #43)

> So in this case we think we have an mDOMNode but we can't get a pres shell. Can
> we prevent this? If so please open a bug for that work (or fix here).

David, it's bug 482932. Technically I believe GetPresShell shouldn't shutdown the accessible. I think we should assert rather.
Surkov,
I agree, ideally mDOMNode should be gone when PresShell is gone.
But currently even with the patch of bug 482932, it's still possible that GetPresShell() failed for CallQueryReferent. If we don't shutdown it, it will hold the memory until we empty our caches on quit.
I agree to add an assertion, and track it in bugzilla.
But we need to keep the shutdown in GetPresShell() until we have addressed the issue completely.

On the other hand, if the pres shell is gone, the accessible obj is defunct, the semantics is correct.
Even if GetPresShell() doesn't shutdown node, it would be a good protection. (the comment should be changed though)
I think we should keep this patch, or add an assertion later.

Also since bug 482932 has a long history, I think we don't need to address it in 1.9.1 branch.

My plan is
1) Land patch of bug 468727 on trunk and 1.9.1
2) Land patch of bug 482932 on trunk and add assertion in GetPresShell().
3) If we have fixed the assertion, we can remove Shutdown() from GetPresShell().
Comment on attachment 367564 [details] [diff] [review]
patch v3

Ginn, you're plan sounds reasonable. So r=me with nits.

>+
>+PRBool nsAccessNode::IsDefunct()

nit: PRBool on own line

nit: try to place IsDefunct definition between GetCurrentFocus and Init to be similar to IsDefunct declaration.
Attachment #367564 - Flags: review?(surkov.alexander) → review+
Pushed to trunk:
http://hg.mozilla.org/mozilla-central/rev/987405c2542a

I can't put it between GetCurrentFocus and Init, because GetCurrentFocus comes after Init. :)
I put it before GetPresShell(), it would be easier to later changes.
Status: ASSIGNED → RESOLVED
Closed: 15 years ago15 years ago
Resolution: --- → FIXED
Crash Signature: [@ nsHTMLTextFieldAccessible::GetStateInternal()]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: