Closed Bug 1181177 Opened 7 years ago Closed 7 years ago

Linux and Windows crashes in mozilla::dom::PBrowserChild::SendPDocAccessibleConstructor

Categories

(Core :: Disability Access APIs, defect)

x86_64
Linux
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla42
Tracking Status
e10s m8+ ---
firefox41 --- affected
firefox42 --- fixed

People

(Reporter: jonathan, Assigned: billm)

References

Details

(Keywords: crash)

Crash Data

Attachments

(2 files)

Attached file allyfox.sh
User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:41.0) Gecko/20100101 Firefox/41.0
Build ID: 20150706004011

Steps to reproduce:

a11y enabled e10s nightly/aurora
http://unepic.wikia.com/wiki/Unepic_Wiki
Hit F5 every second until



Actual results:

Crash
bp-0e37f16a-8e5f-40a7-81e4-a93142150707

Usually crashes only after a few refreshes.



Expected results:

No crash

bug 1170153 hasn't fix this crash.

shell script attached that produces crash
adduser allyfox (lowercase L)
setup nightly in path and install basic tools

With a longer pause to refreshing (manually hitting refresh about time plug-in icon appears) I get browser crash in mozilla::a11y::ProxyAccessible::Shutdown() bug 1100602 bp-51f45409-80b0-4f63-be6c-f727f2150707

(note my usual profile now is with a11y off so not major for me.)
Crash Signature: [@ mozilla::dom::PBrowserChild::SendPDocAccessibleConstructor(mozilla::a11y::PDocAccessibleChild*, mozilla::a11y::PDocAccessibleChild*, unsigned long const&) ]
Keywords: crash
OS: Unspecified → Linux
Hardware: Unspecified → x86_64
tracking-e10s: --- → ?
Depends on: 1170153
Johnathan, is this a clean profile you're reproducing with? Any addons?
Flags: needinfo?(jonathan)
Attachment #8630510 - Attachment mime type: application/x-shellscript → application/text
Attachment #8630510 - Attachment mime type: application/text → text/plain
I'll try to reproduce this.
Assignee: nobody → wmccloskey
I see now that the script sets up a clean profile
Flags: needinfo?(jonathan)
Cool, this reproduces for me! I'll see what I can find.
The problem seems to be that tabChild is null here, even after the patch in bug 1170153. It's null because aDocument->GetDocShell()->mTreeOwner is null. I also notice that aDocument->mCurrentURI is javascript:"<html><body%20style='background:transparent'></body></html>".

I have no idea how javascript: URIs render content, but maybe they're supposed to start out with no docshell tree owner? Should we just add a different null check here?
Flags: needinfo?(tbsaunde+mozbugs)
Flags: needinfo?(bugs)
javascript: uris just create JSProtocolHandler+Channel and feed the data normally.
In that way no special handling in docshell.
http://mxr.mozilla.org/mozilla-central/source/dom/jsurl/nsJSProtocolHandler.h
http://mxr.mozilla.org/mozilla-central/source/dom/jsurl/nsJSProtocolHandler.cpp


There are certain special cases though related to navigation timing and whether to stop the previous load. Easy to look at those in nsDocShell by looking for javascript:

Hmm, why would docshell's treeowner be null? Have we closed the tab?
Flags: needinfo?(bugs)
(In reply to Olli Pettay [:smaug] from comment #6)
> There are certain special cases though related to navigation timing and
> whether to stop the previous load. Easy to look at those in nsDocShell by
> looking for javascript:

Nothing sticks out there.

> Hmm, why would docshell's treeowner be null? Have we closed the tab?

The tab doesn't close. But we are reloading a lot. Also, I see that nsDocShell::Destroy does SetTreeOwner(nullptr). Maybe there's a frame with a javascript: URL and it's being removed from the document before accessibility gets to do something with it?

I think this code is only supposed to execute for root frames. However, a frame that was removed from the document might look like a root frame since its parent docshell will be null.
> > Hmm, why would docshell's treeowner be null? Have we closed the tab?
> 
> The tab doesn't close. But we are reloading a lot. Also, I see that
> nsDocShell::Destroy does SetTreeOwner(nullptr). Maybe there's a frame with a
> javascript: URL and it's being removed from the document before
> accessibility gets to do something with it?
> 
> I think this code is only supposed to execute for root frames. However, a
> frame that was removed from the document might look like a root frame since
> its parent docshell will be null.

that's certainly possible.

(In reply to Olli Pettay [:smaug] from comment #6)
> javascript: uris just create JSProtocolHandler+Channel and feed the data
> normally.
> In that way no special handling in docshell.

So does that mean they can have content that gets rendered?

(In reply to Bill McCloskey (:billm) from comment #5)
> The problem seems to be that tabChild is null here, even after the patch in
> bug 1170153. It's null because aDocument->GetDocShell()->mTreeOwner is null.
> I also notice that aDocument->mCurrentURI is
> javascript:"<html><body%20style='background:transparent'></body></html>".
> 
> I have no idea how javascript: URIs render content, but maybe they're
> supposed to start out with no docshell tree owner? Should we just add a
> different null check here?

If we  assume the problem is in fact the document is being shut down then I tend to think the right fix is to bail out of CreateDocOrRootAccessible earlier and not create a DocAccessible inthe first place.
Flags: needinfo?(tbsaunde+mozbugs)
Status: UNCONFIRMED → NEW
Ever confirmed: true
I debugged this a little further today and it seems like this is what is happening:
- It looks like the javascript: URL is being loaded in an iframe.
- When we reload the page (part of the STR), the main docshell calls nsDocShell::DestroyChildren from nsDocShell::SetupNewViewer. The DestroyChildren call sets mTreeOwner to null for the javascript: frame.

So it sounds like we want to ignore docshells like this. What's the best way to detect this?
Flags: needinfo?(bugs)
Btw I don't know which crash this bug is talking about. SendPDocAccessibleConstructor
or is this about the crash mentioned in the initial comment?

::DestroyChildren is a bit odd name, since it doesn't actually destroy anything.
We call it for example when moving stuff to bfcache. When a page and its iframes then come out from bfcache, treeowner is set again.

I don't recall what triggers creating PDocAccessibleConstructor. Sounds like we should check there
whether we have a treeowner. 
Also, how does non-e10s a11y works here? Does it destroy a11y tree when page moves to bfcache and recreates it when needed?
Flags: needinfo?(bugs) → needinfo?(tbsaunde+mozbugs)
Yes, this is the SendPDocAccessibleConstructor crash. When we get to this line:
http://hg.mozilla.org/mozilla-central/annotate/tip/accessible/base/DocManager.cpp#l462
tabChild is null because we use the tree owner to find the TabChild.

We could just null check tabChild here, but I'm wondering if there's a different check we should be doing at the top of the function. It seems like CreateDocOrRootAccessible is trying to exit if the document is not visible.
> ::DestroyChildren is a bit odd name, since it doesn't actually destroy
> anything.
> We call it for example when moving stuff to bfcache. When a page and its
> iframes then come out from bfcache, treeowner is set again.
> 
> I don't recall what triggers creating PDocAccessibleConstructor. Sounds like
> we should check there
> whether we have a treeowner. 
> Also, how does non-e10s a11y works here? Does it destroy a11y tree when page
> moves to bfcache and recreates it when needed?

I'm not actually sure, but we call PDocAccessibleConstructor right when the DocAccessible is created in the non e10s case.


(In reply to Bill McCloskey (:billm) from comment #11)
> Yes, this is the SendPDocAccessibleConstructor crash. When we get to this
> line:
> http://hg.mozilla.org/mozilla-central/annotate/tip/accessible/base/
> DocManager.cpp#l462
> tabChild is null because we use the tree owner to find the TabChild.
> 
> We could just null check tabChild here, but I'm wondering if there's a
> different check we should be doing at the top of the function. It seems like
> CreateDocOrRootAccessible is trying to exit if the document is not visible.

that's correct, it doesn't make much sense to make a document accessible if it will never contain anything a user can see.  I'd like to have e10s and non e10s behave the same way, but if that turns out to be difficult we can probably behave differently and see if that turns out to be a problem.
Flags: needinfo?(tbsaunde+mozbugs)
Summary: [e10s] crash in mozilla::dom::PBrowserChild::SendPDocAccessibleConstructor → Linux crash in mozilla::dom::PBrowserChild::SendPDocAccessibleConstructor(mozilla::a11y::PDocAccessibleChild*, mozilla::a11y::PDocAccessibleChild*, unsigned long const&)
Blocks: 1182472
Duplicate of this bug: 1182472
This causes e10s to behave differently from non e10s, because in 10s we don't
forward the creation of the document to the parent process.  However this only
seems to happen in strange edge cases.
Attachment #8632234 - Flags: review?(dbolter)
It's too bad we don't have browser stacks for these, it would be interesting to see if there are chrome process module correlations with 3rd party software that turn on a11y.
Crash Signature: [@ mozilla::dom::PBrowserChild::SendPDocAccessibleConstructor(mozilla::a11y::PDocAccessibleChild*, mozilla::a11y::PDocAccessibleChild*, unsigned long const&) ] → [@ mozilla::dom::PBrowserChild::SendPDocAccessibleConstructor(mozilla::a11y::PDocAccessibleChild*, mozilla::a11y::PDocAccessibleChild*, unsigned long const&) ] [@ mozilla::dom::PBrowserChild::SendPDocAccessibleConstructor(mozilla::a11y::PDocAccessibleChi…
Summary: Linux crash in mozilla::dom::PBrowserChild::SendPDocAccessibleConstructor(mozilla::a11y::PDocAccessibleChild*, mozilla::a11y::PDocAccessibleChild*, unsigned long const&) → Linux and Windows crashes in mozilla::dom::PBrowserChild::SendPDocAccessibleConstructor
(In reply to Jim Mathies [:jimm] from comment #15)
> It's too bad we don't have browser stacks for these, it would be interesting
> to see if there are chrome process module correlations with 3rd party
> software that turn on a11y.

I don't think this is likely to be tied to any particular piece of a11y software. It's just a bug in our code that hits randomly when a11y is enabled.
Comment on attachment 8632234 [details] [diff] [review]
null check tabChild in DocManager::CreateDocOrRootAccessible

Review of attachment 8632234 [details] [diff] [review]:
-----------------------------------------------------------------

Yeah let's not crash (thanks for IRC chat). Note missing an e in checkin comment 'because in 10s'. Do you think we should comment something in the code change, about uncertainty what will happen for this case, i.e. will the document get sorted out later if it gets a tree owner...
Attachment #8632234 - Flags: review?(dbolter) → review+
https://hg.mozilla.org/mozilla-central/rev/289cd34133f0
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla42
Hey Bill, is this something we can uplift to aurora? It's currently our #1 top crasher there.

https://crash-stats.mozilla.com/topcrasher/products/Firefox/versions/41.0a2/date_range_type/build/crash_type/all/os_name/None/result_count/50?days=3
Flags: needinfo?(wmccloskey)
I think so, assuming we keep a11y enabled on aurora.
Flags: needinfo?(wmccloskey)
You need to log in before you can comment on or make changes to this bug.