Closed Bug 988133 Opened 6 years ago Closed 6 years ago

[e10s] view-source crashes in nsDocShellEditorData::ReattachToWindow()

Categories

(Core :: DOM: Navigation, defect)

x86
All
defect
Not set

Tracking

()

VERIFIED FIXED
mozilla33
Tracking Status
e10s + ---
firefox31 --- affected
firefox35 --- verified

People

(Reporter: cpeterson, Assigned: mconley)

References

Details

(Keywords: crash, Whiteboard: [workaround - see bug 1025146])

Attachments

(1 file)

STR:
1. In a new e10s window, view source on some web page.

RESULT:
CRASH!

************************************************************
* Call to xpconnect wrapped JSObject produced this error:  *
[Exception... "[JavaScript Error: "cannot ipc non-cpow object" {file: "chrome://global/content/viewSource.js" line: 152}]'[JavaScript Error: "cannot ipc non-cpow object" {file: "chrome://global/content/viewSource.js" line: 152}]' when calling method: [nsISHEntry::setURI]"  nsresult: "0x80570021 (NS_ERROR_XPC_JAVASCRIPT_ERROR_WITH_DETAILS)"  location: "JS frame :: chrome://global/content/viewSource.js :: viewSource :: line 152"  data: yes]
************************************************************
OpenGL version detected: 210
OpenGL vendor: Intel Inc.
OpenGL renderer: Intel Iris OpenGL Engine
************************************************************
* Call to xpconnect wrapped JSObject produced this error:  *
[Exception... "[JavaScript Error: "cannot ipc non-cpow object"]"  nsresult: "0x80570021 (NS_ERROR_XPC_JAVASCRIPT_ERROR_WITH_DETAILS)"  location: "native frame :: <unknown filename> :: <TOP_LEVEL> :: line 0"  data: yes]
************************************************************
JavaScript error: , line 0: NS_NOINTERFACE: Component returned failure code: 0x80004002 (NS_NOINTERFACE) [nsISupports.QueryInterface]
JavaScript error: , line 0: NS_NOINTERFACE: Component returned failure code: 0x80004002 (NS_NOINTERFACE) [nsISupports.QueryInterface]
JavaScript error: , line 0: NS_NOINTERFACE: Component returned failure code: 0x80004002 (NS_NOINTERFACE) [nsISupports.QueryInterface]
JavaScript error: , line 0: NS_NOINTERFACE: Component returned failure code: 0x80004002 (NS_NOINTERFACE) [nsISupports.QueryInterface]
JavaScript error: , line 0: NS_NOINTERFACE: Component returned failure code: 0x80004002 (NS_NOINTERFACE) [nsISupports.QueryInterface]
JavaScript error: , line 0: NS_NOINTERFACE: Component returned failure code: 0x80004002 (NS_NOINTERFACE) [nsISupports.QueryInterface]
JavaScript error: , line 0: NS_NOINTERFACE: Component returned failure code: 0x80004002 (NS_NOINTERFACE) [nsISupports.QueryInterface]
JavaScript error: , line 0: NS_NOINTERFACE: Component returned failure code: 0x80004002 (NS_NOINTERFACE) [nsISupports.QueryInterface]
JavaScript error: , line 0: NS_NOINTERFACE: Component returned failure code: 0x80004002 (NS_NOINTERFACE) [nsISupports.QueryInterface]
JavaScript error: , line 0: NS_NOINTERFACE: Component returned failure code: 0x80004002 (NS_NOINTERFACE) [nsISupports.QueryInterface]
JavaScript error: , line 0: NS_NOINTERFACE: Component returned failure code: 0x80004002 (NS_NOINTERFACE) [nsISupports.QueryInterface]
JavaScript error: , line 0: NS_NOINTERFACE: Component returned failure code: 0x80004002 (NS_NOINTERFACE) [nsISupports.QueryInterface]
************************************************************
* Call to xpconnect wrapped JSObject produced this error:  *
[Exception... "Component returned failure code: 0x80004005 (NS_ERROR_FAILURE) [nsIURI.host]"  nsresult: "0x80004005 (NS_ERROR_FAILURE)"  location: "native frame :: <unknown filename> :: <TOP_LEVEL> :: line 0"  data: no]
************************************************************

Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_INVALID_ADDRESS at address: 0x0000000080004005
nsDocShellEditorData::ReattachToWindow (this=0x80004005, aDocShell=0x11f9e0520) at nsDocShellEditorData.cpp:226
226	  mDocShell = aDocShell;
(gdb) bt
#0  nsDocShellEditorData::ReattachToWindow (this=0x80004005, aDocShell=0x11f9e0520) at nsDocShellEditorData.cpp:226
#1  0x000000010363c386 in nsDocShell::ReattachEditorToWindow (this=<value temporarily unavailable, due to optimizations>, aSHEntry=<value temporarily unavailable, due to optimizations>) at /Users/cpeterson/Code/mozilla/inbound/docshell/base/nsDocShell.cpp:7477
#2  0x000000010363c120 in nsCOMPtr<nsISHEntry>::get () at /Users/cpeterson/Code/mozilla/inbound/docshell/base/nsDocShell.cpp:6615
#3  0x000000010363c120 in nsCOMPtr<nsISHEntry>::operator nsISHEntry* () at /Users/cpeterson/Code/mozilla/inbound/objdir-osx/dist/include/nsCOMPtr.h:864
#4  0x000000010363c120 in nsDocShell::Embed (this=0x11f9e0400, aContentViewer=<value temporarily unavailable, due to optimizations>, aCommand=0x1 <Address 0x1 out of bounds>, aExtraInfo=0x0) at /Users/cpeterson/Code/mozilla/inbound/docshell/base/nsDocShell.cpp:6618
#5  0x00000001036455d6 in nsDocShell::CreateContentViewer (this=0x11f9e0400, aContentType=<value temporarily unavailable, due to optimizations>, request=0x11f974720, aContentHandler=<value temporarily unavailable, due to optimizations>) at /Users/cpeterson/Code/mozilla/inbound/docshell/base/nsDocShell.cpp:8388
#6  0x000000010365f163 in nsDSURIContentListener::DoContent (this=0x110da2d80, aContentType=0x1237cef48 "application/xhtml+xml", aIsContentPreferred=<value temporarily unavailable, due to optimizations>, request=0x11f974720, aContentHandler=0x11f17e1e8, aAbortProcess=0x7fff5fbfd08f) at nsDSURIContentListener.cpp:122
#7  0x0000000101d4e5ae in nsDocumentOpenInfo::TryContentListener (this=<value temporarily unavailable, due to optimizations>, aListener=0x110da2d80, aChannel=0x11f974720) at nsURILoader.cpp:710
#8  0x0000000101d4d104 in nsDocumentOpenInfo::DispatchContent (this=0x11f17e1c0, request=0x11f974720, aCtxt=<value temporarily unavailable, due to optimizations>) at nsURILoader.cpp:386
#9  0x0000000101d4cbdb in nsDocumentOpenInfo::OnStartRequest (this=0x11f17e1c0, request=0x11f974720, aCtxt=0x0) at nsURILoader.cpp:262
#10 0x0000000101639384 in nsBaseChannel::OnStartRequest (this=<value temporarily unavailable, due to optimizations>, request=<value temporarily unavailable, due to optimizations>, ctxt=<value temporarily unavailable, due to optimizations>) at nsBaseChannel.cpp:715
#11 0x000000010164da13 in nsInputStreamPump::OnStateStart (this=0x11f32c690) at nsInputStreamPump.cpp:517
#12 0x000000010164d74f in nsInputStreamPump::OnInputStreamReady (this=0x11f32c690, stream=0x11f9e0520) at nsInputStreamPump.cpp:431
#13 0x0000000101591629 in nsCOMPtr<nsIInputStreamCallback>::assign_assuming_AddRef () at nsStreamUtils.cpp:85
#14 0x0000000101591629 in nsCOMPtr<nsIInputStreamCallback>::assign_with_AddRef () at /Users/cpeterson/Code/mozilla/inbound/objdir-osx/dist/include/nsCOMPtr.h:1257
#15 0x0000000101591629 in nsCOMPtr<nsIInputStreamCallback>::operator= () at /Users/cpeterson/Code/mozilla/inbound/objdir-osx/dist/include/nsCOMPtr.h:694
#16 0x0000000101591629 in nsInputStreamReadyEvent::Run (this=<value temporarily unavailable, due to optimizations>) at nsStreamUtils.cpp:86
#17 0x00000001015aa9be in ~nsCOMPtr [inlined] () at nsThread.cpp:694
#18 0x00000001015aa9be in ~nsCOMPtr [inlined] () at /Users/cpeterson/Code/mozilla/inbound/objdir-osx/dist/include/nsCOMPtr.h:700
#19 0x00000001015aa9be in nsThread::ProcessNextEvent (this=0x1003239c0, mayWait=<value temporarily unavailable, due to optimizations>, result=0x7fff5fbfd567) at nsThread.cpp:529
#20 0x000000010151b1cd in NS_ProcessPendingEvents (thread=<value temporarily unavailable, due to optimizations>, timeout=20) at nsThreadUtils.cpp:210
#21 0x00000001025b906a in nsBaseAppShell::NativeEventCallback (this=0x109af3b60) at nsBaseAppShell.cpp:98
#22 0x0000000102570aff in nsAppShell::ProcessGeckoEvents (aInfo=0x109af3b60) at nsAppShell.mm:388
#23 0x00007fff8c560731 in __CFRUNLOOP_IS_CALLING_OUT_TO_A_SOURCE0_PERFORM_FUNCTION__ ()
#24 0x00007fff8c551ea2 in __CFRunLoopDoSources0 ()
#25 0x00007fff8c55162f in __CFRunLoopRun ()
#26 0x00007fff8c5510b5 in CFRunLoopRunSpecific ()
#27 0x00007fff93b3ea0d in RunCurrentEventLoopInMode ()
#28 0x00007fff93b3e685 in ReceiveNextEventCommon ()
#29 0x00007fff93b3e5bc in _BlockUntilNextEventMatchingListInModeWithFilter ()
#30 0x00007fff950d73de in _DPSNextEvent ()
#31 0x00007fff950d6a2b in -[NSApplication nextEventMatchingMask:untilDate:inMode:dequeue:] ()
#32 0x0000000102570136 in -[GeckoNSApplication nextEventMatchingMask:untilDate:inMode:dequeue:] (self=0x10031ea10, _cmd=<value temporarily unavailable, due to optimizations>, mask=18446744073709551615, expiration=0x422d63c37f00000d, mode=0x7fff79ec9d00, flag=1 '\001') at nsAppShell.mm:165
#33 0x00007fff950cab2c in -[NSApplication run] ()
#34 0x0000000102571177 in nsAppShell::Run (this=<value temporarily unavailable, due to optimizations>) at nsAppShell.mm:742
#35 0x00000001038a02a2 in nsAppStartup::Run (this=0x109a63e20) at nsAppStartup.cpp:276
#36 0x0000000103834763 in XREMain::XRE_mainRun (this=<value temporarily unavailable, due to optimizations>) at /Users/cpeterson/Code/mozilla/inbound/toolkit/xre/nsAppRunner.cpp:4008
#37 0x0000000103834d88 in XREMain::XRE_main (this=0x7fff5fbff0e0, argc=<value temporarily unavailable, due to optimizations>, argv=<value temporarily unavailable, due to optimizations>, aAppData=<value temporarily unavailable, due to optimizations>) at /Users/cpeterson/Code/mozilla/inbound/toolkit/xre/nsAppRunner.cpp:4077
#38 0x000000010383510f in XRE_main (argc=0, argv=0x7fff5fbfe958, aAppData=0x422d63c37f00000d, aFlags=<value temporarily unavailable, due to optimizations>) at /Users/cpeterson/Code/mozilla/inbound/toolkit/xre/nsAppRunner.cpp:4289
#39 0x0000000100002323 in main (argc=<value temporarily unavailable, due to optimizations>, argv=<value temporarily unavailable, due to optimizations>) at /Users/cpeterson/Code/mozilla/inbound/browser/app/nsBrowserApp.cpp:282
tracking-e10s: --- → +
This is pretty horrible. We start out around here:
http://mxr.mozilla.org/mozilla-central/source/browser/base/content/browser.js#2075
and we get a CPOW that points to the docshell of the content page (nsIWebPageDescriptor is just another interface that docshell supports).

Then, later on, we get here:
http://mxr.mozilla.org/mozilla-central/source/toolkit/components/viewsource/content/viewSource.js#152
We're calling loadPage on the docshell of some view source browser element and passing it the CPOW. Amazingly, we get pretty far along that path. Eventually we crash when the network code tries to call SchemeIs on a CPOW for a URI object. It fails because we're using a WrappedJS (the CPOW) off the main thread.

Blake, do you know anything about this code? The design of nsIWebPageDescriptor makes this pretty hard to do in e10s. The only easy solution I can think of is to run the view source browser element in the same remote process as the content itself. Then we could use CPOWs to safely call nsIWebPageDescriptor and the like. That's rather ugly though.
Flags: needinfo?(mrbkap)
Assignee: nobody → mrbkap
Flags: needinfo?(mrbkap)
Duplicate of this bug: 1005689
Yoink. Stealing - mrbkab, feel free to grab back if you really want this yourself.
Assignee: mrbkap → mconley
Duplicate of this bug: 1023538
Hey handyman, were you still interested in working on this? Should I assign it to you, or did you find something else to hack on?
Flags: needinfo?(davidp99)
I've started looking into it but I'm not that far yet.  I'm trying to finish up bug 994554 first.  I'd hold onto it but, if you really want it, I wouldn't deny you the pleasure.
Flags: needinfo?(davidp99)
Alright, I'm going to push on it a bit and see where I get. Thanks. :)
Status: NEW → ASSIGNED
So one easy way of eliminating the crash is to _not_ pass the nsIWebPageDescriptor to view source. That means re-loading the source off the network again, which isn't great - but it means we skip a crash.

Worst case scenario, it'd be a band-aid while we figure out a better solution.
This bug works around the crash by not retrieving the document from the cache, but by
hitting the network again for the document source. That's clearly not ideal, but the
crash is worse. This is a band-aid solution until we can get the cache retrieval
working properly. This workaround should probably not ride the trains to release.
Comment on attachment 8440034 [details] [diff] [review]
Workaround crashes in nsDocShellEditorData::ReattachToWindow() when doing view source in a remote tab. r=?

Hey felipe - we discussed this workaround in the last e10s meeting, and we're willing to put this band-aid on so long as we don't ship it, and come up with a better solution before we uplift. This is just to make the browser not-crashy for Nightly testers.

Thoughts?
Attachment #8440034 - Flags: review?(felipc)
Along with the band-aid, I've started looking into an actual solution, and it might actually be possible to land a real solution instead.

There are still, however, a few things I don't understand. Let me list my findings though.

When opening View Source, we open a window and start loading a view-source URI which attempts to get the document out of the cache instead of hitting the network.

We do this successfully - this is not the problem.

The problem is that when we embed the docshell in its new window with nsDocShell::Embed, we reach this line:

http://hg.mozilla.org/mozilla-central/file/48eee276b1ee/docshell/base/nsDocShell.cpp#l6617

Which attempts to restore some stashed editor state that's held within the shared history entry for the page we're on.

When not using e10s, the editor state is nullptr, so mLSHE->HasDetachedEditor() returns false, and we never do this. So we don't crash.

When we have e10s, the editor state is 0x80004005. I don't know how the editor state gets set there - the shared history entry is a private member of the history entry itself, and it only sets the editor state in one place - and that never occurs. So I have _no idea_ how that value is being set.

What I _do_ know, is that value is totally bogus. 0x80004005 is the hex representation of NS_ERROR_FAILURE. Why is NS_ERROR_FAILURE being stashed in what should be the pointer to some editor state, and how does that stashing happen? No clue.

But that's why we crash.
Alright, so, some developments.

I was wrong about there being some memory being overwritten. Here's what's going on:

For some reason, in the nsDocShell::Embed function, the mLSHE member variable exists, and is a CPOW.

We pass this over to ReattachEditorToWindow, and then execute this line: http://hg.mozilla.org/mozilla-central/file/48eee276b1ee/docshell/base/nsDocShell.cpp#l7468

Remember, aSHEntry is a CPOW. We call ForgetEditorData on that CPOW, and that function is noscript. Either because it's noscript, or because it's ridiculous for a CPOW to return a pointer (which ForgetEditorData is supposed to do), what we get back is NS_ERROR_FAILURE, which we set to mEditorData.

We then call mEditorData->ReattachToWindow, where mEditorData points to NS_ERROR_FAILURE (0x80004005), and kaboom.
Ok, I think I'm starting to figure this out. The reason why we're getting a CPOW for the nsSHEntry is, as billm had noticed, due to the nsIWebPageDescriptor being passed to nsDocShell::LoadPage.

We QI the nsIWebPageDescriptor CPOW to an nsSHEntry, and then clone it, which (you guessed it), returns a CPOW.

http://hg.mozilla.org/mozilla-central/file/48eee276b1ee/docshell/base/nsDocShell.cpp#l5116

Ok, mystery solved. So if getting the history entry is the _only_ reason to pass in the nsIWebPageDescriptor, maybe we can pass in something else to get the nsSHEntry on the parent process's side. I'm looking into that now.
We can simply open view-source:url in a new tab as workaround. It's better as it avoids a popup, and it also fixes the crash.
Duplicate of this bug: 1025460
(In reply to Tim Nguyen [:ntim] from comment #16)
> We can simply open view-source:url in a new tab as workaround. It's better
> as it avoids a popup, and it also fixes the crash.

The view-source:url thing is essentially what my workaround patch does. The only major drawback here is that it hits the network in order to show us the source, and I'd like to avoid that if possible.

While it's true that the crash backtrace mentions a window, this really has not a whole lot to do with the view source _window_, so much as the history entry that represents the document we're trying to view the source of. Opening the source in a tab is not something that would fix this.

Perhaps opening the source viewer in a tab is something UX might propose down the line (we're certainly working hard on reducing the number of windows that we open!), but I don't think that should be the focus of this bug.
So my original idea of attempting to get the history entry on the parent side isn't really panning out. It turns out that the nsIWebNavigation's of remote browsers return a CPOW for the sessionHistory. :/

I'm looking into alternatives now - though we might end up going with billm's original idea in comment 1.
Comment on attachment 8440034 [details] [diff] [review]
Workaround crashes in nsDocShellEditorData::ReattachToWindow() when doing view source in a remote tab. r=?

Review of attachment 8440034 [details] [diff] [review]:
-----------------------------------------------------------------

::: browser/base/content/browser.js
@@ +2090,5 @@
> +    // the network in this case, so it's not a permanent solution, hence hiding
> +    // it behind the E10S_TESTING_ONLY ifdef. This is just a band-aid fix until
> +    // we can find something better - see bug 1025146.
> +    //
> +    // TL;DR - please don't ship this.

no need for the TL;DR.. After all, we are shipping this :) This not indefinitely
Attachment #8440034 - Flags: review?(felipc) → review+
s/This/just ..
Thanks felipe!

remote:   https://hg.mozilla.org/integration/fx-team/rev/0e64a1711d80

Assuming this sticks, the follow-up work to enable us to remove this band-aid will occur in bug 1025146.
Whiteboard: [fixed-in-fx-team][workaround - see bug 1025146]
https://hg.mozilla.org/mozilla-central/rev/0e64a1711d80
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Whiteboard: [fixed-in-fx-team][workaround - see bug 1025146] → [workaround - see bug 1025146]
Target Milestone: --- → mozilla33
Not yet fixed!

718ad241-af33-42c0-9859-b4f4592c94d8
(In reply to Achwaq Khalid from comment #24)
> Not yet fixed!
> 
> 718ad241-af33-42c0-9859-b4f4592c94d8

Hey Achwaq - the patch didn't make the Nightly cut-off, so it wasn't built into last night's Nightly.

Can you try again with tomorrow's Nightly to see if it still crashes?

Or, if you're feeling a bit impatient, you can try a Tinderbox build with the patch:

Windows: http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-central-win32/1403007456/firefox-33.0a1.en-US.win32.zip
OS X: http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-central-macosx64/1403007456/firefox-33.0a1.en-US.mac.dmg
Linux: http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-central-linux/1403007456/firefox-33.0a1.en-US.linux-i686.tar.bz2
Hi Mike! just tried the tinderbox build under windows but unfortunately still crashes,

here is the signature: bp-2ee4ff82-49fa-4c06-af5f-99e9c2140617

https://crash-stats.mozilla.com/report/index/2ee4ff82-49fa-4c06-af5f-99e9c2140617
Hrm. Ok. Unfortunately, that crash report has no symbols, so I can't see what's going on.

Let me try to reproduce it over here.
No luck reproducing with the Windows build I linked to.

Achwaq - are you able to reproduce that crash in a new profile with that build?
Flags: needinfo?(achwaqkhalid)
My bad Mike :) 

I was referring to "Page Info" not "page source" hence the fix is really working with this build
Flags: needinfo?(achwaqkhalid)
Thanks Achwaq - I've filed bug 1026696 about the Page Info crash.
Great.
QA Whiteboard: [good first verify]
I was able to reproduce this bug on Nightly 31.0a1 (2014-03-25), using Windows 7 x64.

Verified fixed on Windows 7 x64, Mac OSX 10.9.5 and Ubuntu 14.04 x86 using Nightly 35.0a1 (2014-10-09).

This fix can be marked as verified.

[bugday-20141008]
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.