Open Bug 1132505 Opened 9 years ago Updated 2 years ago

[e10s] Inspector is blank and would not close

Categories

(DevTools :: Inspector, defect, P2)

x86_64
Windows 7
defect

Tracking

(Not tracked)

People

(Reporter: alice0775, Unassigned)

References

Details

(Keywords: regression)

Attachments

(3 files, 1 obsolete file)

Build Identifier:
https://hg.mozilla.org/mozilla-central/rev/3094601af679
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:38.0) Gecko/20100101 Firefox/38.0 ID:20150212030231

Steps to reproduce:
1. Open e10s window
2. Open New Tab (Ctrl+T)
3. Open Inspector (Ctrl+Shift+C)
4. Open https://developer.mozilla.org/en-US/Add-ons/Code_snippets

Actual Results:
Inspector is blank and would not close
I can't seem to duplicate this using the same build here.  I am on a Mac however.

Do you see any errors in the Browser Console?
Flags: needinfo?(alice0775)
(In reply to J. Ryan Stinnett [:jryans] from comment #1)
> I can't seem to duplicate this using the same build here.  I am on a Mac
> however.
> 
> Do you see any errors in the Browser Console?

I should mention that
Select "Blank" in gear icon of the new tab

Steps to reproduce:
1. Open e10s window
2. Open New Tab (Ctrl+T) and Select "Blank" in gear icon
3. Open Inspector (Ctrl+Shift+C)
4. Open https://developer.mozilla.org/en-US/Add-ons/Code_snippets
Flags: needinfo?(alice0775) → needinfo?(jryans)
Error: Connection closed
Stack trace:
Front<.destroy@resource://gre/modules/commonjs/toolkit/loader.js -> resource://gre/modules/devtools/server/protocol.js:1115:23
exports.WalkerFront<.destroy@resource://gre/modules/commonjs/toolkit/loader.js -> resource://gre/modules/devtools/server/actors/inspector.js:2713:5
frontProto/</proto[name]/<@resource://gre/modules/commonjs/toolkit/loader.js -> resource://gre/modules/devtools/server/protocol.js:1333:11
Handler.prototype.process@resource://gre/modules/Promise.jsm -> resource://gre/modules/Promise-backend.js:867:23
this.PromiseWalker.walkerLoop@resource://gre/modules/Promise.jsm -> resource://gre/modules/Promise-backend.js:746:7
this.PromiseWalker.scheduleWalkerLoop/<@resource://gre/modules/Promise.jsm -> resource://gre/modules/Promise-backend.js:688:37
 protocol.js:20


Error: Connection closed
Stack trace:
Front<.destroy@resource://gre/modules/commonjs/toolkit/loader.js -> resource://gre/modules/devtools/server/protocol.js:1115:23
exports.WalkerFront<.destroy@resource://gre/modules/commonjs/toolkit/loader.js -> resource://gre/modules/devtools/server/actors/inspector.js:2713:5
frontProto/</proto[name]/<@resource://gre/modules/commonjs/toolkit/loader.js -> resource://gre/modules/devtools/server/protocol.js:1333:11
Handler.prototype.process@resource://gre/modules/Promise.jsm -> resource://gre/modules/Promise-backend.js:867:23
this.PromiseWalker.walkerLoop@resource://gre/modules/Promise.jsm -> resource://gre/modules/Promise-backend.js:746:7
this.PromiseWalker.scheduleWalkerLoop/<@resource://gre/modules/Promise.jsm -> resource://gre/modules/Promise-backend.js:688:37
 Promise-backend.js:870


TypeError: this.docShell is null
Stack trace:
TabActor.prototype.chromeEventHandler@resource://gre/modules/commonjs/toolkit/loader.js -> resource://gre/modules/devtools/server/actors/webbrowser.js:611:5
exports.HighlighterActor<._getPickerListenerTarget@resource://gre/modules/commonjs/toolkit/loader.js -> resource://gre/modules/devtools/server/actors/highlighter.js:292:5
exports.HighlighterActor<._stopPickerListeners@resource://gre/modules/commonjs/toolkit/loader.js -> resource://gre/modules/devtools/server/actors/highlighter.js:305:18
exports.HighlighterActor<.cancelPick<@resource://gre/modules/commonjs/toolkit/loader.js -> resource://gre/modules/devtools/server/actors/highlighter.js:324:7
actorProto/</handler@resource://gre/modules/commonjs/toolkit/loader.js -> resource://gre/modules/devtools/server/protocol.js:1006:19
DSC_onPacket@resource://gre/modules/commonjs/toolkit/loader.js -> resource://gre/modules/devtools/server/main.js:1450:15
LocalDebuggerTransport.prototype.send/<@resource://gre/modules/devtools/dbg-client.jsm -> resource://gre/modules/devtools/transport/transport.js:545:11
makeInfallible/<@resource://gre/modules/commonjs/toolkit/loader.js -> resource://gre/modules/devtools/DevToolsUtils.js:82:14
makeInfallible/<@resource://gre/modules/commonjs/toolkit/loader.js -> resource://gre/modules/devtools/DevToolsUtils.js:82:14
 protocol.js:900


TypeError: a(...).offset(...) is undefined wiki-min.js:1:10624


NS_ERROR_FAILURE: Component returned failure code: 0x80004005 (NS_ERROR_FAILURE) [nsIProperties.get] FileUtils.jsm:63:0
Thanks, I can reproduce now.  It seems possibly related to navigating a tab from XUL chrome page to a regular content page.

A regression window would be great to have.  I believe bug 1128027 could fix this as well, but it may take some time to get the fix in place.
Depends on: 1128027
Flags: needinfo?(jryans)
I can repro on OS X. Some maybe important details:

* when you open the inspector first, then the mdn page, I noticed the inspector is working initially and 
showing the about:newtab xul document

* if I open the mdn page first and then the inspector, it works normally

* doing cmd+alt+c again gets me a second, functioning toolbox, see screenshot

I'd hazard a guess this is problem switching an open toolbox from chrome content to web content?
Pushlog:
https://hg.mozilla.org/integration/fx-team/pushloghtml?fromchange=0aa625346f89&tochange=c994439deec4

Triggered by: Bug 1105825


Before landing Bug 1105825, The inspector closes when Step4 of Comment 2. But I can reopen inspector and the inspector works.

After landing Bug 1105825, Inspector is blank and would not close as this bug.
Blocks: 1105825
The errors found in the logs when the toolbox closes (because it tries to close when switching from the non-e10s blank page to the e10s content page, see bug 1068400) don't correspond to the root cause of the problem. Fixing those still leave the toolbox blank and un-closable.

Alice0775 White is right, something in the shutdown process of the animation panel causes this, but no errors are logged.

I'm investigated now.
Note that if the toolbox is open with cmd+alt+I instead of cmd+alt+C (just toggling the tools, not starting the picker), then the toolbox closes fine. Note that this is true even if the inspector is the default tool.

Also, the settleAll in toolbox's destroy (http://mxr.mozilla.org/mozilla-central/source/browser/devtools/framework/toolbox.js#1663) doesn't make it to the end of the list because of a failure in the inspector panel, and as a result, neither the catch nor the then handlers are called, which is why the toolbox stays visible. Shouldn't that part always succeed, even if a panel fails?
So far, I tracked it down to the destroyInspector function in toolbox.js which is part of the settleAll promise resolution in toolbox destroy.
The promise returned by this function never resolves or rejects, therefore blocking the rest of the toolbox destroying process to take place.
And at at the end of track there is this: http://mxr.mozilla.org/mozilla-central/source/browser/devtools/framework/toolbox-highlighter-utils.js#151
Basically, when trying to close the toolbox, we destroy the highlighter and cancel the element picking mode.
To do this, we execute |toolbox.highlighter.cancelPick()| which is an protocol actor method, and so that returns a promise.
For some reasons I can't understand yet, try/catching around the yield in the task or adding a rejection handler doesn't do anything, it keeps on failing with no error message and breaks the rest of the process.
(In reply to Patrick Brosset [:pbrosset] [:patrick] from comment #11)
> And at at the end of track there is this:
> http://mxr.mozilla.org/mozilla-central/source/browser/devtools/framework/
> toolbox-highlighter-utils.js#151
> Basically, when trying to close the toolbox, we destroy the highlighter and
> cancel the element picking mode.
> To do this, we execute |toolbox.highlighter.cancelPick()| which is an
> protocol actor method, and so that returns a promise.
> For some reasons I can't understand yet, try/catching around the yield in
> the task or adding a rejection handler doesn't do anything, it keeps on
> failing with no error message and breaks the rest of the process.

This is same thing I saw when investigating bug 1117067, where the WebIDE toolbox failed to close if the connection is aborted.  We end up blocking on inspector destruction attempting to send more RDP messages, even though the connection is already dead.

Now that I've seen this bug, I am guessing the WebIDE issue in bug 1117067 was likely also caused by changes from the animation work as well, but that root cause was not clear to me until this bug appeared.

I'll try to look closely at the animation work to see if we've changed the timing of toolbox shutdown or if we're now waiting on protocol activity where we weren't before.

Bug 1128027 is my attempt to reject all protocol.js requests in progress once we know the connection is dead.  I have just verified the patch in bug 1128027 does fix the problem seen here.  I think it's a good safety measure to have in place generally.  However, it's caused some strange test failures that will take me a while to work out, so it might not land very soon.

So, I think it's also valuable to find what's changed in inspector shutdown to see if we can get it not block if a connection is dead, which should be a more targeted fix that is easier to land.
Here's the timeline of events that causes toolbox destruction to hang here:

1. Toolbox starts |destroy|
2. |destroyInspector| calls |stopPicker| which sends |cancelPick| (as mentioned in comment 11)
  * Toolbox destroy yields here an waits for a reply from the server
3. DebuggerClient closes the transport
  * It does not track pending protocol.js activity at all
4. Highlighter actor's |cancelPick| handler throws
5. protocol.js sends a error packet back to the client
  * Normally, this error packet would reject the deferred that step 2 is yielding on
  * However, since transport is already closed, the client never receives this packet

So, the global fix for this type of destruction problem seems to be that protocol.js needs to clean up pending requests, and that's what I'm working towards in bug 1128027.

However, we can still make a small adjustment here:  It seems unnecessary to yield on |cancelPick|.  We can consider this a "best effort" request and proceed with toolbox shutdown whether or not it ever completes.
Assignee: nobody → jryans
Status: NEW → ASSIGNED
Attached file MozReview Request: bz://1132505/jryans (obsolete) —
/r/3865 - Bug 1132505 - Record protocol.js request headers for debugging. r=pbrosset
/r/3867 - Bug 1132505 - Don't yield on stopPicker during toolbox destroy. r=pbrosset

Pull down these commits:

hg pull review -r 5a7b5a44606bbe3eefc5c8ab7d15001063b330fb
Attachment #8564350 - Flags: review?(pbrosset)
Comment on attachment 8564350 [details]
MozReview Request: bz://1132505/jryans

Hmm, quite a few tests to fix first.
Attachment #8564350 - Flags: review?(pbrosset)
Thanks jryans, your analysis in comment 12 and comment 13 seem sound, it does indeed seem that we need bug 1128027 to clean this up. In the meantime, I'm fine for removing the yield when calling cancelPick.
After bug 1128027 and/or bug 1059308, the behavior has changed a bit.  The inspector no longer goes blank and it is possible to close.  However, it remains connected to about:newtab, instead of the new content page we changed to.

In the browser console I get:

TypeError: this._walker is null highlighter.js:278:4
TypeError: docShell is undefined webbrowser.js:52:6
Error: Got an invalid root window in DocumentWalker
Stack trace:
DocumentWalker@resource://gre/modules/commonjs/toolkit/loader.js -> resource://gre/modules/devtools/server/actors/inspector.js:3323:1
WalkerActor<.getDocumentWalker@resource://gre/modules/commonjs/toolkit/loader.js -> resource://gre/modules/devtools/server/actors/inspector.js:1194:12
WalkerActor<.parentNode@resource://gre/modules/commonjs/toolkit/loader.js -> resource://gre/modules/devtools/server/actors/inspector.js:1411:18
PageStyleActor<.getAppliedProps@resource://gre/modules/commonjs/toolkit/loader.js -> resource://gre/modules/devtools/server/actors/styles.js:650:20
PageStyleActor<.getApplied<@resource://gre/modules/commonjs/toolkit/loader.js -> resource://gre/modules/devtools/server/actors/styles.js:493:12
actorProto/</handler@resource://gre/modules/commonjs/toolkit/loader.js -> resource://gre/modules/devtools/server/protocol.js:1006:19
DSC_onPacket@resource://gre/modules/commonjs/toolkit/loader.js -> resource://gre/modules/devtools/server/main.js:1458:15
LocalDebuggerTransport.prototype.send/<@resource://gre/modules/devtools/dbg-client.jsm -> resource://gre/modules/devtools/transport/transport.js:561:11
makeInfallible/<@resource://gre/modules/commonjs/toolkit/loader.js -> resource://gre/modules/devtools/DevToolsUtils.js:82:14
makeInfallible/<@resource://gre/modules/commonjs/toolkit/loader.js -> resource://gre/modules/devtools/DevToolsUtils.js:82:14

Patrick, would be able to investigate what remains to be done here?

There is also bug 1068400, which seems like a related issue when moving from chrome to content pages in e10s.
Flags: needinfo?(pbrosset)
See Also: → 1068400
Assignee: jryans → nobody
Status: ASSIGNED → NEW
Here's what we learn when enabling the protocol message logging:

After the blank new tab page is open, and the inspector is open, enter the url to any website
> DBG-SERVER: Received packet 45: {
>   "from": "server1.conn0.tab1",
>   "type": "frameUpdate",
>   "frames": [
>     {
>       "id": 29,
>       "destroy": true
>     }
>   ]
> }
This signals the destroy of the non-e10s frame (because the new tab page isn't hosted in an e10s window)

> DBG-SERVER: Received packet 46: {
>   "type": "newMutations",
>   "from": "server1.conn0.domwalker30"
> }
> ...
> DBG-SERVER: Received packet 48: {
>   "mutations": [
>     {
>       "type": "attributes",
>       "target": "server1.conn0.domnode33",
>       "numChildren": 6,
>       "attributeName": "hasBrowserHandlers",
>       "newValue": null
>     }
>   ],
>   "from": "server1.conn0.domwalker30"
> }
Right before unload, the new tab page mutates an attribute on one of its elements.

> DBG-SERVER: Received packet 49: {
>   "type": "getApplied",
>   "node": "server1.conn0.domnode33",
>   "inherited": true,
>   "matchedSelectors": true,
>   "to": "server1.conn0.pagestyle32"
> }
This causes the style-inspector to try and refresh itself (when DOM attributes change, applied styles may differ).

And that call to the getApplied protocol method then accesses the DOM via the WalkerActor, and that fails with the exception reported by jryans above.
Flags: needinfo?(pbrosset)
We can prevent the error caused by the style-inspector, but that's just half of the story. The exception will be silenced, but the toolbox will still be "connected" to the old page. I believe we have some server-side work to do when switching from a non-e10s to an e10s page, and I believe that's what bug 1068400 is about.

To test, you can do the following:
- open the blank new tab page (non e10s)
- open the webconsole panel
- enter a normal webpage url in the url bar and hit enter
- try to type anything in the webconsole input:

DBG-SERVER: error occurred while processing 'autocomplete: TypeError: argument is not a global object
Stack: WCA_onAutocomplete@resource://gre/modules/commonjs/toolkit/loader.js -> resource://gre/modules/devtools/server/actors/webconsole.js:858:19
DSC_onPacket@resource://gre/modules/commonjs/toolkit/loader.js -> resource://gre/modules/devtools/server/main.js:1459:15
LocalDebuggerTransport.prototype.send/<@resource://gre/modules/devtools/dbg-client.jsm -> resource://gre/modules/devtools/transport/transport.js:561:11
makeInfallible/<@resource://gre/modules/commonjs/toolkit/loader.js -> resource://gre/modules/devtools/DevToolsUtils.js:82:14
makeInfallible/<@resource://gre/modules/commonjs/toolkit/loader.js -> resource://gre/modules/devtools/DevToolsUtils.js:82:14
Line: 858, column: 18
DBG-SERVER: Received packet 21: {
  "error": "unknownError",
  "message": "error occurred while processing 'autocomplete: TypeError: argument is not a global object\nStack: WCA_onAutocomplete@resource://gre/modules/commonjs/toolkit/loader.js -> resource://gre/modules/devtools/server/actors/webconsole.js:858:19\nDSC_onPacket@resource://gre/modules/commonjs/toolkit/loader.js -> resource://gre/modules/devtools/server/main.js:1459:15\nLocalDebuggerTransport.prototype.send/<@resource://gre/modules/devtools/dbg-client.jsm -> resource://gre/modules/devtools/transport/transport.js:561:11\nmakeInfallible/<@resource://gre/modules/commonjs/toolkit/loader.js -> resource://gre/modules/devtools/DevToolsUtils.js:82:14\nmakeInfallible/<@resource://gre/modules/commonjs/toolkit/loader.js -> resource://gre/modules/devtools/DevToolsUtils.js:82:14\nLine: 858, column: 18"
}
Right, seems reasonable.  Let's depend on bug 1068400 then.
Depends on: 1068400
The behavior has changed in nightly again:
Now the toolbox closes when the tab navigates and it is possible to open and use the toolbox again on the next page.
The inspector-related exception remains, but is not blocking.
bug 1068400 is still the bug we should depend on if we want the toolbox to not close at all, but that's definitely not what this bug was originally about in the first place.
See Also: → 1264664
Inspector bug triage (filter on CLIMBING SHOES).
Priority: -- → P2
Product: Firefox → DevTools
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: