Closed Bug 1615412 Opened 1 year ago Closed 10 months ago

Periodically unable to connect Browser Toolbox, despite numerous process restarts, or toolbox split with error message shown on the left

Categories

(DevTools :: General, defect, P1)

defect

Tracking

(firefox-esr68 unaffected, firefox75 wontfix, firefox76 wontfix, firefox77 fixed)

RESOLVED FIXED
Firefox 77
Tracking Status
firefox-esr68 --- unaffected
firefox75 --- wontfix
firefox76 --- wontfix
firefox77 --- fixed

People

(Reporter: mconley, Assigned: ochameau)

References

(Blocks 2 open bugs)

Details

(Keywords: dogfood, regression)

Attachments

(3 files)

Attached image image.png

STR:

  1. Launch the Browser Toolbox

ER:

The Browser Toolbox should attach.

AR:

The Browser Toolbox shows a "Browser Toolbox connection status" error message. See screenshot (I can't seem to cut and paste the error text, frustratingly).

Sometimes when I see this error, I can restart the Browser Toolbox, and it attaches properly. This time, however, no matter how many times I restart the Browser Toolbox, it shows me the same error message.

I'll also note that restarting the Firefox instance doesn't appear to help. The only thing I've ever found that helps is restarting the entire OS, which seems suboptimal.

Same problem, probably for 2 - 3 months now. Only OS restart fixes it. No Firefox's processes are otherwise running.

Re Morgan's screenshot: If toolbox fails to connect, in most cases it will have a full page error dump. Only very rarely I get Morgan's split error toolbox state and recently it only shows up for a moment and then recovers.

Keywords: regression
Priority: -- → P2

I'm running into a (related?) similar issue, which I'll document on this bug too, though it might be unrelated. Note restarting firefox and restarting my OS had no effect.

STR:

  1. Launch https://www.mozilla.org/media/contentcards/img/home-2019/card_2/guide-high-res.a12bff1c92ab.jpg
  2. Open browser toolbox with CMD + Option + Shift + I

Expected:

Browser toolbox launches, attached to the current active window

Actual:

See screenshot browser_toolbox_split.png [Window titled "Multiprocess Browser Toolbox". The left half of the window is white, with JSON text titled "Browser Toolbox connection status". The JSON below the heading indicates the connection is successful. The right half of the window shows the normal browser toolbox. The left "pane" is not collapsible and does not vanish with use of the toolbox.]

My current nightly is from here: https://hg.mozilla.org/mozilla-central/rev/2f6870dd1b99edaba1de4d2aa97f3910a640c5bf
I'm running OSX 10.15.3
I also initially thought this might be a fission thing, but I disabled fission.autostart and got the same results.

Alex, any idea if Multiprocess Toolbox work could have regressed this or any other leads?

(See comment above)

Flags: needinfo?(poirot.alex)
Summary: Periodically unable to connect Browser Toolbox, despite numerous process restarts. → Periodically unable to connect Browser Toolbox, despite numerous process restarts, or toolbox split with error message shown on the left

(In reply to :Harald Kirschner :digitarald from comment #5)

Alex, any idea if Multiprocess Toolbox work could have regressed this or any other leads?

I couldn't think about anything in particular, but Nicolas just hit an exception and was able to capture it.
And it is related to fission work.

console.error: "Error while calling actor 'root's method 'listRemoteFrames'" "currentContext.getChildren is not a function"
console.error: "listRemoteFrames@resource://devtools/server/actors/root.js:650:45\nasync*handler@resource://devtools/shared/protocol/Actor.js:154:37\nonPacket@resource://devtools/server/devtools-server-connection.js:380:58\n_onJSONObjectReady/<@resource://devtools/shared/transport/transport.js:464:22\nexports.makeInfallible/<@resource://devtools/shared/ThreadSafeDevToolsUtils.js:111:22\nexports.makeInfallible/<@resource://devtools/shared/ThreadSafeDevToolsUtils.js:111:22\n"
console.error: "Exception while opening the toolbox" "Error: Protocol error (unknownError): currentContext.getChildren is not a function" (new Error("Protocol error (unknownError): currentContext.getChildren is not a function", "resource://devtools/shared/protocol/Front.js", 294))
onPacket/<@resource://devtools/shared/protocol/Front.js:294:31
DevTools RDP*request@resource://devtools/shared/protocol/Front.js:232:14
generateRequestMethods/</frontProto[name]@resource://devtools/shared/protocol/Front/FrontClassWithSpec.js:49:19
listRemoteFrames@resource://devtools/shared/fronts/targets/browsing-context.js:132:33
listen@resource://devtools/shared/resources/target-list.js:90:42
startListening@resource://devtools/shared/resources/target-list.js:324:47
async*open/<@resource://devtools/client/framework/toolbox.js:754:29
open@resource://devtools/client/framework/toolbox.js:899:18
createToolbox@resource://devtools/client/framework/devtools.js:622:19
async*showToolbox@resource://devtools/client/framework/devtools.js:509:35
openToolbox@chrome://devtools/content/framework/browser-toolbox/window.js:205:30
connect@chrome://devtools/content/framework/browser-toolbox/window.js:112:9
async*@chrome://devtools/content/framework/browser-toolbox/window.js:162:13
EventListener.handleEvent*@chrome://devtools/content/framework/browser-toolbox/window.js:140:8

It looks like in this new code:
https://searchfox.org/mozilla-central/rev/49ed791eec93335abfe6c2880f84c324e73e47e6/devtools/server/actors/root.js#649
currentContext is sometimes undefined.

I'll note that I haven't been able to properly use the Browser Toolbox for several weeks now. Even after restarts, I seem to enter this state pretty regularly.

(In reply to Mike Conley (:mconley) (:⚙️) from comment #8)

I'll note that I haven't been able to properly use the Browser Toolbox for several weeks now. Even after restarts, I seem to enter this state pretty regularly.

+1 I haven't been able to use either (since reporting here) :(

(In reply to Alexandre Poirot [:ochameau] from comment #7)

(In reply to :Harald Kirschner :digitarald from comment #5)

Alex, any idea if Multiprocess Toolbox work could have regressed this or any other leads?

I couldn't think about anything in particular, but Nicolas just hit an exception and was able to capture it.
And it is related to fission work.

console.error: "Error while calling actor 'root's method 'listRemoteFrames'" "currentContext.getChildren is not a function"
console.error: "listRemoteFrames@resource://devtools/server/actors/root.js:650:45\nasync*handler@resource://devtools/shared/protocol/Actor.js:154:37\nonPacket@resource://devtools/server/devtools-server-connection.js:380:58\n_onJSONObjectReady/<@resource://devtools/shared/transport/transport.js:464:22\nexports.makeInfallible/<@resource://devtools/shared/ThreadSafeDevToolsUtils.js:111:22\nexports.makeInfallible/<@resource://devtools/shared/ThreadSafeDevToolsUtils.js:111:22\n"
console.error: "Exception while opening the toolbox" "Error: Protocol error (unknownError): currentContext.getChildren is not a function" (new Error("Protocol error (unknownError): currentContext.getChildren is not a function", "resource://devtools/shared/protocol/Front.js", 294))
onPacket/<@resource://devtools/shared/protocol/Front.js:294:31
DevTools RDP*request@resource://devtools/shared/protocol/Front.js:232:14
generateRequestMethods/</frontProto[name]@resource://devtools/shared/protocol/Front/FrontClassWithSpec.js:49:19
listRemoteFrames@resource://devtools/shared/fronts/targets/browsing-context.js:132:33
listen@resource://devtools/shared/resources/target-list.js:90:42
startListening@resource://devtools/shared/resources/target-list.js:324:47
async*open/<@resource://devtools/client/framework/toolbox.js:754:29
open@resource://devtools/client/framework/toolbox.js:899:18
createToolbox@resource://devtools/client/framework/devtools.js:622:19
async*showToolbox@resource://devtools/client/framework/devtools.js:509:35
openToolbox@chrome://devtools/content/framework/browser-toolbox/window.js:205:30
connect@chrome://devtools/content/framework/browser-toolbox/window.js:112:9
async*@chrome://devtools/content/framework/browser-toolbox/window.js:162:13
EventListener.handleEvent*@chrome://devtools/content/framework/browser-toolbox/window.js:140:8

It looks like in this new code:
https://searchfox.org/mozilla-central/rev/49ed791eec93335abfe6c2880f84c324e73e47e6/devtools/server/actors/root.js#649
currentContext is sometimes undefined.

Maybe we could more gracefully fail in that case in the meantime? Maybe filtering out null browsingContexts before pushing them onto contextsToWalk in https://searchfox.org/mozilla-central/rev/d69ec052bed8700af7a283e37b60b4af22734930/devtools/server/actors/root.js#604 would work? Given Comments 8 and 9 that would be helpful.

Also Mike, do you have any idea why we might be getting a null browsingContext on that line above? Like, in what case a browser element in in the DOM browser.xhtml wouldn't have a browsingContext.

Flags: needinfo?(mconley)

Mike or Morgan, if you don't mind testing locally and you could add .filter(browsingContext => browsingContext) right after .map(browser => browser.browsingContext) at https://searchfox.org/mozilla-central/rev/d69ec052bed8700af7a283e37b60b4af22734930/devtools/server/actors/root.js#604 to see if that helps based on Comment 10.

(In reply to Brian Grinstead [:bgrins] from comment #11)

Mike or Morgan, if you don't mind testing locally and you could add .filter(browsingContext => browsingContext) right after .map(browser => browser.browsingContext) at https://searchfox.org/mozilla-central/rev/d69ec052bed8700af7a283e37b60b4af22734930/devtools/server/actors/root.js#604 to see if that helps based on Comment 10.

I'm afraid that didn't change the behaviour.

(In reply to Brian Grinstead [:bgrins] from comment #10)

Also Mike, do you have any idea why we might be getting a null browsingContext on that line above? Like, in what case a browser element in in the DOM browser.xhtml wouldn't have a browsingContext.

The only situation I can think of where a <browser> might not have a BrowsingContext is if it's not yet been assigned a FrameLoader / been attached to the DOM. Nika, are there other cases I haven't thought of?

Flags: needinfo?(mconley) → needinfo?(nika)

Sorry, this fell through the cracks...
I'm activity working on bug 1593937 which will get rid of this code.
Similar issue might happen in the new code, so it would be handy to better understand what is wrong here.

Now that I read the exception more carefully "currentContext.getChildren is not a function"
currentContext isn't undefined, but it is rather the getChildren attribute which is not defined.
I'm wondering if Nicolas only hit a rebase issue, when testing Firefox in between two builds when getChildren has been renamed to children.
If that is that, the issue may not be related to this code after all.

Mike, Morgan, are you seeing any error in your terminal? Or in the Browser Console?

Please correct me if I'm wrong, but I think the issue isn't about an issue around BrowsingContext.
At least we can't say that from comment 7 exception, which might just be a rebase issue.

So, I'm reseting the unecessary neeinfo on nika.

Flags: needinfo?(nika)

I tried really hard reproducing but can't find any way to make it fail.
I'm wondering if this is specific to OSX and might be about processes. i.e. how we execute the browser toolbox process.
Are you all on OSX? Any particular preference, page or window opened?

I do see it from time to time too, I often only have https://nchevobbe.github.io/demo/console-test-app.html open on my main Firefox window.
Bug 1591590 is a test intermittent about the browser toolbox, and the screenshots indicate we get into that weird state there too.

Yes, I'm on macOS. I can reproduce this even with a new profile, and no other Firefox instance running.

(In reply to Mike Conley (:mconley) (:⚙️) from comment #8)

I'll note that I haven't been able to properly use the Browser Toolbox for several weeks now. Even after restarts, I seem to enter this state pretty regularly.

Same here.

Keywords: dogfood

(In reply to Dão Gottwald [::dao] from comment #18)

Same here.

Is this on your Linux machine?

Flags: needinfo?(dao+bmo)

(In reply to Mike Conley (:mconley) (:⚙️) from comment #19)

(In reply to Dão Gottwald [::dao] from comment #18)

Same here.

Is this on your Linux machine?

Yeah, Ubuntu 19.10.

Flags: needinfo?(dao+bmo)

Okay, I decided to roll up my sleeves and look at this, because I really miss using this tool.

I think I've made some headway. Here's what I know:

  1. The server seems to launch correctly.
  2. The problem appears to be in the toolbox profile that launches - the connection attempt fails, despite the server running
  3. If I Telnet to the server port locally, I get the following:
mconleys-MBP:~ mikeconley$ telnet
telnet> open localhost 54166
Trying ::1...
telnet: connect to address ::1: Connection refused
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
304:{"from":"root","applicationType":"browser","testConnectionPrefix":"server1.conn1.","traits":{"sources":true,"networkMonitor":true,"storageInspector":true,"wasmBinarySource":true,"bulk":true,"webConsoleCommands":true,"allowChromeProcess":true,"heapSnapshots":true,"perfActorVersion":1,"watchpoints":true}}Connection closed by foreign host.

Interestingly, the first attempt that Telnet makes is refused - presumably, because it's trying to connect via an ipv6 address? And then it falls back to ipv4... I think.

So I was able to get things working locally by monkeypatching this line here:

https://searchfox.org/mozilla-central/rev/4d2a9d5dc8f0e65807ee66e2b04c64596c643b7a/devtools/client/framework/browser-toolbox/window.js#103

from:

  const host = Prefs.chromeDebuggingHost;

to

  const host = "127.0.0.1";

and now it works?

I'm not sure why this would kind of thing would only rear itself after some time after the OS has been running. Maybe someone from the Necko team could comment, because this looks more like a local networking issue that our DevTools is stumbling on.

Flags: needinfo?(valentin.gosu)

FWIW, I have the following in my /etc/hosts file:

127.0.0.1       localhost
127.0.0.1       localhost2
127.0.0.1       localhost3
255.255.255.255 broadcasthost
::1             localhost
fe80::1%lo0     localhost

(In reply to Mike Conley (:mconley) (:⚙️) from comment #22)

FWIW, I have the following in my /etc/hosts file:

127.0.0.1       localhost
127.0.0.1       localhost2
127.0.0.1       localhost3
255.255.255.255 broadcasthost
::1             localhost
fe80::1%lo0     localhost

I normally expect only ipv4 addresses for localhost, an ipv6 for ip6-localhost.
But in any case, I don't know where we create the server socket, but I suspect we only bind it to the IPv4 address?
Normally if we get an IPv6 address for localhost we should fallback to IPv4 if there's an error, but maybe that doesn't work?
There's a lot of stuff in the devtools socket implementation, so I'm not sure how that all works, but I assume we could add a

if (useIPv4Only) {
  s.connectionFlags |= nsISocketTransport.DISABLE_IPV6;
}

around here somewhere.

Flags: needinfo?(valentin.gosu)

P1, as many Firefox engineers are frequently affected.

Priority: P2 → P1

I'll note that commenting this line out from my /etc/hosts/ file seems to do the job:

::1             localhost

now the Browser Toolbox works without me needing to patch it. \o/

I have no idea when or why I added that /etc/hosts line, tbh.

Just for reference, this is my hosts file for macOS, but I didn't have the connection issue in the past 4+ weeks.

127.0.0.1      	localhost
255.255.255.255	broadcasthost
::1             localhost
fe80::1%lo0	localhost

(In reply to :Harald Kirschner :digitarald from comment #26)

Just for reference, this is my hosts file for macOS, but I didn't have the connection issue in the past 4+ weeks.

127.0.0.1      	localhost
255.255.255.255	broadcasthost
::1             localhost
fe80::1%lo0	localhost

Seems like this is a mac thing? Either way, I think adding the DISABLE_IPV6 flag should fix this.

See Also: → 1586741
Severity: normal → --
Assignee: nobody → poirot.alex
Status: NEW → ASSIGNED

It would be super nice if someone able to reproduce this issue could apply this patch and confirm that it actually address it!

Flags: needinfo?(poirot.alex)

Hi Mike, could you apply this patch whenever you reproduce the issue so that we can confirm this patch is actually effectice?

Flags: needinfo?(mconley)

I reverted my /etc/hosts fix, and was able to reproduce the issue. I then applied the patch, and the issue appears to be solved with it! Thanks!

Flags: needinfo?(mconley)
Pushed by apoirot@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/d920b65e3cfa
Disable IPv6 when connecting to remote runtime. r=jdescottes

Thanks a lot for testing!

Dão, Morgan, if you still see any issue after this bug lands, please let us know!

Status: ASSIGNED → RESOLVED
Closed: 1 year ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 77

The patch didn't help over here on Linux. My hosts file has ::1 for ip6-localhost but not for localhost.

The patch landed in nightly and beta is affected.
:ochameau, is this bug important enough to require an uplift?
If not please set status_beta to wontfix.

For more information, please visit auto_nag documentation.

Flags: needinfo?(poirot.alex)

I just ran into this again on OSX, with 77.0a1 (2020-04-21) (64-bit).

Curiously I was able to open another toolbox on the failed toolbox to copy the error out:

Exception { name: "NS_ERROR_CONNECTION_REFUSED", message: "Component returned failure code: 0x804b000d (NS_ERROR_CONNECTION_REFUSED) [nsIInputStream.available]", result: 2152398861, filename: "resource://devtools/shared/security/socket.js", lineNumber: 363, columnNumber: 0, data: null, stack: "onInputStreamReady@resource://devtools/shared/security/socket.js:363:18\n", location: XPCWrappedNative_NoHelper }
window.js:173:15
QA Whiteboard: [qa-77b-p2]

Alex, there are reports indicating that the following bug isn't fully fixed (comment #35 and comment #37), so I am reopening it.

Honza

Status: RESOLVED → REOPENED
Resolution: FIXED → ---

I just hit this error on Nightly due to a version mismatch from a Nightly update but I believe that should be handled in a separate bug (maybe even to just give a better error message). Restarting Firefox fixes it in that case.

Otherwise I haven't seen the error on macOS since this was fixed, definitely not the case where I need to restart my whole computer to fix it.

I might have found another exception leading to a broken browser toolbox, bug 1644772.

I'm closing this bug as the ipv6 issue is fixed. Let's reopen new bugs from now as this bug has sailed on 77.

Status: REOPENED → RESOLVED
Closed: 1 year ago10 months ago
Flags: needinfo?(poirot.alex)
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.