Closed Bug 793760 Opened 7 years ago Closed 7 years ago


(Testing :: Marionette, defect)

Not set



blocking-basecamp +


(Reporter: jgriffin, Unassigned)



(Keywords: regression)


(1 file, 2 obsolete files)

Commit has broken Marionette on Firefox and B2G.  Investigating.
More details on the symptoms:  Marionette accepts the first connection request, but later requests fail with "connection refused".  Since Marionette always makes an initial request just to determine if the server is alive, and tests are run using a separate connection, no tests can actually run.
The cause is  It looks like with this change, when a debugger connection is closed, we have to re-init the debugger server transport.
Attached patch patch v0.1 (obsolete) — Splinter Review
Panos, can you comment on this fix?  It doesn't seem like a very clean way for debugger server to restart itself after a connection was terminated, but I didn't see a better way.  It's also failing after some random number of new connections with the error:

* Call to xpconnect wrapped JSObject produced this error:  *
[Exception... "'[JavaScript Error: "this._allowConnection is not a function" {file: "chrome://global/content/devtools/dbg-server.js" line: 272}]' when calling method: [nsIServerSocketListener::onSocketAccepted]"  nsresult: "0x80570021 (NS_ERROR_XPC_JAVASCRIPT_ERROR_WITH_DETAILS)"  location: "native frame :: <unknown filename> :: <TOP_LEVEL> :: line 0"  data: yes]
Attachment #664185 - Flags: review?(past)
Flagging for blocking-basecamp status as this breaks the only test harness we have to WebAPI tests.
blocking-basecamp: --- → ?
Comment on attachment 664185 [details] [diff] [review]
patch v0.1

Review of attachment 664185 [details] [diff] [review]:

I'm so sorry I broke Marionette again, I wish it was integrated with tbpl so I could catch that in my try pushes. The change essentially makes sure that the debugger server is properly shut down after all the connections have been closed, to fix some memory leaks we were seeing in browser mochitests. The expected interaction with the debugger is to initialize it (if it is not already initialized), open connections to it and exchange packets, and when the last connection is dropped, proper shut down will occur. After that the server will need to be initialized again before use.

A simple way to avoid the problem you are facing would be to keep a connection open to the debugger for as long as Marionette is running. This has actually been my assumption so far, but apparently I didn't look at the code close enough.

If, however, you absolutely need to open and close connections while Marionette is running, then you would need to simply restart the server each time the last connection is closed. The way you do that in this patch seems not to be as simple as it could be, because you tangle it up with the actor clean up sequence. The this._allowConnection errors that you get point to a race between checking if the server is already initialized and it shutting down.

In the case of mochitests I rely on a Debugger:Shutdown event that is fired by the browser actors to know when to initialize the server again. In the case of xpcshell tests, which may be more applicable to Marionette, I just poll every few milliseconds after the last connection is closed to know when the server has finished shutting down (see test_dbgsocket.js). Your restartListener method seems very similar to that, so if you just invoked it outside of the actor (or the server for that matter), I think it would make a fine solution to the problem at hand.
Attachment #664185 - Flags: review?(past)
Thanks for the comments, I'll submit a revised patch.

Just FYI, Marionette is currently available on try server, but is hidden.  You can view it by adding &noginore=1 to your TBPL url.  It's hidden because there is a mysterious failure on Win7 that we're still investigating...bug 791346.

Once that bug is cleared up, we'll unhide it and turn it on for m-c and inbound as well.
Attached patch patch v0.2 (obsolete) — Splinter Review
A new version of the Marionette patch which disentangles the listener restart from the actor.  However, it still shows the exception from comment #3 randomly.

In looking into this, it appears to be a race condition in dbg-server.js.  destroy() is being called before the listener really closes its socket (it will call onStopListening when it does this, which is currently a no-op), so there is a window of opportunity with the socket is opening and accepting connections, but the DebuggerServer thinks it is closed, and has nulled out this._allowConnections.

I'm tired and couldn't fix this without causing other problems tonight.  Panos, can you take a look at that problem?  I'm going to file a separate bug for it since it isn't related to Marionette; Marionette just happens to expose it with the recent debugger changes.
Attachment #664185 - Attachment is obsolete: true
Attachment #664333 - Flags: review?(past)
Depends on: 793947
(In reply to Panos Astithas [:past] from comment #5)
> Comment on attachment 664185 [details] [diff] [review]
> patch v0.1
> Review of attachment 664185 [details] [diff] [review]:
> -----------------------------------------------------------------
> I'm so sorry I broke Marionette again, I wish it was integrated with tbpl so
> I could catch that in my try pushes.

is that a thing we can do? Seems like keeping marionette running is a good thing for us to do.
(In reply to Rob Campbell [:rc] (:robcee) from comment #8)
> (In reply to Panos Astithas [:past] from comment #5)
> > Comment on attachment 664185 [details] [diff] [review]
> > patch v0.1
> > 
> > Review of attachment 664185 [details] [diff] [review]:
> > -----------------------------------------------------------------
> > 
> > I'm so sorry I broke Marionette again, I wish it was integrated with tbpl so
> > I could catch that in my try pushes.
> is that a thing we can do? Seems like keeping marionette running is a good
> thing for us to do.

It is on TBPL on try, but hidden, see and look for "Mn" on debug builds, like  We hope to unhide it and add it to m-c and inbound soon.
After looking at this some more with fresh eyes in the morning, I realize that we need both the attached patch, and another one to let Marionette re-use the same socket between sessions.

The way Marionette runs tests currently is that it uses one connection per test, so during a suite, it rapidly creates and destroys connections.  With the changes to the remote debugger, there will also be a window when the socket is briefly closed between sessions, and we may try to connect to the debugger at that point.  The only way to avoid this cleanly is to enable Marionette to re-use an existing connection.

However, we also need the attached patch because we don't want Marionette to shut down after a test run is completed, as that would force someone to restart the browser or B2G in order to run another session.

We've been using Marionette with the remote debugger server in roughly the same way a web client might use a web server; in that context, it would be crazy for the server to shut down its listener when a connection is closed.  This change to the remote debugger breaks that pattern.
Attached patch patch v0.3Splinter Review
This patch allows Marionette to work again on both Firefox and B2G.  It does several things:

- it moves SpecialPowers loading into marionettecomponent.js, since we only want to do that once
- it changes the way we load several JS files in marionette-actors, from loadScript to Cu.import, so we only load them once, even if marionette-actors is loaded multiple times
- it causes the client to re-use the same socket for an entire test run, instead of making a new connection per test
- it adds code to marionettecomponent.js to restart the remote debugger listener after it's closed, so we can run another session after one terminates
- it changes marionette-actor.js not to persist any state between sessions, since the same instance of MarionetteDriverActor will be used for multiple tests, whereas before that wasn't true
- it changes the way we handle some frame script loading details, because we can't save state in marionette-actors any longer, since can have more than once instance loaded over the course of gecko's lifetime

This patch unfortunately breaks the patch for bug 775116, which will need some significant re-factoring due to the fact that we can no longer save any state within marionette-actors.js.  For that bug, I plan to improve on this patch by introducing a new .jsm that will manage frame script contexts, which will be used by marionette-actors.js and which can save state between different instances of marionette-actors.js.  However, that wasn't necessary to fix this problem in the existing code, so I left it for that patch, in the interests of fixing Marionette as quickly as possible.
Attachment #664333 - Attachment is obsolete: true
Attachment #664333 - Flags: review?(past)
Attachment #664731 - Flags: review?(mdas)
I should add that this patch passes all tests on Firefox and B2G.
I think I'll just remove the one line that makes the server shut down after the last connection is closed, since it seems to not affect the core issue with that patch. I'll post the patch on bug 793947 right away.
Comment on attachment 664731 [details] [diff] [review]
patch v0.3

Review of attachment 664731 [details] [diff] [review]:

This patch is fine if we still need to land this.

::: testing/marionette/marionette-actors.js
@@ +408,4 @@
>        //if there is a content listener, then we just wake it up
> +      this.addBrowser(win);
> +      this.curBrowser.startSession((appName == 'Firefox'),
> +                                   this.getCurrentWindow(),

this.getCurrentWindow() can be changed to the 'win' variable
Attachment #664731 - Flags: review?(mdas) → review+
I'll hold off pending the outcome of bug 793947.
The fix for bug 793947 undoes the change that broke Marionette, making this patch unnecessary.  There are some good architectural changes in this patch, but they break the patch for bug 775116, and getting that landed is high priority. Therefore, I won't land this patch, and instead will move some of this work to bug 794535.
Closed: 7 years ago
Resolution: --- → WORKSFORME
Marking as a blocker since we don't want this to regress :)
blocking-basecamp: ? → +
You need to log in before you can comment on or make changes to this bug.