Closed Bug 1254136 Opened 4 years ago Closed 2 years ago

Marionette throws NS_ERROR_SOCKET_ADDRESS_IN_USE due to double server socket initialization

Categories

(Testing :: Marionette, defect)

defect
Not set

Tracking

(firefox-esr52 unaffected, firefox55 wontfix, firefox56 fixed, firefox57 fixed)

RESOLVED FIXED
mozilla57
Tracking Status
firefox-esr52 --- unaffected
firefox55 --- wontfix
firefox56 --- fixed
firefox57 --- fixed

People

(Reporter: whimboo, Assigned: whimboo)

References

Details

(Keywords: pi-marionette-server, regression)

Attachments

(1 file)

When I run our in-tree firefox-ui-tests with the jsdebugger flag, I get a NS_ERROR_SOCKET_ADDRESS_IN_USE failure twice in a row:

./mach firefox-ui-test --gecko-log - --jsdebugger

1457362443297	Marionette	ERROR	Error on starting server: [Exception... "Component returned failure code: 0x804b0036 (NS_ERROR_SOCKET_ADDRESS_IN_USE) [nsIServerSocket.initSpecialConnection]"  nsresult: "0x804b0036 (NS_ERROR_SOCKET_ADDRESS_IN_USE)"  location: "JS frame :: chrome://marionette/content/server.js :: MarionetteServer.prototype.start :: line 105"  data: no]

[Exception... "Component returned failure code: 0x804b0036 (NS_ERROR_SOCKET_ADDRESS_IN_USE) [nsIServerSocket.initSpecialConnection]"  nsresult: "0x804b0036 (NS_ERROR_SOCKET_ADDRESS_IN_USE)"  location: "JS frame :: chrome://marionette/content/server.js :: MarionetteServer.prototype.start :: line 105"  data: no]
MarionetteServer.prototype.start@chrome://marionette/content/server.js:105:19
MarionetteComponent.prototype.init@resource://gre/components/marionettecomponent.js:185:5
MarionetteComponent.prototype.observe@resource://gre/components/marionettecomponent.js:128:7
The same also happens for `mach marionette-test --gecko-log - --jsdebugger`. And as it looks like we are getting the "sessionstore-windows-restored" observer notification twice. As such we run `start()` for the server twice.

Dao, is it expected that "sessionstore-windows-restored" can fire multiple times during startup? Could this be related to the JS debugger window? If yes, how could I detect for which window it is getting sent? The subject and data parameters of the callback get `null`.
Assignee: nobody → hskupin
Blocks: 1378041
Status: NEW → ASSIGNED
Flags: needinfo?(dao+bmo)
(In reply to Henrik Skupin (:whimboo) from comment #1)
> The same also happens for `mach marionette-test --gecko-log - --jsdebugger`.
> And as it looks like we are getting the "sessionstore-windows-restored"
> observer notification twice. As such we run `start()` for the server twice.
> 
> Dao, is it expected that "sessionstore-windows-restored" can fire multiple
> times during startup?

I don't think so.

> Could this be related to the JS debugger window?

It shouldn't be, but I don't know.

> If
> yes, how could I detect for which window it is getting sent? The subject and
> data parameters of the callback get `null`.

The notification isn't tied to a specific window but all windows, hence the empty subject.
Flags: needinfo?(dao+bmo)
So I can reproduce this behavior even without the --jsdebugger command line flag by simply running `mach run --marionette`. Even in this case with a single browser window open I see two notifications right after each other:

1499242777933	Marionette	INFO	Received observer notification "sessionstore-windows-restored"
1499242777937	Marionette	INFO	Received observer notification "sessionstore-windows-restored"

In Marionette we register for the notification here:

https://dxr.mozilla.org/mozilla-central/rev/acbd9a64c0fa4e1980c7732735d4f4c2761ca4c9/testing/marionette/components/marionette.js#162

And directly remove it when it has been received:

https://dxr.mozilla.org/mozilla-central/rev/acbd9a64c0fa4e1980c7732735d4f4c2761ca4c9/testing/marionette/components/marionette.js#199

Dao, do you see something what we do wrong? If not I would file a bug in the sessionstore component.
Flags: needinfo?(dao+bmo)
(In reply to Henrik Skupin (:whimboo) from comment #3)
> So I can reproduce this behavior even without the --jsdebugger command line
> flag by simply running `mach run --marionette`. Even in this case with a
> single browser window open I see two notifications right after each other:
> 
> 1499242777933	Marionette	INFO	Received observer notification
> "sessionstore-windows-restored"
> 1499242777937	Marionette	INFO	Received observer notification
> "sessionstore-windows-restored"

These have basically the same timestamp, which means you're receiving the same notification twice, which means your observer was added twice, likely because you never remove the profile-after-change observer.
Flags: needinfo?(dao+bmo)
We setup "profile-after-change" as XPCOM category in the manifest:

https://dxr.mozilla.org/mozilla-central/rev/211d4dd61025c0a40caea7a54c9066e051bdde8c/testing/marionette/components/marionette.js#144

Looks like any removal of the observer doesn't work because it raises a JS error:

JavaScript error: file:///Volumes/data/code/gecko/obj/debug/dist/NightlyDebug.app/Contents/Resources/components/marionette.js, line 170: NS_ERROR_FAILURE: Component returned failure code: 0x80004005 (NS_ERROR_FAILURE) [nsIObserverService.removeObserver]

When checking the debug output I can see that the "profile-after-change" notification is only received once. So how can this cause a double registration for the "sessionstore-windows-restored" notification?
Also in "profile-after-change" we register for "command-line-startup" which indeed gets only called once. When I move the registration for "sessionstore-windows-restored" into "command-line-startup", everything is fine.

So what is causing this double registration of "sessionstore-windows-restored" when it is done in "profile-after-change"?
Summary: Marionette started with --jsdebugger throws NS_ERROR_SOCKET_ADDRESS_IN_USE → Marionette throws NS_ERROR_SOCKET_ADDRESS_IN_USE due to double server socket initialization
(In reply to Dão Gottwald [::dao] from comment #4)
> These have basically the same timestamp, which means you're receiving the
> same notification twice, which means your observer was added twice, likely
> because you never remove the profile-after-change observer.

We register "profile-after-change" as category service in our component. As such a call to `removeObserver` fails with NS_ERROR_FAILURE: Component returned failure code: 0x80004005. So as it looks like I cannot remove this observer.

(In reply to Henrik Skupin (:whimboo) [partly available 07/10 -07/14] from comment #6)
> So what is causing this double registration of
> "sessionstore-windows-restored" when it is done in "profile-after-change"?

I'm still clueless about this issue. The log clearly indicates that the code for "profile-after-change" is only run once. As such the observer for "sessionstore-windows-restored" is also only added once. But why is it received twice right after each other?

https://dxr.mozilla.org/mozilla-central/rev/211d4dd61025c0a40caea7a54c9066e051bdde8c/testing/marionette/components/marionette.js#144

Benjamin, do you have an idea why this is happening, or what we should change so our code behaves correctly? Thanks.
Flags: needinfo?(benjamin)
I don't know offhand, but the best way to figure it out is probably to set a breakpoint and look at the stack of each: starting out with the JS chrome debugger, and if necessary walking back into a binary debugger.
Flags: needinfo?(benjamin)
This is funny! Sometimes you don't spot the tiniest issue:

https://dxr.mozilla.org/mozilla-central/rev/4c5fbf49376351679dcc49f4cff26c3c2e055ccc/testing/marionette/components/marionette.js#197-202

There is a `break` missing so when we receive `command-line-startup` we again run all the code from `profile-after-change`.

So this is a regression from bug 1374762.
Comment on attachment 8895885 [details]
Bug 1254136 - Fix double registration for sessionstore-windows-restored.

https://reviewboard.mozilla.org/r/167164/#review172392

well spotted!!!!
Attachment #8895885 - Flags: review?(dburns) → review+
Pushed by hskupin@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/69f808119fa3
Fix double registration for sessionstore-windows-restored. r=automatedtester
Backed out for failing marionette-headless' test_cli_arguments.py TestCommandLineArguments.test_start_in_safe_mode on OS X and in marionette on Linux opt:

https://hg.mozilla.org/integration/autoland/rev/4a9fa2c0c14f6495957002bc6fb1b1522d61a1a2

Push with failures: https://treeherder.mozilla.org/#/jobs?repo=autoland&revision=69f808119fa3edafc9f878895878d992b452e1ba&filter-resultStatus=testfailed&filter-resultStatus=busted&filter-resultStatus=exception&filter-resultStatus=retry&filter-resultStatus=runnable&filter-resultStatus=usercancel
Failure log: https://treeherder.mozilla.org/logviewer.html#?job_id=122479656&repo=autoland

23:19:29     INFO - Application command: /Users/cltbld/tasks/task_1502432290/build/application/Nightly.app/Contents/MacOS/firefox -no-remote -marionette -safe-mode -foreground -profile /var/folders/yc/2ch_49xx3_l5h7grtqf6twhr00000w/T/tmp1T8Xmf.mozrunner
23:19:29     INFO -  *** You are running in headless mode.
23:19:29     INFO -  1502432369826	Marionette	DEBUG	Received observer notification "profile-after-change"
23:19:29     INFO -  1502432369898	Marionette	DEBUG	Received observer notification "command-line-startup"
23:19:29     INFO -  1502432369898	Marionette	INFO	Enabled via --marionette
23:21:35     INFO - TEST-UNEXPECTED-ERROR | test_cli_arguments.py TestCommandLineArguments.test_start_in_safe_mode | IOError: Process killed because the connection to Marionette server is lost. Check gecko.log for errors (Reason: [Errno 61] Connection refused)
23:21:35     INFO - Traceback (most recent call last):
23:21:35     INFO -   File "/Users/cltbld/tasks/task_1502432290/build/venv/lib/python2.7/site-packages/marionette_harness/marionette_test/testcases.py", line 156, in run
23:21:35     INFO -     testMethod()
23:21:35     INFO -   File "/Users/cltbld/tasks/task_1502432290/build/tests/marionette/tests/testing/marionette/harness/marionette_harness/tests/unit/test_cli_arguments.py", line 27, in test_start_in_safe_mode
23:21:35     INFO -     self.marionette.start_session()
23:21:35     INFO -   File "/Users/cltbld/tasks/task_1502432290/build/venv/lib/python2.7/site-packages/marionette_driver/decorators.py", line 28, in _
23:21:35     INFO -     m._handle_socket_failure()
23:21:35     INFO -   File "/Users/cltbld/tasks/task_1502432290/build/venv/lib/python2.7/site-packages/marionette_driver/decorators.py", line 23, in _
23:21:35     INFO -     return func(*args, **kwargs)
23:21:35     INFO -   File "/Users/cltbld/tasks/task_1502432290/build/venv/lib/python2.7/site-packages/marionette_driver/marionette.py", line 1213, in start_session
23:21:35     INFO -     self.protocol, _ = self.client.connect()
23:21:35     INFO -   File "/Users/cltbld/tasks/task_1502432290/build/venv/lib/python2.7/site-packages/marionette_driver/transport.py", line 223, in connect
23:21:35     INFO -     self.sock.connect((self.addr, self.port))
23:21:35     INFO -   File "/tools/python27/lib/python2.7/socket.py", line 224, in meth
23:21:35     INFO -     return getattr(self._sock,name)(*args)
23:21:35     INFO - TEST-INFO took 126138ms
23:21:35    ERROR - test_end for test_cli_arg
Flags: needinfo?(hskupin)
So it looks like that there is no "sessionstore-windows-restored" observer notification fired when starting Firefox in Safe mode AND using headless mode.

Brendan, could you please have a look at this? Just apply my patch and run the test test_cli_arguments.py from the Marionette harness unit tests.

I think that warrants a separate bug.
Flags: needinfo?(hskupin) → needinfo?(bdahl)
Ok, I'm going to file a new bug for that and skip the test for MnH tests for now.
Flags: needinfo?(bdahl)
Hm well while working on getting the new bug filed I noticed that even non-headless jobs were failing:

https://treeherder.mozilla.org/#/jobs?repo=autoland&revision=69f808119fa3edafc9f878895878d992b452e1ba&filter-resultStatus=testfailed&filter-resultStatus=busted&filter-resultStatus=exception&filter-resultStatus=retry&filter-resultStatus=runnable&filter-resultStatus=usercancel&selectedJob=122481303

Geoff, I wonder if that is the same underlying issue as bug 1382162 but now happens permanently with the above patch attached. Maybe it helps us to get the underlying issue fixed sooner. Could you have a look at it?
Flags: needinfo?(gbrown)
whimboo - I'm very glad you found that missing 'break' - nice!

Did you notice that the new failures are consistently in TestCommandLineArguments.test_start_in_safe_mode? I don't know why and didn't have much time today, but can hopefully have a closer look next week, unless the safe mode correlation brings something to mind for you?

https://treeherder.mozilla.org/#/jobs?repo=try&revision=2b6cc835e7919c880dd43422aed0b696553de627
I had another look and this is all about the missing break.

Former behavior:
----------------
The callback for the "profile-after-change" notification is run, and given that `this.enabled` is `false` no "domwindowopened" observer is registered. Then we hit "command-line-startup" and due to the missing `break` statement the code passed through to "profile-after-change" again. `this.enabled` is `true` now, and we registered "domwindowopened" to handle the safe mode dialog.

Wanted behavior:
----------------
Register for "domwindowopened" during startup when Marionette got enabled via the Environment variable, or by the command line. This also needs to include the special behavior for safe mode.
Flags: needinfo?(gbrown)
Comment on attachment 8895885 [details]
Bug 1254136 - Fix double registration for sessionstore-windows-restored.

David, I would need another review for the updates in this patch. The patch looks fine so far on try for Linux and Windows. I'm waiting for OS X jobs but thought it's already fine to ask for the review. Thanks.
Attachment #8895885 - Flags: review?(dburns)
Comment on attachment 8895885 [details]
Bug 1254136 - Fix double registration for sessionstore-windows-restored.

https://reviewboard.mozilla.org/r/167164/#review174470

::: testing/marionette/harness/marionette_harness/tests/unit/test_quit_restart.py:198
(Diff revision 2)
>              self.assertNotEqual(self.marionette.process_id, self.pid)
>  
>          self.assertNotEqual(self.marionette.get_pref("startup.homepage_welcome_url"),
>                              "about:")
>  
> +    def test_in_app_restart_safe_mode(self):

This new test fails in headless mode for debug builds on OS X only. I will skip it for now, and file a follow-up bug.
Depends on: 1390848
Attachment #8895885 - Flags: review?(dburns)
Comment on attachment 8895885 [details]
Bug 1254136 - Fix double registration for sessionstore-windows-restored.

https://reviewboard.mozilla.org/r/167164/#review174656
Attachment #8895885 - Flags: review?(dburns) → review+
Pushed by hskupin@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/e146d838112c
Fix double registration for sessionstore-windows-restored. r=automatedtester
https://hg.mozilla.org/mozilla-central/rev/e146d838112c
Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla57
Please uplift this test-only patch to beta to get the regression fixed. Thanks.
Whiteboard: [checkin-needed-beta]
No kidding! Wonder if this is what's behind bug 1389063 on Beta.
Could be but the regression exists longer than your found range.
I filed https://bugzilla.mozilla.org/show_bug.cgi?id=1391699 to add an eslint rule about requiring break statements.
You need to log in before you can comment on or make changes to this bug.