Closed Bug 1397612 Opened 2 years ago Closed 11 months ago

Intermittent test_quit_restart.py TestQuitRestart.test_in_app_restart_safe_mode | IOError: Process has been unexpectedly closed (Exit code: -15) (Reason: [Errno 111] Connection refused)

Categories

(Testing :: Marionette, defect, P3)

Version 3
defect

Tracking

(firefox57 fixed, firefox58 disabled, firefox59 disabled, firefox62 disabled, firefox63 disabled, firefox64 disabled, firefox65 fixed)

RESOLVED FIXED
mozilla65
Tracking Status
firefox57 --- fixed
firefox58 --- disabled
firefox59 --- disabled
firefox62 --- disabled
firefox63 --- disabled
firefox64 --- disabled
firefox65 --- fixed

People

(Reporter: intermittent-bug-filer, Assigned: whimboo)

References

(Blocks 1 open bug)

Details

(Keywords: intermittent-failure)

Attachments

(4 files, 1 obsolete file)

Marionette client doesn't connect and so it times out. A fix for bug 1362293 will also solve this.
Depends on: 1362293
Attached patch skip patchSplinter Review
Assignee: nobody → hskupin
Attachment #8909472 - Flags: review?(dburns)
Attachment #8909472 - Flags: review?(dburns) → review+
https://hg.mozilla.org/mozilla-central/rev/63d97c8b46b2
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla57
Assignee: hskupin → nobody
Status: RESOLVED → REOPENED
Keywords: test-disabled
Resolution: FIXED → ---
The failure here happens due to an in_app restart and should be gone once my patch on bug 1410366 landed.
Depends on: 1410366
Actually the upcoming patch on bug 1410366 should fix it.
Assignee: nobody → hskupin
Status: REOPENED → ASSIGNED
Comment on attachment 8925493 [details]
Bug 1397612 - Backed out changeset 63d97c8b46b2

https://reviewboard.mozilla.org/r/196632/#review201890
Attachment #8925493 - Flags: review?(jmaher) → review+
Pushed by hskupin@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/1a83be186f45
Backed out changeset 63d97c8b46b2 r=jmaher
https://hg.mozilla.org/mozilla-central/rev/1a83be186f45
Status: ASSIGNED → RESOLVED
Closed: 2 years ago2 years ago
Resolution: --- → FIXED
If still possible please uplift this test-only patch to beta/release (57). Thanks
Duplicate of this bug: 1415404
The failure is still present, but might have another underlying issue now. Lets keep the remaining problem tracked on bug 1415404. I will undo the duplication.
No longer depends on: 1400819
Duplicate of this bug: 1415404
(In reply to Cristina Coroiu [:ccoroiu] from comment #19)
> https://hg.mozilla.org/mozilla-central/rev/1a83be186f45

We have to get this commit backed-out on central and beta, because the re-enabled test is causing bug 1391545 which is a high intermittent. I will have to continue to investigate what's wrong with safe mode.
Keywords: checkin-needed
Whiteboard: [backout on central, beta]
Backout by archaeopteryx@coole-files.de:
https://hg.mozilla.org/mozilla-central/rev/f607af87cc3c
Backed out changeset 1a83be186f45 on request from whimboo for causing bug 1391545. r=backout a=backout on a CLOSED TREE
Keywords: checkin-needed
Whiteboard: [backout on central, beta] → [backout on beta]
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Target Milestone: mozilla57 → ---
Target Milestone: --- → mozilla57
Target Milestone: mozilla57 → ---
Keywords: checkin-needed
Whiteboard: [backout on beta]
Status: REOPENED → ASSIGNED
In the last 7 days there are 37 failures. They occur only on Linux.
A recent log example: https://treeherder.mozilla.org/logviewer.html#?repo=autoland&job_id=151473919&lineNumber=32392

[task 2017-12-13T16:53:35.230Z] 16:53:35     INFO -  JavaScript error: jar:file:///builds/worker/workspace/build/application/firefox/omni.ja!/components/nsUrlClassifierListManager.js, line 69: NS_ERROR_XPC_GS_RETURNED_FAILURE: Component returned failure code: 0x80570016 (NS_ERROR_XPC_GS_RETURNED_FAILURE) [nsIJSCID.getService]
[task 2017-12-13T16:53:35.231Z] 16:53:35     INFO -  JavaScript error: jar:file:///builds/worker/workspace/build/application/firefox/omni.ja!/components/nsUrlClassifierListManager.js, line 69: NS_ERROR_XPC_GS_RETURNED_FAILURE: Component returned failure code: 0x80570016 (NS_ERROR_XPC_GS_RETURNED_FAILURE) [nsIJSCID.getService]
[task 2017-12-13T16:55:33.880Z] 16:55:33     INFO - TEST-UNEXPECTED-ERROR | testing/marionette/harness/marionette_harness/tests/unit/test_quit_restart.py TestQuitRestart.test_in_app_restart_safe_mode | IOError: Process has been unexpectedly closed (Exit code: -15) (Reason: [Errno 111] Connection refused)
-----------
-----------
-----------
[task 2017-12-13T16:57:38.312Z] 16:57:38     INFO - FAILED TESTS
[task 2017-12-13T16:57:38.313Z] 16:57:38     INFO - -------
[task 2017-12-13T16:57:38.314Z] 16:57:38     INFO - test_quit_restart.py test_quit_restart.TestQuitRestart.test_in_app_restart_safe_mode
[task 2017-12-13T16:57:38.315Z] 16:57:38     INFO - SUITE-END | took 535s
[task 2017-12-13T16:57:38.317Z] 16:57:38     INFO -  1513184258295	Marionette	DEBUG	Closed connection 1
[task 2017-12-13T16:57:39.799Z] 16:57:39    ERROR - Return code: 10


:whimboo , can you please take a look?
Flags: needinfo?(hskupin)
Whiteboard: [stockwell needswork]
We should just skip the patch on all platforms for now. Can someone please land it? Thanks.
Flags: needinfo?(hskupin)
Attachment #8936662 - Flags: review+
Pushed by ryanvm@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/b949944f17b0
Skip TestQuitRestart.test_in_app_restart_safe_mode across all platforms. r=whimboo
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/b949944f17b0
Status: ASSIGNED → RESOLVED
Closed: 2 years ago2 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla59
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Whiteboard: [stockwell unknown] → [stockwell disabled]
Bug https://bugzilla.mozilla.org/show_bug.cgi?id=1391545 which is recommended for disable is depending on this bug.

:whimboo do you have any updates regarding dis bug?
Flags: needinfo?(hskupin)
(In reply to Arthur Iakab [arthur_iakab] from comment #42)
> Bug https://bugzilla.mozilla.org/show_bug.cgi?id=1391545 which is
> recommended for disable is depending on this bug.
> 
> :whimboo do you have any updates regarding dis bug?

As you can see this test is disabled and shouldn't cause any harm for 59. If wanted we could still uplift the skip patch to 58 to stop the failures on the release branch.
Flags: needinfo?(hskupin) → needinfo?(aryx.bugmail)
Flags: needinfo?(aryx.bugmail) → needinfo?(jmaher)
RyanVM could you uplift the patch from comment 35 to beta- then we should have no failures by next week.
Flags: needinfo?(jmaher)
(In reply to Joel Maher ( :jmaher) (UTC-5) from comment #44)
> RyanVM could you uplift the patch from comment 35 to beta- then we should
> have no failures by next week.

CC'ing Ryan, and putting ni? on him. Thanks.
Flags: needinfo?(ryanvm)
I had the chance to take a look at this failure, and simply tried to reproduce it on MacOS by letting it run for a while in background via --headless. Interestingly it failed for me immediately after a couple of loops, and it simply hang.

After a bit of investigation it turned out that the underlying problem is the `using_context` decorator in combination with the quit/restart callback. It is trying to switch back to the former content, even with the connection already shutdown in some cases. As result an IOError is thrown, and the @process_check decorator kicks in because it is used for `_send_message`. That by itself causes to run the code in `handle_socket_failure`. Here we wait for the application to shutdown, but for a restart this will not happen. As such Marionette is trying to kill the application.

Interestingly I see a hang in `self.process_handler.kill()` at this stage. Maybe this is related to bug 1421289. I will dig further before proposing a solution here.
I also have to add that the Firefox process got a new parent pid after the restart, which is 1. So maybe this is causing problems because it's not in the process group anymore.
Comment 35 is already on Beta. We're in that point of the cycle where both m-c and m-b are tracking 59. I can skip it on m-r for Fx58 if you feel strongly about it, but I'm leaning towards no.
Flags: needinfo?(ryanvm)
Marking as P1 due to the inappropriate use of the process_check decorator on `_send_message()`. We have to make it more stable. I will still have to look at bug 1421289 first.
Status: REOPENED → ASSIGNED
Priority: P5 → P1
While I was checking that bug again for a possible fix I noticed that this actually is a very busted behavior in quit and restart! It's not only related to a user callback for quit and shutdown, but could happen at any time for in_app restarts.

I will file a new bug which will get a fix today and will fix this bug and maybe all the other restart tests.
To ensure my patch on bug 1433873 works, I will already try to re-enable all of those restart tests.
Not an actionable bug for me until bug 1433873 got fixed.
Assignee: hskupin → nobody
Status: ASSIGNED → NEW
Priority: P1 → P3
All dependencies have been fixed. As such I pushed a try build to check if the test works now as expected:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=3771ebd758948c5e8f1e7761e95a1d4f78dfced4
Henrik Skupin <mail@hskupin.info> HG: branch 'default' HG: bookmark 'marionette_enable_safe_mode' HG: changed
testing/marionette/harness/marionette_harness/tests/unit/test_quit_restart.py
Pushed by hskupin@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/463d82852c31
[marionette] Re-enable test TestQuitRestart.test_in_app_restart_safe_mode. r=ato
https://hg.mozilla.org/mozilla-central/rev/463d82852c31
Status: NEW → RESOLVED
Closed: 2 years ago11 months ago
Resolution: --- → FIXED
Assignee: nobody → hskupin
Whiteboard: [stockwell disabled]
Target Milestone: mozilla59 → mozilla64
Backout by aciure@mozilla.com:
https://hg.mozilla.org/mozilla-central/rev/bd60f5f2f402
Backed out changeset 463d82852c31 for accidentally disabling nearly all the tests in test_quit_restart.py a=backout
Status: RESOLVED → REOPENED
Flags: needinfo?(hskupin)
Resolution: FIXED → ---
Target Milestone: mozilla64 → ---
I will try to land this again correctly by next week once the merge to beta are no longer done and central is on 65.
Attachment #9017470 - Attachment is obsolete: true
Pushed by hskupin@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/9234d32ec23c
[marionette] Re-enable test TestQuitRestart.test_in_app_restart_safe_mode. r=ato
https://hg.mozilla.org/mozilla-central/rev/9234d32ec23c
Status: REOPENED → RESOLVED
Closed: 11 months ago11 months ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla65
Flags: needinfo?(hskupin)
Keywords: test-disabled
You need to log in before you can comment on or make changes to this bug.