Closed Bug 1324059 Opened 7 years ago Closed 7 years ago

Intermittent test_fallback_update.py TestFallbackUpdate.test_update | IOError: Process has been closed (Exit code: 0) (Reason: Timed out waiting for connection on localhost:2828!) - No relaunch of Firefox

Categories

(Testing :: Firefox UI Tests, defect)

Version 3
All
Windows
defect
Not set
normal

Tracking

(firefox51 wontfix, firefox54 fix-optional, firefox55 fix-optional)

RESOLVED DUPLICATE of bug 1293969
Tracking Status
firefox51 --- wontfix
firefox54 --- fix-optional
firefox55 --- fix-optional

People

(Reporter: intermittent-bug-filer, Unassigned)

References

(Depends on 1 open bug)

Details

(Keywords: intermittent-failure, regression, Whiteboard: [stockwell unknown])

Filed by: fmezei [at] mozilla.com

https://treeherder.mozilla.org/logviewer.html#?job_id=2093238&repo=mozilla-beta

https://firefox-ui-tests.s3.amazonaws.com/ccffcbf8-e8a6-40f2-9186-93ea55f450e1/log_info.log

Encountered once while running ondemand update tests for Firefox 51 Beta 8. Passed on re-build.
The signature seems to have been changed a bit, but I think it's still the same underlying issue. It increased in its occurrence a lot during the last couple of days.

Here an example from the aurora branch:

https://treeherder.mozilla.org/#/jobs?repo=mozilla-aurora&revision=260ad3e3f8970a637c8acef0b83fde9da19d1044&filter-searchStr=Firefox+UI&filter-tier=1&filter-tier=2&filter-tier=3

The underlying issue is that Firefox does NOT restart as requested but simply quit. The exit code is 0.

05:20:50     INFO -  1491135650061	Marionette	TRACE	conn4 -> [0,1798,"quitApplication",{"flags":["eRestart","eAttemptQuit"]}]
05:20:50     INFO -  1491135650061	Marionette	INFO	New connections will no longer be accepted
05:20:50     INFO -  1491135650061	Marionette	TRACE	conn4 <- [1,1798,null,{}]
05:20:50     INFO -  1491135650295	Marionette	DEBUG	Closed connection conn4
05:20:51     INFO -  *** UTM:SVC TimerManager:registerTimer - id: telemetry_modules_ping
05:23:57     INFO - TEST-UNEXPECTED-ERROR | test_fallback_update.py TestFallbackUpdate.test_update | IOError: Process has been unexpectedly closed (Exit code: 0) (Reason: Timed out waiting for connection on localhost:2828!)
05:23:57     INFO - Traceback (most recent call last):
05:23:57     INFO -   File "c:\jenkins\workspace\mozilla-aurora_update\build\venv\lib\site-packages\marionette_harness\marionette_test\testcases.py", line 166, in run
05:23:57     INFO -     testMethod()
05:23:57     INFO -   File "c:\jenkins\workspace\mozilla-aurora_update\build\tests\firefox-ui\tests\update\fallback\test_fallback_update.py", line 20, in test_update
05:23:57     INFO -     self.download_and_apply_available_update(force_fallback=True)
05:23:57     INFO -   File "c:\jenkins\workspace\mozilla-aurora_update\build\venv\lib\site-packages\firefox_ui_harness\testcases.py", line 288, in download_and_apply_available_update
05:23:57     INFO -     self.restart()

This reminds me on bug 974971. I really wonder what's wrong here.
Depends on: 974971
Summary: Intermittent test_fallback_update.py TestFallbackUpdate.test_update | IOError: Process has been closed (Exit code: 0) (Reason: [Errno 61] Connection refused) → Intermittent test_fallback_update.py TestFallbackUpdate.test_update | IOError: Process has been closed (Exit code: 0) (Reason: Timed out waiting for connection on localhost:2828!)
Andreas patch for bug 1337743 landed between the two nightlies when the problems are starting to appear. So this most likely looks to be a regression due to this patch.

Maybe the call to Puppeteer's `restart()` method does not correctly use the parameters?

https://dxr.mozilla.org/mozilla-central/source/testing/marionette/puppeteer/firefox/firefox_puppeteer/mixins.py#63

This might be identical to the connection issues with update tests on OS X which started at the same time. This is tracked via bug 1326237.
Blocks: 1337743
Flags: needinfo?(ato)
Keywords: regression
See Also: → 1326237
OS: Unspecified → Windows
Hardware: Unspecified → All
I also checked central and it has the same regression range. So clearly a regression from bug 1326237.

But interestingly update tests for newer builds on both branches do not show this issue anymore. I don't remember that we pushed any follow-up, so I might want to observe it the next days.

Clearing ni? from Andreas for now.
Flags: needinfo?(ato)
There doesn’t appear to be anything suspicious with the payload the client is sending:

> 05:20:50     INFO -  1491135650061	Marionette	TRACE	conn4 -> [0,1798,"quitApplication",{"flags":["eRestart","eAttemptQuit"]}]

This is the expected input.  In case the client were to send two accidental eAttemptQuits, e.g. ["eAttemptQuit", "eAttemptQuit"], the client would remove the duplicate from the list in http://searchfox.org/mozilla-central/source/testing/marionette/client/marionette_driver/marionette.py#1159.

The server will also complain in case the client doesn’t remove duplicates as a secondary security check in http://searchfox.org/mozilla-central/source/testing/marionette/driver.js#2749.
The command from the client to the server seems fine, yes. But at this time something else caused Firefox to quit instead of restart. As long as we do not have a fix landed for bug 1350897 we don't know what actually happened. This might still be a Firefox bug, as mentioned earlier (bug 974971) but somehow got repeatedly triggered by your landed patch.

All the update tests from today for central and aurora look fine and still do not show this failure. It means that it fixed itself, and it might not be worth the time figuring out why.

I will close the bug as WFM for now. Lets reopen it if we see it again.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → WORKSFORME
Whiteboard: [stockwell unknown]
This bumped up kinda heavily yesterday with update tests across all flavors of Windows:

https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&revision=8f80d594c08d5c7a112e5d4b9eb44ffca717eb7b&filter-searchStr=firefox update&filter-tier=1&filter-tier=2&filter-tier=3&filter-resultStatus=success&filter-resultStatus=busted&filter-resultStatus=exception&filter-resultStatus=retry&filter-resultStatus=usercancel&filter-resultStatus=running&filter-resultStatus=pending&filter-resultStatus=runnable

Maybe this is all related to the crash we see for update tests and bug 1375407? Somehow we might not see that crash.
See Also: → 1375407
The crash in Core -> IPC, etc. is well outside app update code and it would be better to ask a developer that works on that code.
Actually this is the same as bug 1293969 which has been filed for direct updates.
Status: REOPENED → RESOLVED
Closed: 7 years ago7 years ago
Resolution: --- → DUPLICATE
Summary: Intermittent test_fallback_update.py TestFallbackUpdate.test_update | IOError: Process has been closed (Exit code: 0) (Reason: Timed out waiting for connection on localhost:2828!) → Intermittent test_fallback_update.py TestFallbackUpdate.test_update | IOError: Process has been closed (Exit code: 0) (Reason: Timed out waiting for connection on localhost:2828!) - No relaunch of Firefox
You need to log in before you can comment on or make changes to this bug.