Closed Bug 1692291 Opened 4 years ago Closed 4 years ago

Perma [ccov] testing/marionette/harness/marionette_harness/tests/unit/test_quit_restart.py TestQuitRestart.test_in_app_restart_with_callback_but_process_quit | AssertionError: "Process unexpectedly quit without restarting"

Categories

(Testing :: Marionette Client and Harness, defect, P5)

defect

Tracking

(firefox-esr78 unaffected, firefox86 unaffected, firefox87 wontfix, firefox88 fixed)

RESOLVED FIXED
88 Branch
Tracking Status
firefox-esr78 --- unaffected
firefox86 --- unaffected
firefox87 --- wontfix
firefox88 --- fixed

People

(Reporter: intermittent-bug-filer, Assigned: away)

References

(Regression)

Details

(Keywords: intermittent-failure, regression)

Filed by: apavel [at] mozilla.com
Parsed log: https://treeherder.mozilla.org/logviewer?job_id=329658276&repo=mozilla-central
Full log: https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/Af4LcQfnR9SjCu0kq1ZVvA/runs/0/artifacts/public/logs/live_backing.log


task 2021-02-11T18:40:23.896Z] 18:40:23     INFO - TEST-START | testing/marionette/harness/marionette_harness/tests/unit/test_quit_restart.py TestQuitRestart.test_in_app_restart_with_callback_but_process_quit
[task 2021-02-11T18:40:23.896Z] 18:40:23     INFO -  1613068823890	Marionette	DEBUG	1 -> [0,11,"WebDriver:DeleteSession",{}]
[task 2021-02-11T18:40:23.896Z] 18:40:23     INFO -  1613068823893	Marionette	DEBUG	1 <- [1,11,null,{"value":null}]
[task 2021-02-11T18:40:23.900Z] 18:40:23     INFO -  1613068823896	Marionette	DEBUG	Closed connection 1
[task 2021-02-11T18:40:23.901Z] 18:40:23     INFO -  1613068823898	Marionette	DEBUG	Accepted connection 2 from 127.0.0.1:59894
[task 2021-02-11T18:40:23.904Z] 18:40:23     INFO -  1613068823901	Marionette	DEBUG	2 -> [0,1,"WebDriver:NewSession",{"strictFileInteractability":true}]
[task 2021-02-11T18:40:23.912Z] 18:40:23     INFO -  1613068823906	Marionette	DEBUG	2 <- [1,1,null,{"sessionId":"53e298d9-8f0c-40bd-abff-055b607910a2","capabilities":{"browserName":"firefox","browserVersion":"87.0a ... mp/tmpZc69mT.mozrunner","moz:shutdownTimeout":60000,"moz:useNonSpecCompliantPointerOrigin":false,"moz:webdriverClick":true}}]
[task 2021-02-11T18:40:23.912Z] 18:40:23     INFO -  1613068823910	Marionette	DEBUG	2 -> [0,2,"WebDriver:SetTimeouts",{"script":30000}]
[task 2021-02-11T18:40:23.913Z] 18:40:23     INFO -  1613068823912	Marionette	DEBUG	2 <- [1,2,null,{"value":null}]
[task 2021-02-11T18:40:23.916Z] 18:40:23     INFO -  1613068823914	Marionette	DEBUG	2 -> [0,3,"WebDriver:SetTimeouts",{"pageLoad":300000}]
[task 2021-02-11T18:40:23.917Z] 18:40:23     INFO -  1613068823916	Marionette	DEBUG	2 <- [1,3,null,{"value":null}]
[task 2021-02-11T18:40:23.933Z] 18:40:23     INFO -  1613068823924	Marionette	DEBUG	2 -> [0,4,"WebDriver:SetTimeouts",{"implicit":0}]
[task 2021-02-11T18:40:23.933Z] 18:40:23     INFO -  1613068823925	Marionette	DEBUG	2 <- [1,4,null,{"value":null}]
[task 2021-02-11T18:40:23.934Z] 18:40:23     INFO -  1613068823928	Marionette	DEBUG	2 -> [0,5,"Marionette:GetContext",{}]
[task 2021-02-11T18:40:23.935Z] 18:40:23     INFO -  1613068823928	Marionette	DEBUG	2 <- [1,5,null,{"value":"content"}]
[task 2021-02-11T18:40:23.935Z] 18:40:23     INFO -  1613068823933	Marionette	DEBUG	2 -> [0,6,"Marionette:SetContext",{"value":"chrome"}]
[task 2021-02-11T18:40:23.936Z] 18:40:23     INFO -  1613068823933	Marionette	DEBUG	2 <- [1,6,null,{"value":null}]
[task 2021-02-11T18:40:23.943Z] 18:40:23     INFO -  1613068823940	Marionette	DEBUG	2 -> [0,7,"WebDriver:ExecuteScript",{"script":"Components.utils.import(\"resource://gre/modules/Preferences.jsm\");\n\n            ... cified"],"filename":"../../venv/lib/python2.7/site-packages/marionette_driver/marionette.py","sandbox":"default","line":776}]
[task 2021-02-11T18:40:23.946Z] 18:40:23     INFO -  1613068823943	Marionette	TRACE	[1] MarionetteCommands actor created for window id 2
[task 2021-02-11T18:40:23.954Z] 18:40:23     INFO -  1613068823949	Marionette	DEBUG	2 <- [1,7,null,{"value":"https://www.mozilla.org/projects/firefox/%VERSION%/firstrun/"}]
[task 2021-02-11T18:40:23.954Z] 18:40:23     INFO -  1613068823951	Marionette	DEBUG	2 -> [0,8,"Marionette:SetContext",{"value":"content"}]
[task 2021-02-11T18:40:23.954Z] 18:40:23     INFO -  1613068823952	Marionette	DEBUG	2 <- [1,8,null,{"value":null}]
[task 2021-02-11T18:40:23.956Z] 18:40:23     INFO -  1613068823955	Marionette	DEBUG	2 -> [0,9,"Marionette:GetContext",{}]
[task 2021-02-11T18:40:23.957Z] 18:40:23     INFO -  1613068823955	Marionette	DEBUG	2 <- [1,9,null,{"value":"content"}]
[task 2021-02-11T18:40:23.965Z] 18:40:23     INFO -  1613068823961	Marionette	DEBUG	2 -> [0,10,"Marionette:SetContext",{"value":"chrome"}]
[task 2021-02-11T18:40:23.965Z] 18:40:23     INFO -  1613068823961	Marionette	DEBUG	2 <- [1,10,null,{"value":null}]
[task 2021-02-11T18:40:23.966Z] 18:40:23     INFO -  1613068823965	Marionette	DEBUG	2 -> [0,11,"WebDriver:ExecuteScript",{"script":"Components.utils.import(\"resource://gre/modules/Preferences.jsm\");\n\n           ... ",false],"filename":"../../venv/lib/python2.7/site-packages/marionette_driver/marionette.py","sandbox":"default","line":813}]
[task 2021-02-11T18:40:23.973Z] 18:40:23     INFO -  1613068823972	Marionette	DEBUG	2 <- [1,11,null,{"value":null}]
[task 2021-02-11T18:40:23.977Z] 18:40:23     INFO -  1613068823976	Marionette	DEBUG	2 -> [0,12,"Marionette:SetContext",{"value":"content"}]
[task 2021-02-11T18:40:23.978Z] 18:40:23     INFO -  1613068823976	Marionette	DEBUG	2 <- [1,12,null,{"value":null}]
[task 2021-02-11T18:40:23.993Z] 18:40:23     INFO -  1613068823988	Marionette	DEBUG	2 -> [0,13,"Marionette:GetContext",{}]
[task 2021-02-11T18:40:23.994Z] 18:40:23     INFO -  1613068823989	Marionette	DEBUG	2 <- [1,13,null,{"value":"content"}]
[task 2021-02-11T18:40:23.994Z] 18:40:23     INFO -  1613068823990	Marionette	DEBUG	2 -> [0,14,"Marionette:AcceptConnections",{"value":false}]
[task 2021-02-11T18:40:23.994Z] 18:40:23     INFO -  1613068823990	Marionette	INFO	Stopped listening on port 2828
[task 2021-02-11T18:40:23.994Z] 18:40:23     INFO -  1613068823991	Marionette	DEBUG	2 <- [1,14,null,{"value":null}]
[task 2021-02-11T18:40:23.995Z] 18:40:23     INFO -  1613068823992	Marionette	DEBUG	2 -> [0,15,"Marionette:SetContext",{"value":"chrome"}]
[task 2021-02-11T18:40:23.995Z] 18:40:23     INFO -  1613068823992	Marionette	DEBUG	2 <- [1,15,null,{"value":null}]
[task 2021-02-11T18:40:23.999Z] 18:40:23     INFO -  1613068823994	Marionette	DEBUG	2 -> [0,16,"WebDriver:ExecuteScript",{"script":"Components.utils.import(\"resource://gre/modules/Services.jsm\");\n            let ... ename":"tests/testing/marionette/harness/marionette_harness/tests/unit/test_quit_restart.py","sandbox":"default","line":122}]
[task 2021-02-11T18:40:24.006Z] 18:40:24     INFO -  DEBUG: Adding blocker SessionStore: flushing all windows for phase quit-application-granted
[task 2021-02-11T18:40:24.006Z] 18:40:24     INFO -  DEBUG: Starting phase quit-application-granted
[task 2021-02-11T18:40:24.014Z] 18:40:24     INFO -  DEBUG: Spinning the event loop

task 2021-02-11T18:40:26.701Z] 18:40:26     INFO -  ###!!! [Parent][RunMessage] Error: Channel closing: too late to send/recv, messages will be lost
[task 2021-02-11T18:40:26.702Z] 18:40:26     INFO -  DEBUG: Finished phase profile-before-change
[task 2021-02-11T18:40:26.717Z] 18:40:26     INFO -  DEBUG: Starting phase profile-before-change-telemetry
[task 2021-02-11T18:40:26.717Z] 18:40:26     INFO -  DEBUG: Spinning the event loop
[task 2021-02-11T18:40:26.834Z] 18:40:26     INFO -  DEBUG: Adding blocker Waiting for ping task for phase TelemetryController: Waiting for pending ping activity
[task 2021-02-11T18:40:26.849Z] 18:40:26     INFO -  DEBUG: Adding blocker Waiting for ping task for phase TelemetryController: Waiting for pending ping activity
[task 2021-02-11T18:40:26.865Z] 18:40:26     INFO -  DEBUG: Completed blocker Waiting for ping task for phase TelemetryController: Waiting for pending ping activity
[task 2021-02-11T18:40:26.865Z] 18:40:26     INFO -  DEBUG: Completed blocker Waiting for ping task for phase TelemetryController: Waiting for pending ping activity
[task 2021-02-11T18:40:26.881Z] 18:40:26     INFO -  DEBUG: Completed blocker TelemetryController: shutting down for phase profile-before-change-telemetry
[task 2021-02-11T18:40:26.881Z] 18:40:26     INFO -  DEBUG: Finished phase profile-before-change-telemetry
[task 2021-02-11T18:40:26.881Z] 18:40:26     INFO -  1613068826878	Marionette	TRACE	Received observer notification xpcom-will-shutdown
[task 2021-02-11T18:40:26.881Z] 18:40:26     INFO -  1613068826878	Marionette	DEBUG	Marionette stopped listening
[task 2021-02-11T18:40:26.881Z] 18:40:26     INFO -  DEBUG: Starting phase xpcom-will-shutdown
[task 2021-02-11T18:40:26.881Z] 18:40:26     INFO -  DEBUG: Spinning the event loop
[task 2021-02-11T18:40:26.936Z] 18:40:26     INFO -  DEBUG: Completed blocker OS.File: flush pending requests, warn about unclosed files, shut down service. for phase xpcom-will-shutdown
[task 2021-02-11T18:40:26.952Z] 18:40:26     INFO -  DEBUG: Finished phase xpcom-will-shutdown
[task 2021-02-11T18:40:26.952Z] 18:40:26     INFO -  DEBUG: Starting phase web-workers-shutdown
[task 2021-02-11T18:40:26.952Z] 18:40:26     INFO -  DEBUG: Spinning the event loop
[task 2021-02-11T18:40:26.952Z] 18:40:26     INFO -  DEBUG: Finished phase web-workers-shutdown
[task 2021-02-11T18:40:32.962Z] 18:40:32     INFO - TEST-UNEXPECTED-FAIL | testing/marionette/harness/marionette_harness/tests/unit/test_quit_restart.py TestQuitRestart.test_in_app_restart_with_callback_but_process_quit | AssertionError: "Process unexpectedly quit without restarting" does not match "Process has been unexpectedly closed (Exit code: 0) (Reason: No data received over socket)"
[task 2021-02-11T18:40:32.962Z] 18:40:32     INFO - Traceback (most recent call last):
[task 2021-02-11T18:40:32.962Z] 18:40:32     INFO -   File "/builds/worker/workspace/build/venv/lib/python2.7/site-packages/marionette_harness/marionette_test/testcases.py", line 214, in run
[task 2021-02-11T18:40:32.962Z] 18:40:32     INFO -     testMethod()
[task 2021-02-11T18:40:32.962Z] 18:40:32     INFO -   File "/builds/worker/workspace/build/tests/marionette/tests/testing/marionette/harness/marionette_harness/tests/unit/test_quit_restart.py", line 220, in test_in_app_restart_with_callback_but_process_quit
[task 2021-02-11T18:40:32.962Z] 18:40:32     INFO -     in_app=True, callback=lambda: self.shutdown(restart=False)
[task 2021-02-11T18:40:32.962Z] 18:40:32     INFO - TEST-INFO took 9064ms
[task 2021-02-11T18:40:32.962Z] 18:40:32     INFO - Application command: /builds/worker/workspace/build/application/firefox/firefox -no-remote -marionette -profile /tmp/tmpZc69mT.mozrunner
[task 2021-02-11T18:40:33.030Z] 18:40:33     INFO -  [CodeCoverage] Setting handlers for process 9824.
[task 2021-02-11T18:40:34.378Z] 18:40:34     INFO -  DEBUG: Adding blocker OS.File: flush pending requests, warn about unclosed files, shut down service. for phase xpcom-will-shutdown
[task 2021-02-11T18:40:34.381Z] 18:40:34     INFO -  DEBUG: Adding blocker OS.File: flush I/O queued before profileBeforeChange for phase profile-before-change
[task 2021-02-11T18:40:34.384Z] 18:40:34     INFO -  DEBUG: Adding blocker AddonManager: shutting down. for phase profile-before-change
[task 2021-02-11T18:40:34.453Z] 18:40:34     INFO -  DEBUG: Adding blocker XPIProvider shutdown for phase quit-application-granted
[task 2021-02-11T18:40:34.456Z] 18:40:34     INFO -  DEBUG: Adding blocker XPIProvider for phase AddonManager: Waiting for providers to shut down.
[task 2021-02-11T18:40:34.457Z] 18:40:34     INFO -  DEBUG: Adding blocker PluginProvider for phase AddonManager: Waiting for providers to shut down.
[task 2021-02-11T18:40:34.458Z] 18:40:34     INFO -  DEBUG: Adding blocker ServiceWorkerRegistrar: Flushing data for phase profile-before-change
[task 2021-02-11T18:40:34.459Z] 18:40:34     INFO -  DEBUG: Adding blocker CrashMonitor: Writing notifications to file after receiving profile-before-change for phase OS.File: Waiting for clients before profileBeforeChange
[task 2021-02-11T18:40:34.475Z] 18:40:34     INFO -  1613068834471	Marionette	TRACE	Marionette enabled
[task 2021-02-11T18:40:34.475Z] 18:40:34     INFO -  1613068834472	Marionette	TRACE	Received observer notification profile-after-change
[task 2021-02-11T18:40:34.490Z] 18:40:34     INFO -  DEBUG: Adding blocker TelemetryController: shutting down for phase profile-before-change-telemetry
[task 2021-02-11T18:40:34.506Z] 18:40:34     INFO -  DEBUG: Adding blocker EnvironmentAddonBuilder for phase AddonManager: Waiting to start provider shutdown.
[task 2021-02-11T18:40:34.521Z] 18:40:34     INFO -  DEBUG: Adding blocker RemoteSettingsClient - finish IDB access. for phase profile-before-change
[task 2021-02-11T18:40:34.525Z] 18:40:34     INFO -  1613068834522	Marionette	TRACE	Received observer notification command-line-startup
[task 2021-02-11T18:40:34.601Z] 18:40:34     INFO -  DEBUG: Adding blocker ClientManagerService: start destroying IPC actors early for phase xpcom-will-shutdown
[task 2021-02-11T18:40:34.858Z] 18:40:34     INFO -  DEBUG: Adding blocker MediaShutdownManager: shutdown for phase profile-before-change
[task 2021-02-11T18:40:34.913Z] 18:40:34     INFO -  DEBUG: Adding blocker JSON store: writing data for phase profile-before-change
[task 2021-02-11T18:40:35.138Z] 18:40:35     INFO -  DEBUG: Adding blocker GMPProvider for phase AddonManager: Waiting for providers to shut down.
[task 2021-02-11T18:40:35.146Z] 18:40:35     INFO -  DEBUG: Adding blocker ContentParent: id=7fc5dffc6000 for phase xpcom-will-shutdown
[task 2021-02-11T18:40:35.147Z] 18:40:35     INFO -  DEBUG: Adding blocker ContentParent: id=7fc5dffc6000 for phase profile-before-change
[task 2021-02-11T18:40:35.203Z] 18:40:35     INFO -  DEBUG: Adding blocker ServiceWorkerShutdownBlocker: shutting down Service Workers for phase profile-change-teardown
[task 2021-02-11T18:40:35.364Z] 18:40:35     INFO -  DEBUG: Adding blocker Places Clients shutdown for phase profile-change-teardown
[task 2021-02-11T18:40:35.364Z] 18:40:35     INFO -  DEBUG: Adding blocker Places Connection shutdown for phase profile-before-change```

So this is a recent regression from the last merge from autoland:
https://treeherder.mozilla.org/jobs?repo=mozilla-central&searchStr=ccov+mn&fromchange=96508d0c910050edc14dd95cd9af86036dac0b90

Can we backfill to figure out which changeset caused the problem?

Flags: needinfo?(sheriffs)

Sure, will post the results after tests finish running.

Flags: needinfo?(sheriffs)

Correcting regressor to bug 1499663.

Regressed by: 1499663
No longer regressed by: 149966

Correcting regressor to bug 1499663.

Component: Marionette → Code Coverage

The failing test: https://searchfox.org/mozilla-central/source/testing/marionette/harness/marionette_harness/tests/unit/test_quit_restart.py#209 is pretty similar to the next one (test_in_app_restart_with_callback_missing_shutdown) which succeeded.
No data received over socket indicates that we hit this exception:
https://searchfox.org/mozilla-central/source/testing/marionette/client/marionette_driver/transport.py#171.
Switching to clang for ccov may have changed overhead due to ccov counters stuff, so it's possible we need to change the timeout here.
:whimboo, how can I change the socket timeout in tests ?

Flags: needinfo?(cdenizet)

I think it would be fine to bump up the shutdown timeout for the test here:
https://searchfox.org/mozilla-central/source/testing/marionette/harness/marionette_harness/tests/unit/test_quit_restart.py#213

The problem right now with this test is that in case of a longer shutdown the test fails (as we can see) but doesn't let kick in the HangMonitor to kill the process, and produce a helpful crash report.

Together with the New Session command we actually get the moz:shutdownTimeout capability returned:

https://searchfox.org/mozilla-release/source/testing/marionette/capabilities.js#481-482

[task 2021-02-11T18:40:23.904Z] 18:40:23     INFO -  1613068823901	Marionette	DEBUG	2 -> [0,1,"WebDriver:NewSession",{"strictFileInteractability":true}]
[task 2021-02-11T18:40:23.912Z] 18:40:23     INFO -  1613068823906	Marionette	DEBUG	2 <- [1,1,null,{"sessionId":"53e298d9-8f0c-40bd-abff-055b607910a2","capabilities":{"browserName":"firefox","browserVersion":"87.0a ... mp/tmpZc69mT.mozrunner","moz:shutdownTimeout":60000,"moz:useNonSpecCompliantPointerOrigin":false,"moz:webdriverClick":true}}]

And since bug 1500242 we update the internal shutdown_timeout value:

https://searchfox.org/mozilla-central/rev/8d9564059dbc1e36fb9152e6b6227343d0b49662/testing/marionette/client/marionette_driver/marionette.py#1200-1203

I assume that for CCOV builds we have a dedicated value set for toolkit.asyncshutdown.crash_timeout?

As such I'm more inclined to completely remove the custom shutdown timeout in the test, and rely on the internal self.shutdown_timeout value if its not None.

Flags: needinfo?(cdenizet)

Needinfo has been moved to bug 1692817.

Flags: needinfo?(cdenizet)

Marco, could you give a reply to comment 7? Thanks.

Flags: needinfo?(mcastelluccio)

(In reply to Henrik Skupin (:whimboo) [⌚️UTC+1] from comment #7)

I think it would be fine to bump up the shutdown timeout for the test here:
https://searchfox.org/mozilla-central/source/testing/marionette/harness/marionette_harness/tests/unit/test_quit_restart.py#213

The problem right now with this test is that in case of a longer shutdown the test fails (as we can see) but doesn't let kick in the HangMonitor to kill the process, and produce a helpful crash report.

Together with the New Session command we actually get the moz:shutdownTimeout capability returned:

https://searchfox.org/mozilla-release/source/testing/marionette/capabilities.js#481-482

[task 2021-02-11T18:40:23.904Z] 18:40:23     INFO -  1613068823901	Marionette	DEBUG	2 -> [0,1,"WebDriver:NewSession",{"strictFileInteractability":true}]
[task 2021-02-11T18:40:23.912Z] 18:40:23     INFO -  1613068823906	Marionette	DEBUG	2 <- [1,1,null,{"sessionId":"53e298d9-8f0c-40bd-abff-055b607910a2","capabilities":{"browserName":"firefox","browserVersion":"87.0a ... mp/tmpZc69mT.mozrunner","moz:shutdownTimeout":60000,"moz:useNonSpecCompliantPointerOrigin":false,"moz:webdriverClick":true}}]

And since bug 1500242 we update the internal shutdown_timeout value:

https://searchfox.org/mozilla-central/rev/8d9564059dbc1e36fb9152e6b6227343d0b49662/testing/marionette/client/marionette_driver/marionette.py#1200-1203

I assume that for CCOV builds we have a dedicated value set for toolkit.asyncshutdown.crash_timeout?

As such I'm more inclined to completely remove the custom shutdown timeout in the test, and rely on the internal self.shutdown_timeout value if its not None.

I think we don't have a dedicated value, maybe we should.
I see ASan and TSan do have a dedicated value: https://searchfox.org/mozilla-central/rev/002023eb262be9db3479142355e1675645d52d52/modules/libpref/init/all.js#792.

Flags: needinfo?(mcastelluccio)

I filed bug 1697047 to get a dedicated value for toolkit.asyncshutdown.crash_timeout. Once we have that our tests will be more stable.

Component: Code Coverage → Marionette

Fixed by bug 1693288. I will move the remaining work to make tests more stable to a different bug.

Assignee: nobody → dmajor
Status: NEW → RESOLVED
Closed: 4 years ago
Depends on: 1693288
Resolution: --- → FIXED
Target Milestone: --- → 88 Branch
Product: Testing → Remote Protocol
Moving bug to Testing::Marionette Client and Harness component per bug 1815831.
Component: Marionette → Marionette Client and Harness
Product: Remote Protocol → Testing
You need to log in before you can comment on or make changes to this bug.