Open Bug 1630162 Opened 5 years ago Updated 5 days ago

Intermittent Wd TEST-UNEXPECTED-TIMEOUT | expected OK [shutdown hang]

Categories

(Remote Protocol :: Marionette, defect, P5)

Version 3
defect

Tracking

(Not tracked)

REOPENED

People

(Reporter: intermittent-bug-filer, Unassigned)

References

(Depends on 1 open bug)

Details

(Keywords: intermittent-failure)

Filed by: ccoroiu [at] mozilla.com
Parsed log: https://treeherder.mozilla.org/logviewer.html#?job_id=297631286&repo=mozilla-central
Full log: https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/OIV_AGuRTDeuoSBzs4niRQ/runs/0/artifacts/public/logs/live_backing.log


[task 2020-04-15T00:29:36.827Z] 00:29:36 INFO - PID 1496 | 1586910576818 Marionette DEBUG Resetting recommended pref dom.file.createInChild
[task 2020-04-15T00:29:36.827Z] 00:29:36 INFO - PID 1496 | 1586910576818 Marionette DEBUG Resetting recommended pref dom.max_chrome_script_run_time
[task 2020-04-15T00:29:36.828Z] 00:29:36 INFO - PID 1496 | 1586910576819 Marionette DEBUG Resetting recommended pref dom.max_script_run_time
[task 2020-04-15T00:29:36.828Z] 00:29:36 INFO - PID 1496 | 1586910576820 Marionette DEBUG Resetting recommended pref dom.push.connection.enabled
[task 2020-04-15T00:29:36.829Z] 00:29:36 INFO - PID 1496 | 1586910576820 Marionette DEBUG Resetting recommended pref extensions.getAddons.cache.enabled
[task 2020-04-15T00:29:36.830Z] 00:29:36 INFO - PID 1496 | 1586910576820 Marionette DEBUG Resetting recommended pref extensions.webservice.discoverURL
[task 2020-04-15T00:29:36.830Z] 00:29:36 INFO - PID 1496 | 1586910576821 Marionette DEBUG Resetting recommended pref network.http.prompt-temp-redirect
[task 2020-04-15T00:29:36.831Z] 00:29:36 INFO - PID 1496 | 1586910576821 Marionette DEBUG Resetting recommended pref network.http.speculative-parallel-limit
[task 2020-04-15T00:29:36.831Z] 00:29:36 INFO - PID 1496 | 1586910576821 Marionette DEBUG Resetting recommended pref privacy.trackingprotection.enabled
[task 2020-04-15T00:29:36.832Z] 00:29:36 INFO - PID 1496 | 1586910576821 Marionette DEBUG Resetting recommended pref security.fileuri.strict_origin_policy
[task 2020-04-15T00:29:36.832Z] 00:29:36 INFO - PID 1496 | 1586910576821 Marionette DEBUG Resetting recommended pref security.notification_enable_delay
[task 2020-04-15T00:29:36.832Z] 00:29:36 INFO - PID 1496 | 1586910576822 Marionette DEBUG Resetting recommended pref signon.autofillForms
[task 2020-04-15T00:29:36.832Z] 00:29:36 INFO - PID 1496 | 1586910576823 Marionette DEBUG Resetting recommended pref signon.rememberSignons
[task 2020-04-15T00:29:36.832Z] 00:29:36 INFO - PID 1496 | 1586910576824 Marionette DEBUG Resetting recommended pref toolkit.cosmeticAnimations.enabled
[task 2020-04-15T00:29:36.832Z] 00:29:36 INFO - PID 1496 | 1586910576826 Marionette DEBUG Marionette stopped listening
[task 2020-04-15T00:29:36.832Z] 00:29:36 INFO - PID 1496 | DEBUG: Starting phase xpcom-will-shutdown
[task 2020-04-15T00:29:36.832Z] 00:29:36 INFO - PID 1496 | DEBUG: Spinning the event loop
[task 2020-04-15T00:29:36.922Z] 00:29:36 INFO - PID 1496 | DEBUG: Finished phase xpcom-will-shutdown
[task 2020-04-15T00:29:36.923Z] 00:29:36 INFO - PID 1496 |
[task 2020-04-15T00:29:36.924Z] 00:29:36 INFO - PID 1496 | ###!!! [Child][RunMessage] Error: Channel closing: too late to send/recv, messages will be lost
[task 2020-04-15T00:29:36.925Z] 00:29:36 INFO - PID 1496 |
[task 2020-04-15T00:29:36.925Z] 00:29:36 INFO - PID 1496 | DEBUG: Starting phase web-workers-shutdown
[task 2020-04-15T00:29:36.926Z] 00:29:36 INFO - PID 1496 | DEBUG: Spinning the event loop
[task 2020-04-15T00:29:36.926Z] 00:29:36 INFO - PID 1496 | DEBUG: Completed blocker OS.File: flush pending requests, warn about unclosed files, shut down service. for phase web-workers-shutdown
[task 2020-04-15T00:29:36.926Z] 00:29:36 INFO - PID 1496 | DEBUG: Finished phase web-workers-shutdown
[task 2020-04-15T00:30:09.508Z] 00:30:09 INFO - TEST-UNEXPECTED-TIMEOUT | /webdriver/tests/set_window_rect/user_prompts.py | expected OK
[task 2020-04-15T00:30:09.508Z] 00:30:09 INFO - TEST-INFO took 365004ms
[task 2020-04-15T00:30:09.589Z] 00:30:09 INFO - Closing logging queue
[task 2020-04-15T00:30:09.589Z] 00:30:09 INFO - queue closed
[task 2020-04-15T00:30:09.589Z] 00:30:09 INFO - Starting runner
[task 2020-04-15T00:30:10.019Z] 00:30:10 INFO - PID 2596 | 1586910610002 geckodriver DEBUG Listening on 127.0.0.1:49240
[task 2020-04-15T00:30:10.505Z] 00:30:10 INFO - WebDriver HTTP server listening at http://127.0.0.1:49240/
[task 2020-04-15T00:30:10.505Z] 00:30:10 INFO - TEST-START | /webdriver/tests/status/status.py
[task 2020-04-15T00:30:10.796Z] 00:30:10 INFO - STDOUT: ============================= test session starts =============================
[task 2020-04-15T00:30:10.796Z] 00:30:10 INFO - STDOUT: platform win32 -- Python 2.7.15, pytest-unknown, py-1.5.2, pluggy-0.5.3.dev -- Z:\task_1586905291\build\venv\Scripts\python.exe
[task 2020-04-15T00:30:10.796Z] 00:30:10 INFO - STDOUT: rootdir: Z:\task_1586905291\build, inifile:
[task 2020-04-15T00:30:10.796Z] 00:30:10 INFO - STDOUT: collecting ...
[task 2020-04-15T00:30:10.796Z] 00:30:10 INFO - STDOUT: collected 2 items
[task 2020-04-15T00:30:10.797Z] 00:30:10 INFO - STDOUT: tests/web-platform/tests/webdriver/tests/status/status.py::test_get_status_no_session
[task 2020-04-15T00:30:10.798Z] 00:30:10 INFO - PID 2596 | 1586910610795 webdriver::server DEBUG -> GET /status
[task 2020-04-15T00:30:10.799Z] 00:30:10 INFO - PID 2596 | 1586910610795 webdriver::server DEBUG <- 200 OK {"value":{"message":"","ready":true}}
[task 2020-04-15T00:30:10.799Z] 00:30:10 INFO - STDOUT: PASSED
[task 2020-04-15T00:30:10.799Z] 00:30:10 INFO - STDOUT: tests/web-platform/tests/webdriver/tests/status/status.py::test_status_with_session_running_on_endpoint_node
[task 2020-04-15T00:30:10.806Z] 00:30:10 INFO - PID 2596 | 1586910610798 webdriver::server DEBUG -> POST /session {"capabilities": {"alwaysMatch": {"moz:firefoxOptions": {"binary": "Z:\task_1586905291\build\application\firefox\firefox.exe", "prefs": {"network.process.enabled": false, "toolkit.asyncshutdown.log": true, "network.dns.localDomains": "www1.xn--n8j6ds53lwwkrqhv28a.web-platform.test,op88.web-platform.test,op36.not-web-platform.test,op53.not-web-platform.test,op50.not-web-platform.test,xn--lve-6lad.www.web-platform.test,op98.web-platform.test,op24.not-web-platform.test,op31.not-web-platform.test,op95.not-web-platform.test,op85.web-platform.test,op83.not-web-platform.test,www2.not-web-platform.test,xn--lve-6lad.www.not-web-platform.test,op73.not-web-platform.test,op8.web-platform.test,www2.www2.not-web-platform.test,op89.web-platform.test,op66.web-platform.test,xn--lve-6lad.web-platform.test,op19.not-web-platform.test,www1.www2.web-platform.test,op72.web-platform.test,op24.web-platform.test,op21.not-web-platform.test,xn--lve-6lad.not-web-platform.test,op41.web-platform.test,op79.web-platform.test,op81.not-web-platform.test,op70.not-web-platform.test,xn--n8j6ds53lwwkrqhv28a.xn--lve-6lad.not-web-platform.test,op78.not-web-platform.test,op6.not-web-platform.test,www1.www.not-web-platform.test,op40.not-web-platform.test,op25.not-web-platform.test,op3.not-web-platform.test,op65.not-web-platform.test,op91.web-platform.test,www.www2.web-platform.test,op80.not-web-platform.test,op59.web-platform.test,op52.not-web-platform.test,xn--lve-6lad.xn--lve-6lad.web-platform.test,op68.not-web-platform.test,op45.not-web-platform.test,op71.not-web-platform.test,op72.not-web-platform.test,xn--n8j6ds53lwwkrqhv28a.www2.web-platform.test,op39.web-platform.test,op90.not-web-platform.test,op60.web-platform.test,op58.web-platform.test,op28.web-platform.test,www1.web-platform.test,xn--n8j6ds53lwwkrqhv28a.xn--lve-6lad.web-platform.test,op14.web-platform.test,op89.not-web-platform.test,op69.web-platform.test,op49.not-web-platform.test,op40.web-platform.test,op2.not-web-platform.test,op5.not-web-platform.test,www.www2.not-web-platform.test,op77.not-web-platform.test,www.xn--n8j6ds53lwwkrqhv28a.web-platform.test,op7.web-platform.test,op74.web-platform.test,op79.not-web-platform.test,op82.not-web-platform.test,www.www1.web-platform.test,op12.not-web-platform.test,op39.not-web-platform.test,op31.web-platform.test,www.not-web-platform.test,www.www.not-web-platform.test,op44.not-web-platform.test,www1.not-web-platform.test,xn--n8j6ds53lwwkrqhv28a.www1.web-platform.test,op58.not-web-platform.test,op14.not-web-platform.test,op30.not-web-platform.test,op62.not-web-platform.test,op61.not-web-platform.test,op92.not-web-platform.test,www2.xn--lve-6lad.web-platform.test,op29.not-web-platform.test,op18.web-platform.test,op73.web-platform.test,xn--n8j6ds53lwwkrqhv28a.xn--n8j6ds53lwwkrqhv28a.web-platform.test,op77.web-platform.test,op12.web-platform.test,op54.web-platform.test,op63.web-platform.test,op71.web-platform.test,www2.www1.not-web-platform.test,op95.web-platform.test,op16.web-platform.test,op36.web-platform.test,op27.web-platform.test,www.www.web-platform.test,op98.not-web-platform.test,op64.not-web-platform.test,op29.web-platform.test,op9.web-platform.test,op26.not-web-platform.test,op94.web-platform.test,xn--n8j6ds53lwwkrqhv28a.www2.not-web-platform.test,op44.web-platform.test,op94.not-web-platform.test,op33.web-platform.test,op38.not-web-platform.test,op33.not-web-platform.test,op84.web-platform.test,www1.www1.not-web-platform.test,op23.not-web-platform.test,op57.not-web-platform.test,op54.not-web-platform.test,op85.not-web-platform.test,www2.www2.web-platform.test,op46.not-web-platform.test,op97.not-web-platform.test,op32.web-platform.test,op61.web-platform.test,op70.web-platform.test,www2.web-platform.test,op32.not-web-platform.test,op60.not-web-platform.test,op4.web-platform.test,op43.web-platform.test,op7.not-web-platform.test,op78.web-platform.test,op26.web-platform.test,xn--lve-6lad.xn--n8j6ds53lwwkrqhv28a.web-platform.test,op96.not-web-platform.test,op51.not-web-platform.test,op41.not-web-platform.test,op76.web-platform.test,op52.web-platform.test,op99.web-platform.test,op35.not-web-platform.test,op99.not-web-platform.test,op86.web-platform.test,not-web-platform.test,op42.not-web-platform.test,op46.web-platform.test,op67.not-web-platform.test,op17.web-platform.test,op90.web-platform.test,op93.web-platform.test,op37.not-web-platform.test,op48.not-web-platform.test,op10.web-platform.test,op55.not-web-platform.test,op4.not-web-platform.test,www1.xn--n8j6ds53lwwkrqhv28a.not-web-platform.test,op55.web-platform.test,xn--lve-6lad.www2.web-platform.test,op47.web-platform.test,op51.web-platform.test,op45.web-platform.test,op80.web-platform.test,op68.web-platform.test,op49.web-platform.test,op57.web-platform.test,www2.xn--n8j6ds53lwwkrqhv28a.web-platform.test,www.xn--n8j6ds53lwwkrqhv28a.not-web-platform.test,op56.not-web-platform.test,op35.web-platform.test,op84.not-web-platform.test,xn--n8j6ds53lwwkrqhv28a.not-web-platform.test,xn--lve-6lad.xn--n8j6ds53lwwkrqhv28a.not-web-platform.test,op34.not-web-platform.test,op6.web-platform.test,web-platform.test,op67.web-platform.test,op69.not-web-platform.test,op11.not-web-platform.test,op93.not-web-platform.test,www1.www.web-platform.test,op86.not-web-platform.test,op8.not-web-platform.test,www2.xn--n8j6ds53lwwkrqhv28a.not-web-platform.test,op92.web-platform.test,xn--lve-6lad.www1.not-web-platform.test,op15.web-platform.test,op13.not-web-platform.test,op13.web-platform.test,xn--n8j6ds53lwwkrqhv28a.web-platform.test,xn--n8j6ds53lwwkrqhv28a.www.web-platform.test,op75.web-platform.test,op20.not-web-platform.test,op76.not-web-platform.test,op64.web-platform.test,op97.web-platform.test,op37.web-platform.test,op56.web-platform.test,op62.web-platform.test,op82.web-platform.test,op25.web-platform.test,op11.web-platform.test,www.xn--lve-6lad.not-web-platform.test,www2.www1.web-platform.test,op27.not-web-platform.test,op50.web-platform.test,op17.not-web-platform.test,op38.web-platform.test,www2.www.not-web-platform.test,xn--lve-6lad.www1.web-platform.test,op75.not-web-platform.test,op83.web-platform.test,op81.web-platform.test,op15.not-web-platform.test,xn--n8j6ds53lwwkrqhv28a.www.not-web-platform.test,op20.web-platform.test,op3.web-platform.test,www1.www2.not-web-platform.test,xn--n8j6ds53lwwkrqhv28a.xn--n8j6ds53lwwkrqhv28a.not-web-platform.test,op2.web-platform.test,op21.web-platform.test,op23.web-platform.test,op42.web-platform.test,op47.not-web-platform.test,www1.www1.web-platform.test,op18.not-web-platform.test,op22.web-platform.test,xn--lve-6lad.xn--lve-6lad.not-web-platform.test,op63.not-web-platform.test,op28.not-web-platform.test,op22.not-web-platform.test,op65.web-platform.test,www.www1.not-web-platform.test,www1.xn--lve-6lad.web-platform.test,op43.not-web-platform.test,op66.not-web-platform.test,www2.www.web-platform.test,op96.web-platform.test,op91.not-web-platform.test,www.xn--lve-6lad.web-platform.test,op1.web-platform.test,op74.not-web-platform.test,op87.web-platform.test,op59.not-web-platform.test,op19.web-platform.test,xn--n8j6ds53lwwkrqhv28a.www1.not-web-platform.test,op9.not-web-platform.test,op88.not-web-platform.test,op53.web-platform.test,www2.xn--lve-6lad.not-web-platform.test,op87.not-web-platform.test,op30.web-platform.test,op10.not-web-platform.test,op48.web-platform.test,op16.not-web-platform.test,op34.web-platform.test,op1.not-web-platform.test,www.web-platform.test,op5.web-platform.test,www1.xn--lve-6lad.not-web-platform.test,xn--lve-6lad.www2.not-web-platform.test", "media.peerconnection.mtransport_process": false}}}}}
[task 2020-04-15T00:30:10.811Z] 00:30:10 INFO - PID 2596 | 1586910610803 mozrunner::runner INFO Running command: "Z:\task_1586905291\build\application\firefox\firefox.exe" "-marionette" "-foreground" "-no-remote" "-profile" "C:\Users\task_1586905291\AppData\Local\Temp\rust_mozprofileCej2NI"
[task 2020-04-15T00:30:10.811Z] 00:30:10 INFO - PID 2596 | 1586910610804 geckodriver::marionette DEBUG Waiting 60s to connect to browser on 127.0.0.1:2828
[task 2020-04-15T00:30:11.220Z] 00:30:11 INFO - PID 2596 | DEBUG: Adding blocker OS.File: flush pending requests, warn about unclosed files, shut down service. for phase web-workers-shutdown
[task 2020-04-15T00:30:11.220Z] 00:30:11 INFO - PID 2596 | DEBUG: Adding blocker OS.File: flush I/O queued before profileBeforeChange for phase profile-before-change
[task 2020-04-15T00:30:11.225Z] 00:30:11 INFO - PID 2596 | DEBUG: Adding blocker AddonManager: shutting down. for phase profile-before-change
[task 2020-04-15T00:30:11.312Z] 00:30:11 INFO - PID 2596 | DEBUG: Adding blocker ClientManagerService: start destroying IPC actors early for phase xpcom-will-shutdown
[task 2020-04-15T00:30:11.497Z] 00:30:11 INFO - PID 2596 | DEBUG: Adding blocker Flush WebExtension StartupCache for phase profile-before-change
[task 2020-04-15T00:30:11.785Z] 00:30:11 INFO - PID 2596 | DEBUG: Adding blocker JSON store: writing data for phase profile-before-change
[task 2020-04-15T00:30:11.919Z] 00:30:11 INFO - PID 2596 | DEBUG: Adding blocker JSON store: writing data for phase AddonManager: Waiting for providers to shut down.
[task 2020-04-15T00:30:11.919Z] 00:30:11 INFO - PID 2596 | DEBUG: Adding blocker Update add-on blocklist state into add-on DB for phase AddonManager: Waiting to start provider shutdown.
[task 2020-04-15T00:30:11.919Z] 00:30:11 INFO - PID 2596 | DEBUG: Completed blocker Update add-on blocklist state into add-on DB for phase AddonManager: Waiting to start provider shutdown.
[task 2020-04-15T00:30:11.941Z] 00:30:11 INFO - PID 2596 | DEBUG: Adding blocker XPIProvider shutdown for phase quit-application-granted
[task 2020-04-15T00:30:11.941Z] 00:30:11 INFO - PID 2596 | DEBUG: Adding blocker XPIProvider for phase AddonManager: Waiting for providers to shut down.
[task 2020-04-15T00:30:11.941Z] 00:30:11 INFO - PID 2596 | DEBUG: Adding blocker PluginProvider for phase AddonManager: Waiting for providers to shut down.
[task 2020-04-15T00:30:11.960Z] 00:30:11 INFO - PID 2596 | DEBUG: Adding blocker ServiceWorkerRegistrar: Flushing data for phase profile-before-change
[task 2020-04-15T00:30:11.960Z] 00:30:11 INFO - PID 2596 | DEBUG: Adding blocker CrashMonitor: Writing notifications to file after receiving profile-before-change for phase OS.File: Waiting for clients before profileBeforeChange
[task 2020-04-15T00:30:12.087Z] 00:30:12 INFO - PID 2596 | DEBUG: Adding blocker TelemetryController: shutting down for phase profile-before-change-telemetry
[task 2020-04-15T00:30:12.117Z] 00:30:12 INFO - PID 2596 | DEBUG: Adding blocker EnvironmentAddonBuilder for phase AddonManager: Waiting to start provider shutdown.
[task 2020-04-15T00:30:12.156Z] 00:30:12 INFO - PID 2596 | DEBUG: Adding blocker RemoteSettingsClient - finish IDB access. for phase profile-before-change
[task 2020-04-15T00:30:12.156Z] 00:30:12 INFO - PID 2596 | 1586910612149 Marionette TRACE Marionette enabled
[task 2020-04-15T00:30:12.322Z] 00:30:12 INFO - PID 2596 | 1586910612319 Marionette TRACE Received observer notification toplevel-window-ready
[task 2020-04-15T00:30:12.711Z] 00:30:12 INFO - PID 2596 | DEBUG: Adding blocker MediaShutdownManager: shutdown for phase profile-before-change
[task 2020-04-15T00:30:12.957Z] 00:30:12 INFO - PID 2596 | DEBUG: Adding blocker ServiceWorkerShutdownBlocker: shutting down Service Workers for phase profile-change-teardown
[task 2020-04-15T00:30:13.059Z] 00:30:13 INFO - PID 2596 | DEBUG: Adding blocker JSON store: writing data for phase profile-before-change
[task 2020-04-15T00:30:13.102Z] 00:30:13 INFO - PID 2596 | DEBUG: Adding blocker JSON store: writing data for phase profile-before-change
[task 2020-04-15T00:30:13.102Z] 00:30:13 INFO - PID 2596 | DEBUG: Adding blocker Flush AddonRepository for phase profile-before-change
[task 2020-04-15T00:30:13.145Z] 00:30:13 INFO - PID 2596 | DEBUG: Adding blocker GMPProvider for phase AddonManager: Waiting for providers to shut down.
[task 2020-04-15T00:30:13.145Z] 00:30:13 INFO - PID 2596 | DEBUG: Adding blocker PluginFinder: finding plugins to load for phase profile-change-teardown
[task 2020-04-15T00:30:13.296Z] 00:30:13 INFO - PID 2596 | DEBUG: Adding blocker Remote Settings profile-before-change for phase profile-before-change
[task 2020-04-15T00:30:13.785Z] 00:30:13 INFO - PID 2596 | DEBUG: Adding blocker MediaShutdownManager: shutdown for phase xpcom-will-shutdown
[task 2020-04-15T00:30:14.172Z] 00:30:14 INFO - PID 2596 | DEBUG: Adding blocker Places Clients shutdown for phase profile-change-teardown
[task 2020-04-15T00:30:14.172Z] 00:30:14 INFO - PID 2596 | DEBUG: Adding blocker Places Connection shutdown for phase profile-before-change
[task 2020-04-15T00:30:14.231Z] 00:30:14 INFO - PID 2596 | DEBUG: Adding blocker PageActions: purging unregistered actions from cache for phase profile-before-change
[task 2020-04-15T00:30:14.453Z] 00:30:14 INFO - PID 2596 | DEBUG: Adding blocker Sqlite.jsm shutdown blocker for phase profile-before-change

There is a 1 minute shutdown delay after phase web-workers-shutdown:

https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=297631286&repo=mozilla-central&lineNumber=78424-78425

[task 2020-04-15T00:28:28.248Z] 00:28:28 INFO - PID 1496 | DEBUG: Finished phase web-workers-shutdown
[task 2020-04-15T00:29:28.890Z] 00:29:28 INFO - PID 1496 | 1586910568880 geckodriver::marionette DEBUG Browser process stopped: exit code: 0

Andrea, any idea what this could be?

Flags: needinfo?(amarchesini)

asuth do you have time to take a look? Thanks!

Flags: needinfo?(amarchesini) → needinfo?(bugmail)

I just noticed that this failure is for a ccov build. Maybe there is some special logic, and Marco might know.

Summary: Intermittent TEST-UNEXPECTED-TIMEOUT | /webdriver/tests/set_window_rect/user_prompts.py | expected OK → Intermittent ccov TEST-UNEXPECTED-TIMEOUT | /webdriver/tests/set_window_rect/user_prompts.py | expected OK [hang after finished phase web-workers-shutdown]

The web workers (sub-)phase happens during RuntimeService::Shutdown being invoked by the "xpcom-shutdown" notification. This suggests something else in "xpcom-shutdown" is still doing things? Ideally a crash would be induced in the browser to see what was on the stack.

Flags: needinfo?(bugmail)

Sadly we cannot detect crashes yet for these type of tests due to bug 1490906. That means we might not have a chance to continue investigation for now.

Depends on: 1490906

Ccov is pretty slow at shutdown, as that's when it writes out the coverage counters. When we'll be able to induce a crash we can check if the process was in the process of writing coverage files.

See Also: → 1627394

This is not only happening for ccov builds but also shippable opt. See the recently duped bug.

Summary: Intermittent ccov TEST-UNEXPECTED-TIMEOUT | /webdriver/tests/set_window_rect/user_prompts.py | expected OK [hang after finished phase web-workers-shutdown] → Intermittent Wd TEST-UNEXPECTED-TIMEOUT | expected OK [hang after finished phase web-workers-shutdown]
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → INCOMPLETE
Status: RESOLVED → REOPENED
Resolution: INCOMPLETE → ---

The shutdown hang as port of the failure logged as bug 1698259 is even longer and takes nearly 3 minutes.

Gabriele, isn't the shutdown hang monitor to kick in and crash Firefox eventually? Why doesn't that happen here? If you are the wrong person to ask please forward appropriately.

Flags: needinfo?(gsvelto)
Summary: Intermittent Wd TEST-UNEXPECTED-TIMEOUT | expected OK [hang after finished phase web-workers-shutdown] → Intermittent Wd TEST-UNEXPECTED-TIMEOUT | expected OK [shutdown hang]

It depends on the build. The normal timeout is 60 seconds but it can be bumped to 180 or even 360 depending on the build, see here. However IIRC every time we start a new shutdown phase the timer is reset to 0 and we have quite a few phases, so if they're all slow the over shutdown may last for significantly longer than the timeout w/o causing a crash.

Flags: needinfo?(gsvelto)

(In reply to Gabriele Svelto [:gsvelto] from comment #17)

However IIRC every time we start a new shutdown phase the timer is reset to 0 and we have quite a few phases, so if they're all slow the over shutdown may last for significantly longer than the timeout w/o causing a crash.

Huh, is that really the case? We are using toolkit.asyncshutdown.crash_timeout in Marionette/geckodriver to determine when to actually force-kill the application if time exceeded by extra 10s on top. Do you know who can say for sure how that timer is working?

Flags: needinfo?(gsvelto)

Looking at the code we compare gHeartbeat to the running timer. Whenever we advance to a new shutdown phase we invoke UpdateHeartbeat() that resets the heartbeat to zero.

Or at least that's how I understand this code. BTW I just noticed that this code was recently changed in bug 1689953.

Flags: needinfo?(gsvelto)

Hm ok. Thinking a bit more of it I believe we should keep that force-kill after the 60s. If there is indeed a long shutdown due to some hanging phases we would like to know about it. And killing the process would at least give us a chance to see how often it happens.

Would there be a way to kill the process and get a proper crash stack with the hanging phase? If yes, that would be perfect.

Maybe we need mozcrash.kill_and_get_minidump() under such a condition and not just call mozrunner.stop().

That would do the trick.

Thanks! I filed bug 1698937 to get this implemented.

Status: REOPENED → RESOLVED
Closed: 4 years ago3 years ago
Resolution: --- → INCOMPLETE
Status: RESOLVED → REOPENED
Resolution: INCOMPLETE → ---
Status: REOPENED → RESOLVED
Closed: 3 years ago3 years ago
Resolution: --- → INCOMPLETE
Status: RESOLVED → REOPENED
Resolution: INCOMPLETE → ---
Status: REOPENED → RESOLVED
Closed: 3 years ago3 years ago
Resolution: --- → INCOMPLETE
Status: RESOLVED → REOPENED
Resolution: INCOMPLETE → ---
  • windows10-64-2004-ccov-qr: 18
    • opt: 18

For more details, see:
https://treeherder.mozilla.org/intermittent-failures/bugdetails?bug=1630162&startday=2022-08-01&endday=2022-08-07&tree=all

These failures are actually all about bug 1783217 and should be re-classified.

Flags: needinfo?(sheriffs)
Flags: needinfo?(sheriffs)

(In reply to Henrik Skupin (:whimboo) [⌚️UTC+2] from comment #78)

  • windows10-64-2004-ccov-qr: 18
    • opt: 18

For more details, see:
https://treeherder.mozilla.org/intermittent-failures/bugdetails?bug=1630162&startday=2022-08-01&endday=2022-08-07&tree=all

These failures are actually all about bug 1783217 and should be re-classified.

Those have been reclassified.

(In reply to Intermittent Failures Robot from comment #85)

Platform and build breakdown:

  • windows10-64-2004-ccov-qr: 13
    • opt: 13

All these failures are wrongly classified and should have classified against bug 1783217. I asked the sheriffs to correct that.

(In reply to Intermittent Failures Robot from comment #89)

Platform and build breakdown:

  • windows10-64-2004-ccov-qr: 9

These failures are all about bug 1783217. I requested to re-classify the failures.

Failures are reclassified.

Severity: normal → S3

Most of the failures from the last two weeks are CCOV related and as such have been wrongly classified again. I've asked the sheriffs to correct those entries to bug 1783217.

(In reply to Intermittent Failures Robot from comment #103)

Platform and build breakdown:

  • android-em-7-0-x86_64-qr: 9
    • debug-isolated-process: 8
    • opt: 1
  • android-em-7-0-x86_64-shippable-qr: 3
    • opt: 3

All of these were wrongly classified and mostly are bug 1804027. I've re-classified.

(In reply to Intermittent Failures Robot from comment #105)

12 failures in 3510 pushes (0.003 failures/push) were associated with this bug in the last 7 days.

Platform and build breakdown:

  • windows10-64-2004-ccov-qr: 7
    • opt: 7

All these CCOV related failures from the last week are actually related to bug 1783217. Can those be re-classified please? Thanks.

Flags: needinfo?(sheriffs)

Redirect a needinfo that is pending on an inactive user to the triage owner.
:whimboo, since the bug has recent activity, could you please find another way to get the information or close the bug as INCOMPLETE if it is not actionable?

For more information, please visit auto_nag documentation.

Flags: needinfo?(sheriffs) → needinfo?(hskupin)

Sorry, the bot was not aware of the sheriffs@mozilla.bugs account.

Flags: needinfo?(hskupin) → needinfo?(sheriffs)

The failures are re-classified.

Flags: needinfo?(sheriffs)

(In reply to Intermittent Failures Robot from comment #115)

Platform and build breakdown:

  • android-em-7-0-x86_64-qr: 10
    • debug-isolated-process: 10
  • android-em-7-0-x86_64-shippable-qr: 3
    • opt: 3

All these Android specific failures including others from former weeks are actually related to bug 1804027. Could someone please re-classify? Thanks.

Flags: needinfo?(sheriffs)

All the above are now re-classified. Sorry for the inconvenience.

Flags: needinfo?(sheriffs)
Duplicate of this bug: 1815450
Moving bug to Remote Protocol::Marionette component per bug 1815831.
Component: geckodriver → Marionette
Product: Testing → Remote Protocol
Duplicate of this bug: 1819734
Duplicate of this bug: 1769749
Duplicate of this bug: 1822079
Duplicate of this bug: 1822081
Duplicate of this bug: 1827331
Duplicate of this bug: 1833032

(In reply to Intermittent Failures Robot from comment #146)

31 failures in 4079 pushes (0.008 failures/push) were associated with this bug in the last 7 days.

Most of these failures have been wrongly classified and should actually have been flagged for bug 1540367. I corrected all the classifications.

Duplicate of this bug: 1840183
Duplicate of this bug: 1845329
Duplicate of this bug: 1848314

There are quite a few TSAN related shutdown hangs recently. With the fix on bug 1848121 we should see them most likely go away or at least show up with a way lower rate.

Depends on: 1848121
Duplicate of this bug: 1850776
Duplicate of this bug: 1856784
You need to log in before you can comment on or make changes to this bug.