Closed Bug 1783217 Opened 2 years ago Closed 2 months ago

Intermittent [tier 2] CCOV Windows Wd TEST-UNEXPECTED-TIMEOUT | expected OK (shutdown hang - WARNING: Process <PID> may be hanging at shutdown; will wait for up to <TIME>ms)

Categories

(Testing :: Code Coverage, defect, P5)

Desktop
Windows
defect

Tracking

(firefox-esr91 unaffected, firefox-esr102 unaffected, firefox103 unaffected, firefox104 unaffected, firefox105 disabled)

RESOLVED INCOMPLETE
Tracking Status
firefox-esr91 --- unaffected
firefox-esr102 --- unaffected
firefox103 --- unaffected
firefox104 --- unaffected
firefox105 --- disabled

People

(Reporter: intermittent-bug-filer, Unassigned)

References

Details

(Keywords: intermittent-failure, regression, regressionwindow-wanted)

Filed by: imoraru [at] mozilla.com
Parsed log: https://treeherder.mozilla.org/logviewer?job_id=386385559&repo=mozilla-central
Full log: https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/d6BAuziSSrS-ZQkQ0R69xg/runs/0/artifacts/public/logs/live_backing.log


[task 2022-08-04T11:53:09.892Z] 11:53:09     INFO - TEST-START | /webdriver/tests/get_named_cookie/get.py
[task 2022-08-04T11:53:10.270Z] 11:53:10     INFO - STDOUT: ============================= test session starts =============================
[task 2022-08-04T11:53:10.277Z] 11:53:10     INFO - STDOUT: platform win32 -- Python 3.9.5, pytest-7.0.1, pluggy-unknown -- Z:\task_165961199010716\build\venv\Scripts\python.exe
[task 2022-08-04T11:53:10.279Z] 11:53:10     INFO - STDOUT: rootdir: Z:\task_165961199010716\build
[task 2022-08-04T11:53:10.279Z] 11:53:10     INFO - STDOUT: plugins: asyncio-0.14.0
[task 2022-08-04T11:53:10.280Z] 11:53:10     INFO - STDOUT: collecting ... 
[task 2022-08-04T11:53:10.323Z] 11:53:10     INFO - STDOUT: collected 8 items
[task 2022-08-04T11:53:10.333Z] 11:53:10     INFO - STDOUT: tests/web-platform/tests/webdriver/tests/get_named_cookie/get.py::test_no_top_browsing_context 
[task 2022-08-04T11:53:10.342Z] 11:53:10     INFO - PID 6956 | 1659613990340	webdriver::server	DEBUG	-> POST /session {"capabilities": {"alwaysMatch": {"moz:firefoxOptions": {"args": ["--profile", "C:\\Users\\task_165961199010716\\AppData\\Local\\Temp\\tmpye5a7axu.mozrunner"], "binary": "Z:\\task_165961199010716\\build\\application\\firefox\\firefox.exe"}}}}
[task 2022-08-04T11:53:10.343Z] 11:53:10     INFO - PID 6956 | 1659613990340	geckodriver::capabilities	DEBUG	Trying to read firefox version from ini files
[task 2022-08-04T11:53:10.343Z] 11:53:10     INFO - PID 6956 | 1659613990341	geckodriver::capabilities	DEBUG	Found version 105.0a1
<...>
[task 2022-08-04T11:53:50.826Z] 11:53:50     INFO - STDOUT: tests/web-platform/tests/webdriver/tests/get_named_cookie/get.py::test_get_cookie_with_same_site_flag[capabilities0-Strict] 
[task 2022-08-04T11:53:50.828Z] 11:53:50     INFO - PID 6956 | 1659614030826	webdriver::server	DEBUG	-> POST /session/28a9425e-9164-4561-bea7-231487378b51/window/rect {"width": 800, "height": 600}
[task 2022-08-04T11:53:50.828Z] 11:53:50     INFO - PID 6956 | 1659614030827	Marionette	DEBUG	0 -> [0,42,"WebDriver:SetWindowRect",{"height":600,"width":800}]
[task 2022-08-04T11:53:50.829Z] 11:53:50     INFO - PID 6956 | 1659614030827	Marionette	TRACE	Requested window geometry matches
[task 2022-08-04T11:53:50.829Z] 11:53:50     INFO - PID 6956 | 1659614030828	Marionette	DEBUG	0 <- [1,42,null,{"x":100,"y":100,"width":800,"height":600}]
[task 2022-08-04T11:53:50.831Z] 11:53:50     INFO - PID 6956 | 1659614030828	webdriver::server	DEBUG	<- 200 OK {"value":{"x":100,"y":100,"width":800,"height":600}}
[task 2022-08-04T11:53:50.831Z] 11:53:50     INFO - PID 6956 | 1659614030829	webdriver::server	DEBUG	-> POST /session/28a9425e-9164-4561-bea7-231487378b51/window/rect {"x": 100, "y": 100}
[task 2022-08-04T11:53:50.831Z] 11:53:50     INFO - PID 6956 | 1659614030829	Marionette	DEBUG	0 -> [0,43,"WebDriver:SetWindowRect",{"x":100,"y":100}]
[task 2022-08-04T11:53:50.832Z] 11:53:50     INFO - PID 6956 | 1659614030830	Marionette	TRACE	Requested window geometry matches
[task 2022-08-04T11:53:50.833Z] 11:53:50     INFO - PID 6956 | 1659614030830	Marionette	DEBUG	0 <- [1,43,null,{"x":100,"y":100,"width":800,"height":600}]
[task 2022-08-04T11:53:50.834Z] 11:53:50     INFO - PID 6956 | 1659614030830	webdriver::server	DEBUG	<- 200 OK {"value":{"x":100,"y":100,"width":800,"height":600}}
[task 2022-08-04T11:53:50.835Z] 11:53:50     INFO - PID 6956 | 1659614030832	webdriver::server	DEBUG	-> POST /session/28a9425e-9164-4561-bea7-231487378b51/url {"url": "https://web-platform.test:8443/common/blank.html"}
[task 2022-08-04T11:53:50.836Z] 11:53:50     INFO - PID 6956 | 1659614030832	Marionette	DEBUG	0 -> [0,44,"WebDriver:Navigate",{"url":"https://web-platform.test:8443/common/blank.html"}]
[task 2022-08-04T11:53:50.836Z] 11:53:50     INFO - PID 6956 | 1659614030836	Marionette	TRACE	[11] Received event beforeunload for https://web-platform.test:8443/common/blank.html
[task 2022-08-04T11:53:50.856Z] 11:53:50     INFO - PID 6956 | 1659614030858	Marionette	TRACE	[11] Received event pagehide for https://web-platform.test:8443/common/blank.html
[task 2022-08-04T11:53:50.860Z] 11:53:50     INFO - PID 6956 | 1659614030874	Marionette	TRACE	[11] Received event DOMContentLoaded for https://web-platform.test:8443/common/blank.html
[task 2022-08-04T11:53:50.876Z] 11:53:50     INFO - PID 6956 | 1659614030874	Marionette	TRACE	[11] Received event pageshow for https://web-platform.test:8443/common/blank.html
[task 2022-08-04T11:53:50.877Z] 11:53:50     INFO - PID 6956 | 1659614030876	Marionette	DEBUG	0 <- [1,44,null,{"value":null}]
[task 2022-08-04T11:53:50.885Z] 11:53:50     INFO - PID 6956 | 1659614030884	webdriver::server	DEBUG	<- 200 OK {"value":null}
[task 2022-08-04T11:53:50.887Z] 11:53:50     INFO - PID 6956 | 1659614030886	webdriver::server	DEBUG	-> DELETE /session/28a9425e-9164-4561-bea7-231487378b51/cookie
[task 2022-08-04T11:53:50.887Z] 11:53:50     INFO - PID 6956 | 1659614030887	Marionette	DEBUG	0 -> [0,45,"WebDriver:DeleteAllCookies",{}]
[task 2022-08-04T11:53:50.888Z] 11:53:50     INFO - PID 6956 | 1659614030888	Marionette	DEBUG	0 <- [1,45,null,{"value":null}]
[task 2022-08-04T11:53:50.889Z] 11:53:50     INFO - PID 6956 | 1659614030888	webdriver::server	DEBUG	<- 200 OK {"value":null}
[task 2022-08-04T11:53:50.890Z] 11:53:50     INFO - PID 6956 | 1659614030889	webdriver::server	DEBUG	-> POST /session/28a9425e-9164-4561-bea7-231487378b51/execute/sync {"script": "document.cookie = 'foo=bar;Secure;SameSite=Strict'", "args": []}
[task 2022-08-04T11:53:50.891Z] 11:53:50     INFO - PID 6956 | 1659614030890	Marionette	DEBUG	0 -> [0,46,"WebDriver:ExecuteScript",{"args":[],"script":"document.cookie = 'foo=bar;Secure;SameSite=Strict'"}]
[task 2022-08-04T11:53:50.892Z] 11:53:50     INFO - PID 6956 | 1659614030891	Marionette	TRACE	[11] MarionetteCommands actor created for window id 8589934595
[task 2022-08-04T11:53:50.895Z] 11:53:50     INFO - PID 6956 | 1659614030895	Marionette	DEBUG	0 <- [1,46,null,{"value":null}]
[task 2022-08-04T11:53:50.897Z] 11:53:50     INFO - PID 6956 | 1659614030895	webdriver::server	DEBUG	<- 200 OK {"value":null}
[task 2022-08-04T11:53:50.897Z] 11:53:50     INFO - PID 6956 | 1659614030897	webdriver::server	DEBUG	-> GET /session/28a9425e-9164-4561-bea7-231487378b51/cookie/foo
[task 2022-08-04T11:53:50.898Z] 11:53:50     INFO - PID 6956 | 1659614030897	Marionette	DEBUG	0 -> [0,47,"WebDriver:GetCookies",{}]
[task 2022-08-04T11:53:50.899Z] 11:53:50     INFO - PID 6956 | 1659614030898	Marionette	DEBUG	0 <- [1,47,null,[{"name":"foo","value":"bar","path":"/common","domain":"web-platform.test","secure":true,"httpOnly":false,"sameSite":"Strict"}]]
[task 2022-08-04T11:53:50.900Z] 11:53:50     INFO - PID 6956 | 1659614030898	webdriver::server	DEBUG	<- 200 OK {"value":{"name":"foo","value":"bar","path":"/common","domain":"web-platform.test","secure":true,"httpOnly":false,"sameSite":"Strict"}}
[task 2022-08-04T11:53:50.901Z] 11:53:50     INFO - STDOUT: PASSED
[task 2022-08-04T11:53:50.902Z] 11:53:50     INFO - PID 6956 | 1659614030901	webdriver::server	DEBUG	-> POST /session/28a9425e-9164-4561-bea7-231487378b51/timeouts {"implicit": 0}
[task 2022-08-04T11:53:50.903Z] 11:53:50     INFO - PID 6956 | 1659614030902	Marionette	DEBUG	0 -> [0,48,"WebDriver:SetTimeouts",{"implicit":0}]
[task 2022-08-04T11:53:50.903Z] 11:53:50     INFO - PID 6956 | 1659614030902	Marionette	DEBUG	0 <- [1,48,null,{"value":null}]
[task 2022-08-04T11:53:50.904Z] 11:53:50     INFO - PID 6956 | 1659614030902	webdriver::server	DEBUG	<- 200 OK {"value":null}
[task 2022-08-04T11:53:50.904Z] 11:53:50     INFO - PID 6956 | 1659614030903	webdriver::server	DEBUG	-> POST /session/28a9425e-9164-4561-bea7-231487378b51/timeouts {"pageLoad": 300000}
[task 2022-08-04T11:53:50.905Z] 11:53:50     INFO - PID 6956 | 1659614030904	Marionette	DEBUG	0 -> [0,49,"WebDriver:SetTimeouts",{"pageLoad":300000}]
[task 2022-08-04T11:53:50.906Z] 11:53:50     INFO - PID 6956 | 1659614030905	Marionette	DEBUG	0 <- [1,49,null,{"value":null}]
[task 2022-08-04T11:53:50.906Z] 11:53:50     INFO - PID 6956 | 1659614030905	webdriver::server	DEBUG	<- 200 OK {"value":null}
[task 2022-08-04T11:53:50.907Z] 11:53:50     INFO - PID 6956 | 1659614030906	webdriver::server	DEBUG	-> POST /session/28a9425e-9164-4561-bea7-231487378b51/timeouts {"script": 30000}
[task 2022-08-04T11:53:50.913Z] 11:53:50     INFO - PID 6956 | 1659614030913	Marionette	DEBUG	0 -> [0,50,"WebDriver:SetTimeouts",{"script":30000}]
[task 2022-08-04T11:53:50.914Z] 11:53:50     INFO - PID 6956 | 1659614030913	Marionette	DEBUG	0 <- [1,50,null,{"value":null}]
[task 2022-08-04T11:53:50.914Z] 11:53:50     INFO - PID 6956 | 1659614030914	webdriver::server	DEBUG	<- 200 OK {"value":null}
[task 2022-08-04T11:53:50.915Z] 11:53:50     INFO - PID 6956 | 1659614030915	webdriver::server	DEBUG	-> GET /session/28a9425e-9164-4561-bea7-231487378b51/window
[task 2022-08-04T11:53:50.916Z] 11:53:50     INFO - PID 6956 | 1659614030916	Marionette	DEBUG	0 -> [0,51,"WebDriver:GetWindowHandle",{}]
[task 2022-08-04T11:53:50.917Z] 11:53:50     INFO - PID 6956 | 1659614030917	Marionette	DEBUG	0 <- [1,51,null,{"value":"77fd0ba1-feea-41b0-a19a-e751b9dc10f2"}]
[task 2022-08-04T11:53:50.918Z] 11:53:50     INFO - PID 6956 | 1659614030917	webdriver::server	DEBUG	<- 200 OK {"value":"77fd0ba1-feea-41b0-a19a-e751b9dc10f2"}
[task 2022-08-04T11:53:50.919Z] 11:53:50     INFO - PID 6956 | 1659614030918	webdriver::server	DEBUG	-> GET /session/28a9425e-9164-4561-bea7-231487378b51/window
[task 2022-08-04T11:53:50.919Z] 11:53:50     INFO - PID 6956 | 1659614030919	Marionette	DEBUG	0 -> [0,52,"WebDriver:GetWindowHandle",{}]
[task 2022-08-04T11:53:50.920Z] 11:53:50     INFO - PID 6956 | 1659614030919	Marionette	DEBUG	0 <- [1,52,null,{"value":"77fd0ba1-feea-41b0-a19a-e751b9dc10f2"}]
[task 2022-08-04T11:53:50.920Z] 11:53:50     INFO - PID 6956 | 1659614030920	webdriver::server	DEBUG	<- 200 OK {"value":"77fd0ba1-feea-41b0-a19a-e751b9dc10f2"}
[task 2022-08-04T11:53:50.921Z] 11:53:50     INFO - PID 6956 | 1659614030920	webdriver::server	DEBUG	-> GET /session/28a9425e-9164-4561-bea7-231487378b51/window/handles
[task 2022-08-04T11:53:50.922Z] 11:53:50     INFO - PID 6956 | 1659614030921	Marionette	DEBUG	0 -> [0,53,"WebDriver:GetWindowHandles",{}]
[task 2022-08-04T11:53:50.923Z] 11:53:50     INFO - PID 6956 | 1659614030921	Marionette	DEBUG	0 <- [1,53,null,["77fd0ba1-feea-41b0-a19a-e751b9dc10f2"]]
[task 2022-08-04T11:53:50.923Z] 11:53:50     INFO - PID 6956 | 1659614030922	webdriver::server	DEBUG	<- 200 OK {"value":["77fd0ba1-feea-41b0-a19a-e751b9dc10f2"]}
[task 2022-08-04T11:53:50.925Z] 11:53:50     INFO - PID 6956 | 1659614030922	webdriver::server	DEBUG	-> POST /session/28a9425e-9164-4561-bea7-231487378b51/window {"handle": "77fd0ba1-feea-41b0-a19a-e751b9dc10f2"}
[task 2022-08-04T11:53:50.925Z] 11:53:50     INFO - PID 6956 | 1659614030923	Marionette	DEBUG	0 -> [0,54,"WebDriver:SwitchToWindow",{"handle":"77fd0ba1-feea-41b0-a19a-e751b9dc10f2"}]
[task 2022-08-04T11:53:50.926Z] 11:53:50     INFO - PID 6956 | 1659614030924	Marionette	DEBUG	0 <- [1,54,null,{"value":null}]
[task 2022-08-04T11:53:50.926Z] 11:53:50     INFO - PID 6956 | 1659614030925	webdriver::server	DEBUG	<- 200 OK {"value":null}
[task 2022-08-04T11:53:50.927Z] 11:53:50     INFO - PID 6956 | 1659614030925	webdriver::server	DEBUG	-> POST /session/28a9425e-9164-4561-bea7-231487378b51/alert/dismiss {}
[task 2022-08-04T11:53:50.928Z] 11:53:50     INFO - PID 6956 | 1659614030926	Marionette	DEBUG	0 -> [0,55,"WebDriver:DismissAlert",{}]
[task 2022-08-04T11:53:50.929Z] 11:53:50     INFO - PID 6956 | 1659614030927	Marionette	DEBUG	0 <- [1,55,{"error":"no such alert","message":"","stacktrace":"RemoteError@chrome://remote/content/shared/RemoteError.jsm:12:1\nWe ... ote/content/marionette/server.js:251:9\n_onJSONObjectReady/<@chrome://remote/content/marionette/transport.js:502:20\n"},null]
[task 2022-08-04T11:53:50.930Z] 11:53:50     INFO - PID 6956 | 1659614030927	webdriver::server	DEBUG	<- 404 Not Found {"value":{"error":"no such alert","message":"","stacktrace":"RemoteError@chrome://remote/content/shared/RemoteError.jsm:12:1\nWebDriverError@chrome://remote/content/shared/webdriver/Errors.jsm:192:5\nNoSuchAlertError@chrome://remote/content/shared/webdriver/Errors.jsm:393:5\nGeckoDriver.prototype._checkIfAlertIsPresent@chrome://remote/content/marionette/driver.js:2665:11\nGeckoDriver.prototype.dismissDialog@chrome://remote/content/marionette/driver.js:2572:8\ndespatch@chrome://remote/content/marionette/server.js:304:40\nexecute@chrome://remote/content/marionette/server.js:277:16\nonPacket/<@chrome://remote/content/marionette/server.js:250:20\nonPacket@chrome://remote/content/marionette/server.js:251:9\n_onJSONObjectReady/<@chrome://remote/content/marionette/transport.js:502:20\n"}}
[task 2022-08-04T11:53:50.932Z] 11:53:50     INFO - PID 6956 | 1659614030928	webdriver::server	DEBUG	-> POST /session/28a9425e-9164-4561-bea7-231487378b51/window {"handle": "77fd0ba1-feea-41b0-a19a-e751b9dc10f2"}
[task 2022-08-04T11:53:50.933Z] 11:53:50     INFO - PID 6956 | 1659614030929	Marionette	DEBUG	0 -> [0,56,"WebDriver:SwitchToWindow",{"handle":"77fd0ba1-feea-41b0-a19a-e751b9dc10f2"}]
[task 2022-08-04T11:53:50.934Z] 11:53:50     INFO - PID 6956 | 1659614030929	Marionette	DEBUG	0 <- [1,56,null,{"value":null}]
[task 2022-08-04T11:53:50.934Z] 11:53:50     INFO - PID 6956 | 1659614030930	webdriver::server	DEBUG	<- 200 OK {"value":null}
[task 2022-08-04T11:53:50.934Z] 11:53:50     INFO - PID 6956 | 1659614030930	webdriver::server	DEBUG	-> GET /session/28a9425e-9164-4561-bea7-231487378b51/window
[task 2022-08-04T11:53:50.935Z] 11:53:50     INFO - PID 6956 | 1659614030931	Marionette	DEBUG	0 -> [0,57,"WebDriver:GetWindowHandle",{}]
[task 2022-08-04T11:53:50.936Z] 11:53:50     INFO - PID 6956 | 1659614030931	Marionette	DEBUG	0 <- [1,57,null,{"value":"77fd0ba1-feea-41b0-a19a-e751b9dc10f2"}]
[task 2022-08-04T11:53:50.937Z] 11:53:50     INFO - PID 6956 | 1659614030932	webdriver::server	DEBUG	<- 200 OK {"value":"77fd0ba1-feea-41b0-a19a-e751b9dc10f2"}
[task 2022-08-04T11:53:50.938Z] 11:53:50     INFO - PID 6956 | 1659614030932	webdriver::server	DEBUG	-> GET /session/28a9425e-9164-4561-bea7-231487378b51/window/handles
[task 2022-08-04T11:53:50.939Z] 11:53:50     INFO - PID 6956 | 1659614030933	Marionette	DEBUG	0 -> [0,58,"WebDriver:GetWindowHandles",{}]
[task 2022-08-04T11:53:50.939Z] 11:53:50     INFO - PID 6956 | 1659614030934	Marionette	DEBUG	0 <- [1,58,null,["77fd0ba1-feea-41b0-a19a-e751b9dc10f2"]]
[task 2022-08-04T11:53:50.939Z] 11:53:50     INFO - PID 6956 | 1659614030934	webdriver::server	DEBUG	<- 200 OK {"value":["77fd0ba1-feea-41b0-a19a-e751b9dc10f2"]}
[task 2022-08-04T11:53:50.939Z] 11:53:50     INFO - PID 6956 | 1659614030935	webdriver::server	DEBUG	-> POST /session/28a9425e-9164-4561-bea7-231487378b51/window {"handle": "77fd0ba1-feea-41b0-a19a-e751b9dc10f2"}
[task 2022-08-04T11:53:50.943Z] 11:53:50     INFO - PID 6956 | 1659614030935	Marionette	DEBUG	0 -> [0,59,"WebDriver:SwitchToWindow",{"handle":"77fd0ba1-feea-41b0-a19a-e751b9dc10f2"}]
[task 2022-08-04T11:53:50.943Z] 11:53:50     INFO - PID 6956 | 1659614030936	Marionette	DEBUG	0 <- [1,59,null,{"value":null}]
[task 2022-08-04T11:53:50.943Z] 11:53:50     INFO - PID 6956 | 1659614030937	webdriver::server	DEBUG	<- 200 OK {"value":null}
[task 2022-08-04T11:53:50.945Z] 11:53:50     INFO - PID 6956 | 1659614030938	webdriver::server	DEBUG	-> POST /session/28a9425e-9164-4561-bea7-231487378b51/window/rect {"width": 800, "height": 600}
[task 2022-08-04T11:53:50.946Z] 11:53:50     INFO - PID 6956 | 1659614030939	Marionette	DEBUG	0 -> [0,60,"WebDriver:SetWindowRect",{"height":600,"width":800}]
[task 2022-08-04T11:53:50.947Z] 11:53:50     INFO - PID 6956 | 1659614030939	Marionette	TRACE	Requested window geometry matches
[task 2022-08-04T11:53:50.947Z] 11:53:50     INFO - PID 6956 | 1659614030939	Marionette	DEBUG	0 <- [1,60,null,{"x":100,"y":100,"width":800,"height":600}]
[task 2022-08-04T11:53:50.948Z] 11:53:50     INFO - PID 6956 | 1659614030940	webdriver::server	DEBUG	<- 200 OK {"value":{"x":100,"y":100,"width":800,"height":600}}
[task 2022-08-04T11:53:50.949Z] 11:53:50     INFO - PID 6956 | 1659614030941	webdriver::server	DEBUG	-> POST /session/28a9425e-9164-4561-bea7-231487378b51/frame {"id": null}
[task 2022-08-04T11:53:50.950Z] 11:53:50     INFO - PID 6956 | 1659614030942	Marionette	DEBUG	0 -> [0,61,"WebDriver:SwitchToFrame",{"id":null}]
[task 2022-08-04T11:53:50.950Z] 11:53:50     INFO - PID 6956 | 1659614030945	Marionette	DEBUG	0 <- [1,61,null,{"value":null}]
[task 2022-08-04T11:53:50.952Z] 11:53:50     INFO - PID 6956 | 1659614030945	webdriver::server	DEBUG	<- 200 OK {"value":null}
[task 2022-08-04T11:53:50.952Z] 11:53:50     INFO - PID 6956 | 1659614030949	webdriver::server	DEBUG	-> DELETE /session/28a9425e-9164-4561-bea7-231487378b51
[task 2022-08-04T11:53:50.954Z] 11:53:50     INFO - PID 6956 | 1659614030950	Marionette	DEBUG	0 -> [0,62,"Marionette:Quit",{"flags":["eForceQuit"]}]
[task 2022-08-04T11:53:50.954Z] 11:53:50     INFO - PID 6956 | 1659614030951	Marionette	INFO	Stopped listening on port 58213
[task 2022-08-04T11:53:50.960Z] 11:53:50     INFO - PID 6956 | [Parent 8364, Main Thread] WARNING: ContentParent: id=214e4169200 - BlockShutdown: NotifyImpendingShutdown.: file Z:/task_165960693580546/build/src/dom/ipc/ContentParent.cpp:3612
[task 2022-08-04T11:53:50.964Z] 11:53:50     INFO - PID 6956 | [Parent 8364, Main Thread] WARNING: ContentParent: id=214e5f75600 - BlockShutdown: NotifyImpendingShutdown.: file Z:/task_165960693580546/build/src/dom/ipc/ContentParent.cpp:3612
[task 2022-08-04T11:53:50.965Z] 11:53:50     INFO - PID 6956 | [Parent 8364, Main Thread] WARNING: ContentParent: id=214e67b9f00 - BlockShutdown: NotifyImpendingShutdown.: file Z:/task_165960693580546/build/src/dom/ipc/ContentParent.cpp:3612
[task 2022-08-04T11:53:50.965Z] 11:53:50     INFO - PID 6956 | [Parent 8364, Main Thread] WARNING: ContentParent: id=214e6fa6a00 - BlockShutdown: NotifyImpendingShutdown.: file Z:/task_165960693580546/build/src/dom/ipc/ContentParent.cpp:3612
[task 2022-08-04T11:53:50.967Z] 11:53:50     INFO - PID 6956 | [Parent 8364, Main Thread] WARNING: ContentParent: id=214e6fa6f00 - BlockShutdown: NotifyImpendingShutdown.: file Z:/task_165960693580546/build/src/dom/ipc/ContentParent.cpp:3612
[task 2022-08-04T11:53:50.968Z] 11:53:50     INFO - PID 6956 | [Parent 8364, Main Thread] WARNING: ContentParent: id=214e6fa8300 - BlockShutdown: NotifyImpendingShutdown.: file Z:/task_165960693580546/build/src/dom/ipc/ContentParent.cpp:3612
[task 2022-08-04T11:53:50.969Z] 11:53:50     INFO - PID 6956 | [Parent 8364, Main Thread] WARNING: ContentParent: id=214e861f000 - BlockShutdown: NotifyImpendingShutdown.: file Z:/task_165960693580546/build/src/dom/ipc/ContentParent.cpp:3612
[task 2022-08-04T11:53:51.059Z] 11:53:51     INFO - PID 6956 | 1659614031065	Marionette	TRACE	Received observer notification quit-application
[task 2022-08-04T11:53:51.076Z] 11:53:51     INFO - PID 6956 | 1659614031074	Marionette	TRACE	Received observer notification quit-application
[task 2022-08-04T11:53:51.076Z] 11:53:51     INFO - PID 6956 | 1659614031075	Marionette	DEBUG	Marionette stopped listening
[task 2022-08-04T11:53:51.078Z] 11:53:51     INFO - PID 6956 | 1659614031077	Marionette	DEBUG	0 <- [1,62,null,{"cause":"shutdown","forced":false}]
[task 2022-08-04T11:53:51.112Z] 11:53:51     INFO - PID 6956 | 1659614031118	webdriver::server	DEBUG	Teardown session
[task 2022-08-04T11:53:51.125Z] 11:53:51     INFO - PID 6956 | 1659614031124	Marionette	DEBUG	Closed connection 0
[task 2022-08-04T11:53:51.169Z] 11:53:51     INFO - PID 6956 | [Parent 8364, Main Thread] WARNING: ContentParent: id=214e4169200 - BlockShutdown: CanSend.: file Z:/task_165960693580546/build/src/dom/ipc/ContentParent.cpp:3657
[task 2022-08-04T11:53:51.169Z] 11:53:51     INFO - PID 6956 | [Parent 8364, Main Thread] WARNING: ContentParent: id=214e4169200 - ShutDownProcess: Sent shutdown message.: file Z:/task_165960693580546/build/src/dom/ipc/ContentParent.cpp:1801
[task 2022-08-04T11:53:51.169Z] 11:53:51     INFO - PID 6956 | [Parent 8364, Main Thread] WARNING: ContentParent: id=214e6fa6f00 - BlockShutdown: CanSend.: file Z:/task_165960693580546/build/src/dom/ipc/ContentParent.cpp:3657
[task 2022-08-04T11:53:51.169Z] 11:53:51     INFO - PID 6956 | [Parent 8364, Main Thread] WARNING: ContentParent: id=214e6fa6f00 - ShutDownProcess: Sent shutdown message.: file Z:/task_165960693580546/build/src/dom/ipc/ContentParent.cpp:1801
[task 2022-08-04T11:53:51.169Z] 11:53:51     INFO - PID 6956 | [Parent 8364, Main Thread] WARNING: ContentParent: id=214e6fa8300 - BlockShutdown: CanSend.: file Z:/task_165960693580546/build/src/dom/ipc/ContentParent.cpp:3657
[task 2022-08-04T11:53:51.169Z] 11:53:51     INFO - PID 6956 | [Parent 8364, Main Thread] WARNING: ContentParent: id=214e6fa8300 - ShutDownProcess: Sent shutdown message.: file Z:/task_165960693580546/build/src/dom/ipc/ContentParent.cpp:1801
[task 2022-08-04T11:53:51.169Z] 11:53:51     INFO - PID 6956 | [Parent 8364, Main Thread] WARNING: ContentParent: id=214e861f000 - BlockShutdown: CanSend.: file Z:/task_165960693580546/build/src/dom/ipc/ContentParent.cpp:3657
[task 2022-08-04T11:53:51.169Z] 11:53:51     INFO - PID 6956 | [Parent 8364, Main Thread] WARNING: ContentParent: id=214e861f000 - ShutDownProcess: Sent shutdown message.: file Z:/task_165960693580546/build/src/dom/ipc/ContentParent.cpp:1801
[task 2022-08-04T11:53:51.196Z] 11:53:51     INFO - PID 6956 | [Parent 8364, Main Thread] WARNING: ContentParent: id=214e861f000 - ShutDownProcess: Closing channel.: file Z:/task_165960693580546/build/src/dom/ipc/ContentParent.cpp:1841
[task 2022-08-04T11:53:51.212Z] 11:53:51     INFO - PID 6956 | [Parent 8364, Main Thread] WARNING: ContentParent: id=214e861f000 - RemoveShutdownBlockers: file Z:/task_165960693580546/build/src/dom/ipc/ContentParent.cpp:3772
[task 2022-08-04T11:53:51.212Z] 11:53:51     INFO - PID 6956 | [Parent 8364, Main Thread] WARNING: ContentParent: id=214e6fa6f00 - ShutDownProcess: Closing channel.: file Z:/task_165960693580546/build/src/dom/ipc/ContentParent.cpp:1841
[task 2022-08-04T11:53:51.212Z] 11:53:51     INFO - PID 6956 | [Parent 8364, Main Thread] WARNING: ContentParent: id=214e6fa6f00 - RemoveShutdownBlockers: file Z:/task_165960693580546/build/src/dom/ipc/ContentParent.cpp:3772
[task 2022-08-04T11:53:51.212Z] 11:53:51     INFO - PID 6956 | [Parent 8364, Main Thread] WARNING: ContentParent: id=214e6fa8300 - ShutDownProcess: Closing channel.: file Z:/task_165960693580546/build/src/dom/ipc/ContentParent.cpp:1841
[task 2022-08-04T11:53:51.212Z] 11:53:51     INFO - PID 6956 | [Parent 8364, Main Thread] WARNING: ContentParent: id=214e6fa8300 - RemoveShutdownBlockers: file Z:/task_165960693580546/build/src/dom/ipc/ContentParent.cpp:3772
[task 2022-08-04T11:53:51.212Z] 11:53:51     INFO - PID 6956 | [Parent 8364, Main Thread] WARNING: ContentParent: id=214e4169200 - ShutDownProcess: Closing channel.: file Z:/task_165960693580546/build/src/dom/ipc/ContentParent.cpp:1841
[task 2022-08-04T11:53:51.212Z] 11:53:51     INFO - PID 6956 | [Parent 8364, Main Thread] WARNING: ContentParent: id=214e4169200 - RemoveShutdownBlockers: file Z:/task_165960693580546/build/src/dom/ipc/ContentParent.cpp:3772
[task 2022-08-04T11:53:51.312Z] 11:53:51     INFO - PID 6956 | JavaScript error: resource://gre/modules/TerminatorTelemetry.jsm, line 60: AbortError: IOUtils: Shutting down and refusing additional I/O tasks
[task 2022-08-04T11:53:53.661Z] 11:53:53     INFO - PID 6956 | Waiting in WillDestroyCurrentMessageLoop for pid 8
[task 2022-08-04T11:54:04.904Z] 11:54:04     INFO - TEST-UNEXPECTED-TIMEOUT | /webdriver/tests/get_named_cookie/get.py | expected OK
[task 2022-08-04T11:54:04.904Z] 11:54:04     INFO - TEST-INFO took 55012ms
[task 2022-08-04T11:54:04.974Z] 11:54:04     INFO - Closing logging queue
[task 2022-08-04T11:54:04.987Z] 11:54:04     INFO - queue closed
[task 2022-08-04T11:54:05.499Z] 11:54:05     INFO - PID 7632 | 1659614045004	geckodriver	INFO	Listening on 127.0.0.1:49920
[task 2022-08-04T11:54:05.502Z] 11:54:05     INFO - Starting runner
[task 2022-08-04T11:54:06.185Z] 11:54:06     INFO - TEST-START | /webdriver/tests/get_named_cookie/user_prompts.py
Regressed by: 1783018

Can you look into these unexpected webdriver results on Windows code coverage or should an expectation update get pushed?

This is a hang during shutdown of Firefox. The quit-application observer notification is sent out right away but we seem to hang in the following code:

https://treeherder.mozilla.org/logviewer?job_id=386385559&repo=mozilla-central&lineNumber=42247-42249

[task 2022-08-04T11:53:53.661Z] 11:53:53     INFO - PID 6956 | Waiting in WillDestroyCurrentMessageLoop for pid 8
[task 2022-08-04T11:54:04.904Z] 11:54:04     INFO - TEST-UNEXPECTED-TIMEOUT | /webdriver/tests/get_named_cookie/get.py | expected OK
[task 2022-08-04T11:54:04.904Z] 11:54:04     INFO - TEST-INFO took 55012ms

As such I assume some other patch in the set of landed commits for the following merge might have caused this issue for CCOV Firefox builds:
https://hg.mozilla.org/mozilla-central/pushloghtml?changeset=8f150040b037fc9f7b6039b84a1b7806e8f6a77d

Marco, do you have an idea what could have been caused this regression?

Component: geckodriver → Code Coverage
Flags: needinfo?(hskupin) → needinfo?(mcastelluccio)

Looks like that Marco is out at the moment. Sebastian, could sheriffs maybe do a backfill on autoland or is CCOV not available there?

Flags: needinfo?(aryx.bugmail)

I think that this is actually a dupe given that this failure is present for quite some time as the pushlog shows:

https://treeherder.mozilla.org/jobs?repo=mozilla-central&searchStr=wd%2Cccov%2Cwin%2Cheadless&tochange=8f150040b037fc9f7b6039b84a1b7806e8f6a77d&fromchange=335652eb938ddb0101b016b8e29b60feccdd24eb

Failures have been classified to quite a lot of different bugs and given that I just came back from PTO I don't have the overview yet.

Most of these failures have actually been classified against bug 1630162. But I think that we should have a dedicated bug for just the CCOV case and I think that we should pick this one now. As it looks like it regressed within this range on August 1st:

https://treeherder.mozilla.org/jobs?repo=mozilla-central&searchStr=wd%2Cccov%2Cwin%2Cheadless&tochange=4cf66fe9deb65f1add1e82fdf518a5a1f7729984&fromchange=ff7fc1a977e257d19b33e9c795b780f2bc1ba875

No longer regressed by: 1783018
Summary: Intermittent [tier 2] TEST-UNEXPECTED-TIMEOUT | /webdriver/tests/get_named_cookie/get.py | expected OK → Intermittent [tier 2] CCOV Wd TEST-UNEXPECTED-TIMEOUT | expected OK (shutdown hang of Firefox)

Only fails for the backfills - maybe the list of test manifests in the chunk changed.

Flags: needinfo?(aryx.bugmail)

(In reply to Sebastian Hengst [:aryx] (needinfo me if it's about an intermittent or backout) from comment #7)

Only fails for the backfills - maybe the list of test manifests in the chunk changed.

Yes, on autoland not all the tests get run but only a random set. As such we would have to use backfills to find the regression range.

Maybe this specific order of tests is triggering a situation under which a CCOV build of Firefox takes way longer to shutdown. For Marionette we take the value from the toolkit.asyncshutdown.crash_timeout preference but as I just remembered (and got noticed by Julian) this value isn't used because CCOV builds don't set a specific value - see bug 1697047.

Depends on: 1697047

The problem is actually not related to the shutdown crash timeout. I extended it but tests are still failing.

As such I had a look at the timeout multipler that wptrunner is using for ccov builds and found the following:
https://searchfox.org/mozilla-central/rev/43ba67391e71c57a14420e554e9d381543292611/testing/web-platform/tests/tools/wptrunner/wptrunner/browsers/firefox.py#58-78

Close to the end we define a multiplier of 2 only for CCOV opt builds. This is most likely too short especially for tests that include navigations and as such trigger new content processes. Lets try with a multiplier of 4 similar to debug builds if that helps. If yes we can fix it in wptrunner.

https://treeherder.mozilla.org/jobs?repo=try&revision=9a18b970a7f74b27d2a9ac78e5e6c57c300ea104

It would still be good to know the timing difference between CCOV debug and opt builds but I'm not able to find any debug builds in CI.

Points: --- → 1
Priority: P5 → P1
Whiteboard: [webdriver:m4] [webdriver-external]

Increasing the multiplier also didn't work or maybe it hasn't been correctly selected. There is nothing in the log which actually prints out the final selected one. Might be a good idea to get this added as well. Looks like it might be better when I download such a ccov build and test locally.

As it can be seen for various tests that fail at least one shutdown of Firefox at some time during a test within a single test file sees a hang. And in all the cases that I can see it's related to bug 1779759. I triggered the assignee over there so that we hopefully have this fixed soon.

Flags: needinfo?(mcastelluccio)

Update:

There have been 31 failures in the last 7 days:

  • 1 failure on Linux 18.04 x64 CCov WebRender opt
  • 30 failures on Windows 10 x64 2004 CCov WebRender opt

Recent failure log: https://treeherder.mozilla.org/logviewer?job_id=387987911&repo=mozilla-central&lineNumber=42425

Whiteboard: [webdriver:m4] [webdriver-external] → [webdriver:m4] [webdriver-external][stockwell needswork:owner]

Since the fix from bug 1779759 has been merged to mozilla-central no more failures are visible yet. I'll keep an eye and check again tomorrow if new failures might have been classified.

Failures are still present and are related to a not closing content process with the log entry Waiting in WillDestroyCurrentMessageLoop. Investigation for that is currently happening on bug 1769821.

Depends on: 1769821
Summary: Intermittent [tier 2] CCOV Wd TEST-UNEXPECTED-TIMEOUT | expected OK (shutdown hang of Firefox) → Intermittent [tier 2] CCOV Wd TEST-UNEXPECTED-TIMEOUT | expected OK (shutdown hang - Waiting in WillDestroyCurrentMessageLoop)
Points: 1 → ---
Priority: P1 → P5
Whiteboard: [webdriver:m4] [webdriver-external][stockwell needswork:owner] → [stockwell needswork:owner]

So there is an interesting observation that I've made. Not sure where exactly these different shutdown times for CCOV builds are originating from but why is it that different?

Like it can be seen in this CI test job it's just a couple of seconds until the browser exists with 0. In other cases like when running a CCOV build locally it takes up to 75s. An example of that can be seen in this profile: https://share.firefox.dev/3KReWMP

Marco, I assume that it might be related to the performance of the harddisk vs ssd in how fast these gcda files can actually be stored to disk? Or could there be some other factors that could cause such a variation?

Flags: needinfo?(mcastelluccio)

Actually the 75s are a result from not using the GCOV_PREFIX(_STRIP) environment variables. Firefox tries to reach a non-existent drive and as such each attempt to write the data takes some extra time. As I can see in CI we make use of GCOV_PREFIX_STRIP=4 so it should be fine.

Flags: needinfo?(mcastelluccio)

There have been 48 total failures in the last 7 days, recent failure log.
Affected platforms are:

  • windows10-64-2004-ccov-qr

There have been 32 total failures in the last 7 days, recent failure log.
Affected platforms are:

  • windows10-64-2004-ccov-qr

There have been 39 total failures in the last 7 days, recent failure log.
Affected platforms are:

  • windows10-64-2004-ccov-qr
Flags: needinfo?(mcastelluccio)

Bug 1792021 sounds like the best way forward.

Flags: needinfo?(mcastelluccio)
See Also: → 1801671
No longer duplicate of this bug: 1769430

There have been 43 total failures in the last 7 days, recent failure log.
Affected platforms are:

  • windows10-64-2004-ccov-qr
[task 2022-12-04T12:40:20.298Z] 12:40:20     INFO - PID 7696 | @resource://gre/modules/osfile.jsm:12:30
[task 2022-12-04T12:40:20.298Z] 12:40:20     INFO - PID 7696 | @resource://gre/modules/TelemetryStorage.jsm:19:28
[task 2022-12-04T12:40:20.299Z] 12:40:20     INFO - PID 7696 | removeAbortedSessionPing@resource://gre/modules/TelemetryControllerParent.jsm:721:5
[task 2022-12-04T12:40:20.300Z] 12:40:20     INFO - PID 7696 | removeAbortedSessionPing@resource://gre/modules/TelemetryControllerParent.jsm:249:17
[task 2022-12-04T12:40:20.301Z] 12:40:20     INFO - PID 7696 | shutdownChromeProcess/cleanup/<@resource://gre/modules/TelemetrySession.jsm:1235:42
[task 2022-12-04T12:40:20.302Z] 12:40:20     INFO - PID 7696 | observe@resource://gre/modules/AsyncShutdown.jsm:572:16
[task 2022-12-04T12:40:20.303Z] 12:40:20     INFO - PID 7696 | JavaScript error: resource://gre/modules/AsyncShutdown.jsm, line 718: Error: Phase "profile-before-change" is finished, it is too late to register completion condition "OS.File: flush I/O queued before profileBeforeChange"
[task 2022-12-04T12:40:20.303Z] 12:40:20     INFO - PID 7696 | JavaScript error: resource://gre/modules/AsyncShutdown.jsm, line 718: Error: Phase "profile-before-change" is finished, it is too late to register completion condition "OS.File: flush I/O queued before profileBeforeChange"
[task 2022-12-04T12:40:20.304Z] 12:40:20     INFO - PID 7696 | JavaScript error: resource://gre/modules/AsyncShutdown.jsm, line 718: Error: Phase "profile-before-change" is finished, it is too late to register completion condition "OS.File: flush I/O queued before profileBeforeChange"
[task 2022-12-04T12:40:20.305Z] 12:40:20     INFO - PID 7696 | JavaScript error: resource://gre/modules/AsyncShutdown.jsm, line 718: Error: Phase "profile-before-change" is finished, it is too late to register completion condition "OS.File: flush I/O queued before profileBeforeChange"
[task 2022-12-04T12:40:20.306Z] 12:40:20     INFO - PID 7696 | JavaScript error: resource://gre/modules/AsyncShutdown.jsm, line 718: Error: Phase "profile-before-change" is finished, it is too late to register completion condition "OS.File: flush I/O queued before profileBeforeChange"
[task 2022-12-04T12:40:22.271Z] 12:40:22     INFO - PID 7696 | Waiting in WillDestroyCurrentMessageLoop for pid 6108
[task 2022-12-04T12:40:30.418Z] 12:40:30     INFO - TEST-UNEXPECTED-TIMEOUT | /webdriver/tests/get_named_cookie/get.py | expected OK
[task 2022-12-04T12:40:30.418Z] 12:40:30     INFO - TEST-INFO took 55013ms
Duplicate of this bug: 1802617
Duplicate of this bug: 1805768
No longer duplicate of this bug: 1805768

With the latest changes from Jed on bug 1805761 the signature for CCOV builds has been changed. It now looks like the following:
https://treeherder.mozilla.org/logviewer?job_id=400031779&repo=mozilla-central&lineNumber=95704-95707

[task 2022-12-18T21:58:24.450Z] 21:58:24     INFO - PID 3828 | JavaScript error: resource://gre/modules/AsyncShutdown.jsm, line 718: Error: Phase "profile-before-change" is finished, it is too late to register completion condition "OS.File: flush I/O queued before profileBeforeChange"
[task 2022-12-18T21:58:26.417Z] 21:58:26     INFO - PID 3828 | [Parent 6960, IPC I/O Parent] WARNING: Process 4224 may be hanging at shutdown; will wait for up to 80000ms: file Z:/task_167139274820329/build/src/ipc/chromium/src/chrome/common/process_watcher_win.cc:126
[task 2022-12-18T21:58:39.947Z] 21:58:39     INFO - PID 3828 | [Parent 6960, IPC I/O Parent] WARNING: Process 1676 may be hanging at shutdown; will wait for up to 66484ms: file Z:/task_167139274820329/build/src/ipc/chromium/src/chrome/common/process_watcher_win.cc:126
[task 2022-12-18T21:58:44.476Z] 21:58:44     INFO - TEST-UNEXPECTED-TIMEOUT | /webdriver/tests/status/status.py | expected OK

Given that web-platform tests don't wait that long this is running into a timeout. Not sure how the duration to wait is calculated but maybe it should be synced up with toolkit.asyncshutdown.crash_timeout? If anything has to change on the platform side we should file a new bug.

Flags: needinfo?(jld)
Summary: Intermittent [tier 2] CCOV Wd TEST-UNEXPECTED-TIMEOUT | expected OK (shutdown hang - Waiting in WillDestroyCurrentMessageLoop) → Intermittent [tier 2] CCOV Wd TEST-UNEXPECTED-TIMEOUT | expected OK (shutdown hang - WARNING: Process <PID> may be hanging at shutdown; will wait for up to <TIME>ms)
Duplicate of this bug: 1805768

(In reply to Henrik Skupin (:whimboo) [⌚️UTC+1] from comment #52)

With the latest changes from Jed on bug 1805761 the signature for CCOV builds has been changed. It now looks like the following:
https://treeherder.mozilla.org/logviewer?job_id=400031779&repo=mozilla-central&lineNumber=95704-95707

[task 2022-12-18T21:58:24.450Z] 21:58:24     INFO - PID 3828 | JavaScript error: resource://gre/modules/AsyncShutdown.jsm, line 718: Error: Phase "profile-before-change" is finished, it is too late to register completion condition "OS.File: flush I/O queued before profileBeforeChange"
[task 2022-12-18T21:58:26.417Z] 21:58:26     INFO - PID 3828 | [Parent 6960, IPC I/O Parent] WARNING: Process 4224 may be hanging at shutdown; will wait for up to 80000ms: file Z:/task_167139274820329/build/src/ipc/chromium/src/chrome/common/process_watcher_win.cc:126
[task 2022-12-18T21:58:39.947Z] 21:58:39     INFO - PID 3828 | [Parent 6960, IPC I/O Parent] WARNING: Process 1676 may be hanging at shutdown; will wait for up to 66484ms: file Z:/task_167139274820329/build/src/ipc/chromium/src/chrome/common/process_watcher_win.cc:126
[task 2022-12-18T21:58:44.476Z] 21:58:44     INFO - TEST-UNEXPECTED-TIMEOUT | /webdriver/tests/status/status.py | expected OK

Given that web-platform tests don't wait that long this is running into a timeout. Not sure how the duration to wait is calculated but maybe it should be synced up with toolkit.asyncshutdown.crash_timeout? If anything has to change on the platform side we should file a new bug.

toolkit.asyncshutdown.crash_timeout is 180s on ccov builds, from your patch in bug 1697047, so I thought it was safe to wait 80s in IPC. It looks like I'm also going to need to use a value larger than the current 8s for ASan (bug 1806224) and maybe even regular builds, at least on Mac (bug 1805696). Maybe this test harness needs a larger timeout to deal with ccov instrumentation? Of course, I can see the earlier comments here about increasing the test harness timeout, and if I correctly understand the code that's linked in comment #12 then we should already be waiting 140s (70s * 2) but maybe I'm missing something — the log in comment #52 seems to show the test run being timed out after about 55s total and 20s in shutdown.

Flags: needinfo?(jld)

Actually we have a 2x multiplier for CCOV builds which means a test module times out after 50s if no extended timeout (META: timeout=long) has been requested. The problem with especially wdspec jobs is that it's using pytest and as result the timeout is applied to the whole test file. And this is problematic if multiple tests in such a module restart Firefox a lot because this isn't controllable right now.

Jed, can I assume that there will be more fixes to improve the shutdown time of CCOV builds especially around the content process shutdown?

Whiteboard: [stockwell disable-recommended]
Flags: needinfo?(jld)

Update:

There have been 40 failures within the last 7 days, all of them on Windows 10 x64 2004 CCov WebRender opt.

Recent failure log: https://treeherder.mozilla.org/logviewer?job_id=401707309&repo=mozilla-central&lineNumber=75843

No longer depends on: 1792021

Hi Jed, your feedback regarding my questions from comment 55 would be kinda appreciated. Thanks!

(In reply to Henrik Skupin (:whimboo) [⌚️UTC+1] from comment #55)

Jed, can I assume that there will be more fixes to improve the shutdown time of CCOV builds especially around the content process shutdown?

Unfortunately, this isn't something I'm working on (nor do I know much about it in general). Hopefully it will be possible to increase CI timeouts enough to deal with the slow shutdown.

Flags: needinfo?(jld)

(In reply to Jed Davis [:jld] ⟨⏰|UTC-8⟩ ⟦he/him⟧ from comment #63)

(In reply to Henrik Skupin (:whimboo) [⌚️UTC+1] from comment #55)

Jed, can I assume that there will be more fixes to improve the shutdown time of CCOV builds especially around the content process shutdown?

Unfortunately, this isn't something I'm working on (nor do I know much about it in general). Hopefully it will be possible to increase CI timeouts enough to deal with the slow shutdown.

So let me ask one other question regarding the currently allowed shutdown time for content processes. The toolkit.asyncshutdown.crash_timeout preference is actually used for shutdown hang scenarios and has been increased for CCOV to cover slow shutdown. But the problem is that whenever under such a condition a CCOV build has more than 2 content processes that are slowly shutting down we easily reach the 180s shutdown timeout. Such a scenario is not uncommon. As such should we really wait 80s for each and every content process, or can this time be reduced? Do we know how long content processes usually take to shutdown maybe via telemetry settings?

Flags: needinfo?(mcastelluccio)
Flags: needinfo?(jld)

I don't think we currently have this data. Do we wait 180s for each content process and we fail only after we wait for the last one? Or do we wait 180s and fail as soon as we have the first one that exceeds the timeout?

Flags: needinfo?(mcastelluccio)

No, it's 80s per process as I've mentioned above. Also see the related landed patch from Jed:
https://hg.mozilla.org/mozilla-central/rev/d6814aa8f90e

(In reply to Henrik Skupin (:whimboo) [⌚️UTC+1] from comment #64)

As such should we really wait 80s for each and every content process, or can this time be reduced? Do we know how long content processes usually take to shutdown maybe via telemetry settings?

(In reply to Marco Castelluccio [:marco] from comment #65)

I don't think we currently have this data. Do we wait 180s for each content process and we fail only after we wait for the last one? Or do we wait 180s and fail as soon as we have the first one that exceeds the timeout?

The timeout I implemented in ProcessWatcher is global, not per-process; I was assuming that child processes' late shutdown would proceed in parallel (and I guess I'm also implicitly assuming that there's “enough” hardware parallelism). Also, there's only so much I can do when the parent process will be killed after a fixed timeout (by nsTerminator and/or a test harness).

Is there evidence that Windows ccov shutdown is serialized between multiple child processes?

Flags: needinfo?(jld)

I think that this is an interesting example:
https://treeherder.mozilla.org/logviewer?job_id=404028052&repo=mozilla-central&lineNumber=7191

[task 2023-01-31T05:50:48.499Z] 05:50:48     INFO - PID 1456 | 1675144248498	Marionette	DEBUG	0 -> [0,34,"Marionette:Quit",{"flags":["eForceQuit"]}]
[..]
[task 2023-01-31T05:50:50.854Z] 05:50:50     INFO - PID 1456 | [Parent 1916, IPC I/O Parent] WARNING: Process 8864 may be hanging at shutdown; will wait for up to 80000ms: file Z:/task_167513988063129/build/src/ipc/chromium/src/chrome/common/process_watcher_win.cc:129
[task 2023-01-31T05:50:54.408Z] 05:50:54     INFO - PID 1456 | [Parent 1916, IPC I/O Parent] WARNING: Process 4116 may be hanging at shutdown; will wait for up to 76453ms: file Z:/task_167513988063129/build/src/ipc/chromium/src/chrome/common/process_watcher_win.cc:129
[task 2023-01-31T05:51:00.319Z] 05:51:00     INFO - PID 1456 | 1675144260319	geckodriver::browser	DEBUG	Browser process stopped: exit code: 0
[task 2023-01-31T05:51:00.321Z] 05:51:00     INFO - PID 1456 | 1675144260321	webdriver::server	DEBUG	<- 200 OK {"value":null}

The shutdown of Firefox gets triggered by Marionette at 05:50:48. The first slow content process shutdown log entry can be seen at 05:50:50 which means 2s after. And here it is using a fresh 80s maximum timeout without taking care of the already passed seconds since the shutdown has been triggered. I assume if the first slow content process shutdown is detected later the drift can even be larger? That means it would definitely exceed the allowed async timeout setting of 80s, and we would finally crash anyway.

Is there any possibility to see get more details out of the process in why it takes that long or is it all because just CCOV?

Flags: needinfo?(jld)

(In reply to Henrik Skupin (:whimboo) [⌚️UTC+1] from comment #69)

I think that this is an interesting example:
https://treeherder.mozilla.org/logviewer?job_id=404028052&repo=mozilla-central&lineNumber=7191

[task 2023-01-31T05:50:48.499Z] 05:50:48     INFO - PID 1456 | 1675144248498	Marionette	DEBUG	0 -> [0,34,"Marionette:Quit",{"flags":["eForceQuit"]}]
[..]
[task 2023-01-31T05:50:50.854Z] 05:50:50     INFO - PID 1456 | [Parent 1916, IPC I/O Parent] WARNING: Process 8864 may be hanging at shutdown; will wait for up to 80000ms: file Z:/task_167513988063129/build/src/ipc/chromium/src/chrome/common/process_watcher_win.cc:129
[task 2023-01-31T05:50:54.408Z] 05:50:54     INFO - PID 1456 | [Parent 1916, IPC I/O Parent] WARNING: Process 4116 may be hanging at shutdown; will wait for up to 76453ms: file Z:/task_167513988063129/build/src/ipc/chromium/src/chrome/common/process_watcher_win.cc:129
[task 2023-01-31T05:51:00.319Z] 05:51:00     INFO - PID 1456 | 1675144260319	geckodriver::browser	DEBUG	Browser process stopped: exit code: 0
[task 2023-01-31T05:51:00.321Z] 05:51:00     INFO - PID 1456 | 1675144260321	webdriver::server	DEBUG	<- 200 OK {"value":null}

The shutdown of Firefox gets triggered by Marionette at 05:50:48. The first slow content process shutdown log entry can be seen at 05:50:50 which means 2s after. And here it is using a fresh 80s maximum timeout without taking care of the already passed seconds since the shutdown has been triggered. I assume if the first slow content process shutdown is detected later the drift can even be larger? That means it would definitely exceed the allowed async timeout setting of 80s, and we would finally crash anyway.

ProcessWatcher's timeout starts when the parent process IPC I/O thread is shut down (if there's still a process that EnsureTerminated was called on that's still running), so it makes sense that that would be a little later than the Marionette quit command. Currently, ProcessWatcher's timeout is 80s and nsTerminator's timeout (toolkit.asyncshutdown.crash_timeout) is 180s, so there's 100s left for other parts of shutdown.

But, the thing about that log that's interesting to me is [taskcluster:error] Task aborted - max run time exceeded at the end — if I understand correctly that's not any of the timeouts in Gecko, or in a test harness script, it's Taskcluster imposing a limit on how long one task can use the machine.

Is there any possibility to see get more details out of the process in why it takes that long or is it all because just CCOV?

Unfortunately, no, not that I'm aware of.

In theory, if the Windows ProcessWatcher were changed to invoke the crash reporter directly instead of trying to inject a thread that crashes, it could get stacks from a process during ccov shutdown and we could see if it's spending a lot of time computing or if it's blocked on I/O, and maybe there might be something actionable in there. But that's a nontrivial amount of work, and this bug is marked S4 so it's hard to justify; if we can just increase some timeouts or split the tests into more jobs instead, that might be the easiest approach.

Flags: needinfo?(jld)

(In reply to Jed Davis [:jld] ⟨⏰|UTC-8⟩ ⟦he/him⟧ from comment #70)

ProcessWatcher's timeout starts when the parent process IPC I/O thread is shut down (if there's still a process that EnsureTerminated was called on that's still running), so it makes sense that that would be a little later than the Marionette quit command. Currently, ProcessWatcher's timeout is 80s and nsTerminator's timeout (toolkit.asyncshutdown.crash_timeout) is 180s, so there's 100s left for other parts of shutdown.

When you are saying there are 100s left for other parts, does it mean that the ProcessWatcher blocks any other component from shutdown or do those still run in parallel? I'm fairly sure it's not blocking but asking to get the confirmation.

But, the thing about that log that's interesting to me is [taskcluster:error] Task aborted - max run time exceeded at the end — if I understand correctly that's not any of the timeouts in Gecko, or in a test harness script, it's Taskcluster imposing a limit on how long one task can use the machine.

Yes that's something different and doesn't happen that often. We still have quite a lot of [CCOV jobs on Windows which pass, always show fast shutdowns of Firefox](https://treeherder.mozilla.org/logviewer?job_id=404278480&repo=mozilla-central and are done within 30 minutes. As such we didn't add a new chunk to CCOV jobs yet, but would really like to know why in some cases the shutdown is that massively slow.

Here some more examples:
https://treeherder.mozilla.org/jobs?repo=mozilla-central&searchStr=wd%2Cccov%2Cwindows&revision=b7f07512450399f35fc38a7e94241b19a4c2693c

Is there any possibility to see get more details out of the process in why it takes that long or is it all because just CCOV?

Unfortunately, no, not that I'm aware of.

In theory, if the Windows ProcessWatcher were changed to invoke the crash reporter directly instead of trying to inject a thread that crashes, it could get stacks from a process during ccov shutdown and we could see if it's spending a lot of time computing or if it's blocked on I/O, and maybe there might be something actionable in there. But that's a nontrivial amount of work, and this bug is marked S4 so it's hard to justify; if we can just increase some timeouts or split the tests into more jobs instead, that might be the easiest approach.

This bug is marked as S4 because it's an intermittent failure bug and that is the default. It's used to identify and investigate the current issues but I'm happy to file a new bug to maybe get further work to be done after a proper triage. In which component should it be filed?

I had a quick look at some of the job logs and it's interesting that the shutdown duration steadily increases as more tests were running. While at the beginning each shutdown takes around 5s even for CCOV builds (!!!) this duration increases over time up to 20s, 30s, and more. Could this maybe a result of some leaked processes or memory? This behavior is really strange.

Flags: needinfo?(jld)

The slowdown at the end of the execution for ccov is most likely related to I/O.
A few facts about ccov:

  • at the end of the execution, the binary dumps a lot of files to the disk;
  • each process will dump the files, and there are locks so they don't write to the same files at the same time;
  • the more source files are covered by tests, the more files will be dumped at the end.

Yes, that's understandable but doesn't explain the trend that as longer a job is running shutdown times also take longer. And we do not have long / a lot of tests only at the end of the job.

Marco, are there any CCOV specific log or environment setttings that we could use to get more log output for CCOV builds?

Flags: needinfo?(mcastelluccio)

Marco is probably a better source of information on this issue.

Flags: needinfo?(jld)

(In reply to Henrik Skupin [:whimboo][⌚️UTC+1] (away 02/13 - 02/19) from comment #73)

Yes, that's understandable but doesn't explain the trend that as longer a job is running shutdown times also take longer. And we do not have long / a lot of tests only at the end of the job.

No, indeed. I wonder if we are keeping some processes alive for longer and only end them at the end of a job. Perhaps we can add some logging to check when each process is started and when each process is finished.

Marco, are there any CCOV specific log or environment setttings that we could use to get more log output for CCOV builds?

Unfortunately there isn't, and we'd need to modify LLVM to add some (https://github.com/llvm/llvm-project/blob/main/compiler-rt/lib/profile/GCDAProfiling.c), so it is a bit cumbersome.

Flags: needinfo?(mcastelluccio)

(In reply to Marco Castelluccio [:marco] from comment #76)

Unfortunately there isn't, and we'd need to modify LLVM to add some (https://github.com/llvm/llvm-project/blob/main/compiler-rt/lib/profile/GCDAProfiling.c), so it is a bit cumbersome.

IIRC we've set things up so that a commit can add patches to build/build-clang and list them in the appropriate clang-*.json file, and then Taskcluster will automatically build a new toolchain as needed, including on Try. So, it might not be too difficult to make changes to the coverage runtime, at least for experimenting with.

(In reply to Jed Davis [:jld] ⟨⏰|UTC-8⟩ ⟦he/him⟧ from comment #77)

(In reply to Marco Castelluccio [:marco] from comment #76)

Unfortunately there isn't, and we'd need to modify LLVM to add some (https://github.com/llvm/llvm-project/blob/main/compiler-rt/lib/profile/GCDAProfiling.c), so it is a bit cumbersome.

IIRC we've set things up so that a commit can add patches to build/build-clang and list them in the appropriate clang-*.json file, and then Taskcluster will automatically build a new toolchain as needed, including on Try. So, it might not be too difficult to make changes to the coverage runtime, at least for experimenting with.

Marco, is there anything that we could do here, even only for some try builds?

Flags: needinfo?(mcastelluccio)

Yes we can add patches to Clang, but it is a bit cumbersome (I mean, it's not just a matter of modifying some files in our repo. You need to clone llvm, make sure you are at the same revision we are using in m-c, you need to make sure you apply the same patches we are applying to avoid conflicts, and then you can generate your patch and add your patch to your commit to mozilla-central).

A simple logging we could add is something at the start and at the end of a process (in that file, IIRC we'd need to touch __gcov_fork, llvm_gcov_init and llvm_writeout_and_clear, logging the PID of the current process), so we can at least rule out the possibility that there are too many processes ending at the same time towards the end of the tests, which is when we are running out of time.

Flags: needinfo?(mcastelluccio)
Whiteboard: [stockwell disable-recommended]

I think that as next step I would like to modify how we run the wdspec jobs with ccov builds. Lets trigger some try builds with the MOZ_LOG=process:5 environment variable set once bug 1787718 is fixed. That might maybe give some more details that I could use for further investigations.

Duplicate of this bug: 1820428

I pushed a try build with MOZ_LOG="Process:5":
https://treeherder.mozilla.org/jobs?repo=try&revision=633eeccda5cbe4e03d46953e81438ca4eddf8e68

Lets see if that gives us some insight.

There is actually no additional information when using Process:5 logging which could be used to determine the underlying reason.

But there is an interesting fact. Since we moved the CCOV tests from Windows 10 to Windows 11 the number of failures has been drastically reduced to roughly 1 or 2 per day. That's great to see and I don't think that I will spend more time on investigations.

(In reply to Henrik Skupin [:whimboo][⌚️UTC+1] from comment #88)

But there is an interesting fact. Since we moved the CCOV tests from Windows 10 to Windows 11 the number of failures has been drastically reduced to roughly 1 or 2 per day. That's great to see and I don't think that I will spend more time on investigations.

Joel, would you mind giving us a quick overview which kind of resources have been changed for the Windows 11 workers? It would be good to know why those failures dropped a lot since we moved away from Windows 10. Thanks.

Flags: needinfo?(jmaher)
Duplicate of this bug: 1822974

the hardware/cloud is the same (as far as I know)- the builds are the same, this is just a new OS- within windows the kernel changed significantly in later win10 and all win11 builds, so most likely that is the issue.

Flags: needinfo?(jmaher)
Duplicate of this bug: 1823490

(In reply to Joel Maher ( :jmaher ) (UTC -8) from comment #92)

the hardware/cloud is the same (as far as I know)- the builds are the same, this is just a new OS- within windows the kernel changed significantly in later win10 and all win11 builds, so most likely that is the issue.

Thanks! That's good to see. I'll still observe the failure over the next weeks.

Duplicate of this bug: 1825690
Duplicate of this bug: 1825691

(In reply to Intermittent Failures Robot from comment #99)

Platform and build breakdown:

  • linux1804-64-ccov-qr: 2
    • opt: 2

These are wrongly classified. I went ahead, created bug 1827354 and re-classified those tests.

OS: Unspecified → Windows
Hardware: Unspecified → Desktop
Summary: Intermittent [tier 2] CCOV Wd TEST-UNEXPECTED-TIMEOUT | expected OK (shutdown hang - WARNING: Process <PID> may be hanging at shutdown; will wait for up to <TIME>ms) → Intermittent [tier 2] CCOV Windows Wd TEST-UNEXPECTED-TIMEOUT | expected OK (shutdown hang - WARNING: Process <PID> may be hanging at shutdown; will wait for up to <TIME>ms)
Duplicate of this bug: 1834310
See Also: → 1845424

On bug 1848121 I've increased the timeout multiplier for CCOV builds from 3 to 4, which might help here by the extra 25s. Lets observe.

Depends on: 1848121
No longer duplicate of this bug: 1785648
Status: NEW → RESOLVED
Closed: 3 months ago
Resolution: --- → INCOMPLETE
Status: RESOLVED → REOPENED
Resolution: INCOMPLETE → ---

(In reply to Treeherder Bug Filer from comment #161)

New failure instance: https://treeherder.mozilla.org/logviewer?job_id=470150630&repo=mozilla-central

That is just bug 1912631.

Status: REOPENED → RESOLVED
Closed: 3 months ago3 months ago
Resolution: --- → INCOMPLETE
Status: RESOLVED → REOPENED
Resolution: INCOMPLETE → ---

(In reply to Treeherder Bug Filer from comment #163)

New failure instance: https://treeherder.mozilla.org/logviewer?job_id=472564836&repo=try

That is bug 1791216.

Status: REOPENED → RESOLVED
Closed: 3 months ago2 months ago
Resolution: --- → INCOMPLETE
Status: RESOLVED → REOPENED
Resolution: INCOMPLETE → ---

(In reply to Treeherder Bug Filer from comment #165)

New failure instance: https://treeherder.mozilla.org/logviewer?job_id=472642241&repo=mozilla-central

This is bug 1832294.

Status: REOPENED → RESOLVED
Closed: 2 months ago2 months ago
Resolution: --- → INCOMPLETE
Status: RESOLVED → REOPENED
Resolution: INCOMPLETE → ---

(In reply to Treeherder Bug Filer from comment #167)

New failure instance: https://treeherder.mozilla.org/logviewer?job_id=474881451&repo=mozilla-beta

This is bug 1776189.

Status: REOPENED → RESOLVED
Closed: 2 months ago2 months ago
Resolution: --- → INCOMPLETE
You need to log in before you can comment on or make changes to this bug.