Intermittent [tier 2] CCOV Windows Wd TEST-UNEXPECTED-TIMEOUT | expected OK (shutdown hang - WARNING: Process <PID> may be hanging at shutdown; will wait for up to <TIME>ms)
Categories
(Testing :: Code Coverage, defect, P5)
Tracking
(firefox-esr91 unaffected, firefox-esr102 unaffected, firefox103 unaffected, firefox104 unaffected, firefox105 disabled)
Tracking | Status | |
---|---|---|
firefox-esr91 | --- | unaffected |
firefox-esr102 | --- | unaffected |
firefox103 | --- | unaffected |
firefox104 | --- | unaffected |
firefox105 | --- | disabled |
People
(Reporter: intermittent-bug-filer, Unassigned)
References
Details
(Keywords: intermittent-failure, regression, regressionwindow-wanted)
Filed by: imoraru [at] mozilla.com
Parsed log: https://treeherder.mozilla.org/logviewer?job_id=386385559&repo=mozilla-central
Full log: https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/d6BAuziSSrS-ZQkQ0R69xg/runs/0/artifacts/public/logs/live_backing.log
[task 2022-08-04T11:53:09.892Z] 11:53:09 INFO - TEST-START | /webdriver/tests/get_named_cookie/get.py
[task 2022-08-04T11:53:10.270Z] 11:53:10 INFO - STDOUT: ============================= test session starts =============================
[task 2022-08-04T11:53:10.277Z] 11:53:10 INFO - STDOUT: platform win32 -- Python 3.9.5, pytest-7.0.1, pluggy-unknown -- Z:\task_165961199010716\build\venv\Scripts\python.exe
[task 2022-08-04T11:53:10.279Z] 11:53:10 INFO - STDOUT: rootdir: Z:\task_165961199010716\build
[task 2022-08-04T11:53:10.279Z] 11:53:10 INFO - STDOUT: plugins: asyncio-0.14.0
[task 2022-08-04T11:53:10.280Z] 11:53:10 INFO - STDOUT: collecting ...
[task 2022-08-04T11:53:10.323Z] 11:53:10 INFO - STDOUT: collected 8 items
[task 2022-08-04T11:53:10.333Z] 11:53:10 INFO - STDOUT: tests/web-platform/tests/webdriver/tests/get_named_cookie/get.py::test_no_top_browsing_context
[task 2022-08-04T11:53:10.342Z] 11:53:10 INFO - PID 6956 | 1659613990340 webdriver::server DEBUG -> POST /session {"capabilities": {"alwaysMatch": {"moz:firefoxOptions": {"args": ["--profile", "C:\\Users\\task_165961199010716\\AppData\\Local\\Temp\\tmpye5a7axu.mozrunner"], "binary": "Z:\\task_165961199010716\\build\\application\\firefox\\firefox.exe"}}}}
[task 2022-08-04T11:53:10.343Z] 11:53:10 INFO - PID 6956 | 1659613990340 geckodriver::capabilities DEBUG Trying to read firefox version from ini files
[task 2022-08-04T11:53:10.343Z] 11:53:10 INFO - PID 6956 | 1659613990341 geckodriver::capabilities DEBUG Found version 105.0a1
<...>
[task 2022-08-04T11:53:50.826Z] 11:53:50 INFO - STDOUT: tests/web-platform/tests/webdriver/tests/get_named_cookie/get.py::test_get_cookie_with_same_site_flag[capabilities0-Strict]
[task 2022-08-04T11:53:50.828Z] 11:53:50 INFO - PID 6956 | 1659614030826 webdriver::server DEBUG -> POST /session/28a9425e-9164-4561-bea7-231487378b51/window/rect {"width": 800, "height": 600}
[task 2022-08-04T11:53:50.828Z] 11:53:50 INFO - PID 6956 | 1659614030827 Marionette DEBUG 0 -> [0,42,"WebDriver:SetWindowRect",{"height":600,"width":800}]
[task 2022-08-04T11:53:50.829Z] 11:53:50 INFO - PID 6956 | 1659614030827 Marionette TRACE Requested window geometry matches
[task 2022-08-04T11:53:50.829Z] 11:53:50 INFO - PID 6956 | 1659614030828 Marionette DEBUG 0 <- [1,42,null,{"x":100,"y":100,"width":800,"height":600}]
[task 2022-08-04T11:53:50.831Z] 11:53:50 INFO - PID 6956 | 1659614030828 webdriver::server DEBUG <- 200 OK {"value":{"x":100,"y":100,"width":800,"height":600}}
[task 2022-08-04T11:53:50.831Z] 11:53:50 INFO - PID 6956 | 1659614030829 webdriver::server DEBUG -> POST /session/28a9425e-9164-4561-bea7-231487378b51/window/rect {"x": 100, "y": 100}
[task 2022-08-04T11:53:50.831Z] 11:53:50 INFO - PID 6956 | 1659614030829 Marionette DEBUG 0 -> [0,43,"WebDriver:SetWindowRect",{"x":100,"y":100}]
[task 2022-08-04T11:53:50.832Z] 11:53:50 INFO - PID 6956 | 1659614030830 Marionette TRACE Requested window geometry matches
[task 2022-08-04T11:53:50.833Z] 11:53:50 INFO - PID 6956 | 1659614030830 Marionette DEBUG 0 <- [1,43,null,{"x":100,"y":100,"width":800,"height":600}]
[task 2022-08-04T11:53:50.834Z] 11:53:50 INFO - PID 6956 | 1659614030830 webdriver::server DEBUG <- 200 OK {"value":{"x":100,"y":100,"width":800,"height":600}}
[task 2022-08-04T11:53:50.835Z] 11:53:50 INFO - PID 6956 | 1659614030832 webdriver::server DEBUG -> POST /session/28a9425e-9164-4561-bea7-231487378b51/url {"url": "https://web-platform.test:8443/common/blank.html"}
[task 2022-08-04T11:53:50.836Z] 11:53:50 INFO - PID 6956 | 1659614030832 Marionette DEBUG 0 -> [0,44,"WebDriver:Navigate",{"url":"https://web-platform.test:8443/common/blank.html"}]
[task 2022-08-04T11:53:50.836Z] 11:53:50 INFO - PID 6956 | 1659614030836 Marionette TRACE [11] Received event beforeunload for https://web-platform.test:8443/common/blank.html
[task 2022-08-04T11:53:50.856Z] 11:53:50 INFO - PID 6956 | 1659614030858 Marionette TRACE [11] Received event pagehide for https://web-platform.test:8443/common/blank.html
[task 2022-08-04T11:53:50.860Z] 11:53:50 INFO - PID 6956 | 1659614030874 Marionette TRACE [11] Received event DOMContentLoaded for https://web-platform.test:8443/common/blank.html
[task 2022-08-04T11:53:50.876Z] 11:53:50 INFO - PID 6956 | 1659614030874 Marionette TRACE [11] Received event pageshow for https://web-platform.test:8443/common/blank.html
[task 2022-08-04T11:53:50.877Z] 11:53:50 INFO - PID 6956 | 1659614030876 Marionette DEBUG 0 <- [1,44,null,{"value":null}]
[task 2022-08-04T11:53:50.885Z] 11:53:50 INFO - PID 6956 | 1659614030884 webdriver::server DEBUG <- 200 OK {"value":null}
[task 2022-08-04T11:53:50.887Z] 11:53:50 INFO - PID 6956 | 1659614030886 webdriver::server DEBUG -> DELETE /session/28a9425e-9164-4561-bea7-231487378b51/cookie
[task 2022-08-04T11:53:50.887Z] 11:53:50 INFO - PID 6956 | 1659614030887 Marionette DEBUG 0 -> [0,45,"WebDriver:DeleteAllCookies",{}]
[task 2022-08-04T11:53:50.888Z] 11:53:50 INFO - PID 6956 | 1659614030888 Marionette DEBUG 0 <- [1,45,null,{"value":null}]
[task 2022-08-04T11:53:50.889Z] 11:53:50 INFO - PID 6956 | 1659614030888 webdriver::server DEBUG <- 200 OK {"value":null}
[task 2022-08-04T11:53:50.890Z] 11:53:50 INFO - PID 6956 | 1659614030889 webdriver::server DEBUG -> POST /session/28a9425e-9164-4561-bea7-231487378b51/execute/sync {"script": "document.cookie = 'foo=bar;Secure;SameSite=Strict'", "args": []}
[task 2022-08-04T11:53:50.891Z] 11:53:50 INFO - PID 6956 | 1659614030890 Marionette DEBUG 0 -> [0,46,"WebDriver:ExecuteScript",{"args":[],"script":"document.cookie = 'foo=bar;Secure;SameSite=Strict'"}]
[task 2022-08-04T11:53:50.892Z] 11:53:50 INFO - PID 6956 | 1659614030891 Marionette TRACE [11] MarionetteCommands actor created for window id 8589934595
[task 2022-08-04T11:53:50.895Z] 11:53:50 INFO - PID 6956 | 1659614030895 Marionette DEBUG 0 <- [1,46,null,{"value":null}]
[task 2022-08-04T11:53:50.897Z] 11:53:50 INFO - PID 6956 | 1659614030895 webdriver::server DEBUG <- 200 OK {"value":null}
[task 2022-08-04T11:53:50.897Z] 11:53:50 INFO - PID 6956 | 1659614030897 webdriver::server DEBUG -> GET /session/28a9425e-9164-4561-bea7-231487378b51/cookie/foo
[task 2022-08-04T11:53:50.898Z] 11:53:50 INFO - PID 6956 | 1659614030897 Marionette DEBUG 0 -> [0,47,"WebDriver:GetCookies",{}]
[task 2022-08-04T11:53:50.899Z] 11:53:50 INFO - PID 6956 | 1659614030898 Marionette DEBUG 0 <- [1,47,null,[{"name":"foo","value":"bar","path":"/common","domain":"web-platform.test","secure":true,"httpOnly":false,"sameSite":"Strict"}]]
[task 2022-08-04T11:53:50.900Z] 11:53:50 INFO - PID 6956 | 1659614030898 webdriver::server DEBUG <- 200 OK {"value":{"name":"foo","value":"bar","path":"/common","domain":"web-platform.test","secure":true,"httpOnly":false,"sameSite":"Strict"}}
[task 2022-08-04T11:53:50.901Z] 11:53:50 INFO - STDOUT: PASSED
[task 2022-08-04T11:53:50.902Z] 11:53:50 INFO - PID 6956 | 1659614030901 webdriver::server DEBUG -> POST /session/28a9425e-9164-4561-bea7-231487378b51/timeouts {"implicit": 0}
[task 2022-08-04T11:53:50.903Z] 11:53:50 INFO - PID 6956 | 1659614030902 Marionette DEBUG 0 -> [0,48,"WebDriver:SetTimeouts",{"implicit":0}]
[task 2022-08-04T11:53:50.903Z] 11:53:50 INFO - PID 6956 | 1659614030902 Marionette DEBUG 0 <- [1,48,null,{"value":null}]
[task 2022-08-04T11:53:50.904Z] 11:53:50 INFO - PID 6956 | 1659614030902 webdriver::server DEBUG <- 200 OK {"value":null}
[task 2022-08-04T11:53:50.904Z] 11:53:50 INFO - PID 6956 | 1659614030903 webdriver::server DEBUG -> POST /session/28a9425e-9164-4561-bea7-231487378b51/timeouts {"pageLoad": 300000}
[task 2022-08-04T11:53:50.905Z] 11:53:50 INFO - PID 6956 | 1659614030904 Marionette DEBUG 0 -> [0,49,"WebDriver:SetTimeouts",{"pageLoad":300000}]
[task 2022-08-04T11:53:50.906Z] 11:53:50 INFO - PID 6956 | 1659614030905 Marionette DEBUG 0 <- [1,49,null,{"value":null}]
[task 2022-08-04T11:53:50.906Z] 11:53:50 INFO - PID 6956 | 1659614030905 webdriver::server DEBUG <- 200 OK {"value":null}
[task 2022-08-04T11:53:50.907Z] 11:53:50 INFO - PID 6956 | 1659614030906 webdriver::server DEBUG -> POST /session/28a9425e-9164-4561-bea7-231487378b51/timeouts {"script": 30000}
[task 2022-08-04T11:53:50.913Z] 11:53:50 INFO - PID 6956 | 1659614030913 Marionette DEBUG 0 -> [0,50,"WebDriver:SetTimeouts",{"script":30000}]
[task 2022-08-04T11:53:50.914Z] 11:53:50 INFO - PID 6956 | 1659614030913 Marionette DEBUG 0 <- [1,50,null,{"value":null}]
[task 2022-08-04T11:53:50.914Z] 11:53:50 INFO - PID 6956 | 1659614030914 webdriver::server DEBUG <- 200 OK {"value":null}
[task 2022-08-04T11:53:50.915Z] 11:53:50 INFO - PID 6956 | 1659614030915 webdriver::server DEBUG -> GET /session/28a9425e-9164-4561-bea7-231487378b51/window
[task 2022-08-04T11:53:50.916Z] 11:53:50 INFO - PID 6956 | 1659614030916 Marionette DEBUG 0 -> [0,51,"WebDriver:GetWindowHandle",{}]
[task 2022-08-04T11:53:50.917Z] 11:53:50 INFO - PID 6956 | 1659614030917 Marionette DEBUG 0 <- [1,51,null,{"value":"77fd0ba1-feea-41b0-a19a-e751b9dc10f2"}]
[task 2022-08-04T11:53:50.918Z] 11:53:50 INFO - PID 6956 | 1659614030917 webdriver::server DEBUG <- 200 OK {"value":"77fd0ba1-feea-41b0-a19a-e751b9dc10f2"}
[task 2022-08-04T11:53:50.919Z] 11:53:50 INFO - PID 6956 | 1659614030918 webdriver::server DEBUG -> GET /session/28a9425e-9164-4561-bea7-231487378b51/window
[task 2022-08-04T11:53:50.919Z] 11:53:50 INFO - PID 6956 | 1659614030919 Marionette DEBUG 0 -> [0,52,"WebDriver:GetWindowHandle",{}]
[task 2022-08-04T11:53:50.920Z] 11:53:50 INFO - PID 6956 | 1659614030919 Marionette DEBUG 0 <- [1,52,null,{"value":"77fd0ba1-feea-41b0-a19a-e751b9dc10f2"}]
[task 2022-08-04T11:53:50.920Z] 11:53:50 INFO - PID 6956 | 1659614030920 webdriver::server DEBUG <- 200 OK {"value":"77fd0ba1-feea-41b0-a19a-e751b9dc10f2"}
[task 2022-08-04T11:53:50.921Z] 11:53:50 INFO - PID 6956 | 1659614030920 webdriver::server DEBUG -> GET /session/28a9425e-9164-4561-bea7-231487378b51/window/handles
[task 2022-08-04T11:53:50.922Z] 11:53:50 INFO - PID 6956 | 1659614030921 Marionette DEBUG 0 -> [0,53,"WebDriver:GetWindowHandles",{}]
[task 2022-08-04T11:53:50.923Z] 11:53:50 INFO - PID 6956 | 1659614030921 Marionette DEBUG 0 <- [1,53,null,["77fd0ba1-feea-41b0-a19a-e751b9dc10f2"]]
[task 2022-08-04T11:53:50.923Z] 11:53:50 INFO - PID 6956 | 1659614030922 webdriver::server DEBUG <- 200 OK {"value":["77fd0ba1-feea-41b0-a19a-e751b9dc10f2"]}
[task 2022-08-04T11:53:50.925Z] 11:53:50 INFO - PID 6956 | 1659614030922 webdriver::server DEBUG -> POST /session/28a9425e-9164-4561-bea7-231487378b51/window {"handle": "77fd0ba1-feea-41b0-a19a-e751b9dc10f2"}
[task 2022-08-04T11:53:50.925Z] 11:53:50 INFO - PID 6956 | 1659614030923 Marionette DEBUG 0 -> [0,54,"WebDriver:SwitchToWindow",{"handle":"77fd0ba1-feea-41b0-a19a-e751b9dc10f2"}]
[task 2022-08-04T11:53:50.926Z] 11:53:50 INFO - PID 6956 | 1659614030924 Marionette DEBUG 0 <- [1,54,null,{"value":null}]
[task 2022-08-04T11:53:50.926Z] 11:53:50 INFO - PID 6956 | 1659614030925 webdriver::server DEBUG <- 200 OK {"value":null}
[task 2022-08-04T11:53:50.927Z] 11:53:50 INFO - PID 6956 | 1659614030925 webdriver::server DEBUG -> POST /session/28a9425e-9164-4561-bea7-231487378b51/alert/dismiss {}
[task 2022-08-04T11:53:50.928Z] 11:53:50 INFO - PID 6956 | 1659614030926 Marionette DEBUG 0 -> [0,55,"WebDriver:DismissAlert",{}]
[task 2022-08-04T11:53:50.929Z] 11:53:50 INFO - PID 6956 | 1659614030927 Marionette DEBUG 0 <- [1,55,{"error":"no such alert","message":"","stacktrace":"RemoteError@chrome://remote/content/shared/RemoteError.jsm:12:1\nWe ... ote/content/marionette/server.js:251:9\n_onJSONObjectReady/<@chrome://remote/content/marionette/transport.js:502:20\n"},null]
[task 2022-08-04T11:53:50.930Z] 11:53:50 INFO - PID 6956 | 1659614030927 webdriver::server DEBUG <- 404 Not Found {"value":{"error":"no such alert","message":"","stacktrace":"RemoteError@chrome://remote/content/shared/RemoteError.jsm:12:1\nWebDriverError@chrome://remote/content/shared/webdriver/Errors.jsm:192:5\nNoSuchAlertError@chrome://remote/content/shared/webdriver/Errors.jsm:393:5\nGeckoDriver.prototype._checkIfAlertIsPresent@chrome://remote/content/marionette/driver.js:2665:11\nGeckoDriver.prototype.dismissDialog@chrome://remote/content/marionette/driver.js:2572:8\ndespatch@chrome://remote/content/marionette/server.js:304:40\nexecute@chrome://remote/content/marionette/server.js:277:16\nonPacket/<@chrome://remote/content/marionette/server.js:250:20\nonPacket@chrome://remote/content/marionette/server.js:251:9\n_onJSONObjectReady/<@chrome://remote/content/marionette/transport.js:502:20\n"}}
[task 2022-08-04T11:53:50.932Z] 11:53:50 INFO - PID 6956 | 1659614030928 webdriver::server DEBUG -> POST /session/28a9425e-9164-4561-bea7-231487378b51/window {"handle": "77fd0ba1-feea-41b0-a19a-e751b9dc10f2"}
[task 2022-08-04T11:53:50.933Z] 11:53:50 INFO - PID 6956 | 1659614030929 Marionette DEBUG 0 -> [0,56,"WebDriver:SwitchToWindow",{"handle":"77fd0ba1-feea-41b0-a19a-e751b9dc10f2"}]
[task 2022-08-04T11:53:50.934Z] 11:53:50 INFO - PID 6956 | 1659614030929 Marionette DEBUG 0 <- [1,56,null,{"value":null}]
[task 2022-08-04T11:53:50.934Z] 11:53:50 INFO - PID 6956 | 1659614030930 webdriver::server DEBUG <- 200 OK {"value":null}
[task 2022-08-04T11:53:50.934Z] 11:53:50 INFO - PID 6956 | 1659614030930 webdriver::server DEBUG -> GET /session/28a9425e-9164-4561-bea7-231487378b51/window
[task 2022-08-04T11:53:50.935Z] 11:53:50 INFO - PID 6956 | 1659614030931 Marionette DEBUG 0 -> [0,57,"WebDriver:GetWindowHandle",{}]
[task 2022-08-04T11:53:50.936Z] 11:53:50 INFO - PID 6956 | 1659614030931 Marionette DEBUG 0 <- [1,57,null,{"value":"77fd0ba1-feea-41b0-a19a-e751b9dc10f2"}]
[task 2022-08-04T11:53:50.937Z] 11:53:50 INFO - PID 6956 | 1659614030932 webdriver::server DEBUG <- 200 OK {"value":"77fd0ba1-feea-41b0-a19a-e751b9dc10f2"}
[task 2022-08-04T11:53:50.938Z] 11:53:50 INFO - PID 6956 | 1659614030932 webdriver::server DEBUG -> GET /session/28a9425e-9164-4561-bea7-231487378b51/window/handles
[task 2022-08-04T11:53:50.939Z] 11:53:50 INFO - PID 6956 | 1659614030933 Marionette DEBUG 0 -> [0,58,"WebDriver:GetWindowHandles",{}]
[task 2022-08-04T11:53:50.939Z] 11:53:50 INFO - PID 6956 | 1659614030934 Marionette DEBUG 0 <- [1,58,null,["77fd0ba1-feea-41b0-a19a-e751b9dc10f2"]]
[task 2022-08-04T11:53:50.939Z] 11:53:50 INFO - PID 6956 | 1659614030934 webdriver::server DEBUG <- 200 OK {"value":["77fd0ba1-feea-41b0-a19a-e751b9dc10f2"]}
[task 2022-08-04T11:53:50.939Z] 11:53:50 INFO - PID 6956 | 1659614030935 webdriver::server DEBUG -> POST /session/28a9425e-9164-4561-bea7-231487378b51/window {"handle": "77fd0ba1-feea-41b0-a19a-e751b9dc10f2"}
[task 2022-08-04T11:53:50.943Z] 11:53:50 INFO - PID 6956 | 1659614030935 Marionette DEBUG 0 -> [0,59,"WebDriver:SwitchToWindow",{"handle":"77fd0ba1-feea-41b0-a19a-e751b9dc10f2"}]
[task 2022-08-04T11:53:50.943Z] 11:53:50 INFO - PID 6956 | 1659614030936 Marionette DEBUG 0 <- [1,59,null,{"value":null}]
[task 2022-08-04T11:53:50.943Z] 11:53:50 INFO - PID 6956 | 1659614030937 webdriver::server DEBUG <- 200 OK {"value":null}
[task 2022-08-04T11:53:50.945Z] 11:53:50 INFO - PID 6956 | 1659614030938 webdriver::server DEBUG -> POST /session/28a9425e-9164-4561-bea7-231487378b51/window/rect {"width": 800, "height": 600}
[task 2022-08-04T11:53:50.946Z] 11:53:50 INFO - PID 6956 | 1659614030939 Marionette DEBUG 0 -> [0,60,"WebDriver:SetWindowRect",{"height":600,"width":800}]
[task 2022-08-04T11:53:50.947Z] 11:53:50 INFO - PID 6956 | 1659614030939 Marionette TRACE Requested window geometry matches
[task 2022-08-04T11:53:50.947Z] 11:53:50 INFO - PID 6956 | 1659614030939 Marionette DEBUG 0 <- [1,60,null,{"x":100,"y":100,"width":800,"height":600}]
[task 2022-08-04T11:53:50.948Z] 11:53:50 INFO - PID 6956 | 1659614030940 webdriver::server DEBUG <- 200 OK {"value":{"x":100,"y":100,"width":800,"height":600}}
[task 2022-08-04T11:53:50.949Z] 11:53:50 INFO - PID 6956 | 1659614030941 webdriver::server DEBUG -> POST /session/28a9425e-9164-4561-bea7-231487378b51/frame {"id": null}
[task 2022-08-04T11:53:50.950Z] 11:53:50 INFO - PID 6956 | 1659614030942 Marionette DEBUG 0 -> [0,61,"WebDriver:SwitchToFrame",{"id":null}]
[task 2022-08-04T11:53:50.950Z] 11:53:50 INFO - PID 6956 | 1659614030945 Marionette DEBUG 0 <- [1,61,null,{"value":null}]
[task 2022-08-04T11:53:50.952Z] 11:53:50 INFO - PID 6956 | 1659614030945 webdriver::server DEBUG <- 200 OK {"value":null}
[task 2022-08-04T11:53:50.952Z] 11:53:50 INFO - PID 6956 | 1659614030949 webdriver::server DEBUG -> DELETE /session/28a9425e-9164-4561-bea7-231487378b51
[task 2022-08-04T11:53:50.954Z] 11:53:50 INFO - PID 6956 | 1659614030950 Marionette DEBUG 0 -> [0,62,"Marionette:Quit",{"flags":["eForceQuit"]}]
[task 2022-08-04T11:53:50.954Z] 11:53:50 INFO - PID 6956 | 1659614030951 Marionette INFO Stopped listening on port 58213
[task 2022-08-04T11:53:50.960Z] 11:53:50 INFO - PID 6956 | [Parent 8364, Main Thread] WARNING: ContentParent: id=214e4169200 - BlockShutdown: NotifyImpendingShutdown.: file Z:/task_165960693580546/build/src/dom/ipc/ContentParent.cpp:3612
[task 2022-08-04T11:53:50.964Z] 11:53:50 INFO - PID 6956 | [Parent 8364, Main Thread] WARNING: ContentParent: id=214e5f75600 - BlockShutdown: NotifyImpendingShutdown.: file Z:/task_165960693580546/build/src/dom/ipc/ContentParent.cpp:3612
[task 2022-08-04T11:53:50.965Z] 11:53:50 INFO - PID 6956 | [Parent 8364, Main Thread] WARNING: ContentParent: id=214e67b9f00 - BlockShutdown: NotifyImpendingShutdown.: file Z:/task_165960693580546/build/src/dom/ipc/ContentParent.cpp:3612
[task 2022-08-04T11:53:50.965Z] 11:53:50 INFO - PID 6956 | [Parent 8364, Main Thread] WARNING: ContentParent: id=214e6fa6a00 - BlockShutdown: NotifyImpendingShutdown.: file Z:/task_165960693580546/build/src/dom/ipc/ContentParent.cpp:3612
[task 2022-08-04T11:53:50.967Z] 11:53:50 INFO - PID 6956 | [Parent 8364, Main Thread] WARNING: ContentParent: id=214e6fa6f00 - BlockShutdown: NotifyImpendingShutdown.: file Z:/task_165960693580546/build/src/dom/ipc/ContentParent.cpp:3612
[task 2022-08-04T11:53:50.968Z] 11:53:50 INFO - PID 6956 | [Parent 8364, Main Thread] WARNING: ContentParent: id=214e6fa8300 - BlockShutdown: NotifyImpendingShutdown.: file Z:/task_165960693580546/build/src/dom/ipc/ContentParent.cpp:3612
[task 2022-08-04T11:53:50.969Z] 11:53:50 INFO - PID 6956 | [Parent 8364, Main Thread] WARNING: ContentParent: id=214e861f000 - BlockShutdown: NotifyImpendingShutdown.: file Z:/task_165960693580546/build/src/dom/ipc/ContentParent.cpp:3612
[task 2022-08-04T11:53:51.059Z] 11:53:51 INFO - PID 6956 | 1659614031065 Marionette TRACE Received observer notification quit-application
[task 2022-08-04T11:53:51.076Z] 11:53:51 INFO - PID 6956 | 1659614031074 Marionette TRACE Received observer notification quit-application
[task 2022-08-04T11:53:51.076Z] 11:53:51 INFO - PID 6956 | 1659614031075 Marionette DEBUG Marionette stopped listening
[task 2022-08-04T11:53:51.078Z] 11:53:51 INFO - PID 6956 | 1659614031077 Marionette DEBUG 0 <- [1,62,null,{"cause":"shutdown","forced":false}]
[task 2022-08-04T11:53:51.112Z] 11:53:51 INFO - PID 6956 | 1659614031118 webdriver::server DEBUG Teardown session
[task 2022-08-04T11:53:51.125Z] 11:53:51 INFO - PID 6956 | 1659614031124 Marionette DEBUG Closed connection 0
[task 2022-08-04T11:53:51.169Z] 11:53:51 INFO - PID 6956 | [Parent 8364, Main Thread] WARNING: ContentParent: id=214e4169200 - BlockShutdown: CanSend.: file Z:/task_165960693580546/build/src/dom/ipc/ContentParent.cpp:3657
[task 2022-08-04T11:53:51.169Z] 11:53:51 INFO - PID 6956 | [Parent 8364, Main Thread] WARNING: ContentParent: id=214e4169200 - ShutDownProcess: Sent shutdown message.: file Z:/task_165960693580546/build/src/dom/ipc/ContentParent.cpp:1801
[task 2022-08-04T11:53:51.169Z] 11:53:51 INFO - PID 6956 | [Parent 8364, Main Thread] WARNING: ContentParent: id=214e6fa6f00 - BlockShutdown: CanSend.: file Z:/task_165960693580546/build/src/dom/ipc/ContentParent.cpp:3657
[task 2022-08-04T11:53:51.169Z] 11:53:51 INFO - PID 6956 | [Parent 8364, Main Thread] WARNING: ContentParent: id=214e6fa6f00 - ShutDownProcess: Sent shutdown message.: file Z:/task_165960693580546/build/src/dom/ipc/ContentParent.cpp:1801
[task 2022-08-04T11:53:51.169Z] 11:53:51 INFO - PID 6956 | [Parent 8364, Main Thread] WARNING: ContentParent: id=214e6fa8300 - BlockShutdown: CanSend.: file Z:/task_165960693580546/build/src/dom/ipc/ContentParent.cpp:3657
[task 2022-08-04T11:53:51.169Z] 11:53:51 INFO - PID 6956 | [Parent 8364, Main Thread] WARNING: ContentParent: id=214e6fa8300 - ShutDownProcess: Sent shutdown message.: file Z:/task_165960693580546/build/src/dom/ipc/ContentParent.cpp:1801
[task 2022-08-04T11:53:51.169Z] 11:53:51 INFO - PID 6956 | [Parent 8364, Main Thread] WARNING: ContentParent: id=214e861f000 - BlockShutdown: CanSend.: file Z:/task_165960693580546/build/src/dom/ipc/ContentParent.cpp:3657
[task 2022-08-04T11:53:51.169Z] 11:53:51 INFO - PID 6956 | [Parent 8364, Main Thread] WARNING: ContentParent: id=214e861f000 - ShutDownProcess: Sent shutdown message.: file Z:/task_165960693580546/build/src/dom/ipc/ContentParent.cpp:1801
[task 2022-08-04T11:53:51.196Z] 11:53:51 INFO - PID 6956 | [Parent 8364, Main Thread] WARNING: ContentParent: id=214e861f000 - ShutDownProcess: Closing channel.: file Z:/task_165960693580546/build/src/dom/ipc/ContentParent.cpp:1841
[task 2022-08-04T11:53:51.212Z] 11:53:51 INFO - PID 6956 | [Parent 8364, Main Thread] WARNING: ContentParent: id=214e861f000 - RemoveShutdownBlockers: file Z:/task_165960693580546/build/src/dom/ipc/ContentParent.cpp:3772
[task 2022-08-04T11:53:51.212Z] 11:53:51 INFO - PID 6956 | [Parent 8364, Main Thread] WARNING: ContentParent: id=214e6fa6f00 - ShutDownProcess: Closing channel.: file Z:/task_165960693580546/build/src/dom/ipc/ContentParent.cpp:1841
[task 2022-08-04T11:53:51.212Z] 11:53:51 INFO - PID 6956 | [Parent 8364, Main Thread] WARNING: ContentParent: id=214e6fa6f00 - RemoveShutdownBlockers: file Z:/task_165960693580546/build/src/dom/ipc/ContentParent.cpp:3772
[task 2022-08-04T11:53:51.212Z] 11:53:51 INFO - PID 6956 | [Parent 8364, Main Thread] WARNING: ContentParent: id=214e6fa8300 - ShutDownProcess: Closing channel.: file Z:/task_165960693580546/build/src/dom/ipc/ContentParent.cpp:1841
[task 2022-08-04T11:53:51.212Z] 11:53:51 INFO - PID 6956 | [Parent 8364, Main Thread] WARNING: ContentParent: id=214e6fa8300 - RemoveShutdownBlockers: file Z:/task_165960693580546/build/src/dom/ipc/ContentParent.cpp:3772
[task 2022-08-04T11:53:51.212Z] 11:53:51 INFO - PID 6956 | [Parent 8364, Main Thread] WARNING: ContentParent: id=214e4169200 - ShutDownProcess: Closing channel.: file Z:/task_165960693580546/build/src/dom/ipc/ContentParent.cpp:1841
[task 2022-08-04T11:53:51.212Z] 11:53:51 INFO - PID 6956 | [Parent 8364, Main Thread] WARNING: ContentParent: id=214e4169200 - RemoveShutdownBlockers: file Z:/task_165960693580546/build/src/dom/ipc/ContentParent.cpp:3772
[task 2022-08-04T11:53:51.312Z] 11:53:51 INFO - PID 6956 | JavaScript error: resource://gre/modules/TerminatorTelemetry.jsm, line 60: AbortError: IOUtils: Shutting down and refusing additional I/O tasks
[task 2022-08-04T11:53:53.661Z] 11:53:53 INFO - PID 6956 | Waiting in WillDestroyCurrentMessageLoop for pid 8
[task 2022-08-04T11:54:04.904Z] 11:54:04 INFO - TEST-UNEXPECTED-TIMEOUT | /webdriver/tests/get_named_cookie/get.py | expected OK
[task 2022-08-04T11:54:04.904Z] 11:54:04 INFO - TEST-INFO took 55012ms
[task 2022-08-04T11:54:04.974Z] 11:54:04 INFO - Closing logging queue
[task 2022-08-04T11:54:04.987Z] 11:54:04 INFO - queue closed
[task 2022-08-04T11:54:05.499Z] 11:54:05 INFO - PID 7632 | 1659614045004 geckodriver INFO Listening on 127.0.0.1:49920
[task 2022-08-04T11:54:05.502Z] 11:54:05 INFO - Starting runner
[task 2022-08-04T11:54:06.185Z] 11:54:06 INFO - TEST-START | /webdriver/tests/get_named_cookie/user_prompts.py
Comment hidden (Intermittent Failures Robot) |
Comment 2•2 years ago
|
||
Can you look into these unexpected webdriver results on Windows code coverage or should an expectation update get pushed?
Comment 3•2 years ago
|
||
This is a hang during shutdown of Firefox. The quit-application
observer notification is sent out right away but we seem to hang in the following code:
[task 2022-08-04T11:53:53.661Z] 11:53:53 INFO - PID 6956 | Waiting in WillDestroyCurrentMessageLoop for pid 8
[task 2022-08-04T11:54:04.904Z] 11:54:04 INFO - TEST-UNEXPECTED-TIMEOUT | /webdriver/tests/get_named_cookie/get.py | expected OK
[task 2022-08-04T11:54:04.904Z] 11:54:04 INFO - TEST-INFO took 55012ms
As such I assume some other patch in the set of landed commits for the following merge might have caused this issue for CCOV Firefox builds:
https://hg.mozilla.org/mozilla-central/pushloghtml?changeset=8f150040b037fc9f7b6039b84a1b7806e8f6a77d
Marco, do you have an idea what could have been caused this regression?
Comment 4•2 years ago
|
||
Looks like that Marco is out at the moment. Sebastian, could sheriffs maybe do a backfill on autoland or is CCOV not available there?
Comment 5•2 years ago
|
||
I think that this is actually a dupe given that this failure is present for quite some time as the pushlog shows:
Failures have been classified to quite a lot of different bugs and given that I just came back from PTO I don't have the overview yet.
Comment 6•2 years ago
|
||
Most of these failures have actually been classified against bug 1630162. But I think that we should have a dedicated bug for just the CCOV case and I think that we should pick this one now. As it looks like it regressed within this range on August 1st:
Comment 7•2 years ago
|
||
Only fails for the backfills - maybe the list of test manifests in the chunk changed.
Comment 9•2 years ago
|
||
(In reply to Sebastian Hengst [:aryx] (needinfo me if it's about an intermittent or backout) from comment #7)
Only fails for the backfills - maybe the list of test manifests in the chunk changed.
Yes, on autoland not all the tests get run but only a random set. As such we would have to use backfills to find the regression range.
Comment 10•2 years ago
|
||
Maybe this specific order of tests is triggering a situation under which a CCOV build of Firefox takes way longer to shutdown. For Marionette we take the value from the toolkit.asyncshutdown.crash_timeout
preference but as I just remembered (and got noticed by Julian) this value isn't used because CCOV builds don't set a specific value - see bug 1697047.
Comment 12•2 years ago
|
||
The problem is actually not related to the shutdown crash timeout. I extended it but tests are still failing.
As such I had a look at the timeout multipler that wptrunner is using for ccov builds and found the following:
https://searchfox.org/mozilla-central/rev/43ba67391e71c57a14420e554e9d381543292611/testing/web-platform/tests/tools/wptrunner/wptrunner/browsers/firefox.py#58-78
Close to the end we define a multiplier of 2 only for CCOV opt builds. This is most likely too short especially for tests that include navigations and as such trigger new content processes. Lets try with a multiplier of 4 similar to debug builds if that helps. If yes we can fix it in wptrunner.
https://treeherder.mozilla.org/jobs?repo=try&revision=9a18b970a7f74b27d2a9ac78e5e6c57c300ea104
It would still be good to know the timing difference between CCOV debug and opt builds but I'm not able to find any debug builds in CI.
Updated•2 years ago
|
Comment 14•2 years ago
|
||
Increasing the multiplier also didn't work or maybe it hasn't been correctly selected. There is nothing in the log which actually prints out the final selected one. Might be a good idea to get this added as well. Looks like it might be better when I download such a ccov build and test locally.
Comment 15•2 years ago
|
||
As it can be seen for various tests that fail at least one shutdown of Firefox at some time during a test within a single test file sees a hang. And in all the cases that I can see it's related to bug 1779759. I triggered the assignee over there so that we hopefully have this fixed soon.
Comment 16•2 years ago
|
||
Ok, not in all the cases:
https://treeherder.mozilla.org/logviewer?job_id=387137518&repo=mozilla-central&lineNumber=123013-123034
Comment hidden (Intermittent Failures Robot) |
Comment 18•2 years ago
|
||
Update:
There have been 31 failures in the last 7 days:
- 1 failure on Linux 18.04 x64 CCov WebRender opt
- 30 failures on Windows 10 x64 2004 CCov WebRender opt
Recent failure log: https://treeherder.mozilla.org/logviewer?job_id=387987911&repo=mozilla-central&lineNumber=42425
Comment hidden (Intermittent Failures Robot) |
Comment 20•2 years ago
|
||
Since the fix from bug 1779759 has been merged to mozilla-central no more failures are visible yet. I'll keep an eye and check again tomorrow if new failures might have been classified.
Comment 21•2 years ago
|
||
Failures are still present and are related to a not closing content process with the log entry Waiting in WillDestroyCurrentMessageLoop
. Investigation for that is currently happening on bug 1769821.
Comment hidden (Intermittent Failures Robot) |
Updated•2 years ago
|
Updated•2 years ago
|
Comment hidden (Intermittent Failures Robot) |
Comment 26•2 years ago
|
||
So there is an interesting observation that I've made. Not sure where exactly these different shutdown times for CCOV builds are originating from but why is it that different?
Like it can be seen in this CI test job it's just a couple of seconds until the browser exists with 0
. In other cases like when running a CCOV build locally it takes up to 75s. An example of that can be seen in this profile: https://share.firefox.dev/3KReWMP
Marco, I assume that it might be related to the performance of the harddisk vs ssd in how fast these gcda files can actually be stored to disk? Or could there be some other factors that could cause such a variation?
Comment 27•2 years ago
|
||
Actually the 75s are a result from not using the GCOV_PREFIX(_STRIP)
environment variables. Firefox tries to reach a non-existent drive and as such each attempt to write the data takes some extra time. As I can see in CI we make use of GCOV_PREFIX_STRIP=4
so it should be fine.
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 34•2 years ago
•
|
||
There have been 48 total failures in the last 7 days, recent failure log.
Affected platforms are:
- windows10-64-2004-ccov-qr
Comment hidden (Intermittent Failures Robot) |
Comment 36•2 years ago
|
||
There have been 32 total failures in the last 7 days, recent failure log.
Affected platforms are:
- windows10-64-2004-ccov-qr
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 39•2 years ago
|
||
There have been 39 total failures in the last 7 days, recent failure log.
Affected platforms are:
- windows10-64-2004-ccov-qr
Comment hidden (Intermittent Failures Robot) |
Comment 41•2 years ago
|
||
Bug 1792021 sounds like the best way forward.
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 46•2 years ago
|
||
There have been 43 total failures in the last 7 days, recent failure log.
Affected platforms are:
- windows10-64-2004-ccov-qr
[task 2022-12-04T12:40:20.298Z] 12:40:20 INFO - PID 7696 | @resource://gre/modules/osfile.jsm:12:30
[task 2022-12-04T12:40:20.298Z] 12:40:20 INFO - PID 7696 | @resource://gre/modules/TelemetryStorage.jsm:19:28
[task 2022-12-04T12:40:20.299Z] 12:40:20 INFO - PID 7696 | removeAbortedSessionPing@resource://gre/modules/TelemetryControllerParent.jsm:721:5
[task 2022-12-04T12:40:20.300Z] 12:40:20 INFO - PID 7696 | removeAbortedSessionPing@resource://gre/modules/TelemetryControllerParent.jsm:249:17
[task 2022-12-04T12:40:20.301Z] 12:40:20 INFO - PID 7696 | shutdownChromeProcess/cleanup/<@resource://gre/modules/TelemetrySession.jsm:1235:42
[task 2022-12-04T12:40:20.302Z] 12:40:20 INFO - PID 7696 | observe@resource://gre/modules/AsyncShutdown.jsm:572:16
[task 2022-12-04T12:40:20.303Z] 12:40:20 INFO - PID 7696 | JavaScript error: resource://gre/modules/AsyncShutdown.jsm, line 718: Error: Phase "profile-before-change" is finished, it is too late to register completion condition "OS.File: flush I/O queued before profileBeforeChange"
[task 2022-12-04T12:40:20.303Z] 12:40:20 INFO - PID 7696 | JavaScript error: resource://gre/modules/AsyncShutdown.jsm, line 718: Error: Phase "profile-before-change" is finished, it is too late to register completion condition "OS.File: flush I/O queued before profileBeforeChange"
[task 2022-12-04T12:40:20.304Z] 12:40:20 INFO - PID 7696 | JavaScript error: resource://gre/modules/AsyncShutdown.jsm, line 718: Error: Phase "profile-before-change" is finished, it is too late to register completion condition "OS.File: flush I/O queued before profileBeforeChange"
[task 2022-12-04T12:40:20.305Z] 12:40:20 INFO - PID 7696 | JavaScript error: resource://gre/modules/AsyncShutdown.jsm, line 718: Error: Phase "profile-before-change" is finished, it is too late to register completion condition "OS.File: flush I/O queued before profileBeforeChange"
[task 2022-12-04T12:40:20.306Z] 12:40:20 INFO - PID 7696 | JavaScript error: resource://gre/modules/AsyncShutdown.jsm, line 718: Error: Phase "profile-before-change" is finished, it is too late to register completion condition "OS.File: flush I/O queued before profileBeforeChange"
[task 2022-12-04T12:40:22.271Z] 12:40:22 INFO - PID 7696 | Waiting in WillDestroyCurrentMessageLoop for pid 6108
[task 2022-12-04T12:40:30.418Z] 12:40:30 INFO - TEST-UNEXPECTED-TIMEOUT | /webdriver/tests/get_named_cookie/get.py | expected OK
[task 2022-12-04T12:40:30.418Z] 12:40:30 INFO - TEST-INFO took 55013ms
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 52•2 years ago
•
|
||
With the latest changes from Jed on bug 1805761 the signature for CCOV builds has been changed. It now looks like the following:
https://treeherder.mozilla.org/logviewer?job_id=400031779&repo=mozilla-central&lineNumber=95704-95707
[task 2022-12-18T21:58:24.450Z] 21:58:24 INFO - PID 3828 | JavaScript error: resource://gre/modules/AsyncShutdown.jsm, line 718: Error: Phase "profile-before-change" is finished, it is too late to register completion condition "OS.File: flush I/O queued before profileBeforeChange"
[task 2022-12-18T21:58:26.417Z] 21:58:26 INFO - PID 3828 | [Parent 6960, IPC I/O Parent] WARNING: Process 4224 may be hanging at shutdown; will wait for up to 80000ms: file Z:/task_167139274820329/build/src/ipc/chromium/src/chrome/common/process_watcher_win.cc:126
[task 2022-12-18T21:58:39.947Z] 21:58:39 INFO - PID 3828 | [Parent 6960, IPC I/O Parent] WARNING: Process 1676 may be hanging at shutdown; will wait for up to 66484ms: file Z:/task_167139274820329/build/src/ipc/chromium/src/chrome/common/process_watcher_win.cc:126
[task 2022-12-18T21:58:44.476Z] 21:58:44 INFO - TEST-UNEXPECTED-TIMEOUT | /webdriver/tests/status/status.py | expected OK
Given that web-platform tests don't wait that long this is running into a timeout. Not sure how the duration to wait is calculated but maybe it should be synced up with toolkit.asyncshutdown.crash_timeout
? If anything has to change on the platform side we should file a new bug.
Comment 54•2 years ago
|
||
(In reply to Henrik Skupin (:whimboo) [⌚️UTC+1] from comment #52)
With the latest changes from Jed on bug 1805761 the signature for CCOV builds has been changed. It now looks like the following:
https://treeherder.mozilla.org/logviewer?job_id=400031779&repo=mozilla-central&lineNumber=95704-95707[task 2022-12-18T21:58:24.450Z] 21:58:24 INFO - PID 3828 | JavaScript error: resource://gre/modules/AsyncShutdown.jsm, line 718: Error: Phase "profile-before-change" is finished, it is too late to register completion condition "OS.File: flush I/O queued before profileBeforeChange" [task 2022-12-18T21:58:26.417Z] 21:58:26 INFO - PID 3828 | [Parent 6960, IPC I/O Parent] WARNING: Process 4224 may be hanging at shutdown; will wait for up to 80000ms: file Z:/task_167139274820329/build/src/ipc/chromium/src/chrome/common/process_watcher_win.cc:126 [task 2022-12-18T21:58:39.947Z] 21:58:39 INFO - PID 3828 | [Parent 6960, IPC I/O Parent] WARNING: Process 1676 may be hanging at shutdown; will wait for up to 66484ms: file Z:/task_167139274820329/build/src/ipc/chromium/src/chrome/common/process_watcher_win.cc:126 [task 2022-12-18T21:58:44.476Z] 21:58:44 INFO - TEST-UNEXPECTED-TIMEOUT | /webdriver/tests/status/status.py | expected OK
Given that web-platform tests don't wait that long this is running into a timeout. Not sure how the duration to wait is calculated but maybe it should be synced up with
toolkit.asyncshutdown.crash_timeout
? If anything has to change on the platform side we should file a new bug.
toolkit.asyncshutdown.crash_timeout
is 180s on ccov builds, from your patch in bug 1697047, so I thought it was safe to wait 80s in IPC. It looks like I'm also going to need to use a value larger than the current 8s for ASan (bug 1806224) and maybe even regular builds, at least on Mac (bug 1805696). Maybe this test harness needs a larger timeout to deal with ccov instrumentation? Of course, I can see the earlier comments here about increasing the test harness timeout, and if I correctly understand the code that's linked in comment #12 then we should already be waiting 140s (70s * 2) but maybe I'm missing something — the log in comment #52 seems to show the test run being timed out after about 55s total and 20s in shutdown.
Comment 55•2 years ago
|
||
Actually we have a 2x multiplier for CCOV builds which means a test module times out after 50s if no extended timeout (META: timeout=long
) has been requested. The problem with especially wdspec jobs is that it's using pytest and as result the timeout is applied to the whole test file. And this is problematic if multiple tests in such a module restart Firefox a lot because this isn't controllable right now.
Jed, can I assume that there will be more fixes to improve the shutdown time of CCOV builds especially around the content process shutdown?
Updated•2 years ago
|
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Updated•2 years ago
|
Comment 58•2 years ago
|
||
Update:
There have been 40 failures within the last 7 days, all of them on Windows 10 x64 2004 CCov WebRender opt.
Recent failure log: https://treeherder.mozilla.org/logviewer?job_id=401707309&repo=mozilla-central&lineNumber=75843
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 61•2 years ago
|
||
Hi Jed, your feedback regarding my questions from comment 55 would be kinda appreciated. Thanks!
Comment hidden (Intermittent Failures Robot) |
Comment 63•2 years ago
|
||
(In reply to Henrik Skupin (:whimboo) [⌚️UTC+1] from comment #55)
Jed, can I assume that there will be more fixes to improve the shutdown time of CCOV builds especially around the content process shutdown?
Unfortunately, this isn't something I'm working on (nor do I know much about it in general). Hopefully it will be possible to increase CI timeouts enough to deal with the slow shutdown.
Comment 64•2 years ago
|
||
(In reply to Jed Davis [:jld] ⟨⏰|UTC-8⟩ ⟦he/him⟧ from comment #63)
(In reply to Henrik Skupin (:whimboo) [⌚️UTC+1] from comment #55)
Jed, can I assume that there will be more fixes to improve the shutdown time of CCOV builds especially around the content process shutdown?
Unfortunately, this isn't something I'm working on (nor do I know much about it in general). Hopefully it will be possible to increase CI timeouts enough to deal with the slow shutdown.
So let me ask one other question regarding the currently allowed shutdown time for content processes. The toolkit.asyncshutdown.crash_timeout
preference is actually used for shutdown hang scenarios and has been increased for CCOV to cover slow shutdown. But the problem is that whenever under such a condition a CCOV build has more than 2 content processes that are slowly shutting down we easily reach the 180s shutdown timeout. Such a scenario is not uncommon. As such should we really wait 80s for each and every content process, or can this time be reduced? Do we know how long content processes usually take to shutdown maybe via telemetry settings?
Comment 65•2 years ago
|
||
I don't think we currently have this data. Do we wait 180s for each content process and we fail only after we wait for the last one? Or do we wait 180s and fail as soon as we have the first one that exceeds the timeout?
Comment 66•2 years ago
|
||
No, it's 80s per process as I've mentioned above. Also see the related landed patch from Jed:
https://hg.mozilla.org/mozilla-central/rev/d6814aa8f90e
Comment hidden (Intermittent Failures Robot) |
Comment 68•2 years ago
|
||
(In reply to Henrik Skupin (:whimboo) [⌚️UTC+1] from comment #64)
As such should we really wait 80s for each and every content process, or can this time be reduced? Do we know how long content processes usually take to shutdown maybe via telemetry settings?
(In reply to Marco Castelluccio [:marco] from comment #65)
I don't think we currently have this data. Do we wait 180s for each content process and we fail only after we wait for the last one? Or do we wait 180s and fail as soon as we have the first one that exceeds the timeout?
The timeout I implemented in ProcessWatcher
is global, not per-process; I was assuming that child processes' late shutdown would proceed in parallel (and I guess I'm also implicitly assuming that there's “enough” hardware parallelism). Also, there's only so much I can do when the parent process will be killed after a fixed timeout (by nsTerminator
and/or a test harness).
Is there evidence that Windows ccov shutdown is serialized between multiple child processes?
Comment 69•2 years ago
|
||
I think that this is an interesting example:
https://treeherder.mozilla.org/logviewer?job_id=404028052&repo=mozilla-central&lineNumber=7191
[task 2023-01-31T05:50:48.499Z] 05:50:48 INFO - PID 1456 | 1675144248498 Marionette DEBUG 0 -> [0,34,"Marionette:Quit",{"flags":["eForceQuit"]}]
[..]
[task 2023-01-31T05:50:50.854Z] 05:50:50 INFO - PID 1456 | [Parent 1916, IPC I/O Parent] WARNING: Process 8864 may be hanging at shutdown; will wait for up to 80000ms: file Z:/task_167513988063129/build/src/ipc/chromium/src/chrome/common/process_watcher_win.cc:129
[task 2023-01-31T05:50:54.408Z] 05:50:54 INFO - PID 1456 | [Parent 1916, IPC I/O Parent] WARNING: Process 4116 may be hanging at shutdown; will wait for up to 76453ms: file Z:/task_167513988063129/build/src/ipc/chromium/src/chrome/common/process_watcher_win.cc:129
[task 2023-01-31T05:51:00.319Z] 05:51:00 INFO - PID 1456 | 1675144260319 geckodriver::browser DEBUG Browser process stopped: exit code: 0
[task 2023-01-31T05:51:00.321Z] 05:51:00 INFO - PID 1456 | 1675144260321 webdriver::server DEBUG <- 200 OK {"value":null}
The shutdown of Firefox gets triggered by Marionette at 05:50:48
. The first slow content process shutdown log entry can be seen at 05:50:50
which means 2s after. And here it is using a fresh 80s maximum timeout without taking care of the already passed seconds since the shutdown has been triggered. I assume if the first slow content process shutdown is detected later the drift can even be larger? That means it would definitely exceed the allowed async timeout setting of 80s, and we would finally crash anyway.
Is there any possibility to see get more details out of the process in why it takes that long or is it all because just CCOV?
Comment 70•2 years ago
|
||
(In reply to Henrik Skupin (:whimboo) [⌚️UTC+1] from comment #69)
I think that this is an interesting example:
https://treeherder.mozilla.org/logviewer?job_id=404028052&repo=mozilla-central&lineNumber=7191[task 2023-01-31T05:50:48.499Z] 05:50:48 INFO - PID 1456 | 1675144248498 Marionette DEBUG 0 -> [0,34,"Marionette:Quit",{"flags":["eForceQuit"]}] [..] [task 2023-01-31T05:50:50.854Z] 05:50:50 INFO - PID 1456 | [Parent 1916, IPC I/O Parent] WARNING: Process 8864 may be hanging at shutdown; will wait for up to 80000ms: file Z:/task_167513988063129/build/src/ipc/chromium/src/chrome/common/process_watcher_win.cc:129 [task 2023-01-31T05:50:54.408Z] 05:50:54 INFO - PID 1456 | [Parent 1916, IPC I/O Parent] WARNING: Process 4116 may be hanging at shutdown; will wait for up to 76453ms: file Z:/task_167513988063129/build/src/ipc/chromium/src/chrome/common/process_watcher_win.cc:129 [task 2023-01-31T05:51:00.319Z] 05:51:00 INFO - PID 1456 | 1675144260319 geckodriver::browser DEBUG Browser process stopped: exit code: 0 [task 2023-01-31T05:51:00.321Z] 05:51:00 INFO - PID 1456 | 1675144260321 webdriver::server DEBUG <- 200 OK {"value":null}
The shutdown of Firefox gets triggered by Marionette at
05:50:48
. The first slow content process shutdown log entry can be seen at05:50:50
which means 2s after. And here it is using a fresh 80s maximum timeout without taking care of the already passed seconds since the shutdown has been triggered. I assume if the first slow content process shutdown is detected later the drift can even be larger? That means it would definitely exceed the allowed async timeout setting of 80s, and we would finally crash anyway.
ProcessWatcher
's timeout starts when the parent process IPC I/O thread is shut down (if there's still a process that EnsureTerminated
was called on that's still running), so it makes sense that that would be a little later than the Marionette quit command. Currently, ProcessWatcher
's timeout is 80s and nsTerminator
's timeout (toolkit.asyncshutdown.crash_timeout
) is 180s, so there's 100s left for other parts of shutdown.
But, the thing about that log that's interesting to me is [taskcluster:error] Task aborted - max run time exceeded
at the end — if I understand correctly that's not any of the timeouts in Gecko, or in a test harness script, it's Taskcluster imposing a limit on how long one task can use the machine.
Is there any possibility to see get more details out of the process in why it takes that long or is it all because just CCOV?
Unfortunately, no, not that I'm aware of.
In theory, if the Windows ProcessWatcher were changed to invoke the crash reporter directly instead of trying to inject a thread that crashes, it could get stacks from a process during ccov shutdown and we could see if it's spending a lot of time computing or if it's blocked on I/O, and maybe there might be something actionable in there. But that's a nontrivial amount of work, and this bug is marked S4 so it's hard to justify; if we can just increase some timeouts or split the tests into more jobs instead, that might be the easiest approach.
Comment 71•2 years ago
|
||
(In reply to Jed Davis [:jld] ⟨⏰|UTC-8⟩ ⟦he/him⟧ from comment #70)
ProcessWatcher
's timeout starts when the parent process IPC I/O thread is shut down (if there's still a process thatEnsureTerminated
was called on that's still running), so it makes sense that that would be a little later than the Marionette quit command. Currently,ProcessWatcher
's timeout is 80s andnsTerminator
's timeout (toolkit.asyncshutdown.crash_timeout
) is 180s, so there's 100s left for other parts of shutdown.
When you are saying there are 100s left for other parts, does it mean that the ProcessWatcher blocks any other component from shutdown or do those still run in parallel? I'm fairly sure it's not blocking but asking to get the confirmation.
But, the thing about that log that's interesting to me is
[taskcluster:error] Task aborted - max run time exceeded
at the end — if I understand correctly that's not any of the timeouts in Gecko, or in a test harness script, it's Taskcluster imposing a limit on how long one task can use the machine.
Yes that's something different and doesn't happen that often. We still have quite a lot of [CCOV jobs on Windows which pass, always show fast shutdowns of Firefox](https://treeherder.mozilla.org/logviewer?job_id=404278480&repo=mozilla-central and are done within 30 minutes. As such we didn't add a new chunk to CCOV jobs yet, but would really like to know why in some cases the shutdown is that massively slow.
Here some more examples:
https://treeherder.mozilla.org/jobs?repo=mozilla-central&searchStr=wd%2Cccov%2Cwindows&revision=b7f07512450399f35fc38a7e94241b19a4c2693c
Is there any possibility to see get more details out of the process in why it takes that long or is it all because just CCOV?
Unfortunately, no, not that I'm aware of.
In theory, if the Windows ProcessWatcher were changed to invoke the crash reporter directly instead of trying to inject a thread that crashes, it could get stacks from a process during ccov shutdown and we could see if it's spending a lot of time computing or if it's blocked on I/O, and maybe there might be something actionable in there. But that's a nontrivial amount of work, and this bug is marked S4 so it's hard to justify; if we can just increase some timeouts or split the tests into more jobs instead, that might be the easiest approach.
This bug is marked as S4 because it's an intermittent failure bug and that is the default. It's used to identify and investigate the current issues but I'm happy to file a new bug to maybe get further work to be done after a proper triage. In which component should it be filed?
I had a quick look at some of the job logs and it's interesting that the shutdown duration steadily increases as more tests were running. While at the beginning each shutdown takes around 5s even for CCOV builds (!!!) this duration increases over time up to 20s, 30s, and more. Could this maybe a result of some leaked processes or memory? This behavior is really strange.
Comment 72•2 years ago
|
||
The slowdown at the end of the execution for ccov is most likely related to I/O.
A few facts about ccov:
- at the end of the execution, the binary dumps a lot of files to the disk;
- each process will dump the files, and there are locks so they don't write to the same files at the same time;
- the more source files are covered by tests, the more files will be dumped at the end.
Comment 73•2 years ago
|
||
Yes, that's understandable but doesn't explain the trend that as longer a job is running shutdown times also take longer. And we do not have long / a lot of tests only at the end of the job.
Marco, are there any CCOV specific log or environment setttings that we could use to get more log output for CCOV builds?
Updated•2 years ago
|
Comment hidden (Intermittent Failures Robot) |
Comment 75•2 years ago
•
|
||
Marco is probably a better source of information on this issue.
Comment 76•2 years ago
|
||
(In reply to Henrik Skupin [:whimboo][⌚️UTC+1] (away 02/13 - 02/19) from comment #73)
Yes, that's understandable but doesn't explain the trend that as longer a job is running shutdown times also take longer. And we do not have long / a lot of tests only at the end of the job.
No, indeed. I wonder if we are keeping some processes alive for longer and only end them at the end of a job. Perhaps we can add some logging to check when each process is started and when each process is finished.
Marco, are there any CCOV specific log or environment setttings that we could use to get more log output for CCOV builds?
Unfortunately there isn't, and we'd need to modify LLVM to add some (https://github.com/llvm/llvm-project/blob/main/compiler-rt/lib/profile/GCDAProfiling.c), so it is a bit cumbersome.
Comment 77•2 years ago
|
||
(In reply to Marco Castelluccio [:marco] from comment #76)
Unfortunately there isn't, and we'd need to modify LLVM to add some (https://github.com/llvm/llvm-project/blob/main/compiler-rt/lib/profile/GCDAProfiling.c), so it is a bit cumbersome.
IIRC we've set things up so that a commit can add patches to build/build-clang
and list them in the appropriate clang-*.json
file, and then Taskcluster will automatically build a new toolchain as needed, including on Try. So, it might not be too difficult to make changes to the coverage runtime, at least for experimenting with.
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 80•2 years ago
|
||
(In reply to Jed Davis [:jld] ⟨⏰|UTC-8⟩ ⟦he/him⟧ from comment #77)
(In reply to Marco Castelluccio [:marco] from comment #76)
Unfortunately there isn't, and we'd need to modify LLVM to add some (https://github.com/llvm/llvm-project/blob/main/compiler-rt/lib/profile/GCDAProfiling.c), so it is a bit cumbersome.
IIRC we've set things up so that a commit can add patches to
build/build-clang
and list them in the appropriateclang-*.json
file, and then Taskcluster will automatically build a new toolchain as needed, including on Try. So, it might not be too difficult to make changes to the coverage runtime, at least for experimenting with.
Marco, is there anything that we could do here, even only for some try builds?
Comment 81•2 years ago
|
||
Yes we can add patches to Clang, but it is a bit cumbersome (I mean, it's not just a matter of modifying some files in our repo. You need to clone llvm, make sure you are at the same revision we are using in m-c, you need to make sure you apply the same patches we are applying to avoid conflicts, and then you can generate your patch and add your patch to your commit to mozilla-central).
A simple logging we could add is something at the start and at the end of a process (in that file, IIRC we'd need to touch __gcov_fork, llvm_gcov_init and llvm_writeout_and_clear, logging the PID of the current process), so we can at least rule out the possibility that there are too many processes ending at the same time towards the end of the tests, which is when we are running out of time.
Updated•2 years ago
|
Comment hidden (Intermittent Failures Robot) |
Comment 83•2 years ago
|
||
I think that as next step I would like to modify how we run the wdspec jobs with ccov builds. Lets trigger some try builds with the MOZ_LOG=process:5
environment variable set once bug 1787718 is fixed. That might maybe give some more details that I could use for further investigations.
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 87•2 years ago
•
|
||
I pushed a try build with MOZ_LOG="Process:5"
:
https://treeherder.mozilla.org/jobs?repo=try&revision=633eeccda5cbe4e03d46953e81438ca4eddf8e68
Lets see if that gives us some insight.
Comment 88•2 years ago
|
||
There is actually no additional information when using Process:5
logging which could be used to determine the underlying reason.
But there is an interesting fact. Since we moved the CCOV tests from Windows 10 to Windows 11 the number of failures has been drastically reduced to roughly 1 or 2 per day. That's great to see and I don't think that I will spend more time on investigations.
Comment 89•2 years ago
|
||
(In reply to Henrik Skupin [:whimboo][⌚️UTC+1] from comment #88)
But there is an interesting fact. Since we moved the CCOV tests from Windows 10 to Windows 11 the number of failures has been drastically reduced to roughly 1 or 2 per day. That's great to see and I don't think that I will spend more time on investigations.
Joel, would you mind giving us a quick overview which kind of resources have been changed for the Windows 11 workers? It would be good to know why those failures dropped a lot since we moved away from Windows 10. Thanks.
Comment hidden (Intermittent Failures Robot) |
Comment 92•2 years ago
|
||
the hardware/cloud is the same (as far as I know)- the builds are the same, this is just a new OS- within windows the kernel changed significantly in later win10 and all win11 builds, so most likely that is the issue.
Comment 94•2 years ago
|
||
(In reply to Joel Maher ( :jmaher ) (UTC -8) from comment #92)
the hardware/cloud is the same (as far as I know)- the builds are the same, this is just a new OS- within windows the kernel changed significantly in later win10 and all win11 builds, so most likely that is the issue.
Thanks! That's good to see. I'll still observe the failure over the next weeks.
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 100•2 years ago
|
||
(In reply to Intermittent Failures Robot from comment #99)
Platform and build breakdown:
- linux1804-64-ccov-qr: 2
- opt: 2
These are wrongly classified. I went ahead, created bug 1827354 and re-classified those tests.
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 120•1 year ago
|
||
On bug 1848121 I've increased the timeout multiplier for CCOV builds from 3 to 4, which might help here by the extra 25s. Lets observe.
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 160•4 months ago
|
||
https://wiki.mozilla.org/Bug_Triage#Intermittent_Test_Failure_Cleanup
For more information, please visit BugBot documentation.
Reporter | ||
Comment 161•4 months ago
|
||
treeherder |
New failure instance: https://treeherder.mozilla.org/logviewer?job_id=470150630&repo=mozilla-central
Comment 162•4 months ago
|
||
(In reply to Treeherder Bug Filer from comment #161)
New failure instance: https://treeherder.mozilla.org/logviewer?job_id=470150630&repo=mozilla-central
That is just bug 1912631.
Reporter | ||
Comment 163•3 months ago
|
||
treeherder |
New failure instance: https://treeherder.mozilla.org/logviewer?job_id=472564836&repo=try
Comment 164•3 months ago
|
||
(In reply to Treeherder Bug Filer from comment #163)
New failure instance: https://treeherder.mozilla.org/logviewer?job_id=472564836&repo=try
That is bug 1791216.
Reporter | ||
Comment 165•3 months ago
|
||
treeherder |
New failure instance: https://treeherder.mozilla.org/logviewer?job_id=472642241&repo=mozilla-central
Comment 166•3 months ago
|
||
(In reply to Treeherder Bug Filer from comment #165)
New failure instance: https://treeherder.mozilla.org/logviewer?job_id=472642241&repo=mozilla-central
This is bug 1832294.
Reporter | ||
Comment 167•2 months ago
|
||
treeherder |
New failure instance: https://treeherder.mozilla.org/logviewer?job_id=474881451&repo=mozilla-beta
Comment 168•2 months ago
|
||
(In reply to Treeherder Bug Filer from comment #167)
New failure instance: https://treeherder.mozilla.org/logviewer?job_id=474881451&repo=mozilla-beta
This is bug 1776189.
Description
•