Closed Bug 1779759 Opened 3 months ago Closed 2 months ago

Intermittent [tier 2] Wd CCOV TEST-UNEXPECTED-TIMEOUT | expected OK (JavaScript error: resource://gre/modules/TerminatorTelemetry.jsm, line 60: AbortError: IOUtils: Shutting down and refusing additional I/O tasks)

Categories

(Toolkit :: Telemetry, defect, P1)

defect

Tracking

()

RESOLVED FIXED
105 Branch
Tracking Status
firefox-esr91 --- unaffected
firefox-esr102 --- wontfix
firefox103 --- wontfix
firefox104 --- wontfix
firefox105 --- fixed

People

(Reporter: intermittent-bug-filer, Assigned: barret)

References

(Blocks 1 open bug, Regressed 1 open bug, Regression)

Details

(Keywords: intermittent-failure, regression)

Attachments

(1 file, 1 obsolete file)

Filed by: bszekely [at] mozilla.com
Parsed log: https://treeherder.mozilla.org/logviewer?job_id=384469789&repo=mozilla-central
Full log: https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/QNlBQBh8SGOUAG2BErHRxA/runs/0/artifacts/public/logs/live_backing.log


[task 2022-07-15T11:50:05.348Z] 11:50:05     INFO - PID 7844 | 1657885805338	webdriver::server	DEBUG	<- 200 OK {"value":null}
[task 2022-07-15T11:50:05.349Z] 11:50:05     INFO - PID 7844 | 1657885805339	webdriver::server	DEBUG	-> GET /session/ae2c9f13-933f-4519-9ddf-c6f94463f4f2/window
[task 2022-07-15T11:50:05.350Z] 11:50:05     INFO - PID 7844 | 1657885805339	Marionette	DEBUG	0 -> [0,153,"WebDriver:GetWindowHandle",{}]
[task 2022-07-15T11:50:05.350Z] 11:50:05     INFO - PID 7844 | 1657885805340	Marionette	DEBUG	0 <- [1,153,null,{"value":"a69b83d4-54fa-4d84-9fbc-a0c42064fa0e"}]
[task 2022-07-15T11:50:05.351Z] 11:50:05     INFO - PID 7844 | 1657885805340	webdriver::server	DEBUG	<- 200 OK {"value":"a69b83d4-54fa-4d84-9fbc-a0c42064fa0e"}
[task 2022-07-15T11:50:05.351Z] 11:50:05     INFO - PID 7844 | 1657885805341	webdriver::server	DEBUG	-> GET /session/ae2c9f13-933f-4519-9ddf-c6f94463f4f2/window/handles
[task 2022-07-15T11:50:05.352Z] 11:50:05     INFO - PID 7844 | 1657885805341	Marionette	DEBUG	0 -> [0,154,"WebDriver:GetWindowHandles",{}]
[task 2022-07-15T11:50:05.353Z] 11:50:05     INFO - PID 7844 | 1657885805341	Marionette	DEBUG	0 <- [1,154,null,["a69b83d4-54fa-4d84-9fbc-a0c42064fa0e"]]
[task 2022-07-15T11:50:05.353Z] 11:50:05     INFO - PID 7844 | 1657885805342	webdriver::server	DEBUG	<- 200 OK {"value":["a69b83d4-54fa-4d84-9fbc-a0c42064fa0e"]}
[task 2022-07-15T11:50:05.354Z] 11:50:05     INFO - PID 7844 | 1657885805342	webdriver::server	DEBUG	-> POST /session/ae2c9f13-933f-4519-9ddf-c6f94463f4f2/window {"handle": "a69b83d4-54fa-4d84-9fbc-a0c42064fa0e"}
[task 2022-07-15T11:50:05.355Z] 11:50:05     INFO - PID 7844 | 1657885805343	Marionette	DEBUG	0 -> [0,155,"WebDriver:SwitchToWindow",{"handle":"a69b83d4-54fa-4d84-9fbc-a0c42064fa0e"}]
[task 2022-07-15T11:50:05.356Z] 11:50:05     INFO - PID 7844 | 1657885805343	Marionette	DEBUG	0 <- [1,155,null,{"value":null}]
[task 2022-07-15T11:50:05.356Z] 11:50:05     INFO - PID 7844 | 1657885805344	webdriver::server	DEBUG	<- 200 OK {"value":null}
[task 2022-07-15T11:50:05.357Z] 11:50:05     INFO - PID 7844 | 1657885805344	webdriver::server	DEBUG	-> POST /session/ae2c9f13-933f-4519-9ddf-c6f94463f4f2/window/rect {"width": 800, "height": 600}
[task 2022-07-15T11:50:05.358Z] 11:50:05     INFO - PID 7844 | 1657885805345	Marionette	DEBUG	0 -> [0,156,"WebDriver:SetWindowRect",{"height":600,"width":800}]
[task 2022-07-15T11:50:05.358Z] 11:50:05     INFO - PID 7844 | 1657885805345	Marionette	TRACE	Requested window geometry matches
[task 2022-07-15T11:50:05.359Z] 11:50:05     INFO - PID 7844 | 1657885805345	Marionette	DEBUG	0 <- [1,156,null,{"x":100,"y":100,"width":800,"height":600}]
[task 2022-07-15T11:50:05.360Z] 11:50:05     INFO - PID 7844 | 1657885805346	webdriver::server	DEBUG	<- 200 OK {"value":{"x":100,"y":100,"width":800,"height":600}}
[task 2022-07-15T11:50:05.360Z] 11:50:05     INFO - PID 7844 | 1657885805346	webdriver::server	DEBUG	-> POST /session/ae2c9f13-933f-4519-9ddf-c6f94463f4f2/frame {"id": null}
[task 2022-07-15T11:50:05.361Z] 11:50:05     INFO - PID 7844 | 1657885805347	Marionette	DEBUG	0 -> [0,157,"WebDriver:SwitchToFrame",{"id":null}]
[task 2022-07-15T11:50:05.362Z] 11:50:05     INFO - PID 7844 | 1657885805348	Marionette	DEBUG	0 <- [1,157,null,{"value":null}]
[task 2022-07-15T11:50:05.363Z] 11:50:05     INFO - PID 7844 | 1657885805348	webdriver::server	DEBUG	<- 200 OK {"value":null}
[task 2022-07-15T11:50:05.364Z] 11:50:05     INFO - PID 7844 | 1657885805351	webdriver::server	DEBUG	-> DELETE /session/ae2c9f13-933f-4519-9ddf-c6f94463f4f2
[task 2022-07-15T11:50:05.364Z] 11:50:05     INFO - PID 7844 | 1657885805352	Marionette	DEBUG	0 -> [0,158,"Marionette:Quit",{"flags":["eForceQuit"]}]
[task 2022-07-15T11:50:05.365Z] 11:50:05     INFO - PID 7844 | 1657885805352	Marionette	INFO	Stopped listening on port 62361
[task 2022-07-15T11:50:05.418Z] 11:50:05     INFO - PID 7844 | 1657885805425	Marionette	TRACE	Received observer notification quit-application
[task 2022-07-15T11:50:05.432Z] 11:50:05     INFO - PID 7844 | 1657885805431	Marionette	TRACE	Received observer notification quit-application
[task 2022-07-15T11:50:05.433Z] 11:50:05     INFO - PID 7844 | 1657885805432	Marionette	DEBUG	Marionette stopped listening
[task 2022-07-15T11:50:05.433Z] 11:50:05     INFO - PID 7844 | 1657885805433	Marionette	DEBUG	0 <- [1,158,null,{"cause":"shutdown","forced":false}]
[task 2022-07-15T11:50:05.463Z] 11:50:05     INFO - PID 7844 | 1657885805465	webdriver::server	DEBUG	Teardown session
[task 2022-07-15T11:50:05.470Z] 11:50:05     INFO - PID 7844 | 1657885805470	Marionette	DEBUG	Closed connection 0
[task 2022-07-15T11:50:05.499Z] 11:50:05     INFO - PID 7844 | [Parent 8636, Main Thread] WARNING: ContentParent: id=1a23fa9cc00 - BlockShutdown: CanSend.: file Z:/task_165787977686434/build/src/dom/ipc/ContentParent.cpp:3587
[task 2022-07-15T11:50:05.509Z] 11:50:05     INFO - PID 7844 | [Parent 8636, Main Thread] WARNING: ContentParent: id=1a23fa9cc00 - ShutDownProcess: Sent shutdown message.: file Z:/task_165787977686434/build/src/dom/ipc/ContentParent.cpp:1791
[task 2022-07-15T11:50:05.510Z] 11:50:05     INFO - PID 7844 | [Parent 8636, Main Thread] WARNING: ContentParent: id=1a24899a100 - BlockShutdown: CanSend.: file Z:/task_165787977686434/build/src/dom/ipc/ContentParent.cpp:3587
[task 2022-07-15T11:50:05.511Z] 11:50:05     INFO - PID 7844 | [Parent 8636, Main Thread] WARNING: ContentParent: id=1a24899a100 - ShutDownProcess: Sent shutdown message.: file Z:/task_165787977686434/build/src/dom/ipc/ContentParent.cpp:1791
[task 2022-07-15T11:50:05.512Z] 11:50:05     INFO - PID 7844 | [Parent 8636, Main Thread] WARNING: ContentParent: id=1a248997900 - BlockShutdown: CanSend.: file Z:/task_165787977686434/build/src/dom/ipc/ContentParent.cpp:3587
[task 2022-07-15T11:50:05.513Z] 11:50:05     INFO - PID 7844 | [Parent 8636, Main Thread] WARNING: ContentParent: id=1a248997900 - ShutDownProcess: Sent shutdown message.: file Z:/task_165787977686434/build/src/dom/ipc/ContentParent.cpp:1791
[task 2022-07-15T11:50:05.514Z] 11:50:05     INFO - PID 7844 | [Parent 8636, Main Thread] WARNING: ContentParent: id=1a248999200 - BlockShutdown: CanSend.: file Z:/task_165787977686434/build/src/dom/ipc/ContentParent.cpp:3587
[task 2022-07-15T11:50:05.515Z] 11:50:05     INFO - PID 7844 | [Parent 8636, Main Thread] WARNING: ContentParent: id=1a248999200 - ShutDownProcess: Sent shutdown message.: file Z:/task_165787977686434/build/src/dom/ipc/ContentParent.cpp:1791
[task 2022-07-15T11:50:05.544Z] 11:50:05     INFO - PID 7844 | [Parent 8636, Main Thread] WARNING: ContentParent: id=1a23fa9cc00 - ShutDownProcess: Closing channel.: file Z:/task_165787977686434/build/src/dom/ipc/ContentParent.cpp:1826
[task 2022-07-15T11:50:05.544Z] 11:50:05     INFO - PID 7844 | [Parent 8636, Main Thread] WARNING: ContentParent: id=1a23fa9cc00 - RemoveShutdownBlockers: file Z:/task_165787977686434/build/src/dom/ipc/ContentParent.cpp:3689
[task 2022-07-15T11:50:05.544Z] 11:50:05     INFO - PID 7844 | [Parent 8636, Main Thread] WARNING: ContentParent: id=1a248999200 - ShutDownProcess: Closing channel.: file Z:/task_165787977686434/build/src/dom/ipc/ContentParent.cpp:1826
[task 2022-07-15T11:50:05.544Z] 11:50:05     INFO - PID 7844 | [Parent 8636, Main Thread] WARNING: ContentParent: id=1a248999200 - RemoveShutdownBlockers: file Z:/task_165787977686434/build/src/dom/ipc/ContentParent.cpp:3689
[task 2022-07-15T11:50:05.544Z] 11:50:05     INFO - PID 7844 | [Parent 8636, Main Thread] WARNING: ContentParent: id=1a24899a100 - ShutDownProcess: Closing channel.: file Z:/task_165787977686434/build/src/dom/ipc/ContentParent.cpp:1826
[task 2022-07-15T11:50:05.544Z] 11:50:05     INFO - PID 7844 | [Parent 8636, Main Thread] WARNING: ContentParent: id=1a24899a100 - RemoveShutdownBlockers: file Z:/task_165787977686434/build/src/dom/ipc/ContentParent.cpp:3689
[task 2022-07-15T11:50:05.544Z] 11:50:05     INFO - PID 7844 | [Parent 8636, Main Thread] WARNING: ContentParent: id=1a248997900 - ShutDownProcess: Closing channel.: file Z:/task_165787977686434/build/src/dom/ipc/ContentParent.cpp:1826
[task 2022-07-15T11:50:05.544Z] 11:50:05     INFO - PID 7844 | [Parent 8636, Main Thread] WARNING: ContentParent: id=1a248997900 - RemoveShutdownBlockers: file Z:/task_165787977686434/build/src/dom/ipc/ContentParent.cpp:3689
[task 2022-07-15T11:50:05.843Z] 11:50:05     INFO - PID 7844 | JavaScript error: resource://gre/modules/TerminatorTelemetry.jsm, line 60: AbortError: IOUtils: Shutting down and refusing additional I/O tasks
[task 2022-07-15T11:50:07.430Z] 11:50:07     INFO - PID 7844 | Waiting in WillDestroyCurrentMessageLoop for pid 5848
[task 2022-07-15T11:50:57.275Z] 11:50:57     INFO - TEST-UNEXPECTED-TIMEOUT | /webdriver/tests/get_element_shadow_root/get.py | expected OK
[task 2022-07-15T11:50:57.278Z] 11:50:57     INFO - TEST-INFO took 55010ms
[task 2022-07-15T11:50:57.338Z] 11:50:57     INFO - Closing logging queue
[task 2022-07-15T11:50:57.346Z] 11:50:57     INFO - queue closed
[task 2022-07-15T11:50:57.859Z] 11:50:57     INFO - PID 6396 | 1657885857358	geckodriver	INFO	Listening on 127.0.0.1:49876
[task 2022-07-15T11:50:57.861Z] 11:50:57     INFO - Starting runner
[task 2022-07-15T11:50:58.657Z] 11:50:58     INFO - TEST-START | /webdriver/tests/get_element_shadow_root/user_prompts.py

There is a hang during shutdown of Firefox:
https://treeherder.mozilla.org/logviewer?job_id=384469789&repo=mozilla-central&lineNumber=73844-73846

[task 2022-07-15T11:50:05.843Z] 11:50:05     INFO - PID 7844 | JavaScript error: resource://gre/modules/TerminatorTelemetry.jsm, line 60: AbortError: IOUtils: Shutting down and refusing additional I/O tasks
[task 2022-07-15T11:50:07.430Z] 11:50:07     INFO - PID 7844 | Waiting in WillDestroyCurrentMessageLoop for pid 5848
[task 2022-07-15T11:50:57.275Z] 11:50:57     INFO - TEST-UNEXPECTED-TIMEOUT | /webdriver/tests/get_element_shadow_root/get.py | expected OK

Chris could this be Telemetry related or a side-effect from a CCOV build?

Flags: needinfo?(chutten)
Summary: Intermittent [tier 2] TEST-UNEXPECTED-TIMEOUT | /webdriver/tests/get_element_shadow_root/get.py | expected OK → Intermittent [tier 2] CCOV TEST-UNEXPECTED-TIMEOUT | /webdriver/tests/get_element_shadow_root/get.py | expected OK

We have this existing in non-CCOV builds (bug 1779493) and may be a side-effect of how Telemetry needs to write to disk later than IOUtils (currently) would like us to (bug 1769152).

We may need to revert bug 1749620 to return TerminatorTelemetry back to OS.file usage to stop these aborterrors from happening.

ni?barret for IOUtils migration planning: How's bug 1769152 looking wrt timeline? Is it possible or worthwhile to go back to using osfile for TerminatorTelemetry given the intermittents we might be seeing as a result?

Flags: needinfo?(chutten) → needinfo?(brennie)

I have a patch that half works locally. I can prioritize working on it at the beginning of the Fx105 cycle, so maybe a backout for now would be goodl

Flags: needinfo?(brennie) → needinfo?(chutten)

I'm working on something else at the moment, but I might be able to get to this revert soonish.

Assignee: nobody → chutten
Status: NEW → ASSIGNED
Flags: needinfo?(chutten)
Priority: P5 → P1

This reverts commit 3e67ed2191aeb4844f64a83e94160b5860e7ebf4.

Pushed by chutten@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/313771167333
Put TerminatorTelemetry back on osfile.jsm for late writes r=barret

Backed out changeset 313771167333 (bug 1779759) for causing browser-chrome failures in browser/base/content/test/performance/browser_startup.js

Backout link: https://hg.mozilla.org/integration/autoland/rev/c7cd33940024f3502af1e33ed365262a2fc678eb

Push with failures

Failure log

TEST-UNEXPECTED-FAIL | browser/base/content/test/performance/browser_startup.js | resource://gre/modules/osfile.jsm is not allowed before becoming idle -
Flags: needinfo?(chutten)

Dagnabbit. Okay Barret, looks like the simple backout isn't gonna work out so well. I leave the fix in your hands.

Flags: needinfo?(chutten)
Depends on: 1769152
Attachment #9286489 - Attachment is obsolete: true
Assignee: chutten → nobody
Status: ASSIGNED → NEW
Regressed by: 1749620

:mathew.hodson, since you are the author of the regressor, bug 1749620, could you take a look?
For more information, please visit auto_nag documentation.

Flags: needinfo?(mathew.hodson)

Looks like :barret will fix this in bug 1769152.

Flags: needinfo?(mathew.hodson)
Assignee: nobody → brennie
Pushed by brennie@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/44b5ce9432c9
Block shutdown for terminator telemetry r=chutten

Backed out for causing build bustages on IOUtils.cpp

  • Backout link
  • Push with failures
  • Failure Log
  • Failure line: package> Assertion failure: aPhase >= IOUtils::ShutdownPhase::ProfileBeforeChange && aPhase < IOUtils::ShutdownPhase::XpcomWillShutdown, at /builds/worker/checkouts/gecko/dom/system/IOUtils.cpp:2016
Flags: needinfo?(brennie)
Pushed by brennie@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/84d85b2c2c57
Block shutdown for terminator telemetry r=chutten

Backed out for causing build bustages on IOUtils.cpp

Backout link: https://hg.mozilla.org/integration/autoland/rev/b6d7397f08835225f847ac613f38c95322e7214e

Push with failures

Failure log build bustage

Failure log crashtest

WARNING: At least one completion condition is taking too long to complete. Conditions: [{"name":"IOUtils Blocker (xpcom-will-shutdown)","state":"(none)","filename":"/builds/worker/checkouts/gecko/dom/system/IOUtils.cpp","lineNumber":2194,"stack":"IOUtils::EventQueue::SetShutdownHooks"}] Barrier: xpcom-will-shutdown
[task 2022-08-09T18:07:51.247Z] 18:07:51     INFO -  package> FATAL ERROR: AsyncShutdown timeout in xpcom-will-shutdown Conditions: [{"name":"IOUtils Blocker (xpcom-will-shutdown)","state":"(none)","filename":"/builds/worker/checkouts/gecko/dom/system/IOUtils.cpp","lineNumber":2194,"stack":"IOUtils::EventQueue::SetShutdownHooks"}] At least one completion condition failed to complete within a reasonable amount of time. Causing a crash to ensure that we do not leave the user with an unresponsive process draining resources.
[task 2022-08-09T18:07:51.247Z] 18:07:51     INFO -  package> WARNING: No crash reporter available
[task 2022-08-09T18:07:51.247Z] 18:07:51     INFO -  package> [Parent 29518, Main Thread] ###!!! ABORT: file /builds/worker/checkouts/gecko/dom/system/IOUtils.cpp:2194

Updating the bug's summary for a better test failure classification experience given that this could happen basically for each and every of our wdspec tests.

Status: NEW → ASSIGNED
Component: geckodriver → Telemetry
Product: Testing → Toolkit
Summary: Intermittent [tier 2] CCOV TEST-UNEXPECTED-TIMEOUT | /webdriver/tests/get_element_shadow_root/get.py | expected OK → Intermittent [tier 2] Wd CCOV TEST-UNEXPECTED-TIMEOUT | expected OK (telemetry shutdown)

This specific shutdown hang causes quite some test failures for wdspec tests for Windows ccov builds (bug 1783217).

Barret do you think that we can get this fixed sometime soon?

Blocks: 1783217
Summary: Intermittent [tier 2] Wd CCOV TEST-UNEXPECTED-TIMEOUT | expected OK (telemetry shutdown) → Intermittent [tier 2] Wd CCOV TEST-UNEXPECTED-TIMEOUT | expected OK (JavaScript error: resource://gre/modules/TerminatorTelemetry.jsm, line 60: AbortError: IOUtils: Shutting down and refusing additional I/O tasks)

Just triggered re-landing.

Flags: needinfo?(brennie)
Pushed by brennie@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/adc608dc4add
Block shutdown for terminator telemetry r=chutten
Pushed by brennie@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/18b95d8fd376
Block shutdown for terminator telemetry r=chutten

I think this updated patch should fix the remaining failures.

Flags: needinfo?(brennie)

This patch would have worked if nsTerminatorTelemetry used modern JS, but because it uses old ES5 style callbacks, it breaks here:

nsTerminatorTelemetry.prototype = {
  // ...
  observe: function(/* ... */) {
    // ...
   (async function() {
      // ...
      this._deferred.resolve(); // this is not defined because none of this functions bind `this`.
    })();
  },
}
Flags: needinfo?(brennie)
Pushed by brennie@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/3bb45c9d60de
Block shutdown for terminator telemetry r=chutten
Status: ASSIGNED → RESOLVED
Closed: 2 months ago
Resolution: --- → FIXED
Target Milestone: --- → 105 Branch

Barret do you think that this patch should be uplifted to 104. Or is it ok to keep it on central only given that it only seems to affect CCOV builds? But there might maybe other factors as well to trigger that behavior and seeing release users being affected sounds unfortunate.

Flags: needinfo?(brennie)

This really shouldn't be affecting actual users. The issue only manifests when shutdown occurs part-way through browser startup, before we reach a stable state, which really only happens in tests.

Flags: needinfo?(brennie)
Duplicate of this bug: 1758355
You need to log in before you can comment on or make changes to this bug.