Intermittent Assertion failure: false (MOZ_ASSERT_UNREACHABLE: PUtilityAudioDecoder: failure when starting actor), at /builds/worker/checkouts/gecko/ipc/glue/UtilityProcessManager.cpp:365
Categories
(Core :: IPC, defect, P5)
Tracking
()
Tracking | Status | |
---|---|---|
firefox-esr102 | --- | unaffected |
firefox112 | --- | unaffected |
firefox113 | --- | unaffected |
firefox114 | --- | wontfix |
firefox115 | --- | fixed |
People
(Reporter: intermittent-bug-filer, Assigned: gerard-majax)
References
(Regression)
Details
(Keywords: assertion, intermittent-failure, regression, Whiteboard: [retriggered])
Filed by: nerli [at] mozilla.com
Parsed log: https://treeherder.mozilla.org/logviewer?job_id=412293911&repo=autoland
Full log: https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/Tl0bfDdsTM64awrJAHJR8g/runs/0/artifacts/public/logs/live_backing.log
[task 2023-04-13T12:07:17.519Z] 12:07:17 INFO - TEST-START | dom/canvas/test/webgl-conf/generated/test_2_conformance__textures__image_bitmap_from_video__tex-2d-alpha-alpha-unsigned_byte.html
[task 2023-04-13T12:07:17.613Z] 12:07:17 INFO - GECKO(1462) | [Parent 1462, Main Thread] WARNING: NS_ENSURE_TRUE(she && she->mInfo->mSharedState.Get()) failed: file /builds/worker/checkouts/gecko/docshell/shistory/SessionHistoryEntry.cpp:1155
[task 2023-04-13T12:07:23.156Z] 12:07:23 INFO - GECKO(1462) | [Parent 1462, Main Thread] WARNING: Reject LaunchProcess() for LaunchPromise() rejection: file /builds/worker/checkouts/gecko/ipc/glue/UtilityProcessManager.cpp:222
[task 2023-04-13T12:07:23.157Z] 12:07:23 INFO - GECKO(1462) | [Parent 1462, Main Thread] WARNING: Reject StartUtility() for LaunchProcess() rejection: file /builds/worker/checkouts/gecko/ipc/glue/UtilityProcessManager.cpp:284
[task 2023-04-13T12:07:23.157Z] 12:07:23 INFO - GECKO(1462) | [Parent 1462, Main Thread] WARNING: Reject StartUtility() when !IsShutdown(): file /builds/worker/checkouts/gecko/ipc/glue/UtilityProcessManager.cpp:286
[task 2023-04-13T12:07:23.157Z] 12:07:23 INFO - GECKO(1462) | Assertion failure: false (MOZ_ASSERT_UNREACHABLE: PUtilityAudioDecoder: failure when starting actor), at /builds/worker/checkouts/gecko/ipc/glue/UtilityProcessManager.cpp:365
[task 2023-04-13T12:07:23.226Z] 12:07:23 INFO - Initializing stack-fixing for the first stack frame, this may take a while...
[task 2023-04-13T12:07:42.208Z] 12:07:42 INFO - GECKO(1462) | #01: mozilla::MozPromise<bool, nsresult, false>::ThenValue<mozilla::ipc::UtilityProcessManager::StartProcessForRemoteMediaDecoding(int, mozilla::ipc::SandboxingKind)::$_0, mozilla::ipc::UtilityProcessManager::StartProcessForRemoteMediaDecoding(int, mozilla::ipc::SandboxingKind)::$_1>::DoResolveOrRejectInternal(mozilla::MozPromise<bool, nsresult, false>::ResolveOrRejectValue&) [xpcom/threads/MozPromise.h:852]
[task 2023-04-13T12:07:42.208Z] 12:07:42 INFO - GECKO(1462) | #02: mozilla::MozPromise<bool, nsresult, false>::ThenValueBase::ResolveOrRejectRunnable::Run() [xpcom/threads/MozPromise.h:490]
[task 2023-04-13T12:07:42.208Z] 12:07:42 INFO - GECKO(1462) | #03: mozilla::RunnableTask::Run() [xpcom/threads/TaskController.cpp:554]
[task 2023-04-13T12:07:42.209Z] 12:07:42 INFO - GECKO(1462) | #04: mozilla::TaskController::DoExecuteNextTaskOnlyMainThreadInternal(mozilla::detail::BaseAutoLock<mozilla::Mutex&> const&) [xpcom/threads/TaskController.cpp:869]
[task 2023-04-13T12:07:42.209Z] 12:07:42 INFO - GECKO(1462) | #05: mozilla::TaskController::ExecuteNextTaskOnlyMainThreadInternal(mozilla::detail::BaseAutoLock<mozilla::Mutex&> const&) [xpcom/threads/TaskController.cpp:0]
[task 2023-04-13T12:07:42.209Z] 12:07:42 INFO - GECKO(1462) | #06: mozilla::TaskController::ProcessPendingMTTask(bool) [xpcom/threads/TaskController.cpp:464]
[task 2023-04-13T12:07:42.210Z] 12:07:42 INFO - GECKO(1462) | #07: mozilla::detail::RunnableFunction<mozilla::TaskController::TaskController()::$_0>::Run() [xpcom/threads/nsThreadUtils.h:549]
[task 2023-04-13T12:07:42.210Z] 12:07:42 INFO - GECKO(1462) | #08: nsThread::ProcessNextEvent(bool, bool*) [xpcom/threads/nsThread.cpp:1243]
[task 2023-04-13T12:07:42.210Z] 12:07:42 INFO - GECKO(1462) | #09: NS_ProcessPendingEvents(nsIThread*, unsigned int) [xpcom/threads/nsThreadUtils.cpp:444]
[task 2023-04-13T12:07:42.211Z] 12:07:42 INFO - GECKO(1462) | #10: nsBaseAppShell::NativeEventCallback() [widget/nsBaseAppShell.cpp:88]
[task 2023-04-13T12:07:42.211Z] 12:07:42 INFO - GECKO(1462) | #11: nsAppShell::ProcessGeckoEvents(void*) [widget/cocoa/nsAppShell.mm:512]
[task 2023-04-13T12:07:42.455Z] 12:07:42 INFO - GECKO(1462) | #12: __CFRUNLOOP_IS_CALLING_OUT_TO_A_SOURCE0_PERFORM_FUNCTION__ [/System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation + 0x83d52]
[task 2023-04-13T12:07:42.455Z] 12:07:42 INFO - GECKO(1462) | #13: __CFRunLoopDoSource0 [/System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation + 0x83cf1]
[task 2023-04-13T12:07:42.456Z] 12:07:42 INFO - GECKO(1462) | #14: __CFRunLoopDoSources0 [/System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation + 0x83b0b]
[task 2023-04-13T12:07:42.456Z] 12:07:42 INFO - GECKO(1462) | #15: __CFRunLoopRun [/System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation + 0x8283a]
[task 2023-04-13T12:07:42.456Z] 12:07:42 INFO - GECKO(1462) | #16: CFRunLoopRunSpecific [/System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation + 0x81e3e]
[task 2023-04-13T12:07:42.679Z] 12:07:42 INFO - GECKO(1462) | #17: RunCurrentEventLoopInMode [/System/Library/Frameworks/Carbon.framework/Versions/A/Frameworks/HIToolbox.framework/Versions/A/HIToolbox + 0x2fabd]
[task 2023-04-13T12:07:42.680Z] 12:07:42 INFO - GECKO(1462) | #18: ReceiveNextEventCommon [/System/Library/Frameworks/Carbon.framework/Versions/A/Frameworks/HIToolbox.framework/Versions/A/HIToolbox + 0x2f7d5]
[task 2023-04-13T12:07:42.680Z] 12:07:42 INFO - GECKO(1462) | #19: _BlockUntilNextEventMatchingListInModeWithFilter [/System/Library/Frameworks/Carbon.framework/Versions/A/Frameworks/HIToolbox.framework/Versions/A/HIToolbox + 0x2f579]
[task 2023-04-13T12:07:42.974Z] 12:07:42 INFO - GECKO(1462) | #20: _DPSNextEvent [/System/Library/Frameworks/AppKit.framework/Versions/C/AppKit + 0x41039]
[task 2023-04-13T12:07:42.975Z] 12:07:42 INFO - GECKO(1462) | #21: -[NSApplication(NSEvent) _nextEventMatchingEventMask:untilDate:inMode:dequeue:] [/System/Library/Frameworks/AppKit.framework/Versions/C/AppKit + 0x3f880]
[task 2023-04-13T12:07:42.975Z] 12:07:42 INFO - GECKO(1462) | #22: -[GeckoNSApplication nextEventMatchingMask:untilDate:inMode:dequeue:] [widget/cocoa/nsAppShell.mm:175]
[task 2023-04-13T12:07:42.976Z] 12:07:42 INFO - GECKO(1462) | #23: -[NSApplication run] [/System/Library/Frameworks/AppKit.framework/Versions/C/AppKit + 0x3158e]
[task 2023-04-13T12:07:42.976Z] 12:07:42 INFO - GECKO(1462) | #24: nsAppShell::Run() [widget/cocoa/nsAppShell.mm:0]
[task 2023-04-13T12:07:42.976Z] 12:07:42 INFO - GECKO(1462) | #25: nsAppStartup::Run() [toolkit/components/startup/nsAppStartup.cpp:296]
[task 2023-04-13T12:07:42.976Z] 12:07:42 INFO - GECKO(1462) | #26: XREMain::XRE_mainRun() [toolkit/xre/nsAppRunner.cpp:5683]
[task 2023-04-13T12:07:42.977Z] 12:07:42 INFO - GECKO(1462) | #27: XREMain::XRE_main(int, char**, mozilla::BootstrapConfig const&) [toolkit/xre/nsAppRunner.cpp:5883]
[task 2023-04-13T12:07:42.977Z] 12:07:42 INFO - GECKO(1462) | #28: XRE_main(int, char**, mozilla::BootstrapConfig const&) [toolkit/xre/nsAppRunner.cpp:5939]
[task 2023-04-13T12:07:42.977Z] 12:07:42 INFO - GECKO(1462) | #29: main [browser/app/nsBrowserApp.cpp:426]
[task 2023-04-13T12:07:42.978Z] 12:07:42 INFO - GECKO(1462) | [Child 1476, Unnamed thread 11882eda0] WARNING: Not registering callback thread due to refcount logging; it may show up as a leak of the TLS-backed nsThread wrapper if the thread outlives xpcom shutdown.: file /builds/worker/checkouts/gecko/dom/media/CallbackThreadRegistry.cpp:57
Comment 1•2 years ago
|
||
R+B range reveals that this particular assertion started with the land of https://bugzilla.mozilla.org/show_bug.cgi?id=1819311
https://treeherder.mozilla.org/jobs?repo=autoland&searchStr=OS%2CX%2C10.15%2CWebRender%2Cdebug%2CMochitests%2Ctest-macosx1015-64-qr%2Fdebug-mochitest-webgl2-ext%2Cgl2e4&tochange=48c5c3273a44220dcb7dfdfb40cff11c2c1df492&fromchange=3ce125387ccbd002ae2c02ecee7c9a243717e712&selectedTaskRun=Wi29yYK7SCuKiqdPvAMSDw.0
Alexandre, could you please take a look at this?
Thanks
Comment 2•2 years ago
|
||
Set release status flags based on info from the regressing bug 1819311
Assignee | ||
Comment 3•2 years ago
|
||
This one is expected, but I was hoping we'd get some of the logging we added, and I dont see it :(
Assignee | ||
Updated•2 years ago
|
Comment hidden (Intermittent Failures Robot) |
Assignee | ||
Updated•2 years ago
|
Assignee | ||
Comment 5•2 years ago
|
||
Ok, so bug 1828460 is landing and yielding us results already: https://treeherder.mozilla.org/logviewer?job_id=412956082&repo=autoland&lineNumber=10840 https://treeherder.mozilla.org/logviewer?job_id=412956062&repo=autoland&lineNumber=5388
In both case we come from LaunchTimeout
, i.e., InitAfterConnect(false)
at https://searchfox.org/mozilla-central/rev/31f5847a4494b3646edabbdd7ea39cb88509afe2/ipc/glue/UtilityProcessHost.cpp#110-120
Assignee | ||
Comment 6•2 years ago
•
|
||
Checking the behavior on dom/media/webaudio/test/test_decodeAudioDataPromise.html
from https://treeherder.mozilla.org/logviewer?job_id=412956082&repo=autoland&lineNumber=10874
Utility Process 2022 complains about the loosing IPC https://treeherder.mozilla.org/logviewer?job_id=412956082&repo=autoland&lineNumber=10924. It was started before this test: https://treeherder.mozilla.org/logviewer?job_id=412956082&repo=autoland&lineNumber=10635-10636
Utility Process 2037 complains about loosing IPC https://treeherder.mozilla.org/logviewer?job_id=412956082&repo=autoland&lineNumber=10890. It was started during the test https://treeherder.mozilla.org/logviewer?job_id=412956082&repo=autoland&lineNumber=10874
The test loads some ogg file that contains Vorbis: https://searchfox.org/mozilla-central/rev/a4fb94aab371d8650d194558c77879f5b8842380/dom/media/webaudio/test/test_decodeAudioDataPromise.html#42
So we should be able to run UtilityAudioDecoder
actor on the pre-existing Utility Process 2022
from above. We might have those two processes on macOS (and Windows) because those are the platforms where we need a different sandbox for audio decoders (one generic, one platform-specific).
Comment hidden (Intermittent Failures Robot) |
Assignee | ||
Comment 8•2 years ago
|
||
FTR: When using the reproducer patch at https://bugzilla.mozilla.org/attachment.cgi?id=9329967 and disabling the MOZ_ASSERT
, then audio decoding would fallback to Content process:
ipc/glue/test/browser/browser_utility_multipleAudio.js
FAIL playback https://example.com/browser/ipc/glue/test/browser/small-shot.ogg was from process 'vorbis audio decoder', expected 'Utility Generic' - false == true - got false, expected true (operator ==)
Assignee | ||
Comment 9•2 years ago
|
||
From the following try, we should be mostly safe now in assuming:
- this is just a legit timeout,
- probably our 5s limit is too low,
- we could remove this assert (and likely others),
Speficially,
https://treeherder.mozilla.org/jobs?repo=try&revision=9c12d0296b5a5d5c7d43465a4897667a0baed51b
https://treeherder.mozilla.org/jobs?repo=try&revision=9d43a5057f691307d0ae30f5ed70c7082c27623d
We can see increasing timeout alleviates the issue
From
https://treeherder.mozilla.org/jobs?repo=try&revision=dc60f6b3e02a0e9eca4c55192a71c7acc9a76d32
https://treeherder.mozilla.org/jobs?repo=try&revision=5072b1a20663954fa7049cf2ec0a774efae5b4d9
We can see that not asserting we complete the tests correctly. Given we disable ability to perform audio decoding on Content process on those, we should hit a test timeout in case no utility process can start. Out of the few error reported (one leak?) we can see the timeout being hit a few, and it takes 13s from test start to a successfull Utility process child connecting to the parent, but ultimately it does.
Comment hidden (Intermittent Failures Robot) |
Assignee | ||
Comment 11•2 years ago
|
||
Deeper debugging shows https://treeherder.mozilla.org/jobs?repo=try&revision=169c2bbf2740c38cb55fa3d4c78e1e08241b2eda&selectedTaskRun=Dpam2MJUSdCY8awmyGBVmw.0:
- PID 2273 created via
posix_spawnp
at https://treeherder.mozilla.org/logviewer?job_id=414728579&repo=try&lineNumber=4696 - PID 2273 saying hello from its
main()
at https://treeherder.mozilla.org/logviewer?job_id=414728579&repo=try&lineNumber=4772
According to logging, there's quite some delay between both, 10s, so it would look like the system is constrained and it really fail/is slow to run a new process?
Assignee | ||
Comment 12•2 years ago
|
||
posix_spawnp
creates1633
at https://treeherder.mozilla.org/logviewer?job_id=414767230&repo=try&lineNumber=5690- 6 seconds later
main()
of1633
is up https://treeherder.mozilla.org/logviewer?job_id=414767230&repo=try&lineNumber=5773
Comment hidden (Intermittent Failures Robot) |
Assignee | ||
Comment 14•2 years ago
|
||
Bug 1829108 took care of removing this extraenous assert, so this will not happen anymore.
Updated•2 years ago
|
Comment hidden (Intermittent Failures Robot) |
Description
•