Closed Bug 1827894 Opened 2 years ago Closed 2 years ago

Intermittent Assertion failure: false (MOZ_ASSERT_UNREACHABLE: PUtilityAudioDecoder: failure when starting actor), at /builds/worker/checkouts/gecko/ipc/glue/UtilityProcessManager.cpp:365

Categories

(Core :: IPC, defect, P5)

defect

Tracking

()

RESOLVED FIXED
115 Branch
Tracking Status
firefox-esr102 --- unaffected
firefox112 --- unaffected
firefox113 --- unaffected
firefox114 --- wontfix
firefox115 --- fixed

People

(Reporter: intermittent-bug-filer, Assigned: gerard-majax)

References

(Regression)

Details

(Keywords: assertion, intermittent-failure, regression, Whiteboard: [retriggered])

Filed by: nerli [at] mozilla.com
Parsed log: https://treeherder.mozilla.org/logviewer?job_id=412293911&repo=autoland
Full log: https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/Tl0bfDdsTM64awrJAHJR8g/runs/0/artifacts/public/logs/live_backing.log


[task 2023-04-13T12:07:17.519Z] 12:07:17     INFO - TEST-START | dom/canvas/test/webgl-conf/generated/test_2_conformance__textures__image_bitmap_from_video__tex-2d-alpha-alpha-unsigned_byte.html
[task 2023-04-13T12:07:17.613Z] 12:07:17     INFO - GECKO(1462) | [Parent 1462, Main Thread] WARNING: NS_ENSURE_TRUE(she && she->mInfo->mSharedState.Get()) failed: file /builds/worker/checkouts/gecko/docshell/shistory/SessionHistoryEntry.cpp:1155
[task 2023-04-13T12:07:23.156Z] 12:07:23     INFO - GECKO(1462) | [Parent 1462, Main Thread] WARNING: Reject LaunchProcess() for LaunchPromise() rejection: file /builds/worker/checkouts/gecko/ipc/glue/UtilityProcessManager.cpp:222
[task 2023-04-13T12:07:23.157Z] 12:07:23     INFO - GECKO(1462) | [Parent 1462, Main Thread] WARNING: Reject StartUtility() for LaunchProcess() rejection: file /builds/worker/checkouts/gecko/ipc/glue/UtilityProcessManager.cpp:284
[task 2023-04-13T12:07:23.157Z] 12:07:23     INFO - GECKO(1462) | [Parent 1462, Main Thread] WARNING: Reject StartUtility() when !IsShutdown(): file /builds/worker/checkouts/gecko/ipc/glue/UtilityProcessManager.cpp:286
[task 2023-04-13T12:07:23.157Z] 12:07:23     INFO - GECKO(1462) | Assertion failure: false (MOZ_ASSERT_UNREACHABLE: PUtilityAudioDecoder: failure when starting actor), at /builds/worker/checkouts/gecko/ipc/glue/UtilityProcessManager.cpp:365
[task 2023-04-13T12:07:23.226Z] 12:07:23     INFO -  Initializing stack-fixing for the first stack frame, this may take a while...
[task 2023-04-13T12:07:42.208Z] 12:07:42     INFO - GECKO(1462) | #01: mozilla::MozPromise<bool, nsresult, false>::ThenValue<mozilla::ipc::UtilityProcessManager::StartProcessForRemoteMediaDecoding(int, mozilla::ipc::SandboxingKind)::$_0, mozilla::ipc::UtilityProcessManager::StartProcessForRemoteMediaDecoding(int, mozilla::ipc::SandboxingKind)::$_1>::DoResolveOrRejectInternal(mozilla::MozPromise<bool, nsresult, false>::ResolveOrRejectValue&) [xpcom/threads/MozPromise.h:852]
[task 2023-04-13T12:07:42.208Z] 12:07:42     INFO - GECKO(1462) | #02: mozilla::MozPromise<bool, nsresult, false>::ThenValueBase::ResolveOrRejectRunnable::Run() [xpcom/threads/MozPromise.h:490]
[task 2023-04-13T12:07:42.208Z] 12:07:42     INFO - GECKO(1462) | #03: mozilla::RunnableTask::Run() [xpcom/threads/TaskController.cpp:554]
[task 2023-04-13T12:07:42.209Z] 12:07:42     INFO - GECKO(1462) | #04: mozilla::TaskController::DoExecuteNextTaskOnlyMainThreadInternal(mozilla::detail::BaseAutoLock<mozilla::Mutex&> const&) [xpcom/threads/TaskController.cpp:869]
[task 2023-04-13T12:07:42.209Z] 12:07:42     INFO - GECKO(1462) | #05: mozilla::TaskController::ExecuteNextTaskOnlyMainThreadInternal(mozilla::detail::BaseAutoLock<mozilla::Mutex&> const&) [xpcom/threads/TaskController.cpp:0]
[task 2023-04-13T12:07:42.209Z] 12:07:42     INFO - GECKO(1462) | #06: mozilla::TaskController::ProcessPendingMTTask(bool) [xpcom/threads/TaskController.cpp:464]
[task 2023-04-13T12:07:42.210Z] 12:07:42     INFO - GECKO(1462) | #07: mozilla::detail::RunnableFunction<mozilla::TaskController::TaskController()::$_0>::Run() [xpcom/threads/nsThreadUtils.h:549]
[task 2023-04-13T12:07:42.210Z] 12:07:42     INFO - GECKO(1462) | #08: nsThread::ProcessNextEvent(bool, bool*) [xpcom/threads/nsThread.cpp:1243]
[task 2023-04-13T12:07:42.210Z] 12:07:42     INFO - GECKO(1462) | #09: NS_ProcessPendingEvents(nsIThread*, unsigned int) [xpcom/threads/nsThreadUtils.cpp:444]
[task 2023-04-13T12:07:42.211Z] 12:07:42     INFO - GECKO(1462) | #10: nsBaseAppShell::NativeEventCallback() [widget/nsBaseAppShell.cpp:88]
[task 2023-04-13T12:07:42.211Z] 12:07:42     INFO - GECKO(1462) | #11: nsAppShell::ProcessGeckoEvents(void*) [widget/cocoa/nsAppShell.mm:512]
[task 2023-04-13T12:07:42.455Z] 12:07:42     INFO - GECKO(1462) | #12: __CFRUNLOOP_IS_CALLING_OUT_TO_A_SOURCE0_PERFORM_FUNCTION__ [/System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation + 0x83d52]
[task 2023-04-13T12:07:42.455Z] 12:07:42     INFO - GECKO(1462) | #13: __CFRunLoopDoSource0 [/System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation + 0x83cf1]
[task 2023-04-13T12:07:42.456Z] 12:07:42     INFO - GECKO(1462) | #14: __CFRunLoopDoSources0 [/System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation + 0x83b0b]
[task 2023-04-13T12:07:42.456Z] 12:07:42     INFO - GECKO(1462) | #15: __CFRunLoopRun [/System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation + 0x8283a]
[task 2023-04-13T12:07:42.456Z] 12:07:42     INFO - GECKO(1462) | #16: CFRunLoopRunSpecific [/System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation + 0x81e3e]
[task 2023-04-13T12:07:42.679Z] 12:07:42     INFO - GECKO(1462) | #17: RunCurrentEventLoopInMode [/System/Library/Frameworks/Carbon.framework/Versions/A/Frameworks/HIToolbox.framework/Versions/A/HIToolbox + 0x2fabd]
[task 2023-04-13T12:07:42.680Z] 12:07:42     INFO - GECKO(1462) | #18: ReceiveNextEventCommon [/System/Library/Frameworks/Carbon.framework/Versions/A/Frameworks/HIToolbox.framework/Versions/A/HIToolbox + 0x2f7d5]
[task 2023-04-13T12:07:42.680Z] 12:07:42     INFO - GECKO(1462) | #19: _BlockUntilNextEventMatchingListInModeWithFilter [/System/Library/Frameworks/Carbon.framework/Versions/A/Frameworks/HIToolbox.framework/Versions/A/HIToolbox + 0x2f579]
[task 2023-04-13T12:07:42.974Z] 12:07:42     INFO - GECKO(1462) | #20: _DPSNextEvent [/System/Library/Frameworks/AppKit.framework/Versions/C/AppKit + 0x41039]
[task 2023-04-13T12:07:42.975Z] 12:07:42     INFO - GECKO(1462) | #21: -[NSApplication(NSEvent) _nextEventMatchingEventMask:untilDate:inMode:dequeue:] [/System/Library/Frameworks/AppKit.framework/Versions/C/AppKit + 0x3f880]
[task 2023-04-13T12:07:42.975Z] 12:07:42     INFO - GECKO(1462) | #22: -[GeckoNSApplication nextEventMatchingMask:untilDate:inMode:dequeue:] [widget/cocoa/nsAppShell.mm:175]
[task 2023-04-13T12:07:42.976Z] 12:07:42     INFO - GECKO(1462) | #23: -[NSApplication run] [/System/Library/Frameworks/AppKit.framework/Versions/C/AppKit + 0x3158e]
[task 2023-04-13T12:07:42.976Z] 12:07:42     INFO - GECKO(1462) | #24: nsAppShell::Run() [widget/cocoa/nsAppShell.mm:0]
[task 2023-04-13T12:07:42.976Z] 12:07:42     INFO - GECKO(1462) | #25: nsAppStartup::Run() [toolkit/components/startup/nsAppStartup.cpp:296]
[task 2023-04-13T12:07:42.976Z] 12:07:42     INFO - GECKO(1462) | #26: XREMain::XRE_mainRun() [toolkit/xre/nsAppRunner.cpp:5683]
[task 2023-04-13T12:07:42.977Z] 12:07:42     INFO - GECKO(1462) | #27: XREMain::XRE_main(int, char**, mozilla::BootstrapConfig const&) [toolkit/xre/nsAppRunner.cpp:5883]
[task 2023-04-13T12:07:42.977Z] 12:07:42     INFO - GECKO(1462) | #28: XRE_main(int, char**, mozilla::BootstrapConfig const&) [toolkit/xre/nsAppRunner.cpp:5939]
[task 2023-04-13T12:07:42.977Z] 12:07:42     INFO - GECKO(1462) | #29: main [browser/app/nsBrowserApp.cpp:426]
[task 2023-04-13T12:07:42.978Z] 12:07:42     INFO - GECKO(1462) | [Child 1476, Unnamed thread 11882eda0] WARNING: Not registering callback thread due to refcount logging; it may show up as a leak of the TLS-backed nsThread wrapper if the thread outlives xpcom shutdown.: file /builds/worker/checkouts/gecko/dom/media/CallbackThreadRegistry.cpp:57

Set release status flags based on info from the regressing bug 1819311

This one is expected, but I was hoping we'd get some of the logging we added, and I dont see it :(

Flags: needinfo?(lissyx+mozillians)
Assignee: nobody → lissyx+mozillians
Component: Graphics: CanvasWebGL → IPC
See Also: → 1828427
Blocks: 1828460
Depends on: 1829108

Checking the behavior on dom/media/webaudio/test/test_decodeAudioDataPromise.html from https://treeherder.mozilla.org/logviewer?job_id=412956082&repo=autoland&lineNumber=10874

Utility Process 2022 complains about the loosing IPC https://treeherder.mozilla.org/logviewer?job_id=412956082&repo=autoland&lineNumber=10924. It was started before this test: https://treeherder.mozilla.org/logviewer?job_id=412956082&repo=autoland&lineNumber=10635-10636

Utility Process 2037 complains about loosing IPC https://treeherder.mozilla.org/logviewer?job_id=412956082&repo=autoland&lineNumber=10890. It was started during the test https://treeherder.mozilla.org/logviewer?job_id=412956082&repo=autoland&lineNumber=10874

The test loads some ogg file that contains Vorbis: https://searchfox.org/mozilla-central/rev/a4fb94aab371d8650d194558c77879f5b8842380/dom/media/webaudio/test/test_decodeAudioDataPromise.html#42

So we should be able to run UtilityAudioDecoder actor on the pre-existing Utility Process 2022 from above. We might have those two processes on macOS (and Windows) because those are the platforms where we need a different sandbox for audio decoders (one generic, one platform-specific).

FTR: When using the reproducer patch at https://bugzilla.mozilla.org/attachment.cgi?id=9329967 and disabling the MOZ_ASSERT, then audio decoding would fallback to Content process:

ipc/glue/test/browser/browser_utility_multipleAudio.js
  FAIL playback https://example.com/browser/ipc/glue/test/browser/small-shot.ogg was from process 'vorbis audio decoder', expected 'Utility Generic' - false == true - got false, expected true (operator ==)

From the following try, we should be mostly safe now in assuming:

  • this is just a legit timeout,
  • probably our 5s limit is too low,
  • we could remove this assert (and likely others),

Speficially,
https://treeherder.mozilla.org/jobs?repo=try&revision=9c12d0296b5a5d5c7d43465a4897667a0baed51b
https://treeherder.mozilla.org/jobs?repo=try&revision=9d43a5057f691307d0ae30f5ed70c7082c27623d
We can see increasing timeout alleviates the issue

From
https://treeherder.mozilla.org/jobs?repo=try&revision=dc60f6b3e02a0e9eca4c55192a71c7acc9a76d32
https://treeherder.mozilla.org/jobs?repo=try&revision=5072b1a20663954fa7049cf2ec0a774efae5b4d9
We can see that not asserting we complete the tests correctly. Given we disable ability to perform audio decoding on Content process on those, we should hit a test timeout in case no utility process can start. Out of the few error reported (one leak?) we can see the timeout being hit a few, and it takes 13s from test start to a successfull Utility process child connecting to the parent, but ultimately it does.

Deeper debugging shows https://treeherder.mozilla.org/jobs?repo=try&revision=169c2bbf2740c38cb55fa3d4c78e1e08241b2eda&selectedTaskRun=Dpam2MJUSdCY8awmyGBVmw.0:

According to logging, there's quite some delay between both, 10s, so it would look like the system is constrained and it really fail/is slow to run a new process?

Bug 1829108 took care of removing this extraenous assert, so this will not happen anymore.

Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → 115 Branch
You need to log in before you can comment on or make changes to this bug.