Open Bug 1805736 Opened 1 year ago Updated 6 days ago

Intermittent wmfme Main app process exited normally | application crashed [@ DbgBreakPoint + 0x0]

Categories

(Core :: Audio/Video: Playback, defect, P3)

defect

Tracking

()

Performance Impact low

People

(Reporter: intermittent-bug-filer, Assigned: alwu)

References

Details

(Keywords: crash, intermittent-failure, leave-open)

Crash Data

Attachments

(2 files)

Filed by: ncsoregi [at] mozilla.com
Parsed log: https://treeherder.mozilla.org/logviewer?job_id=399638346&repo=mozilla-central
Full log: https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/cJsxCtzKSJCAbopN3QkL0w/runs/0/artifacts/public/logs/live_backing.log


[task 2022-12-14T16:48:37.460Z] 16:48:37     INFO - PROCESS-CRASH | Main app process exited normally | application crashed [@ DbgBreakPoint + 0x0]
[task 2022-12-14T16:48:37.465Z] 16:48:37     INFO - Crash dump filename: C:\Users\task_167102879040983\AppData\Local\Temp\tmposjffr9l.mozrunner\minidumps\08ec712e-950e-4dab-ad3c-9a7c3423f734.dmp
[task 2022-12-14T16:48:37.465Z] 16:48:37     INFO - Operating system: Windows NT
[task 2022-12-14T16:48:37.465Z] 16:48:37     INFO -                   10.0.19041
[task 2022-12-14T16:48:37.465Z] 16:48:37     INFO - CPU: amd64
[task 2022-12-14T16:48:37.465Z] 16:48:37     INFO -      family 6 model 106 stepping 6
[task 2022-12-14T16:48:37.465Z] 16:48:37     INFO -      8 CPUs
[task 2022-12-14T16:48:37.465Z] 16:48:37     INFO - 
[task 2022-12-14T16:48:37.465Z] 16:48:37     INFO - Crash reason:  EXCEPTION_BREAKPOINT
[task 2022-12-14T16:48:37.465Z] 16:48:37     INFO - Crash address: 0x7fff82d30860
[task 2022-12-14T16:48:37.465Z] 16:48:37     INFO - Crashing instruction: `int 0x3`
[task 2022-12-14T16:48:37.465Z] 16:48:37     INFO - No memory accessed by instruction
[task 2022-12-14T16:48:37.465Z] 16:48:37     INFO - Process uptime: 34 seconds
[task 2022-12-14T16:48:37.465Z] 16:48:37     INFO - 
[task 2022-12-14T16:48:37.465Z] 16:48:37     INFO - Thread 9  (crashed)
[task 2022-12-14T16:48:37.465Z] 16:48:37     INFO -  0  ntdll.dll!DbgBreakPoint + 0x0
[task 2022-12-14T16:48:37.465Z] 16:48:37     INFO -      rax = 0x00007fff82d30860    rdx = 0x00007fff82d30860
[task 2022-12-14T16:48:37.465Z] 16:48:37     INFO -      rcx = 0x0043504900435049    rbx = 0x00007fff82d30860
[task 2022-12-14T16:48:37.465Z] 16:48:37     INFO -      rsi = 0x0043504900435049    rdi = 0x0000000000000000
[task 2022-12-14T16:48:37.465Z] 16:48:37     INFO -      rbp = 0x0000000000000000    rsp = 0x000000d9c41cf8a8
[task 2022-12-14T16:48:37.466Z] 16:48:37     INFO -       r8 = 0x0043504900435049     r9 = 0x0000000000000000
[task 2022-12-14T16:48:37.466Z] 16:48:37     INFO -      r10 = 0x00000ffff05a610d    r11 = 0x002000406000a008
[task 2022-12-14T16:48:37.466Z] 16:48:37     INFO -      r12 = 0x0000000000000000    r13 = 0x0000000000000000
[task 2022-12-14T16:48:37.466Z] 16:48:37     INFO -      r14 = 0x0000000000000000    r15 = 0x0000000000000000
[task 2022-12-14T16:48:37.466Z] 16:48:37     INFO -      rip = 0x00007fff82d30860
[task 2022-12-14T16:48:37.466Z] 16:48:37     INFO -     Found by: given as instruction pointer in context
[task 2022-12-14T16:48:37.466Z] 16:48:37     INFO -  1  kernel32.dll!BaseThreadInitThunk + 0x13
[task 2022-12-14T16:48:37.466Z] 16:48:37     INFO -      rbx = 0x00007fff82d30860    rbp = 0x0000000000000000
[task 2022-12-14T16:48:37.466Z] 16:48:37     INFO -      rsp = 0x000000d9c41cf8b0    r12 = 0x0000000000000000
[task 2022-12-14T16:48:37.466Z] 16:48:37     INFO -      r13 = 0x0000000000000000    r14 = 0x0000000000000000
[task 2022-12-14T16:48:37.466Z] 16:48:37     INFO -      r15 = 0x0000000000000000    rip = 0x00007fff81427034
[task 2022-12-14T16:48:37.466Z] 16:48:37     INFO -     Found by: call frame info
[task 2022-12-14T16:48:37.466Z] 16:48:37     INFO -  2  mozglue.dll!mozilla::interceptor::FuncHook<mozilla::interceptor::WindowsDllInterceptor<mozilla::interceptor::VMSharingPolicyShared>,void (*)(int, void *, void *)>::operator()(int&, void*&, void*&) const [nsWindowsDllInterceptor.h:7ff758e0d08b4bdf0ce3bd2abe84a1bff3be8ff4 : 150]
[task 2022-12-14T16:48:37.466Z] 16:48:37     INFO -     Found by: inlining
[task 2022-12-14T16:48:37.466Z] 16:48:37     INFO -  3  mozglue.dll!patched_BaseThreadInitThunk(int, void*, void*) [WindowsDllBlocklist.cpp:7ff758e0d08b4bdf0ce3bd2abe84a1bff3be8ff4 : 587 + 0x14]
[task 2022-12-14T16:48:37.466Z] 16:48:37     INFO -      rbx = 0x00007fff82d30860    rbp = 0x0000000000000000
[task 2022-12-14T16:48:37.466Z] 16:48:37     INFO -      rsp = 0x000000d9c41cf8e0    r12 = 0x0000000000000000
[task 2022-12-14T16:48:37.466Z] 16:48:37     INFO -      r13 = 0x0000000000000000    r14 = 0x0000000000000000
[task 2022-12-14T16:48:37.467Z] 16:48:37     INFO -      r15 = 0x0000000000000000    rip = 0x00007fff67a15884
[task 2022-12-14T16:48:37.467Z] 16:48:37     INFO -     Found by: call frame info
[task 2022-12-14T16:48:37.467Z] 16:48:37     INFO -  4  ntdll.dll!RtlUserThreadStart + 0x20
[task 2022-12-14T16:48:37.467Z] 16:48:37     INFO -      rbx = 0x0000000000000000    rsi = 0x0000000000000000
[task 2022-12-14T16:48:37.467Z] 16:48:37     INFO -      rdi = 0x0000000000000000    rbp = 0x0000000000000000
[task 2022-12-14T16:48:37.467Z] 16:48:37     INFO -      rsp = 0x000000d9c41cf950    r12 = 0x0000000000000000
[task 2022-12-14T16:48:37.467Z] 16:48:37     INFO -      r13 = 0x0000000000000000    r14 = 0x0000000000000000
[task 2022-12-14T16:48:37.467Z] 16:48:37     INFO -      r15 = 0x0000000000000000    rip = 0x00007fff82ce2651
[task 2022-12-14T16:48:37.467Z] 16:48:37     INFO -     Found by: call frame info
[task 2022-12-14T16:48:37.467Z] 16:48:37     INFO - 

Looking at the stack, frame 2 of the crashing thread suggests this is caused by something called into via the Windows DLL interceptor code... Hey Greg, I know you've been hacking about in this domain of the codebase... any chance you might know what's going wrong here? And do you think this belongs in the "External Software Affecting Firefox" product?

Flags: needinfo?(gstoll)

Huh, interesting. From the call stack it looks like we go through patched_BaseThreadInitThunk and are calling into the real BaseThreadInitThunk, and crashing shortly thereafter. Looking at the disassembly of BaseThreadInitThunk on my system, it looks like you can hit an int 3 if RtlExitUserThread fails. (or maybe that call isn't supposed to return at all?)

Anyway, I don't see anything here that definitively points to the interceptor code or to third-party code, but there's not a lot to go on here so I could be wrong...

Flags: needinfo?(gstoll)

Gabriele, any way we could gather more info here so we can move this to the right component? It seems unlikely the frontend menu code is directly responsible for this low-level crash so I'd like to get it in front of people with more relevant expertise. :-)

Flags: needinfo?(gsvelto)

A child process was probably hung. When the process watcher code detects this scenario it attempts to kill the child by creating a thread that will call DbgBreak hence why the crash looks like that, it was induced that way.

We have to figure out why the child process is hanging on shutdown, these lines are the last messages from that process. They're warnings but all child processes appear to be spewing them out in this particular instance so I don't think they point to the problem. Looking at what the main thread was doing though suggests it was processing some pending events (see the lines here). It means we were at this stage of shutdown. Looking at the other threads it seems that the process was relatively idle and the crash occurred while the main thread was executing this line. Could it be that we were injecting delays in the pending events execution and waited for too long? I'm not super-familiar with this code so I can't really tell. CC'ing Kris Wright who should know this machinery better than I do.

Flags: needinfo?(gsvelto)

(In reply to Gabriele Svelto [:gsvelto] from comment #5)

A child process was probably hung. When the process watcher code detects this scenario it attempts to kill the child by creating a thread that will call DbgBreak hence why the crash looks like that, it was induced that way.

We have to figure out why the child process is hanging on shutdown, these lines are the last messages from that process. They're warnings but all child processes appear to be spewing them out in this particular instance so I don't think they point to the problem. Looking at what the main thread was doing though suggests it was processing some pending events (see the lines here). It means we were at this stage of shutdown. Looking at the other threads it seems that the process was relatively idle and the crash occurred while the main thread was executing this line. Could it be that we were injecting delays in the pending events execution and waited for too long? I'm not super-familiar with this code so I can't really tell. CC'ing Kris Wright who should know this machinery better than I do.

Kris, any chance you can get any more clarity on what might be happening here? :-)

Flags: needinfo?(kwright)

Since this seems to be more about shutdown performance, I'm going to move this to Core :: Performance.

Component: Menus → Performance
Product: Firefox → Core

Sorry for the wait, been sick. Looking at this, my only guess is that the process was stuck waiting on some blocking task. There's a few rare scenarios where child processes hang in testing during shutdown due to a task (e.g. TaskQueues made after timers shut down, attempted blocking tasks that should be disallowed during shutdown, etc) so my best guess is that there's something like that going on there. I'm not super familiar with our Windows machinery so I'm not totally sure how to debug this or get more info.

Flags: needinfo?(kwright)
Performance Impact: --- → low

I wonder what made this spike on windows10-64-2004-qr this past week. Aryx, is that frequent enough to regression-hunt down to a commit?

Flags: needinfo?(aryx.bugmail)

This got more frequent on 2023-02-02. Log:

[task 2023-02-02T05:40:30.973Z] 05:40:30     INFO - GECKO(8276) | [Parent 6568, IPC I/O Parent] WARNING: Process 7872 hanging at shutdown; attempting crash report (fatal error): file /builds/worker/checkouts/gecko/ipc/chromium/src/chrome/common/process_watcher_win.cc:164
[task 2023-02-02T05:40:31.220Z] 05:40:31     INFO - GECKO(8276) | [Parent 6568, Main Thread] WARNING: NS_ENSURE_TRUE(Preferences::InitStaticMembers()) failed: file /builds/worker/checkouts/gecko/modules/libpref/Preferences.cpp:4663
[task 2023-02-02T05:40:31.279Z] 05:40:31     INFO - TEST-INFO | Main app process: exit 0
[task 2023-02-02T05:40:31.280Z] 05:40:31     INFO - TEST-INFO | Confirming we saw 19 DOCSHELL created and 19 destroyed log strings.
[task 2023-02-02T05:40:31.281Z] 05:40:31     INFO - TEST-INFO | Confirming we saw 54 DOMWINDOW created and 54 destroyed log strings.
[task 2023-02-02T05:40:31.281Z] 05:40:31     INFO - TEST-INFO | dom/media/test/browser/wmfme/browser_wmfme_crash.js | This test created 1 hidden window(s)
[task 2023-02-02T05:40:31.282Z] 05:40:31     INFO - TEST-INFO | dom/media/test/browser/wmfme/browser_wmfme_crash.js | This test created 1 hidden docshell(s)

This aligns with the landing of bug 1811901 which touched WMF code. Alastor, please check if these crashes are related to those changes.

Flags: needinfo?(aryx.bugmail) → needinfo?(alwu)

There have been 43 total failures in the last 7 days.
There are:

  • 5 failures on Windows 10 x86 2004 WebRender debug
  • 3 failures on Windows 10 x64 2004 CCov WebRender opt
  • 35 failures on Windows 10 x64 2004 WebRender debug

Recent failure log.

[task 2023-02-14T05:45:42.577Z] 05:45:42     INFO - PROCESS-CRASH | Main app process exited normally | application crashed [@ DbgBreakPoint]
[task 2023-02-14T05:45:42.577Z] 05:45:42     INFO - Crash dump filename: C:\Users\task_167634267292266\AppData\Local\Temp\tmp3rl5a0kf.mozrunner\minidumps\80e26bf6-bd3b-44f7-b4f3-5d484c9c4bce.dmp
[task 2023-02-14T05:45:42.577Z] 05:45:42     INFO - Operating system: Windows NT
[task 2023-02-14T05:45:42.587Z] 05:45:42     INFO -                   10.0.19041
[task 2023-02-14T05:45:42.587Z] 05:45:42     INFO - CPU: amd64
[task 2023-02-14T05:45:42.587Z] 05:45:42     INFO -      family 6 model 106 stepping 6
[task 2023-02-14T05:45:42.587Z] 05:45:42     INFO -      8 CPUs
[task 2023-02-14T05:45:42.587Z] 05:45:42     INFO - 
[task 2023-02-14T05:45:42.587Z] 05:45:42     INFO - Crash reason:  EXCEPTION_BREAKPOINT
[task 2023-02-14T05:45:42.587Z] 05:45:42     INFO - Crash address: 0x7ff8d82f0860
[task 2023-02-14T05:45:42.587Z] 05:45:42     INFO - Crashing instruction: `int 0x3`
[task 2023-02-14T05:45:42.587Z] 05:45:42     INFO - No memory accessed by instruction
[task 2023-02-14T05:45:42.587Z] 05:45:42     INFO - Process uptime: 24 seconds
[task 2023-02-14T05:45:42.587Z] 05:45:42     INFO - 
[task 2023-02-14T05:45:42.587Z] 05:45:42     INFO - Thread 31  (crashed)
[task 2023-02-14T05:45:42.587Z] 05:45:42     INFO -  0  ntdll.dll!DbgBreakPoint + 0x0
[task 2023-02-14T05:45:42.587Z] 05:45:42     INFO -      rax = 0x00007ff8d82f0860    rdx = 0x00007ff8d82f0860
[task 2023-02-14T05:45:42.587Z] 05:45:42     INFO -      rcx = 0x0043504900435049    rbx = 0x00007ff8d82f0860
[task 2023-02-14T05:45:42.587Z] 05:45:42     INFO -      rsi = 0x0043504900435049    rdi = 0x0000000000000000
[task 2023-02-14T05:45:42.587Z] 05:45:42     INFO -      rbp = 0x0000000000000000    rsp = 0x00000070b2abf7d8
[task 2023-02-14T05:45:42.587Z] 05:45:42     INFO -       r8 = 0x0043504900435049     r9 = 0x0000000000000000
[task 2023-02-14T05:45:42.587Z] 05:45:42     INFO -      r10 = 0x00000fff1b05e10d    r11 = 0x002000406000a008
[task 2023-02-14T05:45:42.587Z] 05:45:42     INFO -      r12 = 0x0000000000000000    r13 = 0x0000000000000000
[task 2023-02-14T05:45:42.587Z] 05:45:42     INFO -      r14 = 0x0000000000000000    r15 = 0x0000000000000000
[task 2023-02-14T05:45:42.587Z] 05:45:42     INFO -      rip = 0x00007ff8d82f0860
[task 2023-02-14T05:45:42.587Z] 05:45:42     INFO -     Found by: given as instruction pointer in context
[task 2023-02-14T05:45:42.587Z] 05:45:42     INFO -  1  kernel32.dll!BaseThreadInitThunk + 0x13
[task 2023-02-14T05:45:42.587Z] 05:45:42     INFO -      rbx = 0x00007ff8d82f0860    rbp = 0x0000000000000000
[task 2023-02-14T05:45:42.588Z] 05:45:42     INFO -      rsp = 0x00000070b2abf7e0    r12 = 0x0000000000000000
[task 2023-02-14T05:45:42.588Z] 05:45:42     INFO -      r13 = 0x0000000000000000    r14 = 0x0000000000000000
[task 2023-02-14T05:45:42.588Z] 05:45:42     INFO -      r15 = 0x0000000000000000    rip = 0x00007ff8d7577034
[task 2023-02-14T05:45:42.588Z] 05:45:42     INFO -     Found by: call frame info
[task 2023-02-14T05:45:42.588Z] 05:45:42     INFO -  2  mozglue.dll!mozilla::interceptor::FuncHook<mozilla::interceptor::WindowsDllInterceptor<mozilla::interceptor::VMSharingPolicyShared>,void (*)(int, void *, void *)>::operator()(int&, void*&, void*&) const [nsWindowsDllInterceptor.h:f45ac8766b6167ecd4adc81490f8f9b927472200 : 150]
[task 2023-02-14T05:45:42.588Z] 05:45:42     INFO -     Found by: inlining
[task 2023-02-14T05:45:42.588Z] 05:45:42     INFO -  3  mozglue.dll!patched_BaseThreadInitThunk(int, void*, void*) [WindowsDllBlocklist.cpp:f45ac8766b6167ecd4adc81490f8f9b927472200 : 592 + 0x14]
[task 2023-02-14T05:45:42.588Z] 05:45:42     INFO -      rbx = 0x00007ff8d82f0860    rbp = 0x0000000000000000
[task 2023-02-14T05:45:42.588Z] 05:45:42     INFO -      rsp = 0x00000070b2abf810    r12 = 0x0000000000000000
[task 2023-02-14T05:45:42.589Z] 05:45:42     INFO -      r13 = 0x0000000000000000    r14 = 0x0000000000000000
[task 2023-02-14T05:45:42.589Z] 05:45:42     INFO -      r15 = 0x0000000000000000    rip = 0x00007ff8ba3d5844
[task 2023-02-14T05:45:42.589Z] 05:45:42     INFO -     Found by: call frame info
[task 2023-02-14T05:45:42.589Z] 05:45:42     INFO -  4  ntdll.dll!RtlUserThreadStart + 0x20
[task 2023-02-14T05:45:42.589Z] 05:45:42     INFO -      rbx = 0x0000000000000000    rsi = 0x0000000000000000
[task 2023-02-14T05:45:42.589Z] 05:45:42     INFO -      rdi = 0x0000000000000000    rbp = 0x0000000000000000
[task 2023-02-14T05:45:42.589Z] 05:45:42     INFO -      rsp = 0x00000070b2abf880    r12 = 0x0000000000000000
[task 2023-02-14T05:45:42.589Z] 05:45:42     INFO -      r13 = 0x0000000000000000    r14 = 0x0000000000000000
[task 2023-02-14T05:45:42.589Z] 05:45:42     INFO -      r15 = 0x0000000000000000    rip = 0x00007ff8d82a2651
[task 2023-02-14T05:45:42.589Z] 05:45:42     INFO -     Found by: call frame info
[task 2023-02-14T05:45:42.589Z] 05:45:42     INFO - 
[task 2023-02-14T05:45:42.589Z] 05:45:42     INFO - Thread 0 MainThread

Hello Frank! Is there any chance you can assign this to someone?

Thank you!

Flags: needinfo?(fdoty)
Whiteboard: [stockwell needswork:owner]

:denispal can you review and make a determination of next steps?

Flags: needinfo?(fdoty) → needinfo?(dpalmeiro)

Alternatively, if Bug 1811901 is the culprit, a response from :alwu on the initial NI would be helpful to determine a cause

Flags: needinfo?(dpalmeiro)

Reopening NI for :denispal to review from a perf perspective

Flags: needinfo?(dpalmeiro)
Severity: -- → S3
Priority: -- → P3
Component: Performance → Audio/Video: Playback
Flags: needinfo?(dpalmeiro)

Alastor, could you have a look over this frequent wmfme crash?

Recent log: https://treeherder.mozilla.org/logviewer?job_id=406807063&repo=autoland

Flags: needinfo?(alwu)
Summary: Intermittent Main app process exited normally | application crashed [@ DbgBreakPoint + 0x0] → Intermittent wmfme Main app process exited normally | application crashed [@ DbgBreakPoint + 0x0]
Flags: needinfo?(alwu)
Whiteboard: [stockwell disable-recommended]
Whiteboard: [stockwell disable-recommended]

The code landed in bug 1811901 is not enabled by default, which shouldn't be the root cause of causing this spike. But maybe something related with CDM unexpectedly crashed the process. I will add a patch to see if it helps or not. Sorry for my late reply.

Assignee: nobody → alwu
Flags: needinfo?(alwu)
Keywords: leave-open
Pushed by alwu@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/eb39f531b78f
only create MFContentProtectionManager when the pref for play ready is enabled. r=jolin
Whiteboard: [stockwell disable-recommended]

Still failing after being merged here.

Flags: needinfo?(alwu)

The patches I've been landed recently were about a feature which won't even be run on the try server. My patch in this bug is even to prevent the relative class being created if the preference is off, but sadly it seems no help for the failure so far.

So I wonder if there any environment changed on the try server recently? Eg. changing Windows SDK, Windows version? Thanks.

Flags: needinfo?(alwu) → needinfo?(csabou)

The only recent change that comes to mind is Bug 1816568, it's true some of those failures happen on windows11-64-2009-qr debug but there are also a lot of them on windows10-64-2004-qr debug.
Maybe Joel can help with some more insight about try server.

Flags: needinfo?(csabou) → needinfo?(jmaher)
Whiteboard: [stockwell disable-recommended] → [stockwell needswork:owner]

Update:

There have been 93 failures within the last 7 days:

  • 2 failures on Windows 10 x86 2004 WebRender debug
  • 3 failures on Windows 10 x64 2004 CCov WebRender opt
  • 39 failures on Windows 10 x64 2004 WebRender debug
  • 1 failure on Windows 11 x86 22H2 WebRender debug
  • 48 failures on Windows 11 x64 22H2 WebRender debug

Recent failure log: https://treeherder.mozilla.org/logviewer?job_id=407840918&repo=mozilla-central&lineNumber=2337

nothing has changed on the win10 machines as far as I know. These tests will be running only on win11 next week, so the failure rate will drop down a lot. Win11 is the October 2022 release, python versions, libraries, etc. should be static until we do work to change them specifically.

Flags: needinfo?(jmaher)
Whiteboard: [stockwell disable-recommended] → [stockwell needswork:owner]
Whiteboard: [stockwell disable-recommended] → [stockwell needswork:owner]
Whiteboard: [stockwell disable-recommended]

alwu, could you have another go at this? It has 243 total failures in the last 30 days? Thank you.
https://treeherder.mozilla.org/intermittent-failures/bugdetails?startday=2023-02-18&endday=2023-03-20&tree=trunk&failurehash=all&bug=1805736

Flags: needinfo?(alwu)

OK I will check it again, keep my NI.

By looking at this failure, the hang process caused crash is GPU process. On Thread 0 MainThread, we can see GPUParent is shutting down and it seems still waiting for some message?

Sotaro, do you know who would be the best person to ask this kind of problems? Thank you.

Flags: needinfo?(sotaro.ikeda.g)

If we can't find out the reason of GPU hang, then I will disable media-bc for wmfme on win11 debug for temporarily.

Attachment #9325549 - Attachment description: Bug 1805736 - disable media-bc on win11 2009 debug. → Bug 1805736 - disable media-bc on win11 2009 debug for wmfme.
Pushed by alwu@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/58e78171b61a
disable media-bc on win11 2009 debug for wmfme. r=jmaher

Leave it open and currently we don't have any idea of why the process would hang.

Flags: needinfo?(alwu)
See Also: → 1831236
See Also: → 1875095
Flags: needinfo?(sotaro.ikeda.g)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: