Closed Bug 1895174 Opened 1 years ago Closed 1 year ago

Crash in [@ mozilla::mscom::ProcessRuntime::InitInsideApartment] in the Utility process

Categories

(External Software Affecting Firefox :: Other, defect)

Unspecified
Windows 11
defect

Tracking

(firefox-esr115 unaffected, firefox-esr128 unaffected, firefox126 unaffected, firefox127 disabled, firefox128 disabled, firefox129 disabled, firefox130 disabled, firefox131 wontfix, firefox132 wontfix, firefox133 fixed, firefox134 fixed)

RESOLVED FIXED
134 Branch
Tracking Status
firefox-esr115 --- unaffected
firefox-esr128 --- unaffected
firefox126 --- unaffected
firefox127 --- disabled
firefox128 --- disabled
firefox129 --- disabled
firefox130 --- disabled
firefox131 --- wontfix
firefox132 --- wontfix
firefox133 --- fixed
firefox134 --- fixed

People

(Reporter: gsvelto, Assigned: bobowen)

References

Details

(Keywords: crash)

Crash Data

Attachments

(3 files)

Crash report: https://crash-stats.mozilla.org/report/index/2910b928-bdd5-47d7-b75f-0b3400240505

MOZ_CRASH Reason: MOZ_DIAGNOSTIC_ASSERT((((HRESULT)(mInitResult)) >= 0))

Top 10 frames:

0  xul.dll  mozilla::mscom::ProcessRuntime::InitInsideApartment()  ipc/mscom/ProcessRuntime.cpp:256
1  xul.dll  mozilla::mscom::ProcessRuntime::ProcessRuntime(const mozilla::mscom::ProcessR...  ipc/mscom/ProcessRuntime.cpp:143
1  xul.dll  mozilla::mscom::ProcessRuntime::ProcessRuntime(const GeckoProcessType)  ipc/mscom/ProcessRuntime.cpp:47
1  xul.dll  mozilla::mscom::ProcessRuntime::ProcessRuntime()  ipc/mscom/ProcessRuntime.cpp:42
2  xul.dll  mozilla::ipc::UtilityProcessImpl::ProcessChild(unsigned long, nsID const&)  ipc/glue/UtilityProcessImpl.h:23
2  xul.dll  mozilla::MakeUnique(unsigned long&, nsID&)  mfbt/UniquePtr.h:606
2  xul.dll  XRE_InitChildProcess(int, char**, XREChildData const*)  toolkit/xre/nsEmbedFunctions.cpp:594
2  xul.dll  mozilla::BootstrapImpl::XRE_InitChildProcess(int, char**, XREChildData const*)  toolkit/xre/Bootstrap.cpp:67
3  firefox.exe  content_process_main(mozilla::Bootstrap*, int, char**)  ipc/contentproc/plugin-container.cpp:57
3  firefox.exe  NS_internal_main(int, char**, char**)  browser/app/nsBrowserApp.cpp:375

Huge spike in nightly fortunately coming from a small number of users. Several crashes have the last error value set to ERROR_INSUFFICIENT_BUFFER.

The bug is linked to a topcrash signature, which matches the following criterion:

  • Top 10 desktop browser crashes on nightly

For more information, please visit BugBot documentation.

Keywords: topcrash

The severity field is not set for this bug.
:handyman, could you have a look please?

For more information, please visit BugBot documentation.

Flags: needinfo?(davidp99)

Anything in the recent beta that either of you can imagine causing this?

Flags: needinfo?(rkraesig)
Flags: needinfo?(lissyx+mozillians)
Flags: needinfo?(davidp99)

I was not able to determine the utility proc type from these crashes -- not the metadata or the dump. It's early enough that I'm not sure it's relevant anyway.

No but Gabriele told me it was focused on a few clients on older windows 10 versions

Flags: needinfo?(lissyx+mozillians) → needinfo?(gsvelto)

No recent changes on our part that I'm aware of.

A quick trawl through several crash reports suggests that this might be due to a third-party DLL, WRusr.dll, which is the only non-Microsoft, non-Mozilla DLL loaded in any of the crash reports I've checked so far — mostly v9.0.35.17 (apparently unsigned?), but with at least one occurrence of v9.0.35.12 (signed by Webroot). I could very easily believe that this DLL is initializing COM early. But then, I could also believe that these are all from one user.

Flags: needinfo?(rkraesig)

I concur with Ray's analysis. All the nightly crashes I've looked at have that DLL injected and they're also definitely from different users.

Flags: needinfo?(gsvelto)

The severity field is not set for this bug.
:handyman, could you have a look please?

For more information, please visit BugBot documentation.

Flags: needinfo?(davidp99)

Going by Ray and Gabriele's assessment, NI to @gstoll to see if this should be moved to External Software Affecting Firefox (and for severity).

Flags: needinfo?(davidp99) → needinfo?(gstoll)

Yeah, we've already had to block earlier versions of WRusr.dll in the past (bug 1752466). I'll do a proper query to make sure but it sure seems like that's the problem.

Severity: -- → S3
Component: IPC: MSCOM → Other
Flags: needinfo?(gstoll)
Product: Core → External Software Affecting Firefox

This spiked again, again with a low number of installations, but this time either SafeWrapper.dll or chromesafe64.dll seem to be the problem.

Based on the topcrash criteria, the crash signature linked to this bug is not a topcrash signature anymore.

For more information, please visit BugBot documentation.

Keywords: topcrash

This is also shaping up to be a topcrash in Nightly 129, a startupcrash in what looks like just a few installs.
I wonder why it is Nightly (and maybe Beta) only?

(In reply to Liz Henry (:lizzard) (relman/hg->git project) from comment #13)

This is also shaping up to be a topcrash in Nightly 129, a startupcrash in what looks like just a few installs.
I wonder why it is Nightly (and maybe Beta) only?

Probably because only Nightly starts the Windows file-dialog in a utility process by default.

There's a pretty large spike in nightly, with numbers going up 10-20x compared to the previous volume. There's two different crashes now, with slightly different behavior. The first type is what we've seen before, we fail with ERROR_INSUFFICIENT_BUFFER in the utility process, and this seems to affect Windows 11 installations only. The other type which appears to be more common is a failure with an ERROR_ACCESS_DENIED failure when launching a content process, and this seems to affect Windows 10 installations only. Greg. Raymond can you have a look please?

Flags: needinfo?(rkraesig)
Flags: needinfo?(gstoll)

(In reply to Gabriele Svelto [:gsvelto] from comment #15)

The other type which appears to be more common is a failure with an ERROR_ACCESS_DENIED failure when launching a content process, and this seems to affect Windows 10 installations only. Greg. Raymond can you have a look please?

This appears to be induced primarily by SbieDll.dll, signed by "Tonalio GmbH", which I think is Sandboxie, an open-source sandboxing utility. So far I've seen DLL versions 5.62.2.0, 5.67.7.0, and 5.69.6.0, so I think it's unlikely that this is either a single user with a misconfigured sandbox or a recently-released DLL with (EDIT:) new behaviors. Bug 1900175 seems to be about the right timeframe and process-context.

Pinging :bobowen for further plausibility assessment while I go test Sandboxie + Nightly.

Flags: needinfo?(rkraesig) → needinfo?(bobowencode)

I need to look in the past but I had telemetry queries to investigate sandboxie related crashes

(In reply to Ray Kraesig [:rkraesig] from comment #16)

[...] while I go test Sandboxie + Nightly.

Yup, that's the one. https://crash-stats.mozilla.org/report/index/3541d878-d78e-4ec7-ad15-60f710240826#tab-details

(Somewhat surprisingly, the value of mInitResult in this crash is E_OUTOFMEMORY; I would have expected this to be associated with an ERROR_INSUFFICIENT_BUFFER rather than an ERROR_ACCESS_DENIED. Perhaps this is an error code arising from Sandboxie's injections?)

The bug is marked as tracked for firefox131 (nightly). We have limited time to fix this, the soft freeze is in 2 days. However, the bug still isn't assigned and has low severity.

:gcp, could you please find an assignee and increase the severity for this tracked bug? If you disagree with the tracking decision, please talk with the release managers.

For more information, please visit BugBot documentation.

Flags: needinfo?(gpascutto)

If it's open source at the very least we can file an issue on their github?

Assignee: nobody → bobowencode
Flags: needinfo?(gpascutto)

(In reply to BugBot [:suhaib / :marco/ :calixte] from comment #19)

The bug is marked as tracked for firefox131 (nightly). We have limited time to fix this, the soft freeze is in 2 days. However, the bug still isn't assigned and has low severity.

:gcp, could you please find an assignee and increase the severity for this tracked bug? If you disagree with the tracking decision, please talk with the release managers.

For more information, please visit BugBot documentation.

It seems likely that this is down to an interaction between our sandbox and theirs.
This new feature is only enabled on Nightly, so we shouldn't see it when Fx131 goes to Beta.

(In reply to Gian-Carlo Pascutto [:gcp] from comment #20)

If it's open source at the very least we can file an issue on their github?

Yes, I'll file an issue.

Flags: needinfo?(bobowencode)
Flags: needinfo?(gstoll)

It looks like with the back-out of PGO with sandboxing (bug 1553850) this has stopped crashing with sandboxie.
I think the two that happened on build 20240829040756 were tests for a different type of sandboxie sandbox (Application Compartment) that is only fully available to project supporters, so an even smaller subset of sandboxie users. It can only be used for 5 minutes by non-supporters.
My local PGO build didn't reproduce, but it seems to with the Application Compartment sandbox, so I'll try and investigate a little further.

We are still getting quite a few crashes with webroot in the utility process.
gstoll - do we have any contacts at webroot from previous issues? I guess blocking in utility might be an option.

Flags: needinfo?(gstoll)

Yes, I'll share the doc with you.

Flags: needinfo?(gstoll)

(In reply to Greg Stoll :gstoll from comment #24)

Yes, I'll share the doc with you.

Thanks, I'll try and contact them, once I've had a look at some of the crashes to see if I can add any more useful information for them.

The similar crash with only USER_RESTRICTED that happens with the Application Compartment type sandboxie sandbox appears to be caused when the call to ProcessToken::GetProcessToken in CRpcResolver::GetConnection fails within the CoInitializeSecurity call in combase.dll.
I've looked at the process and thread tokens just after the failure and I don't see any obvious reason for this.
I've not dug any deeper because I had some trouble with breakpoints causing failures when debugging within the sandboxie sandbox.
I've updated the sandboxie github issue.

Unfortunately all the emails in the doc for webroot got bounced, possibly because it is now owned by opentext.
I've tried the same emails at opentext.com and one of them hasn't been bounced (yet).
In addition, I've submitted a ticket to their support and I've also found a possible contact on LinkedIn from previous colleagues.

Response from Webroot support (possibly automated) saying that the issue has been submitted to their development team.
They also asked me to run a log and system information gathering utility.
I’ve replied that I don’t have Webroot, but that I would try and reproduce if they could provide a version that includes the WRusr.dll injection.

Another response from Webroot support saying they are looking into my request, so presumably this is about providing us with a version of Webroot we can test with.

After suggestion from :bobowen and :gcp I tried to repro the crashes with Webroot. Here is what happened.

With a fresh install of Windows 10 22H2, without Webroot installed: I was initially unable to get this PlayReady example page to play the video, neither in Edge nor Firefox Nightly. After running Windows Updates to get all drivers for the machine and an up-to-date version of Windows, Edge was able to run the video but not Firefox Nightly.

With a fresh install of Windows 11 23H2, without Webroot installed: same story as for Windows 10 except that after running Windows Update, Firefox Nightly was able to play the video. Then I installed Webroot and found a way to reproduce the crashes.

STR for reproducing the crashes:

  • Install Firefox Nightly.
  • Make sure that this PlayReady example page plays a video.
  • Close Firefox Nightly.
  • Install and launch Webroot (e.g. trial version).
  • Launch the PlayReady example page. The video should play.
  • Restart Firefox.
  • Return to the PlayReady example page. The video should this time refuse to play and about:crashes now contains a crash report. Using Shift+F5 to force reload the page makes the video load successfully again.

Expected behavior: Firefox Nightly is able to play the video even when the page is loaded after a restart, without requiring Shift+F5.

The error seems to be a case of RPC_E_TOO_LATE as suspected by :rkraesig in comment 6. There is a non-documented registry key that can be used to debug these kind of issues: by setting Computer\HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Ole\FeatureDevelopmentProperties\BreakOnRpcETooLate to DWORD 1, if a debugger is attached the first call to CoInitializeSecurity will record the call stack and the second call will generate a breakpoint and indicate where to find the call stack for the first call.

After doing that we indeed break in combase.dll with the following output in WinDbg:

RPC_E_TOO_LATE indicates that CoInitializeSecurity has already been called
Command to display the stack trace for CoInitializeSecurity: dps 0X000001705F3D23C8
(76b0.68b0): Break instruction exception - code 80000003 (first chance)
combase!BreakIntoDebugger+0x4:
00007fff`8f4317b8 cc              int     3

The call stack for the second call is the same as in the crash, as expected:

8:185> k
 # Child-SP          RetAddr               Call Site
00 00000021`80ffe4b0 00007fff`8f4240ce     combase!BreakIntoDebugger+0x4 [onecore\com\combase\inc\DebuggerUtils.h @ 33] 
01 (Inline Function) --------`--------     combase!BreakIntoAnyDebuggerIfPresent+0x22 [onecore\com\combase\inc\DebuggerUtils.h @ 56] 
02 00000021`80ffe4e0 00007fff`8f30144d     combase!RpcETooLate+0xbaaf6 [onecore\com\combase\dcomrem\security.cxx @ 3079] 
03 00000021`80ffe510 00007fff`0c5d0bfc     combase!CoInitializeSecurity+0x22d [onecore\com\combase\dcomrem\security.cxx @ 3409] 
04 (Inline Function) --------`--------     xul!mozilla::detail::DynamicallyLinkedFunctionPtrBase<long (*)(void *, long, tagSOLE_AUTHENTICATION_SERVICE *, void *, unsigned long, unsigned long, void *, unsigned long, void *)>::operator()+0x35 [/builds/worker/workspace/obj-build/dist/include/mozilla/DynamicallyLinkedFunctionPtr.h @ 77] 
05 (Inline Function) --------`--------     xul!mozilla::mscom::wrapped::CoInitializeSecurity+0xcb [/builds/worker/checkouts/gecko/ipc/mscom/COMWrappers.cpp @ 75] 
06 (Inline Function) --------`--------     xul!mozilla::mscom::ProcessRuntime::InitializeSecurity+0x435 [/builds/worker/checkouts/gecko/ipc/mscom/ProcessRuntime.cpp @ 480] 
07 00000021`80ffe860 00007fff`0c5fe6ad     xul!mozilla::mscom::ProcessRuntime::InitInsideApartment+0x47c [/builds/worker/checkouts/gecko/ipc/mscom/ProcessRuntime.cpp @ 255] 
08 (Inline Function) --------`--------     xul!mozilla::mscom::ProcessRuntime::ProcessRuntime+0x1a4 [/builds/worker/checkouts/gecko/ipc/mscom/ProcessRuntime.cpp @ 143] 
09 (Inline Function) --------`--------     xul!mozilla::mscom::ProcessRuntime::ProcessRuntime+0x1b0 [/builds/worker/checkouts/gecko/ipc/mscom/ProcessRuntime.cpp @ 47] 
0a 00000021`80ffeb60 00007fff`0c5d52dc     xul!mozilla::mscom::ProcessRuntime::ProcessRuntime+0x1dd [/builds/worker/checkouts/gecko/ipc/mscom/ProcessRuntime.cpp @ 42] 
0b (Inline Function) --------`--------     xul!mozilla::ipc::UtilityProcessImpl::ProcessChild+0x29 [/builds/worker/workspace/obj-build/dist/include/mozilla/ipc/UtilityProcessImpl.h @ 23] 
0c (Inline Function) --------`--------     xul!mozilla::MakeUnique+0x3f [/builds/worker/workspace/obj-build/dist/include/mozilla/UniquePtr.h @ 606] 
0d (Inline Function) --------`--------     xul!XRE_InitChildProcess+0xe12 [/builds/worker/checkouts/gecko/toolkit/xre/nsEmbedFunctions.cpp @ 592] 
0e 00000021`80ffec20 00007ff7`e97c3609     xul!mozilla::BootstrapImpl::XRE_InitChildProcess+0xe3c [/builds/worker/checkouts/gecko/toolkit/xre/Bootstrap.cpp @ 63] 
0f (Inline Function) --------`--------     firefox!NS_internal_main+0x45c [/builds/worker/checkouts/gecko/browser/app/nsBrowserApp.cpp @ 403] 
10 00000021`80ffeee0 00007ff7`e97e03a8     firefox!wmain+0x2569 [/builds/worker/checkouts/gecko/toolkit/xre/nsWindowsWMain.cpp @ 151] 
11 (Inline Function) --------`--------     firefox!invoke_main+0x22 [D:\a\_work\1\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl @ 90] 
12 00000021`80fffb50 00007fff`8ec7257d     firefox!__scrt_common_main_seh+0x10c [D:\a\_work\1\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl @ 288] 
13 00000021`80fffb90 00007fff`8fd2af28     KERNEL32!BaseThreadInitThunk+0x1d
14 00000021`80fffbc0 00000000`00000000     ntdll!RtlUserThreadStart+0x28

But there is indeed a COM call that occured before that with the following call stack:

<elided, see next comment>

I'm not sure how this relates to Webroot at the moment but I'll keep investigating.

Edit: Oh, just realized that the call stack is truncated not because combase didn't record it fully but more likely because I used dps without a range. Will post full call stack tomorrow.

Here is the full stack which is indeed directly linked to WRusr:

0000021a`a1822448  00007ff9`8079d021 combase!CoInitializeSecurity+0xfbe01 [onecore\com\combase\dcomrem\security.cxx @ 3136]
0000021a`a1822450  00007ff9`806f69c8 combase!InitializeSecurity+0x88 [onecore\com\combase\dcomrem\security.cxx @ 6257]
0000021a`a1822458  00007ff9`8066eee1 combase!CComApartment::InitRemoting+0xc9 [onecore\com\combase\dcomrem\aprtmnt.cxx @ 885]
0000021a`a1822460  00007ff9`8066b566 combase!CComApartment::StartServer+0x2a [onecore\com\combase\dcomrem\aprtmnt.cxx @ 1215]
0000021a`a1822468  00007ff9`8066886f combase!CRpcResolver::BindToSCMProxy+0x2b [onecore\com\combase\dcomrem\resolver.cxx @ 1645]
0000021a`a1822470  00007ff9`806682eb combase!CRpcResolver::DelegateActivationToSCM+0x12f [onecore\com\combase\dcomrem\resolver.cxx @ 2155]
0000021a`a1822478  00007ff9`806f689e combase!CRpcResolver::CreateInstance+0x1a [onecore\com\combase\dcomrem\resolver.cxx @ 2414]
0000021a`a1822480  00007ff9`806af948 combase!CClientContextActivator::CreateInstance+0x138 [onecore\com\combase\objact\actvator.cxx @ 604]
0000021a`a1822488  00007ff9`80691e4c combase!ActivationPropertiesIn::DelegateCreateInstance+0x8c [onecore\com\combase\actprops\actprops.cxx @ 1920]
0000021a`a1822490  00007ff9`806609c1 combase!ICoCreateInstanceEx+0x891 [onecore\com\combase\objact\objact.cxx @ 1921]
0000021a`a1822498  00007ff9`8065ffae combase!CComActivator::DoCreateInstance+0x15e [onecore\com\combase\objact\immact.hxx @ 380]
0000021a`a18224a0  00007ff9`80630ecf combase!CoCreateInstanceAsUser+0x1df [onecore\com\combase\objact\actapi.cxx @ 441]
0000021a`a18224a8  00007ff9`8062e0fd combase!RuntimeBrokerActivation+0x18d [onecore\com\combase\winrtbase\brokeredactivation.cpp @ 780]
0000021a`a18224b0  00007ff9`80663918 combase!WinRTActivateInstanceInternal+0x518 [onecore\com\combase\winrtbase\winrtbase.cpp @ 662]
0000021a`a18224b8  00007ff9`8065e7ac combase!RoActivateInstance+0x1ac [onecore\com\combase\winrtbase\winrtbase.cpp @ 810]
0000021a`a18224c0  00007ff9`4047b4ce windows_storage_onecore!wil::ActivateInstance<IWin32Broker>+0x56
0000021a`a18224c8  00007ff9`4047e615 windows_storage_onecore!`anonymous namespace'::_GetBrokerInstance+0x49
0000021a`a18224d0  00007ff9`4047eae5 windows_storage_onecore!BrokeredCreateFile2+0x125
0000021a`a18224d8  00007ff9`7e459d78 KERNELBASE!CreateFileW+0x158
0000021a`a18224e0  00007ff9`7e50d388 KERNELBASE!CallNamedPipeW+0xf8
0000021a`a18224e8  00007ff9`698bed0e WRusr!DllRegisterServer+0xf8e
0000021a`a18224f0  00007ff9`7ef2257d KERNEL32!BaseThreadInitThunk+0x1d
0000021a`a18224f8  00007ff9`80deaf28 ntdll!RtlUserThreadStart+0x28

So this would be a race condition between our initialization code's call to CoInitializeSecurity and Webroot's call to CallNamedPipeW. Note that sometimes the crash does not occur (presumably we win the race condition) but nonetheless the video doesn't play, so there may be other problems.

In the call stack above, WRusr.dll is calling CallNamedPipeW with their pipe \\.\pipe\WRSynUM2 from a thread created by their DllMain. This ends up unintentionally doing COM, because CreateFileW uses a brokered fallback code if the normal path (using NtCreateFile) fails with ERROR_ACCESS_DENIED (which it does here, because of our sandboxing). It is the brokered fallback code that does COM here, and this code is part of Microsoft's implementation for CreateFileW. It sounds a bit crazy that CreateFileW is doing COM but here we are.

I believe that the easiest option outside blocking WRusr.dll would be to ask Webroot to use NtCreateFile, TransactNamedPipe, CloseHandle instead of CallNamedPipeW to avoid the brokered fallback code (Calling CallNamedPipe is equivalent to calling the CreateFile (or WaitNamedPipe, if CreateFile cannot open the pipe immediately), TransactNamedPipe, and CloseHandle functions.)...

If I nop-out the call to CallNamedPipeW I can play the video without trouble.

Attached file repro.cpp

Attached is a minimal reproducer for the issue, written with the help of :bobowen for the AppContainer part.

The brokered create file path only occurs within AppContainer processes, upon running into an ERROR_ACCESS_DENIED. It calls RoInitialize, CoInitializeSecurity, RoUninitialize. If this happens -- even on a different thread -- between a call to CoInitializeEx and a call to CoInitializeSecurity, then CoInitializeSecurity will fail with RPC_E_TOO_LATE.

The brokered create file path in not present on Windows 10 22H2 (verified in KernelBase.dll version 10.0.19041.4842), so the issue detailed above should be specific to Windows 11. The symbols present in Windows.Storage.OneCore.dll and/or imported by KernelBase.dll suggest that this behavior also exists for at least CopyFileW, CreateDirectoryW, DeleteFileW, FindFirstFileExW, GetFileAttributesExW, MoveFileExW, RemoveDirectoryW, ReplaceFileW, SetFileAttributesW, presumably with the same side effect. I'll write to Microsoft to let them know about this undesirable side effect.

Edit: For reference wrt the reproducer code, the output if the faulty code is present (e.g. on Windows 11 23H2) is the following:

handle: FFFFFFFFFFFFFFFF, last error: 00000005
CoInitializeSecurity result: 80010119

I consider this output a bug and personally believe that the expected result should always be:

handle: FFFFFFFFFFFFFFFF, last error: 00000005
CoInitializeSecurity result: 0
Has STR: --- → yes

I have now also contacted Webroot through Bob's ticket to ask if it would be an option for them to use NtCreateFile.

There's a pretty bad spike happening here though I don't know if it's the same issue we've experienced before or something different. Most crashes in nightly have the result of GetLastError() set to ERROR_INSUFFICIENT_BUFFER which seems like some kind of OOM condition. Similarly a new signature appeared:

https://crash-stats.mozilla.org/signature/?product=Firefox&signature=OOM%20%7C%20unknown%20%7C%20__delayLoadHelper2%20%7C%20_tailMerge_oleaut32.dll%20%7C%20mozilla%3A%3Amscom%3A%3AProcessRuntime%3A%3AInitInsideApartment

This one is indeed an OOM, but the stack is the same, so they're likely related.

(In reply to Gabriele Svelto [:gsvelto] from comment #35)

There's a pretty bad spike happening here though I don't know if it's the same issue we've experienced before or something different. Most crashes in nightly have the result of GetLastError() set to ERROR_INSUFFICIENT_BUFFER which seems like some kind of OOM condition. Similarly a new signature appeared:

https://crash-stats.mozilla.org/signature/?product=Firefox&signature=OOM%20%7C%20unknown%20%7C%20__delayLoadHelper2%20%7C%20_tailMerge_oleaut32.dll%20%7C%20mozilla%3A%3Amscom%3A%3AProcessRuntime%3A%3AInitInsideApartment

This one is indeed an OOM, but the stack is the same, so they're likely related.

Most of the very recent ones (particularly in content) appear to be from one installation.

I haven't heard back from Microsoft about this issue so far, and we are still receiving some crashes. With PlayReady support being added to 132 Release (as a progressive rollout), I think our safest option here would be to block WRusr.dll injection in our utility processes. I have informed Webroot of this decision and I'll propose a patch for that.

WRusr.dll injection from WebRoot causes a crash in the utility LPAC
process used for PlayReady because on some versions of Windows,
CallNamedPipeW does COM if it fails to open the pipe in an AppContainer
process. This does not affect other processes though, so blocking
injection in utility processes should be enough.

Pushed by yjuglaret@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/a0e21b5c2476 Block WRusr.dll in utility processes. r=bobowen,win-reviewers,gstoll
Status: NEW → RESOLVED
Closed: 1 year ago
Resolution: --- → FIXED
Target Milestone: --- → 134 Branch

I am unable to reproduce the issue in the latest Nightly thanks to the patch. about:third-party confirms that WRusr.dll is loaded in the main process and blocked in the utility process, and everything seems to work normally. If the crash volume in the next few nightlies reaches zero as expected, let's uplift this patch to beta preventively before broad support for PlayReady reaches release channel.

WRusr.dll injection from WebRoot causes a crash in the utility LPAC
process used for PlayReady because on some versions of Windows,
CallNamedPipeW does COM if it fails to open the pipe in an AppContainer
process. This does not affect other processes though, so blocking
injection in utility processes should be enough.

Original Revision: https://phabricator.services.mozilla.com/D228185

Attachment #9436313 - Flags: approval-mozilla-beta?

beta Uplift Approval Request

  • User impact if declined: Webroot users won't be able to play videos that use the PlayReady DRM when we deploy broad support for it. The utility LPAC process will crash.
  • Code covered by automated testing: yes
  • Fix verified in Nightly: yes
  • Needs manual QE test: no
  • Steps to reproduce for manual QE testing: -
  • Risk associated with taking this patch: Low
  • Explanation of risk level: Blocks a third-party DLL only in utility processes. Manual testing in Nightly showed didn't exhibit any particular issue caused by the blocking.
  • String changes made/needed: no
  • Is Android affected?: no
Attachment #9436313 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
See Also: → 1930846
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: