Open Bug 1737467 Opened 4 years ago Updated 1 month ago

Startup Crash in [@ sandbox::InterceptionManager::PatchNtdll] via mozilla::SandboxBroker::LaunchApp

Categories

(Core :: Security: Process Sandboxing, defect, P3)

Desktop
Windows
defect

Tracking

()

People

(Reporter: wsmwk, Unassigned)

References

Details

(Keywords: crash, regression, Whiteboard: [tbird crash])

Crash Data

Attachments

(2 obsolete files)

This occurs in Firefox at a low crash rate, but Thunderbird 91 is much higher, as can be seen in https://crash-stats.mozilla.org/signature/?signature=logging%3A%3ALogMessage%3A%3A~LogMessage&date=%3E%3D2021-04-24T00%3A00%3A00.000Z&date=%3C2021-10-24T23%3A59%3A00.000Z#graphs

Earlist Crash report is : https://crash-stats.mozilla.org/report/index/8ad2318c-a60e-4136-acfd-452270210903 tb91.0.3

Reason: EXCEPTION_BREAKPOINT

Top 10 frames of crashing thread:

0 thunderbird.exe logging::LogMessage::~LogMessage security/sandbox/chromium-shim/base/logging.cpp:111
1 thunderbird.exe sandbox::InterceptionManager::PatchNtdll security/sandbox/chromium/sandbox/win/src/interception.cc:392
2 thunderbird.exe sandbox::InterceptionManager::InitializeInterceptions security/sandbox/chromium/sandbox/win/src/interception.cc:150
3 thunderbird.exe sandbox::PolicyBase::SetupAllInterceptions security/sandbox/chromium/sandbox/win/src/sandbox_policy_base.cc:705
4 thunderbird.exe sandbox::PolicyBase::AddTarget security/sandbox/chromium/sandbox/win/src/sandbox_policy_base.cc:524
5 thunderbird.exe sandbox::BrokerServicesBase::SpawnTarget security/sandbox/chromium/sandbox/win/src/broker_services.cc:638
6 xul.dll mozilla::SandboxBroker::LaunchApp security/sandbox/win/src/sandboxbroker/sandboxBroker.cpp:282
7 xul.dll mozilla::ipc::WindowsProcessLauncher::DoLaunch ipc/glue/GeckoChildProcessHost.cpp:1523
8 xul.dll mozilla::ipc::BaseProcessLauncher::PerformAsyncLaunch ipc/glue/GeckoChildProcessHost.cpp:1012
9 xul.dll mozilla::detail::ProxyRunnable<mozilla::MozPromise<mozilla::ipc::LaunchResults, mozilla::ipc::LaunchError, 0>, RefPtr<mozilla::MozPromise<mozilla::ipc::LaunchResults, mozilla::ipc::LaunchError, 0> >  xpcom/threads/MozPromise.h:1536

More recent Thunderbird crash bp-0b29108e-c947-4a73-89b4-7d8f50211023

Flags: needinfo?(mkmelin+mozilla)

#5 crash rank for tb91.2.1

Not sure why the crash reason isn't shown, it usually does for MOZ_CRASH: "Hit fatal chromium sandbox condition."
https://searchfox.org/mozilla-central/rev/ff5309e67b20a9976ea762f9ec5f657da0801490/security/sandbox/chromium-shim/base/logging.cpp#111

Not too much to go on in the crash report.

Flags: needinfo?(mkmelin+mozilla)

#154, so No longer a topcrash

Severity: S2 → S4
OS: Windows 10 → Windows

Recent Firefox example bp-d78d7c30-8c0d-41ec-bb00-7ca7d0221231

Recent Thunderbird example bp-91308484-a7e1-4cb1-9d18-e157e0221226

Product: Thunderbird → Core
Summary: Startup Crash in [@ logging::LogMessage::~LogMessage] → Startup Crash in [@ logging::LogMessage::~LogMessage] via mozilla::SandboxBroker::LaunchApp
Whiteboard: [tbird crash]
Version: Thunderbird 91 → unspecified
QA Whiteboard: [qa-regression-triage]

These crashes are failure to reserve and/or commit memory in the child process from code running in the main process in sandbox::InterceptionManager::PatchNtdll. Most of the crashes show a memory exhaustion condition on the system, with only few megabytes left of page file and/or physical memory.

  // Reserve a full 64k memory range in the child process.
  HANDLE child = child_->Process();
  BYTE* thunk_base = reinterpret_cast<BYTE*>(::VirtualAllocEx(
      child, nullptr, kAllocGranularity, MEM_RESERVE, PAGE_NOACCESS));

  // Find an aligned, random location within the reserved range.
  size_t thunk_bytes =
      interceptions_.size() * sizeof(ThunkData) + sizeof(DllInterceptionData);
  size_t thunk_offset = internal::GetGranularAlignedRandomOffset(thunk_bytes);

  // Split the base and offset along page boundaries.
  thunk_base += thunk_offset & ~(kPageSize - 1);
  thunk_offset &= kPageSize - 1;

  // Make an aligned, padded allocation, and move the pointer to our chunk.
  size_t thunk_bytes_padded = (thunk_bytes + kPageSize - 1) & ~(kPageSize - 1);
  thunk_base = reinterpret_cast<BYTE*>(
      ::VirtualAllocEx(child, thunk_base, thunk_bytes_padded, MEM_COMMIT,
                       PAGE_EXECUTE_READWRITE));
  CHECK(thunk_base);  // If this fails we'd crash anyway on an invalid access.

We could potentially recycle the weird trick here and sleep and retry (bug 1716727) to try to save the main process.

Component: General → Security: Process Sandboxing
Priority: -- → P3

The bug is linked to a topcrash signature, which matches the following criterion:

  • Top 20 desktop browser crashes on release (startup)

For more information, please visit BugBot documentation.

This is pretty high on the parent process topcrash list. Request re-review of the severity. I think we may want to prioritize comment 5 as well if we think it can help mitigate this.

Severity: S4 → --

It's an OOM so it's only partially actionable. I don't think it should be a priority.

Severity: -- → S3
Hardware: Unspecified → Desktop
Version: unspecified → Trunk

Based on the topcrash criteria, the crash signature linked to this bug is not a topcrash signature anymore.

For more information, please visit BugBot documentation.

I assume the assessment from comment 5 is still valid?

Flags: needinfo?(jstutte) → needinfo?(yjuglaret)

(In reply to Jens Stutte [:jstutte] from comment #11)

I assume the assessment from comment 5 is still valid?

Thanks for the ping! While the OOM volume is still present as background volume, there is new volume with startup crashes from people that don't show symptoms of memory exhaustion (examples 1, 2, 3). So a new, serious bug seems to have appeared within this old signature. I'll take a look.

Flags: needinfo?(yjuglaret)
Blocks: 1992678
No longer blocks: 1992678
See Also: → 1992678

Regarding the code shared in comment 5 -- despite the comment next to it, I don't think that a CHECK is required here. It should be safe to just return SBOX_ERROR_CANNOT_WRITE_INTERCEPTION_THUNK instead of crashing. This would cause a failure to start the child process, rather than a parent process crash.

My only concern with that would be: do we monitor process launch failures enough, e.g. would we have been able to catch bug 1992678 if this CHECK was just a return SBOX_ERROR_CANNOT_WRITE_INTERCEPTION_THUNK;?

See Also: → 1997298
Assignee: nobody → yjuglaret
Status: NEW → ASSIGNED
Attachment #9523467 - Attachment is obsolete: true
Attachment #9523466 - Attachment is obsolete: true

I got my bug IDs mixed up, sorry for the noise.

Assignee: yjuglaret → nobody
Status: ASSIGNED → NEW

Reflecting the signature change after bug 1998112. We will no longer be mixing all chromium sandbox CHECK failures in the same signature.

Crash Signature: [@ logging::LogMessage::~LogMessage] → [@ logging::LogMessage::~LogMessage] [@ sandbox::InterceptionManager::PatchNtdll ]
See Also: → 1998112
Summary: Startup Crash in [@ logging::LogMessage::~LogMessage] via mozilla::SandboxBroker::LaunchApp → Startup Crash in [@ sandbox::InterceptionManager::PatchNtdll] via mozilla::SandboxBroker::LaunchApp
See Also: → 1998867
Crash Signature: [@ logging::LogMessage::~LogMessage] [@ sandbox::InterceptionManager::PatchNtdll ] → [@ sandbox::InterceptionManager::PatchNtdll ]
See Also: → 2006568
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: