Closed Bug 1444378 Opened 7 years ago Closed 1 years ago

Crash in detail::ProxyReleaseEvent<T>::Run

Categories

(Core :: Disability Access APIs, defect, P3)

60 Branch
Unspecified
Windows
defect

Tracking

()

RESOLVED FIXED
113 Branch
Tracking Status
firefox-esr91 --- wontfix
firefox-esr102 --- wontfix
firefox104 --- wontfix
firefox105 --- wontfix
firefox106 --- wontfix
firefox114 --- fixed
firefox115 --- fixed
firefox116 --- fixed

People

(Reporter: MarcoZ, Unassigned)

References

Details

(Keywords: crash, csectype-uaf, sec-high)

Crash Data

This bug was filed from the Socorro interface and is report bp-f47127dc-df59-4f0e-8407-e95d10180309. ============================================================= Top 10 frames of crashing thread: 0 xul.dll detail::ProxyReleaseEvent<nsISupports>::Run xpcom/threads/nsProxyRelease.h:38 1 xul.dll mozilla::SchedulerGroup::Runnable::Run xpcom/threads/SchedulerGroup.cpp:413 2 xul.dll nsThread::ProcessNextEvent xpcom/threads/nsThread.cpp:1040 3 xul.dll mozilla::ipc::MessagePump::Run ipc/glue/MessagePump.cpp:97 4 xul.dll mozilla::ipc::MessagePumpForChildProcess::Run ipc/glue/MessagePump.cpp:301 5 xul.dll MessageLoop::RunHandler ipc/chromium/src/base/message_loop.cc:319 6 xul.dll MessageLoop::Run ipc/chromium/src/base/message_loop.cc:299 7 xul.dll nsBaseAppShell::Run widget/nsBaseAppShell.cpp:157 8 xul.dll nsAppShell::Run widget/windows/nsAppShell.cpp:401 9 xul.dll XRE_RunAppShell toolkit/xre/nsEmbedFunctions.cpp:892 ============================================================= Just was hit by this out of the blue when switching labels in Gmail.
Priority: -- → P3
Saw this again the other day, but in a different context. Report: bp-c5769733-35f0-4ca0-9e27-263f00181002. I actually no longer remember what I was doing when this crash occurred.
QA Whiteboard: qa-not-actionable

This isn't an IPC bug, it's an issue with some random component using ProxyRelease, it's impossible to tell which one from the stacks unfortunately.

Status: NEW → RESOLVED
Closed: 2 years ago
Component: IPC → XPCOM
Resolution: --- → INCOMPLETE

Reopening because since this signature didn't have an associated bug anymore it popped up in nightly crash triage and I analyzed a few crashes. For starters this is an UAF so I'm securing this bug, given its nature however I don't know if it's easy to exploit. I don't think it is but better safe than sorry.

I've cracked open a nightly minidump which contains objects pointed by reference on the stack and analyzed what happened:

  • We're releasing a dead object here, mDoomed points to a dead object which has been overwritten with the poison value
  • The task fortunately contains the name of the object that should be released runnable by the runnable which is "WeakReferenceSupport", so it must have been launched from here
  • I'm adjusting the component accordingly, since this is within the mscom machinery, please adjust it if I'm wrong
Group: core-security
Status: RESOLVED → REOPENED
Component: XPCOM → IPC: MSCOM
OS: Windows 10 → Windows
Resolution: INCOMPLETE → ---
Group: core-security → dom-core-security

It looks like all these crashes have a11y enabled and are in the content process, so it's most likely related to the COM interceptor stuff used by a11y. The a11y re-architecture (AKA Cache the World) moves away from all of that, so I think this will go away once we ship that.

Depends on: 1737193
Component: IPC: MSCOM → Disability Access APIs

CTW isn't going to be an option for fixing this on ESR102, though. Could we at least make this crash more safely if we can't feasibly stop the crashes from happening without CTW?

That's a fair point. Unfortunately, I can't reproduce this, nor can I fathom how this could happen.

  1. A11y uses mscom::Interceptor, which is the only user of mscom::WeakReferenceSupport with eDestroyOnMainThread.
  2. Because this is in the content process, it can't be accessed by any client directly; it has to go through the COM marshaler. That means it can't be a client calling Release too many times. The COM marshaler should disconnect a client once the client's ref count drops to 0, so the crash would occur in the client, not us.
  3. A11y mostly doesn't touch Interceptor pointers directly except to wrap an object with an Interceptor. It only releases an Interceptor via CoDisconnectObject, in which case (excepting a bug in COM) there shouldn't be an unwanted Release either.
  4. Even when a11y does shut down objects, it always does so on the main thread, so this would just get released immediately instead of dispatching a runnable.
  5. The release runnable holds a reference to the WeakReferenceSupport, so I think the only way it could be deleted before that runs is if there was an extraneous Release on the main thread. Otherwise, we'd likely crash trying to acquire a reference for the second runnable.

Unless I'm missing something, that leaves either:

  1. An incorrect extra Release caused by COM, in which case that's a bug in Windows.
  2. A Release somewhere on the main thread (maybe in the Interceptor code) that I'm missing.
Flags: needinfo?(jteh)

We do see these crashes on Nightly at low volume if there's any diagnostic asserts we could land that you think might help identify a culprit.

Group: dom-core-security → layout-core-security
Severity: critical → S2

I did put some thought into any diagnostic assertions we might be able to land to help track this down, but came up blank.

Keywords: stalled

This can no longer occur as of Firefox 113 with Cache the World and the crash reports confirm this. The crashing code was removed altogether in bug 1821972. So, this is fixed in 113+.

Group: layout-core-security → core-security-release
Status: REOPENED → RESOLVED
Closed: 2 years ago1 years ago
Resolution: --- → FIXED
Target Milestone: --- → 113 Branch

Since the bug is closed, the stalled keyword is now meaningless.
For more information, please visit BugBot documentation.

Keywords: stalled
QA Whiteboard: qa-not-actionable → [post-critsmash-triage]
Group: core-security-release
You need to log in before you can comment on or make changes to this bug.