Closed Bug 1575257 Opened 5 years ago Closed 4 months ago

Crash in [@ mozilla::ipc::IProtocol::DestroySubtree]

Categories

(Core :: IPC, defect, P3)

69 Branch
defect

Tracking

()

RESOLVED INCOMPLETE
Tracking Status
firefox-esr60 --- unaffected
firefox-esr68 --- unaffected
firefox68 --- unaffected
firefox69 --- wontfix
firefox70 --- wontfix
firefox71 --- wontfix

People

(Reporter: philipp, Assigned: jld)

References

(Regression)

Details

(4 keywords)

Crash Data

This bug is for crash report bp-6109a158-bce5-44bd-94df-3160f0190820.

Top 10 frames of crashing thread:

0  @0x771e5600 
1 xul.dll mozilla::ipc::IProtocol::DestroySubtree ipc/glue/ProtocolUtils.cpp:659
2 xul.dll mozilla::dom::PContentParent::OnChannelError ipc/ipdl/PContentParent.cpp:12291
3 xul.dll void mozilla::dom::ContentParent::OnChannelError dom/ipc/ContentParent.cpp:1511
4 xul.dll nsresult mozilla::detail::RunnableMethodImpl<mozilla::dom::Performance*, void  xpcom/threads/nsThreadUtils.h:1176
5 xul.dll nsThread::ProcessNextEvent xpcom/threads/nsThread.cpp:1225
6 xul.dll NS_ProcessNextEvent xpcom/threads/nsThreadUtils.cpp:486
7 xul.dll mozilla::ipc::MessagePump::Run ipc/glue/MessagePump.cpp:88
8 xul.dll MessageLoop::RunHandler ipc/chromium/src/base/message_loop.cc:308
9 xul.dll MessageLoop::Run ipc/chromium/src/base/message_loop.cc:290

this crash signature is starting to show up sporadically during the 69.0b cycle, it might have regressed with bug 1540731.

Group: core-security → dom-core-security

It's hard to tell what's going on. Some of them are reading from a null pointer here, some of them are jumping to a bad address after ActorDestroy called from here has set up the frame pointer, and one of them is reading unmapped memory for IProtocol::mLinkStatus here (note unaligned address).

I wonder if there's a race condition with one of ContentParent's managees being destroyed on the wrong thread, or something like that.

It's worth noting that bug 1540731 fixed at least one and probably several use-after-free bugs — see also bug 1471154 — and this crash appears to be relatively low-volume.

(Making this a separate comment just in case it needs to be hidden after the rest of the bug is public.)

Keywords: sec-high
Assignee: nobody → jld

Nika, do you know if there are any DOM actors with sketchy lifetime stuff that we should be suspecting as the cause of this? There doesn't seem to be much actionable info in the crash reports themselves.

Flags: needinfo?(nika)

Given the time frame which it started occuring in, this is probably an existing bug which simply had it's signature changed. The IProtocol::DestroySubtree method was previously known as, based on the call stack, PContentParent::DestroySubtree. The signature may have also looked different though, so I'm not 100% confident.

Based on the causes you mention earlier, I'm guessing this could be caused by some odd lifecycle behaviour. Looking at some of the crash sites jld linked:

(In reply to Jed Davis [:jld] ⟨⏰|UTC-6⟩ ⟦he/him⟧ from comment #1)

It's hard to tell what's going on. Some of them are reading from a null pointer here,

Only way that this should be nullptr would be if a managee was in the mFooManagee list but has had its ActorLifecycleProxy destroyed already. Theoretically that shouldn't happen(tm).

I suppose we could null-check each of the proxies as we read them? No clue how they'd be null here though, even with a fairly poorly behaving actor.

some of them are jumping to a bad address after ActorDestroy called from here has set up the frame pointer,

Hmm, might be a UAF which leaves a weird VTable pointer laying around?

and one of them is reading unmapped memory for IProtocol::mLinkStatus here (note unaligned address).

Wild. I suppose the value inside of the proxy has been clobbered after it was freed or something.

I wonder if there's a race condition with one of ContentParent's managees being destroyed on the wrong thread, or something like that.

That would be pretty odd. We do a RELEASE_ASSERT to check the worker thread matches what we expect when sending a message like __delete__. Looking a bit though I suppose we don't currently release assert the "channel is shut down" status when ~IProtocol is run, so perhaps that's what's happening?

All of these smell strongly of a UAF, which is especially interesting because we have a kungFuDeathGrip on ContentParent throughout the call into OnChannelError here: https://searchfox.org/mozilla-central/rev/325c1a707819602feff736f129cb36055ba6d94f/dom/ipc/ContentParent.cpp#1552. Perhaps we have a double-Release happening somewhere?

Flags: needinfo?(nika)

At one point in the IPC meeting we were wondering if this might just be a few people with bad RAM, but there are some more crashes:

Extending the search to the past two weeks, there are a couple more related to PBackground, as well as bp-fe080a28-c838-45e4-999b-488230190908 and bp-1f27fbc5-4b3f-4d18-9e6d-2c09a0190908 in PBrowserChild::Send__delete__ from BrowserChild::DelayedDeleteRunnable.

I hate to mark a likely UAF as P3, but there just isn't actionable information here.

Priority: -- → P3
Keywords: stalled
Has Regression Range: --- → yes
Severity: critical → S3

The severity field for this bug is set to S3. However, the bug is flagged with the sec-high keyword.
:jld, could you consider increasing the severity of this security bug?

For more information, please visit auto_nag documentation.

Flags: needinfo?(jld)

stalled.

Flags: needinfo?(jld)

We are reviewing and closing unactionable stalled bugs.

Status: NEW → RESOLVED
Closed: 4 months ago
Resolution: --- → INCOMPLETE

Since the bug is closed, the stalled keyword is now meaningless.
For more information, please visit BugBot documentation.

Keywords: stalled
Group: dom-core-security
You need to log in before you can comment on or make changes to this bug.