Crash in [@ mozilla::ipc::IProtocol::DestroySubtree]
Categories
(Core :: IPC, defect, P3)
Tracking
()
Tracking | Status | |
---|---|---|
firefox-esr60 | --- | unaffected |
firefox-esr68 | --- | unaffected |
firefox68 | --- | unaffected |
firefox69 | --- | wontfix |
firefox70 | --- | wontfix |
firefox71 | --- | wontfix |
People
(Reporter: philipp, Assigned: jld)
References
(Regression)
Details
(4 keywords)
Crash Data
This bug is for crash report bp-6109a158-bce5-44bd-94df-3160f0190820.
Top 10 frames of crashing thread:
0 @0x771e5600
1 xul.dll mozilla::ipc::IProtocol::DestroySubtree ipc/glue/ProtocolUtils.cpp:659
2 xul.dll mozilla::dom::PContentParent::OnChannelError ipc/ipdl/PContentParent.cpp:12291
3 xul.dll void mozilla::dom::ContentParent::OnChannelError dom/ipc/ContentParent.cpp:1511
4 xul.dll nsresult mozilla::detail::RunnableMethodImpl<mozilla::dom::Performance*, void xpcom/threads/nsThreadUtils.h:1176
5 xul.dll nsThread::ProcessNextEvent xpcom/threads/nsThread.cpp:1225
6 xul.dll NS_ProcessNextEvent xpcom/threads/nsThreadUtils.cpp:486
7 xul.dll mozilla::ipc::MessagePump::Run ipc/glue/MessagePump.cpp:88
8 xul.dll MessageLoop::RunHandler ipc/chromium/src/base/message_loop.cc:308
9 xul.dll MessageLoop::Run ipc/chromium/src/base/message_loop.cc:290
this crash signature is starting to show up sporadically during the 69.0b cycle, it might have regressed with bug 1540731.
Updated•6 years ago
|
Assignee | ||
Comment 1•6 years ago
|
||
It's hard to tell what's going on. Some of them are reading from a null pointer here, some of them are jumping to a bad address after ActorDestroy
called from here has set up the frame pointer, and one of them is reading unmapped memory for IProtocol::mLinkStatus
here (note unaligned address).
I wonder if there's a race condition with one of ContentParent
's managees being destroyed on the wrong thread, or something like that.
Assignee | ||
Comment 2•6 years ago
|
||
It's worth noting that bug 1540731 fixed at least one and probably several use-after-free bugs — see also bug 1471154 — and this crash appears to be relatively low-volume.
(Making this a separate comment just in case it needs to be hidden after the rest of the bug is public.)
Updated•6 years ago
|
Assignee | ||
Comment 3•6 years ago
|
||
Nika, do you know if there are any DOM actors with sketchy lifetime stuff that we should be suspecting as the cause of this? There doesn't seem to be much actionable info in the crash reports themselves.
Comment 4•6 years ago
|
||
Given the time frame which it started occuring in, this is probably an existing bug which simply had it's signature changed. The IProtocol::DestroySubtree
method was previously known as, based on the call stack, PContentParent::DestroySubtree
. The signature may have also looked different though, so I'm not 100% confident.
Based on the causes you mention earlier, I'm guessing this could be caused by some odd lifecycle behaviour. Looking at some of the crash sites jld linked:
(In reply to Jed Davis [:jld] ⟨⏰|UTC-6⟩ ⟦he/him⟧ from comment #1)
It's hard to tell what's going on. Some of them are reading from a null pointer here,
Only way that this should be nullptr would be if a managee was in the mFooManagee
list but has had its ActorLifecycleProxy
destroyed already. Theoretically that shouldn't happen(tm).
I suppose we could null-check each of the proxies as we read them? No clue how they'd be null here though, even with a fairly poorly behaving actor.
some of them are jumping to a bad address after
ActorDestroy
called from here has set up the frame pointer,
Hmm, might be a UAF which leaves a weird VTable pointer laying around?
and one of them is reading unmapped memory for
IProtocol::mLinkStatus
here (note unaligned address).
Wild. I suppose the value inside of the proxy has been clobbered after it was freed or something.
I wonder if there's a race condition with one of
ContentParent
's managees being destroyed on the wrong thread, or something like that.
That would be pretty odd. We do a RELEASE_ASSERT to check the worker thread matches what we expect when sending a message like __delete__
. Looking a bit though I suppose we don't currently release assert the "channel is shut down" status when ~IProtocol
is run, so perhaps that's what's happening?
All of these smell strongly of a UAF, which is especially interesting because we have a kungFuDeathGrip
on ContentParent
throughout the call into OnChannelError
here: https://searchfox.org/mozilla-central/rev/325c1a707819602feff736f129cb36055ba6d94f/dom/ipc/ContentParent.cpp#1552. Perhaps we have a double-Release happening somewhere?
Assignee | ||
Comment 5•5 years ago
|
||
At one point in the IPC meeting we were wondering if this might just be a few people with bad RAM, but there are some more crashes:
- bp-fbb58f48-b9ef-4e55-b7c2-bf1290190911, where stack unwinding failed despite this being one of our builds that crashed in our code.
- bp-f68160c5-16d9-4d00-9fcb-d5e170190912, in
PHttpBackgroundChannelParent::Send__delete__
- bp-1fe014be-c125-4e99-9fae-397350190913, in
PBackgroundParent::OnChannelError
- bp-1c38047a-7ec8-48db-8dc6-faf490190913, also in
PBackgroundParent::OnChannelError
- bp-cf45dedd-3585-4172-baaa-817990190914, in
PBackgroundIDBRequestChild::OnMessageReceived
Extending the search to the past two weeks, there are a couple more related to PBackground
, as well as bp-fe080a28-c838-45e4-999b-488230190908 and bp-1f27fbc5-4b3f-4d18-9e6d-2c09a0190908 in PBrowserChild::Send__delete__
from BrowserChild::DelayedDeleteRunnable
.
Updated•5 years ago
|
Assignee | ||
Comment 6•5 years ago
|
||
I hate to mark a likely UAF as P3, but there just isn't actionable information here.
Updated•5 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Comment 7•3 years ago
|
||
The severity field for this bug is set to S3. However, the bug is flagged with the sec-high
keyword.
:jld, could you consider increasing the severity of this security bug?
For more information, please visit auto_nag documentation.
Comment 9•1 year ago
|
||
We are reviewing and closing unactionable stalled bugs.
Comment 10•1 year ago
|
||
Since the bug is closed, the stalled keyword is now meaningless.
For more information, please visit BugBot documentation.
Updated•1 year ago
|
Description
•