Closed
Bug 1277614
Opened 9 years ago
Closed 9 years ago
Crash in mozilla::dom::PBlobChild::DestroySubtree
Categories
(Core :: IPC, defect, P1)
Tracking
()
RESOLVED
FIXED
mozilla50
People
(Reporter: marcia, Assigned: gkrizsanits)
References
Details
(Keywords: crash, csectype-uaf, sec-high, Whiteboard: [adv-main48-] btpp-active, e10s-only)
Crash Data
Attachments
(2 files, 3 obsolete files)
1.40 KB,
patch
|
billm
:
review+
Sylvestre
:
approval-mozilla-aurora+
Sylvestre
:
approval-mozilla-beta+
|
Details | Diff | Splinter Review |
1.62 KB,
text/plain
|
Details |
This bug was filed from the Socorro interface and is
report bp-a9aedac8-9df7-47c0-9908-139d92160602.
=============================================================
Seen while looking at crash-stats. http://bit.ly/1TYfPTu to the crashes on nightly, which are both Mac and Windows. Hopefully I filed this in the right bucket...
Frame Module Signature Source
0 xul.dll mozilla::dom::PBlobChild::DestroySubtree(mozilla::ipc::IProtocolManager<mozilla::ipc::IProtocol>::ActorDestroyReason) obj-firefox/ipc/ipdl/PBlobChild.cpp:626
1 xul.dll mozilla::dom::PContentChild::DestroySubtree(mozilla::ipc::IProtocolManager<mozilla::ipc::IProtocol>::ActorDestroyReason) obj-firefox/ipc/ipdl/PContentChild.cpp:10453
2 xul.dll mozilla::dom::PContentChild::OnChannelClose() obj-firefox/ipc/ipdl/PContentChild.cpp:10370
3 xul.dll mozilla::ipc::MessageChannel::NotifyChannelClosed() ipc/glue/MessageChannel.cpp:2232
4 xul.dll mozilla::ipc::MessageChannel::NotifyMaybeChannelError() ipc/glue/MessageChannel.cpp:2071
5 xul.dll mozilla::ipc::MessageChannel::OnNotifyMaybeChannelError() ipc/glue/MessageChannel.cpp:2107
6 xul.dll nsRunnableMethodImpl<void ( mozilla::ipc::MessageChannel::*)(void), 0, 1>::Run() obj-firefox/dist/include/nsThreadUtils.h:749
7 xul.dll nsThread::ProcessNextEvent(bool, bool*) xpcom/threads/nsThread.cpp:1029
8 xul.dll NS_ProcessNextEvent(nsIThread*, bool) xpcom/glue/nsThreadUtils.cpp:290
9 xul.dll mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate*) ipc/glue/MessagePump.cpp:132
10 xul.dll mozilla::ipc::MessagePumpForChildProcess::Run(base::MessagePump::Delegate*) ipc/glue/MessagePump.cpp:317
11 xul.dll MessageLoop::RunHandler() ipc/chromium/src/base/message_loop.cc:228
12 xul.dll MessageLoop::Run() ipc/chromium/src/base/message_loop.cc:208
13 xul.dll nsBaseAppShell::Run() widget/nsBaseAppShell.cpp:156
14 xul.dll nsAppShell::Run() widget/windows/nsAppShell.cpp:262
15 xul.dll XRE_RunAppShell toolkit/xre/nsEmbedFunctions.cpp:827
16 xul.dll mozilla::ipc::MessagePumpForChildProcess::Run(base::MessagePump::Delegate*) ipc/glue/MessagePump.cpp:285
17 xul.dll MessageLoop::RunHandler() ipc/chromium/src/base/message_loop.cc:228
18 xul.dll MessageLoop::Run() ipc/chromium/src/base/message_loop.cc:208
19 xul.dll XRE_InitChildProcess toolkit/xre/nsEmbedFunctions.cpp:657
20 plugin-container.exe NS_internal_main(int, char**) ipc/app/MozillaRuntimeMain.cpp:11
21 plugin-container.exe wmain toolkit/xre/nsWindowsWMain.cpp:123
22 plugin-container.exe __scrt_common_main_seh f:/dd/vctools/crt/vcstartup/src/startup/exe_common.inl:255
Comment 2•9 years ago
|
||
Looking at the code where it is crashing, this may be a generic IPC issue, rather than something related specifically to Blobs, so I'm going to move this to IPC.
Component: DOM: Content Processes → IPC
Comment 4•9 years ago
|
||
One of the IPC footgun patterns is to have an ActorDestroy method that indirectly (e.g., via dropping references) tries to do things to another actor that's later in the child array currently being destroyed; I wonder if there's an instance of that going on here. See also bug 1226477 comment #35.
Who should own this?
Flags: needinfo?(jmathies)
Flags: needinfo?(continuation)
Comment 6•9 years ago
|
||
I have a ni? out on baku in the blocking sec bug.
Flags: needinfo?(jmathies)
Flags: needinfo?(continuation)
![]() |
||
Comment 7•9 years ago
|
||
(In reply to David Baron :dbaron: ⌚️UTC+2 (review requests must explain patch) (vacation June 4-12) from comment #5)
> Who should own this?
I don't know if we have an primary owner of Blob code. bent wrote most of it, and Andrea Marchesini has done a bit of work on it. Gabor Kriszanits might be interested helping out here since this is dom/ipc related.
Flags: needinfo?(gkrizsanits)
Assignee | ||
Comment 8•9 years ago
|
||
(In reply to Jim Mathies [:jimm] from comment #7)
> (In reply to David Baron :dbaron: ⌚️UTC+2 (review requests must explain
> patch) (vacation June 4-12) from comment #5)
> > Who should own this?
>
> I don't know if we have an primary owner of Blob code. bent wrote most of
> it, and Andrea Marchesini has done a bit of work on it. Gabor Kriszanits
> might be interested helping out here since this is dom/ipc related.
Sure. Can I have CC to the blocking sec bug?
Assignee: nobody → gkrizsanits
Flags: needinfo?(gkrizsanits)
Comment 9•9 years ago
|
||
Gabor, I CCed you. I wonder if this is a regression. I'm OK to review a patch. I was planning to work on this bug, but not before the work-week.
Flags: needinfo?(gkrizsanits)
Updated•9 years ago
|
Whiteboard: btpp-active
Assignee | ||
Comment 10•9 years ago
|
||
To avoid the scenario in Comment 4 I think we should set a flag when we start deallocating the children. And from that point on we ignore remove requests from children (we're about to remove all of them anyway). Instead of inventing a new flag or state for this I'm using the dying state here. I'm not sure if that's the right approach and what invariants I'm breaking with this. Bill, what do you think?
Flags: needinfo?(gkrizsanits)
Attachment #8761647 -
Flags: feedback?(wmccloskey)
![]() |
||
Updated•9 years ago
|
Priority: -- → P1
Can you include an example of how this changes the generated code (as a diff)? It's a lot easier to review that way.
Flags: needinfo?(gkrizsanits)
Assignee | ||
Comment 12•9 years ago
|
||
Sorry, I thought I did but it seems like I forgot :(
Flags: needinfo?(gkrizsanits)
Attachment #8761960 -
Flags: feedback?(wmccloskey)
Assignee | ||
Comment 13•9 years ago
|
||
Comment on attachment 8761960 [details]
generated
Bugzilla cannot recognize a regular diff's as a patch so I'm unchecking it.
Attachment #8761960 -
Attachment is patch: false
Comment 14•9 years ago
|
||
Why do you think the issue is in the IPC code and not in the PBlob? It seems that this happens only with PBlobChild::DestroySubtree.
Flags: needinfo?(gkrizsanits)
Assignee | ||
Comment 15•9 years ago
|
||
(In reply to Andrea Marchesini (:baku) from comment #14)
> Why do you think the issue is in the IPC code and not in the PBlob? It seems
> that this happens only with PBlobChild::DestroySubtree.
What Jed describes in Comment 4 and in the comment he links to there seems like something we should guard against anyway. This issue can explain the crash, and I cannot see any other explanation for it. Looking into the linked bugs through Comment 4 it does not seem like a Blob only issue by the way. We might be able to find a workaround for the crash at Blob level but I would prefer to fix the footgun in IPC itself so this won't happen again in another, similar situation in the future.
What I think might be happening here is that when we start the iteration in DeallocSubTree:
for (auto iter = (mManagedPBlobStreamChild).Iter(); (!((iter).Done())); (iter).Next()) {
During that iteration it's fine to remove an element from the hash table (nsTHashtable<nsPtrHashKey<Protocol>>) via the iterator (iterator.Remove())
but if we trigger a PBlobChild::RemoveManagee call (in case of RemoteBlobSliceImpl this can happen) during the iteration, that removes an element directly by (mManagedPBlobStreamChild).RemoveEntry(actor); and that can totally confuse the iterator I think.
(the difference is between the two kind of removal is that the later also calls ShrinkIfAppropriate() on the map https://dxr.mozilla.org/mozilla-central/source/xpcom/glue/PLDHashTable.cpp#633)
Flags: needinfo?(gkrizsanits)
Comment 16•9 years ago
|
||
Thanks for you comment. I was/am only afraid that we are hiding some existing Blob+IPC bug.
I see the point of what you do. Sounds good to me.
Assignee | ||
Comment 17•9 years ago
|
||
Bill, I'm not sure you've seen it but this is an m9 :( do you think you will have some time this week to discuss this bug/patch?
Flags: needinfo?(wmccloskey)
Sorry for the delay. This doesn't look right to me though. My understanding from Jed's comment is that we get into trouble in DestroySubtree and never reach DeallocSubtree. So:
In DestroySubtree, we accumulate a bunch of children in |kids|. Then we iterate through them and call ActorDestroy. If one of those ActorDestroy methods calls Send__delete__ on a later sibling, then we'll delete it right away and remove it from mManagedPBlobStreamChild. However, we'll still iterate over it later because it's still in |kids|. At this point we crash with a UAF.
I wonder if we could modify DestroySubtree to verify that each element of |kids| is still in mManagedPBlobStreamChild before calling DestroySubtree on it. That should be quick since mManagedPBlobStreamChild is a hashtable.
The fact that we'll still send the __delete__ message is a bit unfortunate. In the shutdown case, the actual Send call will fail, which I think is fine. If this isn't during shutdown, then I guess we might end up crashing with a route error. Not sure how big a problem that should be. At least it's not a UAF.
Flags: needinfo?(wmccloskey)
Assignee | ||
Comment 19•9 years ago
|
||
(In reply to Bill McCloskey (:billm) from comment #18)
> Sorry for the delay. This doesn't look right to me though. My understanding
> from Jed's comment is that we get into trouble in DestroySubtree and never
> reach DeallocSubtree. So:
Not for the Blob but we do reach it for the child otherwise it would not remove itself from mManagedPBlobStreamChild no?
>
> In DestroySubtree, we accumulate a bunch of children in |kids|. Then we
> iterate through them and call ActorDestroy. If one of those ActorDestroy
> methods calls Send__delete__ on a later sibling, then we'll delete it right
> away and remove it from mManagedPBlobStreamChild.
That's exactly what my patch is supposed to prevent. With my patch that child would
not remove itself from mManagedPBlobStreamChild and we would not destroy it yet.
>
> I wonder if we could modify DestroySubtree to verify that each element of
> |kids| is still in mManagedPBlobStreamChild before calling DestroySubtree on
> it. That should be quick since mManagedPBlobStreamChild is a hashtable.
That was my initial plan, but let's say we do that, that would fix this problem,
but I would be still worried about the problem I described in Comment 15 which
we might hit in some cases. My solution would fix both problem at once.
>
> The fact that we'll still send the __delete__ message is a bit unfortunate.
> In the shutdown case, the actual Send call will fail, which I think is fine.
> If this isn't during shutdown, then I guess we might end up crashing with a
> route error. Not sure how big a problem that should be. At least it's not a
> UAF.
I think this part should be another bug but once we set that flag (again I'm not sure
the state is the right place to indicate that we started destroying the blob and its
subtree) we should be able to prevent that message to be sent too.
Flags: needinfo?(wmccloskey)
Sorry, I made a mistake. I shouldn't have talked about mManagedPBlobStreamChild since that's not related to what's happening here.
I think the problem is that there are two blobs that are siblings. When one blob's ActorDestroy function is called, it ends up invoking Send__delete__ on a second blob. The problem is in PContentChild::DestroySubtree when it iterates over mManagedPBlobChild.
I don't see how the problem described in comment 15 can happen. DeallocSubTree is basically designed to recursively call methods like DeallocPBlobStreamChild or DeallocPBlobChild. However, those methods don't seem to do much besides dealloc memory. The destructors they call don't do anything. That's pretty typical for this sort of IPDL code.
Flags: needinfo?(wmccloskey)
Assignee | ||
Comment 21•9 years ago
|
||
Attaching diff for the generated code too in a bit.
Attachment #8761647 -
Attachment is obsolete: true
Attachment #8761960 -
Attachment is obsolete: true
Attachment #8761647 -
Flags: feedback?(wmccloskey)
Attachment #8761960 -
Flags: feedback?(wmccloskey)
Attachment #8763076 -
Flags: review?(wmccloskey)
Assignee | ||
Comment 22•9 years ago
|
||
Assignee | ||
Comment 23•9 years ago
|
||
I think there are some special characters in the previous diff for some reason that prevents bugzilla to open it. I'm trying to upload a fixed version I hope this helps.
Attachment #8763077 -
Attachment is obsolete: true
Attachment #8763076 -
Flags: review?(wmccloskey) → review+
Assignee | ||
Comment 24•9 years ago
|
||
Assignee | ||
Comment 25•9 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/f96486bb6405877f4d8687325615562d4daa9290
Bug 1277614 - Guarding child iteration in DestroySubtree. r=wmccloskey
Assignee | ||
Comment 26•9 years ago
|
||
Comment on attachment 8763076 [details] [diff] [review]
guarding child iteration in DestroySubtree
Approval Request Comment
[Feature/regressing bug #]: hard to tell
[User impact if declined]: it's a crashing bug with high rates and it's an e10s release blocker
[Describe test coverage new/current, TreeHerder]: I could never reproduce the issue locally so there are no tests for it but we think the problem is well understood and the fix might cover other similar cases. It's on inbound right now.
[Risks and why]: The patch is adding a trivial hash map check. I see no risk at all.
[String/UUID change made/needed]: none
Attachment #8763076 -
Flags: approval-mozilla-beta?
Attachment #8763076 -
Flags: approval-mozilla-aurora?
Updated•9 years ago
|
status-firefox48:
--- → affected
status-firefox50:
--- → affected
Reporter | ||
Comment 28•9 years ago
|
||
Adding the signatures from the duped bug.
Crash Signature: [@ mozilla::dom::PBlobChild::DestroySubtree] → [@ mozilla::dom::PBlobChild::DestroySubtree]
[@ @0x0 | mozilla::dom::PContentChild::DestroySubtree]
[@ mozilla::dom::PContentChild::DestroySubtree]
Comment 29•9 years ago
|
||
bugherder |
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla50
Comment 30•9 years ago
|
||
Comment on attachment 8763076 [details] [diff] [review]
guarding child iteration in DestroySubtree
Anything to fix e10s crashes, taking it.
Should be in 48 beta 3
Attachment #8763076 -
Flags: approval-mozilla-beta?
Attachment #8763076 -
Flags: approval-mozilla-beta+
Attachment #8763076 -
Flags: approval-mozilla-aurora?
Attachment #8763076 -
Flags: approval-mozilla-aurora+
Comment 31•9 years ago
|
||
bugherder uplift |
Comment 32•9 years ago
|
||
bugherder uplift |
Comment 34•9 years ago
|
||
The mozilla::dom::PBlobChild::DestroySubtree signature, which is by far the most common, does seem to have gone away when these patches landed, which is great. mozilla::dom::PContentChild::DestroySubtree only seems to have happened once recently, so I guess that's okay, too. @0x0 | mozilla::dom::PContentChild::DestroySubtree is still happening, but it looks like it is happening somewhere else in the code, so I'll undupe a bug to track that separately.
Comment 35•9 years ago
|
||
No need for this on ESR, as it is e10s-specific.
status-firefox-esr45:
--- → wontfix
Updated•9 years ago
|
Keywords: csectype-uaf,
sec-high
Whiteboard: btpp-active → btpp-active, e10s-only
Comment 37•9 years ago
|
||
After uplift, there are still 5 reports in the last week for this signature on aurora 49: @0x0 | mozilla::dom::PContentChild::DestroySubtree
and 62 reports in the last week for mozilla::dom::PBlobChild::DestroySubtree .
Does this need a follow up bug? Did this fix most if not all of the issue in aurora?
Flags: needinfo?(gkrizsanits)
Comment 38•9 years ago
|
||
(In reply to Liz Henry (:lizzard) (needinfo? me) from comment #37)
> Does this need a follow up bug? Did this fix most if not all of the issue in
> aurora?
I don't see any mozilla::dom::PBlobChild::DestroySubtree crashes on builds post 6-23, when this was fixed. Bug 1281223 covers the resididual @0x0 | mozilla::dom::PContentChild::DestroySubtree crashes, which seem like a separate issue.
Flags: needinfo?(gkrizsanits)
See Also: → 1281223
Updated•9 years ago
|
status-firefox47:
--- → disabled
Whiteboard: btpp-active, e10s-only → [adv-main48-] btpp-active, e10s-only
You need to log in
before you can comment on or make changes to this bug.
Description
•