Closed Bug 910962 Opened 11 years ago Closed 10 years ago

DeallocShmem abort with multi-process

Categories

(Core :: IPC, defect)

x86
macOS
defect
Not set
normal

Tracking

()

VERIFIED FIXED
mozilla29

People

(Reporter: Felipe, Assigned: billm)

References

Details

Attachments

(2 files)

With our current multi-process setup, the abort introduced in bug 844996 is always hit when the content process dies.

STR:
 - Set browser.tabs.remote to true
 - Restart firefox
 - Kill the content process that was launched

Result: parent process aborts


I don't know if this is exposing a real problem, or if the code was just not expecting this situation.
It's exposing a real problem. This likely means you're doing a double free or trying have a shmem somewhere that isn't a shmem at all.
Blocks: 913155
Are you sure? When you kill the process there really isn't time to free stuff properly.
I don't understand the word "time" here. The parent process has all the time in the world.
(In reply to Tom Schuster [:evilpie] from comment #2)
> Are you sure? When you kill the process there really isn't time to free
> stuff properly.

Note that this isn't a 'leak' but most likely a double free.
nrc pointed out a version of this on desktop Firefox builds:

https://tbpl.mozilla.org/php/getParsedLog.php?id=28086871&tree=Try

It appears that in this test the child is intentionally being killed. The IPDL system will clean up any active ShMem instances when it tears down the actor tree. Client code (the layer system) should be using ActorDestroy to notice that the actor is dead and should not be touching these ShMem instances after that point.

Stack:
03:46:35     INFO -  Crash reason:  EXCEPTION_BREAKPOINT
03:46:35     INFO -  Crash address: 0x7c90120e
03:46:35     INFO -  Thread 23 (crashed)
03:46:35     INFO -   0  ntdll.dll + 0x120e
03:46:35     INFO -      eip = 0x7c90120e   esp = 0x08aff85c   ebp = 0x08affc78   ebx = 0x00000040
03:46:35     INFO -      esi = 0x10261440   edi = 0x00000000   eax = 0x00000000   ecx = 0x08aff874
03:46:35     INFO -      edx = 0x003853dc   efl = 0x00000202
03:46:35     INFO -      Found by: given as instruction pointer in context
03:46:35     INFO -   1  xul.dll!mozilla::layers::PLayerTransactionParent::DeallocShmem(mozilla::ipc::Shmem &) [PLayerTransactionParent.cpp:226d33cca5ed : 827 + 0x16]
03:46:35     INFO -      eip = 0x0328f218   esp = 0x08affc80   ebp = 0x08affc9c
03:46:35     INFO -      Found by: previous frame's frame pointer
03:46:35     INFO -   2  xul.dll!mozilla::layers::ISurfaceAllocator::DestroySharedSurface(mozilla::layers::SurfaceDescriptor *) [ISurfaceAllocator.cpp:226d33cca5ed : 138 + 0x10]
03:46:35     INFO -      eip = 0x037a3423   esp = 0x08affca4   ebp = 0x08affce8
03:46:35     INFO -      Found by: call frame info
03:46:35     INFO -   3  xul.dll!mozilla::layers::DeprecatedTextureHost::~DeprecatedTextureHost() [TextureHost.cpp:226d33cca5ed : 192 + 0x5]
03:46:35     INFO -      eip = 0x037c7a93   esp = 0x08affcf0   ebp = 0x08affd00
03:46:35     INFO -      Found by: call frame info
03:46:35     INFO -   4  xul.dll!mozilla::layers::DeprecatedTextureHostShmemD3D9::`scalar deleting destructor'(unsigned int) + 0xa
03:46:35     INFO -      eip = 0x0375b4d3   esp = 0x08affcfc   ebp = 0x08affd00
03:46:35     INFO -      Found by: call frame info
03:46:35     INFO -   5  xul.dll!mozilla::detail::RefCounted<mozilla::layers::TextureSource,1>::Release() [RefPtr.h:226d33cca5ed : 82 + 0xa]
03:46:35     INFO -      eip = 0x037568d6   esp = 0x08affd08   ebp = 0x08affd28
03:46:35     INFO -      Found by: call frame info
03:46:35     INFO -   6  xul.dll!mozilla::layers::ContentHostDoubleBuffered::DestroyTextures() [ContentHost.cpp:226d33cca5ed : 439 + 0xb]
03:46:35     INFO -      eip = 0x037a0ff4   esp = 0x08affd10   ebp = 0x08affd28
03:46:35     INFO -      Found by: call frame info
03:46:35     INFO -   7  xul.dll!mozilla::layers::ContentHostDoubleBuffered::~ContentHostDoubleBuffered() [ContentHost.cpp:226d33cca5ed : 377 + 0x4]
03:46:35     INFO -      eip = 0x037a1765   esp = 0x08affd1c   ebp = 0x08affd28
03:46:35     INFO -      Found by: call frame info
03:46:35     INFO -   8  xul.dll!mozilla::layers::ContentHostDoubleBuffered::`vector deleting destructor'(unsigned int) + 0xa
03:46:35     INFO -      eip = 0x03789e15   esp = 0x08affd24   ebp = 0x08affd28
03:46:35     INFO -      Found by: call frame info
03:46:35     INFO -   9  xul.dll!mozilla::detail::RefCounted<mozilla::layers::CompositableHost,1>::Release() [RefPtr.h:226d33cca5ed : 82 + 0xa]
03:46:35     INFO -      eip = 0x03780e88   esp = 0x08affd30   ebp = 0x08affd44
03:46:35     INFO -      Found by: call frame info
03:46:35     INFO -  10  xul.dll!mozilla::layers::CompositableParent::~CompositableParent() [CompositableHost.cpp:226d33cca5ed : 259 + 0x11]
03:46:35     INFO -      eip = 0x0378b9ba   esp = 0x08affd38   ebp = 0x08affd44
03:46:35     INFO -      Found by: call frame info
03:46:35     INFO -  11  xul.dll!mozilla::layers::CompositableParent::`scalar deleting destructor'(unsigned int) + 0xa
03:46:35     INFO -      eip = 0x0378bdc3   esp = 0x08affd40   ebp = 0x08affd44
03:46:35     INFO -      Found by: call frame info
03:46:35     INFO -  12  xul.dll!mozilla::layers::LayerTransactionParent::DeallocPCompositableParent(mozilla::layers::PCompositableParent *) [LayerTransactionParent.cpp:226d33cca5ed : 592 + 0xc]
03:46:35     INFO -      eip = 0x037b6720   esp = 0x08affd4c   ebp = 0x08affd50
03:46:35     INFO -      Found by: call frame info
03:46:35     INFO -  13  xul.dll!mozilla::layers::PLayerTransactionParent::DeallocSubtree() [PLayerTransactionParent.cpp:226d33cca5ed : 898 + 0x11]
03:46:35     INFO -      eip = 0x032946f1   esp = 0x08affd58   ebp = 0x08affda0
03:46:35     INFO -      Found by: call frame info
03:46:35     INFO -  14  xul.dll!mozilla::layers::PCompositorParent::DeallocSubtree() [PCompositorParent.cpp:226d33cca5ed : 893 + 0x12]
03:46:35     INFO -      eip = 0x03275150   esp = 0x08affd6c   ebp = 0x08affda0
03:46:35     INFO -      Found by: call frame info
03:46:35     INFO -  15  xul.dll!mozilla::layers::PCompositorParent::OnChannelError() [PCompositorParent.cpp:226d33cca5ed : 756 + 0x6]
03:46:35     INFO -      eip = 0x03276604   esp = 0x08affd7c   ebp = 0x08affda0

In this case it seems that CompositableParent::ActorDestroy does call Detach on its CompositableHost, but still holds a ref to ~ContentHostDoubleBuffered (via mFirstTexture? not sure about the object hierarchy here). So probably we need to make sure that ContentHostDoubleBuffered::DestroyTextures() is called from within ActorDestroy. But I'm a little fuzzy on how it all works.
Attached file stacktrace
I was trying to reproduce something else and hit this again. Just clicked on reddit links and went back and forward rather quickly for a few minutes.
At our last e10s meeting, Bill said he could take a look at this bug.
Assignee: nobody → wmccloskey
Attached patch ipc-fixSplinter Review
This is basically a bogus assertion. It happens when the IPC channel unexpected closes and we tear everything down. The code looks like this:

auto PCompositorParent::OnChannelError() -> void
{
    DestroySubtree(AbnormalShutdown);
    DeallocSubtree();
    DeallocShmems();
}

The code for DeallocSubtree() eventually calls DestroySharedMemory. Currently, that code looks like this:

auto PCompositorParent::DestroySharedMemory(Shmem& shmem) -> bool
{
    Shmem::id_t aId = (shmem).Id(Shmem::IHadBetterBeIPDLCodeCallingThis_OtherwiseIAmADoodyhead());
    Shmem::SharedMemory* segment = LookupSharedMemory(aId);
    if ((!(segment))) {
        return false;
    }

    Message* descriptor = (shmem).UnshareFrom(Shmem::IHadBetterBeIPDLCodeCallingThis_OtherwiseIAmADoodyhead(), OtherProcess(), MSG_ROUTING_CONTROL);
    (mShmemMap).Remove(aId);
    Shmem::Dealloc(Shmem::IHadBetterBeIPDLCodeCallingThis_OtherwiseIAmADoodyhead(), segment);

    return (descriptor) && ((mChannel).Send(descriptor));
}

We fail here when we try to do the Send call, since the channel is already closed. However, everything else is successful.

There doesn't seem to be any reason why we can't allow DestroySharedMemory to be called at this time. Aside from the Send call, there's nothing wrong. This patch just adds a check and avoids the Send() call if the channel is already closed. The new code looks like this:

...
    Message* descriptor = ...;

    (mShmemMap).Remove(aId);
    Shmem::Dealloc(Shmem::IHadBetterBeIPDLCodeCallingThis_OtherwiseIAmADoodyhead(), segment);

    if ((!((mChannel).CanSend()))) {
        delete descriptor;
        return true;
    }

    return (descriptor) && ((mChannel).Send(descriptor));
}

I had to add an extra CanSend method to the channel. I was hoping to use the protocol's mState to detect if we had already closed. However, that doesn't appear to be updated at all during this entire sequence. At the time we crash, it just contains the normal __Null state.
Attachment #8368244 - Flags: review?(benjamin)
Attachment #8368244 - Flags: review?(benjamin) → review+
https://hg.mozilla.org/mozilla-central/rev/fcb20c4c1ef8
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla29
Keywords: verifyme
I tried to reproduce this on Firefox 26, on Mac OS X 10.9. After restarting, I get one extra process: Firefox Plugin Process. When killing it, I get: "Tab crashed
Well, this is embarrassing. We tried to display this Web page, but it's not responding."

Nothing else happens. When hitting "Retry", the process is restarted and the tab is restored.

I see the exact same behavior with Firefox 29b3 on MAC OS X 10.9.

Felipe, is this what you reproduced initially? If not, could you please verify if this bug is fixed for you on Firefox 29?
Flags: needinfo?(felipc)
Keywords: verifyme
Ioana, this bug affected only debug builds of Firefox, not release builds. So that's probably why you weren't able to reproduce it on your test. But I think further verification is not necessary, I do recall the problem going away with the fix.
Flags: needinfo?(felipc)
based on comment 12
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.