Closed Bug 1219529 Opened 9 years ago Closed 9 years ago

crash in mozilla::layers::PLayerTransactionParent::DestroySharedMemory

Categories

(Core :: Graphics: Layers, defect)

44 Branch
x86
macOS
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla45
Tracking Status
firefox44 --- fixed
firefox45 + fixed

People

(Reporter: kjozwiak, Assigned: nical)

References

Details

(Keywords: crash, topcrash, Whiteboard: [gfx-noted])

Crash Data

Attachments

(2 files)

m-c is currently crashing when you close the browser while google drive is still opened. I've reproduced the crash 100% of the time using the STR outlined below. I'm not sure if this is exploitable as it deals with destroying memory??.. I've checked the "Exploitability" field under the crash reports but it appears as "ERROR: unable to analyze dump". Please re-open if this isn't exploitable.

STR:

- install the latest m-c [1] (reproduced this on a clean install as well)
- launch gmail.com and login
- click on the "Google Apps" icon and select "Drive"
- once "Google Drive" opens in a new tab, close FX via the hamburger menu
---> instant crash every single time

Process 4288 stopped
* thread #30: tid = 0x30cf6b, 0x0000000102be9d27 XUL`mozilla::layers::PLayerTransactionParent::DestroySharedMemory(this=<unavailable>, shmem=0x0000000120ad7220) + 7 at PLayerTransactionParent.cpp:254, name = 'Compositor', stop reason = EXC_BAD_ACCESS (code=EXC_I386_GPFLT)
    frame #0: 0x0000000102be9d27 XUL`mozilla::layers::PLayerTransactionParent::DestroySharedMemory(this=<unavailable>, shmem=0x0000000120ad7220) + 7 at PLayerTransactionParent.cpp:254
   251 	
   252 	auto PLayerTransactionParent::DestroySharedMemory(Shmem& shmem) -> bool
   253 	{
-> 254 	    return (mManager)->DestroySharedMemory(shmem);
   255 	}
   256 	
   257 	auto PLayerTransactionParent::OtherPid() const -> base::ProcessId

Crashes:

* bp-56877e54-3e03-449d-a9c2-774982151029
* bp-af11bc13-51cb-45d7-8862-18a0f2151029
* bp-ca95c687-5951-4866-b9c5-54e1e2151029
* bp-14617e89-5622-4e77-9d4c-293bf2151029

System Info:

* https://archive.mozilla.org/pub/firefox/nightly/2015/10/2015-10-28-03-04-32-mozilla-central/
** https://hg.mozilla.org/mozilla-central/rev/fc706d376f0658e560a59c3dd520437b18e8c4a4
* MPB running 10.11.1 x64
Probably another place where others are holding on to graphics resources after the graphics is shutdown; Nical is looking at some of those.
Flags: needinfo?(nical.bugzilla)
This one is different because it happens on the parent side. It looks like the TextureHost's reference count keeps him alive a bit longer than it should have, and TextureHost::Finalize calls DeallocateSharedData which calls into some IPDL with a PLayerTransactionParent that was already deallocated to something along those lines.
I am not sure if it means we are calling DeallocateSharedData twice (we should have called it when the PTexture actor was destroyed), or if we missed to call it when destroying the actor, but we should definitely not rely on the refcount so I'll fix that, and make sure we don't leak anything.
Assignee: nobody → nical.bugzilla
Flags: needinfo?(nical.bugzilla)
This patch fixes the following things:
* Destroying the shared data HAS to happen during the PTexture destuction handshake, and it is the whole point of having that handshake in the first place. The current situation kinda worked out because the DEALLOCATE_CLIENT flag forces the TextureHost to handle destruction in the handshake, and for async destruction, we are supposed to not have a handle to the texture on the client side, so waiting a bit before we deallocate deosn't hurt too much in theory. Except during shutdown, where we might have already deallocated the top-level protocol when the TextureHost's reference count gets to 0 (which causes this crash when we try to use a deallocated protocol to deallocate a shmem).
* We used to have code to handle forgetting the shmem buffer if the child process crashes (it is important because in this situation, IPDL deallocates the shmem underneath us, so we want to avoid double free). But we don't call that function anymore (OnShutdown()), so I restored it (renamed into OnAbnormalShutdown).
* ForgetBufferActor is dead code and fixed a problem that we don't have anymore so I removed it.

This bug is a good example of code that was designed around some properties, then lost some of these properties without us (or at least me) noticing, while other code still relies them. Unfortunately it's pretty hard to automatically test this area currently.
Attachment #8681375 - Flags: review?(sotaro.ikeda.g)
On a side-note, I don't think this is security sensitive, because it can only happen during shutdown and we consistently crash dereferencing null when it happens. No uninitialized memory involve or double-free (since we crash on that null pointer before any of that can happen).
(In reply to Nicolas Silva [:nical] from comment #3)
...
> This bug is a good example of code that was designed around some properties,
> then lost some of these properties without us (or at least me) noticing,
> while other code still relies them. Unfortunately it's pretty hard to
> automatically test this area currently.

Any way to assert/gfxCrash if assumptions we have are violated?
Group: gfx-core-security
Depends on: 1219761
Depends on: 1221056
Bringing signatures over from duped bug 1221567
Crash Signature: [@ mozilla::layers::ShmemTextureHost::DeallocateSharedData] [@ <T>]
Depends on: 1222080
Severity: normal → critical
Keywords: crash
[Tracking Requested - why for this release]: This is currently the #2 topcrash in Nightly accounting for 5.74% of the total crash volume.
Whiteboard: [gfx-noted]
I still want to land the other fix but it depends on some other patches.
This patch works around the issue by making sure we don't touch shmems after the ipdl actors are shut down (the shmems won't leak because IPDL automatically deallocates remaining shmems when their associated protocol is destroyed). It should also be safe to uplift if needed.
The patch also makes sure CompositorParent cleans up the textures "device data" which can contain gl textures among other things, before the compositor and its gl context dies (same logic as ImageBridgeParent).
Attachment #8687191 - Flags: review?(sotaro.ikeda.g)
Attachment #8687191 - Flags: review?(sotaro.ikeda.g) → review+
Attachment #8681375 - Flags: review?(sotaro.ikeda.g) → review+
Crash Signature: [@ mozilla::layers::ShmemTextureHost::DeallocateSharedData] [@ <T>] → [@ mozilla::layers::ShmemTextureHost::DeallocateSharedData] [@ <T>] [@ <name omitted> | mozilla::layers::PLayerTransactionParent::DeallocShmem]
https://hg.mozilla.org/mozilla-central/rev/7ef5c4e21693
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla45
I don't see any crashes of this with a higher build ID than the build from Nov 19. Can we get this uplifted to Dev Edition 44?
Flags: needinfo?(nical.bugzilla)
Comment on attachment 8687191 [details] [diff] [review]
Don't try to do shmem-related work after the ipdl connection is shut down

Approval Request Comment
[Feature/regressing bug #]:
[User impact if declined]: shutdown crashes
[Describe test coverage new/current, TreeHerder]:
[Risks and why]: low, has baked in central for a little while
[String/UUID change made/needed]:
Flags: needinfo?(nical.bugzilla)
Attachment #8687191 - Flags: approval-mozilla-aurora?
Comment on attachment 8687191 [details] [diff] [review]
Don't try to do shmem-related work after the ipdl connection is shut down

Nightly crashstats give us a decent bit of confidence that this fix has helped, let's uplift to Aurora44.
Attachment #8687191 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: