Closed Bug 933082 (RefcntAllocator) Opened 6 years ago Closed 6 years ago

Make it possible to hold on to an ISurfaceAllocator and check if its IPC connection is still open

Categories

(Core :: Graphics, defect)

defect
Not set

Tracking

()

RESOLVED FIXED
mozilla28
Tracking Status
firefox28 --- fixed

People

(Reporter: bjacob, Assigned: bjacob)

References

(Blocks 2 open bugs)

Details

(Whiteboard: [qa-])

Attachments

(5 files, 4 obsolete files)

4.27 KB, patch
nical
: review+
Details | Diff | Splinter Review
3.30 KB, patch
nical
: review+
Details | Diff | Splinter Review
2.31 KB, patch
nical
: review+
Details | Diff | Splinter Review
34.75 KB, patch
nical
: review+
Details | Diff | Splinter Review
939 bytes, patch
Details | Diff | Splinter Review
In bug 914823, we have a crash in code like this in SharedSurface_Gralloc (see bug 914823 comment 71) in B2G 1.2:

    if (mAllocator) {
        mAllocator->DestroySharedSurface(&desc); // crash here
    }

Here, mAllocator is currently a WeakPtr<ISurfaceAllocator>.

The crash is a segfault at address 0x18, which happens to be the offset of ISurfaceAllocator::DestroySharedSurface in ISurfaceAllocator's vtable (it being the 7th virtual method by order of declaration, we have (7 - 1) * sizeof(void*) == 6*4 == 24 == 0x18).

So it seems that mAllocator is null at the crash line, even though this is inside a if (mAllocator) block.

Maybe we have a race condition here, which would be surprising as this is not supposed to be accessed from another thread, and the crash reports' raw dumps don't have any thread in places that might race with this. Or maybe we have a bug in WeakPtr...

Anyway. The reason why this was made a WeakPtr in the first place is that we used to have crashes (earlier in the same bug 914823) with the ISurfaceAllocator dying unexpectedly, and we couldn't hold a strong pointer to it because this would have closed a strong reference cycle. Indeed, the ISurfaceAllocator here is the base class of the ClientLayerManager, which has a chain of strong references to us (basically, ClientLayerManager->ClientCanvasLayer->GLContext->GLScreenBuffer->SurfaceStream->SharedSurface_Gralloc)

We can fix that by making ClientLayerManager not inherit from ISurfaceAllocator anymore, instead hold itself a strong reference to an ISurfaceAllocator. It then becomes possible for our SharedSurface_Gralloc to also hold a strong reference to that ISurfaceAllocator, without closing a reference cycle.

Once we have that, we can be confident that the ISurfaceAllocator doesn't go away under our feet, but we have a new problem to solve: it becomes possible for the IPC channel to be closed under our feet even though the ISurfaceAllocator is alive. The IPC channel here is a LayerTransactionChild. The way that we can become able to tell if it was closed, is to hold a strong reference to it, and have a |mIPCOpen| flag on it that is set to false when the IPDL Dealloc*() method is called on it. Currently, LayerTransactionChild is not refcounted, and Dealloc just destroys it.

So our first prerequisite is to make LayerTransactionChild refcounted and add that flag (patch 1)

Once we have that, we add a IPCOpen virtual method in ISurfaceAllocator whose default impl just returns true, but which is overridden in ShadowLayerForwarder to return the right answer from LayerTransactionChild. DestroySharedSurface uses this flag to avoid actually sending messages after the channel is gone. (patch 2)

Next, we must be careful: as LayerTransactionChild is now refcounted, we must have ShadowLayerForwarder hold a strong reference to it, otherwise it could go away under its feet. (patch 3).

Finally, we can remove ClientLayerManager's inheritance of ShadowLayerForwarder and instead have ClientLayerManager hold a strong reference to a ShadowLayerForwarder. This allows to make ISurfaceAllocator (the base class) refcounted, and in turn this allows SharedSurface_Gralloc to hold a strong reference to ISurfaceAllocator (patch 4)
Is this bug required to fix the blocking bug? If so, then this should be set to koi?
If we're crashing at 0x18, then it means *mAllocator is null, not mAllocator.
And if mAllocator is just a raw pointer (https://bugzilla.mozilla.org/show_bug.cgi?id=914823#c86), then it makes sense that this could be a valid pointer to deleted memory.

It's surprising to me that the value at that address is 0x0 though, rather than garbage or the allocators poison value.
(In reply to Jason Smith [:jsmith] from comment #5)
> Is this bug required to fix the blocking bug? If so, then this should be set
> to koi?

We're actually still unsure, I just marked it as blocking preemptively... but the below means I was wrong!

(In reply to Matt Woodrow (:mattwoodrow) from comment #6)
> If we're crashing at 0x18, then it means *mAllocator is null, not mAllocator.

Dang! Sorry for being so stupid! Indeed it means that mAllocator->vptr is null, which means that it was already destroyed and overwritten by something!

Unblocking 914823.
No longer blocks: 914823
Alias: RefcntAllocator
Getting back to this as it blocks bug 941389 and bug 941390.
Attachment #825037 - Attachment is obsolete: true
Attachment #8338008 - Flags: review?(nical.bugzilla)
Attachment #8338008 - Flags: review?(nical.bugzilla) → review+
Attachment #8338009 - Flags: review?(nical.bugzilla) → review+
Attachment #8338010 - Flags: review?(nical.bugzilla) → review+
Comment on attachment 8338011 [details] [diff] [review]
Part 4: let ClientLayerManager hold a ShadowLayerForwarder instead of inheriting from ShadowLayerForwarder. Make ISurfaceAllocator refcounted. Make SharedSurface_Gralloc keep its mAllocator alive.

Review of attachment 8338011 [details] [diff] [review]:
-----------------------------------------------------------------

Nice!
Attachment #8338011 - Flags: review?(nical.bugzilla) → review+
Attached patch mac fixSplinter Review
Had forgotten to update a few {A,Dea}llocPLayerTransaction{Parent,Child}. I don't suppose that another cycle of reviews is needed for that.

https://tbpl.mozilla.org/?tree=Try&rev=ed22b705bb30
Had forgotten to initialize LayerTransactionParent::mIPCOpen to false >_<

https://tbpl.mozilla.org/?tree=Try&rev=fb4d542ac5d8
Had forgotten to remove the NS_INLINE_DECL_THREADSAFE_REFCOUNTING(ImageBridgeParent)

https://tbpl.mozilla.org/?tree=Try&rev=3c48adddf0f6
Blocks: 944703
Whiteboard: [qa-]
Blocks: 1004191
You need to log in before you can comment on or make changes to this bug.