Closed Bug 1329432 Opened 3 years ago Closed 3 years ago

Intermittent dom/media/test/test_background_video_suspend.html | application crashed [@ RefPtr<mozilla::layers::ImageClient>::~RefPtr()][@ mozilla::layers::ImageLayerComposite::~ImageLayerComposite()]


(Core :: Graphics, defect)

Not set



Tracking Status
firefox51 --- unaffected
firefox52 --- unaffected
firefox53 --- fixed


(Reporter: intermittent-bug-filer, Assigned: sotaro)



(Keywords: assertion, crash, intermittent-failure, Whiteboard: gfx-noted)


(1 file, 4 obsolete files)

Component: Audio/Video: Playback → Graphics
Whiteboard: gfx-noted
From low log, an assertion failure happens by dupping release on ImageClient.

08:55:40     INFO - Assertion failure: int32_t(mRefCnt) > 0 (dup release), at /builds/slave/m-in-m64-d-0000000000000000000/build/src/obj-firefox/dist/include/mozilla/layers/CompositableClient.h:75 

ImageClient was created by ImageContainer and possibly held reference by SharedPlanarYCbCrImage or SharedRGBImage. 
Since they all allocate another RefPtr to hold ImageClient, it won't be a problem if SharedPlanarYCbCrImage or SharedRGBImage was destroyed by some reasons.
Other possible was someone operated release() to reduce reference count directly. But from code tracking, I don't see this.
The last possible would be memory corruption.

Soraro, can you please have more input to figure out the possible reason if I have any missing? Really thanks.
Flags: needinfo?(sotaro.ikeda.g)
It seems like a regression of Bug 1325784. A pointer of ImageContainer in mImageContainers might become invalid already.
Conflict between ImageContainer::~ImageContainer() and ImageBridgeChild::RecvDidComposite() seems not work well.

ImageBridgeChild::RecvDidComposite() tries to keep ref count of ImageContainer. But When ImageContainer::~ImageContainer() is called, ref count became already 0. Increment from 0 does not work and could cause dup release.
Flags: needinfo?(sotaro.ikeda.g)
See Also: → 1329415
Gecko does not support multi threaded weak pointer. And mImageContainers needs to hold raw pointers of ImageContainer to prevent leaking ImageContainer.

Simple solution seems to use lock to prevent to exit ImageContainer::~ImageContainer() during calling imageContainer->NotifyComposite().
Blocks: 1325784
Attachment #8825634 - Flags: review?(nical.bugzilla)
Duplicate of this bug: 1329415
Attachment #8825634 - Flags: review?(nical.bugzilla) → review+
attachment 8825634 [details] [diff] [review] seems not work well with GPU process restart :( If there is gpu process restart, there is a risk of dead lock.
AssertNotCurrentThreadIn() is not implemented yet. There is no way to check if monitor is not locked.
Addressed potential dead lock.
Attachment #8825634 - Attachment is obsolete: true
correct patch.
Attachment #8825789 - Attachment is obsolete: true
attachment 8825791 [details] [diff] [review] addressed the potential dead lock. But it could be broken easily. It seems better to try another way.
Duplicate of this bug: 1329176
Duplicate of this bug: 1328905
(In reply to Sotaro Ikeda [:sotaro] from comment #11)
> attachment 8825791 [details] [diff] [review] addressed the potential dead
> lock. But it could be broken easily. 

Since AssertNotCurrentThreadIn() is not implemented yet as comment 8.
ImageContainer is declared as to support weak pointer. But it could be used only when ImageContainer is used as non multi-threaded usage.

> class ImageContainer final : public SupportsWeakPtr<ImageContainer>

ImageContainer's weak pointer is used in RasterImage.

gecko does not support thread safe weak pointer. It was removed by Bug 923554. There is a bug to support thread safe weak pointer as Bug 1258263. But it is not clear when it is addressed.
We could not use the following to address the problem for now.
- Use multi-threaded weak pointer(Bug 1258263)
- Use AtomicRefCountedWithFinalize as ancestor of ImageContainer and use recycle callback.
    We could not replace the ancestor since weak pointer is used already used by RasterImage.

attachment 8825791 [details] [diff] [review] seems like simple fix, but we do not have a way to check the possible dead lock in future, since AssertNotCurrentThreadIn() is not implemented yet(Bug 1027772). It seems not be implemented even in future.
Another way to address the problem is to move ImageContainer::NotifyComposite() related stuffs to a different refcounted object.
:nical, what do you prefer to address the problem?
Flags: needinfo?(nical.bugzilla)
Duplicate of this bug: 1329083
I hate WeakPtr :(

(In reply to Sotaro Ikeda [:sotaro] from comment #18)
> :nical, what do you prefer to address the problem?

Is this issue affecting all Image implementations? It looks like this unfortunate bi-directional dependency between Image and ImageContainer is true for most image classes since they implement GetImageContainer.

I would rather not rely on the recycle callback mechanism because it is already a fair bit complicated so I'd prefer to not make it have to support too many use cases.
That's my preference but if you feel strongly for doing it this way I'm nit going to oppose it.
I'm fine with the other solutions. Ideally we would not have this bi-directional dependency between Image and ImageContainer (In my opinion it is asking for trouble) but I have no idea how much work it would be to change that so I don't want to corner you with that approach.
Flags: needinfo?(nical.bugzilla)
Duplicate of this bug: 1330508
Duplicate of this bug: 1330510
It seems simpler if ImageContainer::NotifyComposite() is split to different ref counted object.
Assignee: nobody → sotaro.ikeda.g
Attachment #8825791 - Attachment is obsolete: true
Fix build failure.
Attachment #8826453 - Attachment is obsolete: true
Attachment #8826461 - Flags: review?(nical.bugzilla)
Attachment #8826461 - Flags: review?(nical.bugzilla) → review+
Duplicate of this bug: 1328748
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla53
Duplicate of this bug: 1331147
Duplicate of this bug: 1329036
Duplicate of this bug: 1329548
You need to log in before you can comment on or make changes to this bug.