Closed Bug 1894929 Opened 5 months ago Closed 4 months ago

Surface recycling on Android in GLScreenBuffer is broken

Categories

(Core :: Graphics: CanvasWebGL, defect)

defect

Tracking

()

RESOLVED FIXED
128 Branch
Tracking Status
firefox127 --- fixed
firefox128 --- fixed

People

(Reporter: jnicol, Assigned: lsalzman)

References

Details

Attachments

(1 file, 1 obsolete file)

I noticed this while investigating bug 1892716. It exacerbates the issue but it's not the real culprit.

In GLScreenBuffer.cpp, on Android we use a pool of SharedSurface_SurfaceTextures, as allocating Android Surfaces is expensive. After we obtain a surface from the pool (or allocate a new one if necessary, we immediately push the surface to the back of the pool here.

Some time later, we push the Surface to the back of the pool again, here in StoreRecycledSurface.

Now we have the same Surface in the pool multiple times, which presumably could cause some issues with selecting the wrong Surface at the wrong time. But additionally it means that the pool size can grow larger than expected, and due to this check using == instead of >=, we choose not to use an existing surface and allocate a new one instead. We end up allocating a new Surface every time we try to acquire one.

Kelsey, Sotaro, what is the correct fix here? I think we should defensively either add an assert that the pool size is not larger than expected, or switch the condition to >=. But how should we avoid adding duplicate surfaces to the pool? Do we remove the call here, or should StoreRecycledSurface() not be being called on Android?

Flags: needinfo?(sotaro.ikeda.g)
Flags: needinfo?(jgilbert)
Summary: Surface recycling in GLScreenBuffer is broken → Surface recycling on Android in GLScreenBuffer is broken
Severity: -- → S2

@sotaro Can you take this?

Assignee: nobody → sotaro.ikeda.g
Flags: needinfo?(jgilbert)

I dug a little deeper in to this. I think the problem is that mDestroyedCallback should be set when using accelerated canvas, but it is not.

For in-process webgl, mDestroyedCallback is not set, so we must push the surfaces to the pool in Acquire(). This currently works correctly.

In out-of-process webgl, mDestroyedCallback is set, and StoreRecycledSurface() takes care of the recycling, and this code takes care of ensuring we don't add duplicates to the pool, because poolSize == 0 when mDestroyedCallback is set. This could be clearer, but currently works correctly.

For accelerated canvas, mDestroyedCallback is not set, but StoreRecycledSurface() gets called anyway. Hence the problem.

The reason mDestroyedCallback is not set for accelerated canvas is that we usually set it here. However, for accelerated canvas by the time this gets called we have already registered the texture owner here. If we skip the RegisterTextureOwner() call in CanvasTranslator::EnsureRemoteTextureOwner(), then WebGLContext::PushRemoteTexture() will take care of registering it for us, and it will set the destroyed callback. And then everything works correctly.

CanvasTranslator::EnsureRemoteTextureOwner() was calling
RegisterTextureOwner(), meaning that the owner was already registered
when WebGLContext::PushRemoteTexture() gets called. This means that
the SwapChain::SetDestroyedCallback() never got called for acclerated
canvas swapchains.

This in turn meant that SharedSurfaces were being added the the
swapchain's pool by both SwapChain::Acquire() and
SwapChain::StoreRecycledSurface(), breaking the pooling logic and
resulting in new Surfaces being allocated every time Acquire() was
called.

To fix this, we remove CanvasTranslator's call to
RegisterTextureOwner(), relying on WebGLContext to register it for
us (and set the destroyed callback correctly). We additionally make
the swapchain pooling logic more defensive, and add an assertion that
the callback was correctly set.

Attachment #9402442 - Attachment is obsolete: true
Pushed by lsalzman@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/34afcad20473
Disable gl::SwapChain's internal pooling when RemoteTextureMap is used. r=jnicol,gfx-reviewers
Status: NEW → RESOLVED
Closed: 4 months ago
Resolution: --- → FIXED
Target Milestone: --- → 128 Branch
Flags: needinfo?(sotaro.ikeda.g)

Comment on attachment 9402602 [details]
Bug 1894929 - Disable gl::SwapChain's internal pooling when RemoteTextureMap is used. r?jnicol

Beta/Release Uplift Approval Request

  • User impact if declined: Freezing and/or memory leaks on Android.
  • Is this code covered by automated tests?: Unknown
  • Has the fix been verified in Nightly?: Yes
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky):
  • String changes made/needed:
  • Is Android affected?: Yes
Attachment #9402602 - Flags: approval-mozilla-beta?
Assignee: sotaro.ikeda.g → lsalzman

Comment on attachment 9402602 [details]
Bug 1894929 - Disable gl::SwapChain's internal pooling when RemoteTextureMap is used. r?jnicol

Approved for 127 beta 7, thanks.

Attachment #9402602 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
See Also: → 1898391
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: