Closed Bug 1894929 Opened 1 year ago Closed 1 year ago

Surface recycling on Android in GLScreenBuffer is broken

Tracking

()

Status:

RESOLVED FIXED

Milestone:

128 Branch

Tracking Flags:

Tracking

Status

firefox127

---

fixed

firefox128

---

fixed

People

(Reporter: jnicol, Assigned: lsalzman)

References

Details

Attachments

(1 file, 1 obsolete file)

Bug 1894929 - Fix SharedSurface recycling with accelerated canvas on Android. r?lsalzman,sotaro 1 year ago Jamie Nicol [:jnicol] 48 bytes, text/x-phabricator-request		Details \| Review
Bug 1894929 - Disable gl::SwapChain's internal pooling when RemoteTextureMap is used. r?jnicol 1 year ago Lee Salzman [:lsalzman] 48 bytes, text/x-phabricator-request	pascalc : approval-mozilla-beta+	Details \| Review

Jamie Nicol [:jnicol]

Reporter

Description

•

1 year ago

I noticed this while investigating bug 1892716. It exacerbates the issue but it's not the real culprit.

In GLScreenBuffer.cpp, on Android we use a pool of SharedSurface_SurfaceTextures, as allocating Android Surfaces is expensive. After we obtain a surface from the pool (or allocate a new one if necessary, we immediately push the surface to the back of the pool here.

Some time later, we push the Surface to the back of the pool again, here in StoreRecycledSurface.

Now we have the same Surface in the pool multiple times, which presumably could cause some issues with selecting the wrong Surface at the wrong time. But additionally it means that the pool size can grow larger than expected, and due to this check using == instead of >=, we choose not to use an existing surface and allocate a new one instead. We end up allocating a new Surface every time we try to acquire one.

Kelsey, Sotaro, what is the correct fix here? I think we should defensively either add an assert that the pool size is not larger than expected, or switch the condition to >=. But how should we avoid adding duplicate surfaces to the pool? Do we remove the call here, or should StoreRecycledSurface() not be being called on Android?

Flags: needinfo?(sotaro.ikeda.g)

Flags: needinfo?(jgilbert)

Jamie Nicol [:jnicol]

Reporter

Updated

•

1 year ago

Summary: Surface recycling in GLScreenBuffer is broken → Surface recycling on Android in GLScreenBuffer is broken

Nicolas Silva [:nical]

Updated

•

1 year ago

Severity: -- → S2

Kelsey Gilbert [:jgilbert]

Comment 1

•

1 year ago

@sotaro Can you take this?

Assignee: nobody → sotaro.ikeda.g

Flags: needinfo?(jgilbert)

Jamie Nicol [:jnicol]

Reporter

Comment 2

•

1 year ago

I dug a little deeper in to this. I think the problem is that mDestroyedCallback should be set when using accelerated canvas, but it is not.

For in-process webgl, mDestroyedCallback is not set, so we must push the surfaces to the pool in Acquire(). This currently works correctly.

In out-of-process webgl, mDestroyedCallback is set, and StoreRecycledSurface() takes care of the recycling, and this code takes care of ensuring we don't add duplicates to the pool, because poolSize == 0 when mDestroyedCallback is set. This could be clearer, but currently works correctly.

For accelerated canvas, mDestroyedCallback is not set, but StoreRecycledSurface() gets called anyway. Hence the problem.

The reason mDestroyedCallback is not set for accelerated canvas is that we usually set it here. However, for accelerated canvas by the time this gets called we have already registered the texture owner here. If we skip the RegisterTextureOwner() call in CanvasTranslator::EnsureRemoteTextureOwner(), then WebGLContext::PushRemoteTexture() will take care of registering it for us, and it will set the destroyed callback. And then everything works correctly.

Jamie Nicol [:jnicol]

Reporter

Comment 3

•

1 year ago

Attached file Bug 1894929 - Fix SharedSurface recycling with accelerated canvas on Android. r?lsalzman,sotaro (obsolete) — Details

CanvasTranslator::EnsureRemoteTextureOwner() was calling
RegisterTextureOwner(), meaning that the owner was already registered
when WebGLContext::PushRemoteTexture() gets called. This means that
the SwapChain::SetDestroyedCallback() never got called for acclerated
canvas swapchains.

This in turn meant that SharedSurfaces were being added the the
swapchain's pool by both SwapChain::Acquire() and
SwapChain::StoreRecycledSurface(), breaking the pooling logic and
resulting in new Surfaces being allocated every time Acquire() was
called.

To fix this, we remove CanvasTranslator's call to
RegisterTextureOwner(), relying on WebGLContext to register it for
us (and set the destroyed callback correctly). We additionally make
the swapchain pooling logic more defensive, and add an assertion that
the callback was correctly set.

Lee Salzman [:lsalzman]

Assignee

Comment 4

•

1 year ago

Attached file Bug 1894929 - Disable gl::SwapChain's internal pooling when RemoteTextureMap is used. r?jnicol — Details

Phabricator Automation

Updated

•

1 year ago

Attachment #9402442 - Attachment is obsolete: true

Pulsebot

Comment 5

•

1 year ago

Pushed by lsalzman@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/34afcad20473 Disable gl::SwapChain's internal pooling when RemoteTextureMap is used. r=jnicol,gfx-reviewers

Cosmin Sabou [:CosminS]

Comment 6

•

1 year ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/34afcad20473

Status: NEW → RESOLVED

Closed: 1 year ago

status-firefox128: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → 128 Branch

Sotaro Ikeda [:sotaro]

Updated

•

1 year ago

Flags: needinfo?(sotaro.ikeda.g)

Lee Salzman [:lsalzman]

Assignee

Comment 7

•

1 year ago

Comment on attachment 9402602 [details]
Bug 1894929 - Disable gl::SwapChain's internal pooling when RemoteTextureMap is used. r?jnicol

Beta/Release Uplift Approval Request

User impact if declined: Freezing and/or memory leaks on Android.
Is this code covered by automated tests?: Unknown
Has the fix been verified in Nightly?: Yes
Needs manual test from QE?: No
If yes, steps to reproduce:
List of other uplifts needed: None
Risk to taking this patch: Low
Why is the change risky/not risky? (and alternatives if risky):
String changes made/needed:
Is Android affected?: Yes

Attachment #9402602 - Flags: approval-mozilla-beta?

Ryan VanderMeulen [:RyanVM]

Updated

•

1 year ago

Assignee: sotaro.ikeda.g → lsalzman

Ryan VanderMeulen [:RyanVM]

Updated

•

1 year ago

Blocks: 1896513

Pascal Chevrel:pascalc

Comment 8

•

1 year ago

Comment on attachment 9402602 [details]
Bug 1894929 - Disable gl::SwapChain's internal pooling when RemoteTextureMap is used. r?jnicol

Approved for 127 beta 7, thanks.

Attachment #9402602 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

Pulsebot

Comment 9

•

1 year ago

uplift

https://hg.mozilla.org/releases/mozilla-beta/rev/ce086ad41a66

Pascal Chevrel:pascalc

Updated

•

1 year ago

status-firefox127: --- → fixed

Cathy Lu [:calu]

Updated

•

1 year ago

Bugzilla

Surface recycling on Android in GLScreenBuffer is broken

Categories

(Core :: Graphics: CanvasWebGL, defect)

Tracking

()

People

(Reporter: jnicol, Assigned: lsalzman)

References

Details

Crash Data

Security

(public)

User Story

Attachments

(1 file, 1 obsolete file)

Description

Updated

Updated

Comment 1

Comment 2

Comment 3

Comment 4

Updated

Comment 5

Comment 6

Updated

Comment 7

Beta/Release Uplift Approval Request

Updated

Updated

Comment 8

Comment 9

Updated

Updated

Attachment

General

Description

File Name

Content Type