Closed Bug 1784049 Opened 2 years ago Closed 2 years ago

Native surfaces are retained for background tabs on Android with software webrender

Categories

(Core :: Graphics: WebRender, enhancement, P3)

Unspecified
Android
enhancement

Tracking

()

RESOLVED FIXED
105 Branch
Tracking Status
firefox105 --- fixed

People

(Reporter: jnicol, Assigned: jnicol)

References

(Blocks 1 open bug)

Details

Attachments

(1 file)

See profile here: https://drive.google.com/file/d/1pGQpwKpCpUJYmibiNWwytXcRbm-Yq_uj/view?usp=sharing (which can be opened in Android Studio's profiler tab.) This is from opening 10 tabs of mozilla.com in geckoview_example, with SWGL+OGL enabled.

On Android, each tab is actually its own window, meaning they each have a separate webrender RenderBackend/Renderer/etc. As a blunt instrument to prevent memory usage getting too high, we trigger a memory pressure event when a tab is backgrounded. When running hardware webrender, the picture cache tiles are owned by the texture cache, and therefore get freed for backgrounded tabs here.

When running SWGL, however, picture cache tiles are native surfaces managed by RenderCompositorOGLSWGL, and do not get freed. Therefore the profile shows a huge amount of memory (120MB) allocated from CreateDataSourceSurface() in TileOGL::TileOGL(). (Note this is with the RenderCompositorOGLSWGL's PBO upload path disabled. When it is enabled, the PBO allocations do not show up in the Android Studio profiler, but presumably are still there.)

The reason the native surfaces don't get freed when the tab becomes inactive is due to this. If we were to set preserveLayers to false then when a tab is backgrounded the display list would be cleared, which in turn would cause the native surfaces to be deallocated. However, clearing the display list causes a flash when switching tabs, as it causes us render a frame with an empty display list when switching.

To fix this, we either need to prevent that flash somehow, allowing us to set preserveLayers to false. This will probably free lots of smaller allocations too as we are cleaning up from the beginning of the webrender pipeline, which would be nice. But preventing the flash may be tricky. Or alternatively we can add a mechanism for deallocating the native surfaces just at the renderer/compositor stage, whilst keeping the display list set.

Glenn, any ideas how feasible the latter of those options is? And any pointers for how to do so?

Flags: needinfo?(gwatson)
Blocks: 1784051
Severity: -- → S4
Type: defect → enhancement
Priority: -- → P3

This is also a relevant problem on desktop.

Some users have lots of windows open, but often many of those windows are obscured or minimized. In an ideal world, whatever we do here for android could also be applied on desktop to help with this issue.

Since we already free picture cache tiles on memory pressure, it seems like we should do the same thing for native surfaces (the existing picture and native surface code handles the case where a native surface isn't allocated, and will invalidate / redraw any tiles that are found visible but have no native surface allocated). Would that solve this issue?

Flags: needinfo?(gwatson) → needinfo?(jnicol)

Yes I believe it would. As for how we implement that...

The render backend handles memory pressure here. It seems we could add something to ResourceCache.clear() to ensure the Renderer clears the surfaces. Either sending a NativeSurfaceOperationDetails::DestroySurface for each surface, or perhaps adding a new NativeSurfaceOperationDetails::DestroyAllSurfaces.

For the picture code to handle automatically allocating a new one, are you referring to here? I can see how that creates the surface the first time. But would we not need to ensure we reset the value of id to None when we free the surfaces on memory pressure, to ensure we allocate new ones later when required? If so, any thoughts on how to get the message through to the picture code from where we handle the memory pressure in the render backend?

Flags: needinfo?(jnicol) → needinfo?(gwatson)

We will definitely need to reset the id value to None.

I think we probably want to not add anything new, or directly call in to the resource cache at all.

I think we could probably do something like the following from the render backend:

foreach document:
    foreach picture cache:
         picture_cache.memory_pressure()

And that new method in TileCacheInstance would go through and issue the resource_cache destroy_native_surface/tile calls, and clear id at the same time.

Would that work?

Flags: needinfo?(gwatson) → needinfo?(jnicol)

When the webrender render backend receives a memory pressure event,
call a new function TileCacheInstance.memory_pressure() for each tile
cache owned by the render backend. This destroys each compositor tile
and surface owned by the cache. By additionally setting the
NativeTileId for each tile to None, we will automatically allocate new
tiles if required during the next frame build.

On Android with SWGL enabled, this helps prevent ballooning memory
usage when opening several tabs, as memory pressure events are
triggered when a tab is backgrounded.

Assignee: nobody → jnicol
Status: NEW → ASSIGNED

Yep, that works a treat. Thanks!

Flags: needinfo?(jnicol)
Pushed by jnicol@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/247891777d45
Free native surfaces held by tile cache on memory pressure. r=gw

Backed out for causing mochitest failures with crash:core::panicking::panic_str

  • Backout link
  • Push with failures
  • Failure Log
  • Failure line: PROCESS-CRASH | devtools/client/inspector/rules/test/browser_rules_cubicbezier-commit-on-ENTER.js | application crashed [@ core::panicking::panic_str]

And also: https://treeherder.mozilla.org/logviewer?job_id=387476737&repo=autoland

Flags: needinfo?(jnicol)
Pushed by jnicol@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/10db2858c3e6
Free native surfaces held by tile cache on memory pressure. r=gw
Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → 105 Branch
Flags: needinfo?(jnicol)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: