Closed Bug 1495977 Opened 6 years ago Closed 6 years ago

WebRender allocates ~85MB of VRAM (and regular memory) per window for the texture cache

Categories

(Core :: Graphics: WebRender, enhancement, P2)

enhancement

Tracking

()

RESOLVED FIXED

People

(Reporter: bholley, Assigned: bholley)

References

Details

(Whiteboard: [MemShrink:P1])

Attachments

(3 files, 2 obsolete files)

The largest stack in bug 1495936 is for 85MB looks like [1].

The WR part of the stack is a bit ambiguous as to whether we're rendering to the framebuffer or to an offscreen surface for an intermediate pass, but the DXGI part of the stack suggests frame buffer.

Thoughts as to why it's so big, or what we might do to shrink it?

[1]
[Root]/ntdll.dll
RtlUserThreadStart/mozglue.dll
patched_BaseThreadInitThunk/kernel32.dll
BaseThreadInitThunk/xul.dll
`anonymous namespace'::ThreadFunc/xul.dll
base::Thread::ThreadMain/xul.dll
MessageLoop::RunHandler/xul.dll
base::MessagePumpDefault::Run/xul.dll
MessageLoop::DoWork/xul.dll
MessageLoop::DeferOrRunPendingTask/xul.dll
mozilla::detail::RunnableMethodImpl<mozilla::net::LookupHelper *,nsresult (mozilla::net::LookupHelper::*)(mozilla::net::LookupArgument *),1,mozilla::RunnableKind::Standard,RefPtr<mozilla::net::LookupArgument> >::Run/xul.dll
mozilla::wr::RenderThread::NewFrameReady/xul.dll
mozilla::wr::RenderThread::UpdateAndRender/xul.dll
mozilla::wr::RendererOGL::UpdateAndRender/xul.dll
webrender_bindings::bindings::wr_renderer_render/xul.dll
webrender::renderer::Renderer::render_impl/xul.dll
webrender::renderer::Renderer::draw_tile_frame/libGLESv2.dll
gl::Clear/libGLESv2.dll
gl::ValidateClear/libGLESv2.dll
gl::Framebuffer::checkStatusImpl/libGLESv2.dll
gl::Framebuffer::syncState/libGLESv2.dll
rx::Framebuffer11::syncState/libGLESv2.dll
rx::RenderTargetCache<rx::RenderTarget11>::updateCachedRenderTarget/libGLESv2.dll
rx::TextureD3D::getAttachmentRenderTarget/libGLESv2.dll
rx::TextureD3D_2DArray::getRenderTarget/libGLESv2.dll
rx::TextureD3D::ensureRenderTarget/libGLESv2.dll
rx::TextureD3D_2DArray::initializeStorage/libGLESv2.dll
rx::TextureD3D_2DArray::updateStorage/libGLESv2.dll
rx::TextureD3D_2DArray::updateStorageLevel/libGLESv2.dll
rx::Image11::copyToStorage/libGLESv2.dll
rx::TextureStorage11::updateSubresourceLevel/libGLESv2.dll
rx::TextureStorage11_2DArray::getResource/libGLESv2.dll
rx::Renderer11::allocateTexture/libGLESv2.dll
rx::ResourceManager11::allocate<ID3D11Texture2D>/d3d11.dll
CDevice::CreateTexture2D/d3d11.dll
CDevice::CreateTexture2D_Worker/d3d11.dll
NOutermost::CDevice::CreateLayeredChild/d3d11.dll
NDXGI::CDevice::CreateLayeredChild/d3d11.dll
NDXGI::CResource::FinalConstruct/d3d11.dll
NDXGI::CDeviceChild<IDXGIResource1,IDXGISwapChainInternal>::FinalConstruct/d3d11.dll
CDevice::CreateLayeredChild/d3d11.dll
CResource<ID3D11Texture2D1>::CLS::FinalConstruct/nvwgf2umx.dll
<PDB not found>/nvwgf2umx.dll
<PDB not found>/nvwgf2umx.dll
<PDB not found>/nvwgf2umx.dll
<PDB not found>/nvwgf2umx.dll
<PDB not found>/nvwgf2umx.dll
<PDB not found>/nvwgf2umx.dll
<PDB not found>/d3d11.dll
NDXGI::CDevice::AllocateCB/gdi32.dll
D3DKMTCreateAllocation/win32u.dll
ZwGdiDdDDICreateAllocation/ntoskrnl.exe
KiSystemServiceExitPico/win32kbase.sys
NtGdiDdDDICreateAllocation/dxgkrnl.sys
DxgkCreateAllocation/dxgkrnl.sys
DxgkCreateAllocationInternal/dxgkrnl.sys
DXGDEVICE::CreateAllocation/dxgkrnl.sys
DXGDEVICE::CreateVidMmAllocations<_DXGK_ALLOCATIONINFO>/dxgmms2.sys
VidMmOpenAllocation/dxgmms2.sys
VIDMM_GLOBAL::OpenAllocation/dxgmms2.sys
VIDMM_GLOBAL::OpenOneAllocation/dxgmms2.sys
VIDMM_GLOBAL::OpenLocalAllocation/dxgmms2.sys
VIDMM_GLOBAL::CommitLocalBackingStore/dxgmms2.sys
VIDMM_RECYCLE_HEAP_MGR::Allocate/dxgmms2.sys
VIDMM_RECYCLE_HEAP::Allocate/dxgmms2.sys
VIDMM_RECYCLE_MULTIRANGE::Commit/dxgmms2.sys
VIDMM_RECYCLE_RANGE::Commit/ntoskrnl.exe
KiServiceLinkage/ntoskrnl.exe
KiSystemServiceExitPico/ntoskrnl.exe
NtAllocateVirtualMemory/ntoskrnl.exe
MiAllocateVirtualMemory
We probably need a bit more information to confirm, but it doesn't look like frame buffer to me (due to the call stack including references to an array texture).

I suspect that is the main texture cache being allocated:

The fixed size texture cache is 2048 (width) x 2048 (height) x 4 (slices) x 4 (RGBA), which is 64 MB. If D3D / ANGLE is also allocating space for mipmaps, that would be scaled by ~1.33x, which is ~85 MB.

In general this is fine since it's a once off allocation and almost all textures end up in there.

We could make this grow lazily, which would mean that if there's a heap of windows we don't necessarily allocate the full size per window (but in GL3 you must completely re-create the texture if the slice count changes, which complicates things).

If we're talking about a single window though, this seems reasonable and expected. Do we think it's a problem in that case?
We don't actually use the mipmaps, so we could fix that (might be a WR bug or an ANGLE issue). That would reduce that size from 85 MB to 64 MB.
(In reply to Glenn Watson [:gw] from comment #1)
> We probably need a bit more information to confirm, but it doesn't look like
> frame buffer to me (due to the call stack including references to an array
> texture).
> 
> I suspect that is the main texture cache being allocated:
> 
> The fixed size texture cache is 2048 (width) x 2048 (height) x 4 (slices) x
> 4 (RGBA), which is 64 MB. If D3D / ANGLE is also allocating space for
> mipmaps, that would be scaled by ~1.33x, which is ~85 MB.

Ah I see. That makes sense! I was going to post that we should get rid of the mipmaps, but you beat me to it. :-)

> 
> In general this is fine since it's a once off allocation and almost all
> textures end up in there.
> 
> We could make this grow lazily, which would mean that if there's a heap of
> windows we don't necessarily allocate the full size per window (but in GL3
> you must completely re-create the texture if the slice count changes, which
> complicates things).
> 
> If we're talking about a single window though, this seems reasonable and
> expected. Do we think it's a problem in that case?

Yeah, I think it's probably a problem. We could _maybe_ sell it if it were a singleton, but having 64MB of extra overhead per window probably isn't acceptable. And the single-window case matters too, since press often measures memory usage on first start, and people on resource-constrained systems expect that they can use less memory by loading fewer pages.

is the 2048x2048x4 size the result of measurement, or was it just a guess? Naively, I think we'd want to start with something pretty small (1024 x 1024 x 1?) and then grow dynamically. Is there a problem with that aside from the copying (which I'd expect GPUs to be good at)?
Summary: Initial WebRender frame buffer allocation appears to be ~85MB → WebRender allocates ~85MB of VRAM (and regular memory) per window for the texture cache
Assignee: nobody → bobbyholley
There's no _major_ problem with growing it dynamically - there's probably even a TODO comment somewhere saying we should grow this dynamically in the future :)

There's two ways to approach it within the limited API capabilities of GL3 that I'm aware of:

(a) Destroy and re-init the texture at a larger size, and re-upload the CPU-side contents into the new texture.
(b) Create a new GPU texture of larger size and use the GPU to blit the contents across, then destroy the old texture.

(a) of course is not ideal for performance - since there could be significant CPU/GPU performance spikes due to all the CPU uploads.

(b) has the problem that during resizing we end up allocating significantly more GPU memory. This is freed after the resize, but makes for a higher peak memory usage, and potentially fragments GPU memory (unsure if this would ever be a real problem).

There is an option (c) which is to use some of the more advanced GL functionality in modern GL versions where available, but it's not something we could rely on everywhere.

I'd lean towards (b). It might even make sense to restrict the growth to 3 steps - 1024 x 1024 x 1, then 2048 x 2048 x 1, then 2048 x 2048 x 4, for example. This would reduce how often we do the resize with extra memory / performance overhead.
Blocks: 1491703
Depends on: 1496168
I filed bug 1496168 for the mipmap stuff.

(b) Generally sounds good to me. Though note that the current state of affairs is actually worse than I described above, because there are actually _four_ different texture caches, for different formats and filters [1]. So assuming they all get used, that puts us at 340 MB of texture cache VRAM per window. If we fix mipmapping that drops to 256. If we do (b) that drops to 16MB. 

But even that's kind of a lot, so I think we probably want to eventually share these across all windows.

[1] https://searchfox.org/mozilla-central/rev/3c85ea2f8700ab17e38b82d77cd44644b4dae703/gfx/webrender/src/texture_cache.rs#239
It shouldn't be 340 MB - it's only the RGBA linear format that allocates 4 slices - all the others should generate 1 slice only. They are also initialized lazily - so for example, R16 probably never ends up ever allocating a texture.

Sharing them between windows sounds very non-trivial. At that point you basically have a single Renderer object across windows, which isn't possible on some (all?) platforms due to context sharing issues. I could be missing something but I don't think sharing these across windows will be feasible.
(In reply to Glenn Watson [:gw] from comment #6)
> It shouldn't be 340 MB - it's only the RGBA linear format that allocates 4
> slices - all the others should generate 1 slice only.

Is that really true? I see 3/4 of them getting 4 slices:

https://searchfox.org/mozilla-central/rev/3c85ea2f8700ab17e38b82d77cd44644b4dae703/gfx/webrender/src/texture_cache.rs#280

Is that something we should fix?

> They are also
> initialized lazily - so for example, R16 probably never ends up ever
> allocating a texture.
> 
> Sharing them between windows sounds very non-trivial. At that point you
> basically have a single Renderer object across windows, which isn't possible
> on some (all?) platforms due to context sharing issues. I could be missing
> something but I don't think sharing these across windows will be feasible.

Bug 1494763 seems to be about sharing the contexts across windows.

I guess the point here is that, if we want to share this kind of stuff across windows, we should just share the Renderer, which gives us that for free. In the mean time, we should focus on making those texture allocations as small as possible.
(In reply to Bobby Holley (:bholley) from comment #7)
> (In reply to Glenn Watson [:gw] from comment #6)
> > It shouldn't be 340 MB - it's only the RGBA linear format that allocates 4
> > slices - all the others should generate 1 slice only.
> 
> Is that really true? I see 3/4 of them getting 4 slices:

Hmm, you're right - although the A8 one is 1/4 the size (A8 vs. RGBA8) and the R16 one is (probably) never getting allocated.

> 
> https://searchfox.org/mozilla-central/rev/
> 3c85ea2f8700ab17e38b82d77cd44644b4dae703/gfx/webrender/src/texture_cache.
> rs#280
> 
> Is that something we should fix?
> 
> > They are also
> > initialized lazily - so for example, R16 probably never ends up ever
> > allocating a texture.
> > 
> > Sharing them between windows sounds very non-trivial. At that point you
> > basically have a single Renderer object across windows, which isn't possible
> > on some (all?) platforms due to context sharing issues. I could be missing
> > something but I don't think sharing these across windows will be feasible.
> 
> Bug 1494763 seems to be about sharing the contexts across windows.
> 
> I guess the point here is that, if we want to share this kind of stuff
> across windows, we should just share the Renderer, which gives us that for
> free. In the mean time, we should focus on making those texture allocations
> as small as possible.

Yes, agreed. But sharing the Renderer is non-trivial on certain platforms (e.g. Mac) from what I can tell, since the context sharing is believed to either not work / not be reliable, and each context must be bound to a specific window. That's based on what I've been told, but could be wrong - I don't know much about that stuff myself.
The sharing situation should be fixed on Mac with bug 1491442. I believe it's ok on other platforms already.
Priority: -- → P2
Blocks: wr-memory
Blocks: 1498890
Blocks: 1494760
Depends on: 1504115
This simplifies code, and gives us more flexibility around sizing the
shared texture caches (since we can effectively grow/shrink it by units
of 1MB rather than 16MB).
A big part of this is handling coalescing, so that if we end up creating
several new regions in a given frame, we won't allocate/blit/free
textures unnecessarily.

Depends on D10852
Aside from general cleanliness, this allows us to factor the selection
logic into a helper without a widely-scoped mutable borrow on the entire
TextureCache.

Depends on D10853
This is a strict improvement over the status quo, though we'll still
eventually consume as much memory as before, since we don't try to
evict at all until we hit the max size. We'll fix that soon, but it's
worth landing this separately because:
* It's useful to separate out regressions in the growing logic from in
  any new complex eviction logic.
* This patch alone is a very large (~100MB) win on AWSY.

Depends on D10854
Per [1], these patches reduces RSS on AWSY by ~90MB, which is awesome.

[1] https://treeherder.mozilla.org/#/jobs?repo=try&revision=425892569e7cb77fe15e80874023912641374eb0
Whiteboard: [MemShrink:P1]
It appears that the patch from this bug was the source of the unexpected passes that we were looking at a few days ago, I must have accidentally included them as part of my try push.

I've got expectation updates that work. Here's the try push: [1]. There's one orange in there that I've fixed in the expectations to be attached.

[1] https://treeherder.mozilla.org/#/jobs?repo=try&revision=bf7ff3e8afe20d4cd1725d077ddba9302cc26bbb
Attachment #9023141 - Flags: review?(kats)
Comment on attachment 9023141 [details] [diff] [review]
Update reftest annotations.

Review of attachment 9023141 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks. Are the results still nondeterministic? If yes, it would be good to have a bug on file for tracking that down.
Attachment #9023141 - Flags: review?(kats) → review+
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #19)
> Comment on attachment 9023141 [details] [diff] [review]
> Update reftest annotations.
> 
> Review of attachment 9023141 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Thanks. Are the results still nondeterministic? If yes, it would be good to
> have a bug on file for tracking that down.

I have bug 1504817 on file for the one I investigated. The others are non-deterministic, but it's not actually any new non-determinism (which is what I thought was the case before), it's just us perturbing the bounds of the existing fuzzy tests. I could file bugs for the others, but I don't think it would add utility beyond what we get out of bug 1504817 and the existing fuzziness annotations.
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Blocks: 1505639
Depends on: 1505664
Attachment #9022480 - Attachment is obsolete: true
Attachment #9022479 - Attachment is obsolete: true
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: