Falling leaves are very slow, re-allocating textures on each paint...

NEW
Unassigned

Status

()

Core
Graphics
6 years ago
6 years ago

People

(Reporter: romaxa, Unassigned)

Tracking

Trunk
x86
Linux
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

(URL)

Attachments

(4 attachments, 5 obsolete attachments)

(Reporter)

Description

6 years ago
Try to run falling leaves example on Mobile FF with CSS3D transforms enabled
And found that it is extremely slow because we are creating and destroying textures on each paint...

looking deeply into Swap/Upload code and sizes found that for each paint we are trying to upload texture with different size ... why ? leaves images dimensions are not changing...
(Reporter)

Comment 1

6 years ago
If it works this way:
1) Calc size of non-rotated ThebesLayer (with non-rotated frames tree)
2) Create ThebesLayer (Texture) for non-transformed tree
3) Apply all transforms to ThebesLayer (rotation/scale et.c)

In that case if ThebesLayer content not changing all the time, then we should endup with just compositing Leave image picture with different transform (rotation)

How it feels like it working:
1) Calc size of rotated ThebesLayer frame subtree
2) alloc ThebesLayer with that size
3) blit resulting shadow thebes layer to GPU
4) calc new size of ThebesLayer rotated to different angle
et.c.
It should definitely be the first way.

Looking at this on a desktop build (with gDumpPaintList = 1), all the transformed ThebesLayers are consistently 101x100. Any chance you can try track down why the sizes are changing for you?

We will still be hitting the problem of allocating temporary surfaces for ContainerLayers.
(Reporter)

Comment 3

6 years ago
I've workarounded problem with temp surface with patch from 689416, so that is not a problem in this case
(Reporter)

Comment 4

6 years ago
Created attachment 562899 [details]
Desktop build backtrace

I see the same problem for desktop build too
SurfaceBufferOGL::CreateBuffer()::328 CreateClampOrRepeatTextureImage, sz[62,44]
SurfaceBufferOGL::CreateBuffer()::328 CreateClampOrRepeatTextureImage, sz[54,53]
SurfaceBufferOGL::CreateBuffer()::328 CreateClampOrRepeatTextureImage, sz[96,78]
SurfaceBufferOGL::CreateBuffer()::328 CreateClampOrRepeatTextureImage, sz[79,90]
SurfaceBufferOGL::CreateBuffer()::328 CreateClampOrRepeatTextureImage, sz[100,90]
SurfaceBufferOGL::CreateBuffer()::328 CreateClampOrRepeatTextureImage, sz[43,80]
.....
This looks to be causes by the leaves being partially clipped by their bounding box. As they move onto the screen we are reallocating the textures for every frame.

We should look at a strategy for precaching some of this content to avoid this much texture churn. This sounds very similar to the problem we had with mobile XUL UI scrolling in, so CC'ing roc and heeen.

This isn't 3d transforms specific, you can create this performance problem with any animated transform.
I don't think we should use hints. I think we should use thresholding so we don't reallocate textures too often, e.g. when the visible region grows, allocate a texture that's a fair bit bigger than the new visible region, so if it grows some more we don't have to allocate again. And when it shrinks, don't reallocate the texture unless the visible region size is a lot smaller than the current texture.

Just rounding up to the nearest power of 2 wouldn't always work well if the size oscillates around a power of 2.
I'm thinking we might be able to create large textures and actually work on doing a sort of 'memory manager' for textures. If we would have a user mode texture memory manager we could drastically reduce the amount of texture creation.

The one downside would be clamping and such, but that could be overcome relatively easily, for example by complicating our geometries slightly (i.e. surrounding the quad by and outline the size of the area sampling pixels outside the quad with the clamped texture coordinate).

This approach seems like a more solid long term approach, although it is a little more complicated.
(In reply to Bas Schouten (:bas) from comment #7)
> I'm thinking we might be able to create large textures and actually work on
> doing a sort of 'memory manager' for textures. If we would have a user mode
> texture memory manager we could drastically reduce the amount of texture
> creation.

Ideally we'd just have a jemalloc that could allocate from a pool to avoid unnecessary fragmentation etc, but from my understanding of jemalloc, that's currently impossible.
Attachment #562899 - Attachment is patch: false
(In reply to Joe Drew (:JOEDREW!) from comment #8)
> (In reply to Bas Schouten (:bas) from comment #7)
> > I'm thinking we might be able to create large textures and actually work on
> > doing a sort of 'memory manager' for textures. If we would have a user mode
> > texture memory manager we could drastically reduce the amount of texture
> > creation.
> 
> Ideally we'd just have a jemalloc that could allocate from a pool to avoid
> unnecessary fragmentation etc, but from my understanding of jemalloc, that's
> currently impossible.

There's the additional problem that you need to be a little smarter. Since we don't just need contiguous texture memory we need the surface of the texture to 'fit' in the available surface area. Essentially unlike a 1D system like a heap manager you're doing a 2D system.
Ideally you wouldn't ever want to create and destroy textures if all you do is rotate and translate images that don't change. I guess trying to keep textures around for at least the next frame would help with the creation/destruction churn, if not with repainting each frame.
I agree with comment 6.  That would also ameliorate other issues like window resizing being dog slow (not that anyone cares).  We used to see issues like this in fennec too, on page load, where the visible region would steadily grow as new content was added.  My only concern is wasted memory, but a quick talos run would be an easy estimate.  (And we could statically bound that through our resize policy.)
Also, what does webkit do?
(Reporter)

Comment 13

6 years ago
webkit just allocate all textures ahead... and don't bother with saving memory on that... if allocation fail then it will just not render layer...
Created attachment 564457 [details] [diff] [review]
WIP overallocation for GL layers

Patch to make OpenGL allocate larger textures than required when the size is increasing.

For sizing heuristics, I went with the smaller of 50% larger and 5x the current size increment. We can probably do better than this, this was just an initial test.

As mentioned in the comment, we could probably get more information from layout to prevent memory being wasted unnecessarily here.

Romaxa: Does this help in a noticeable way on mobile?
Attachment #564457 - Flags: feedback?(roc)
(Reporter)

Comment 15

6 years ago
actually this does not have any effect on mobile, because on mobile we are using another code path, I believe that is something around BasicShadowableThebesLayer  and software thebesLayerBuffer
(Reporter)

Comment 16

6 years ago
Also I tried same code in http://mxr.mozilla.org/mozilla-central/source/gfx/layers/ThebesLayerBuffer.cpp#273

but it did not work as expected, we still creating buffers on every paint
Good catch.

I think we code we need to modify is:

http://mxr.mozilla.org/mozilla-central/source/gfx/layers/opengl/ThebesLayerOGL.cpp#842

This is even worse than the desktop GL situation, because we're resizing the texture when it shrinks as well as when it grows. We should be able to fix this and gain a lot on all resize operations.
Created attachment 565158 [details] [diff] [review]
WIP overallocation for ThebesLayerBuffer

I'm struggling to get useful improvements from this without hints as to the possible sizes.

I've been working with a modified version of falling leaves, with only a single leaf and the attached patch to view the texture re-allocations.

The current version gets us down to around 5 textures allocated as the leaf moves onto the screen (plus two when it passes under the text - From a new layer). This is slightly higher sometimes, depending on the path it takes.

The current code is overallocating by huge margins though, so will waste excessive amounts of video memory.

Coming up with a good allocation algorithm that balances texture allocations and memory wastage is key here, and I'm out of ideas.

I'm going to have a look at passing down the full possible bounds of the layer so we can clamp allocations to that size.
What if the layer contents are actually growing though, due to reflow? It'll be hard to predict what the maximum bounds are going to be.

What if a) we're very conservative about how we shrink, so at least we don't shrink until a couple of seconds at the smaller size and b) we increase by at least a factor of two every time we increase in a given direction?
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #19)
> What if the layer contents are actually growing though, due to reflow? It'll
> be hard to predict what the maximum bounds are going to be.

Hrm, this is true. I was mainly thinking about the case of moving objects that are partially clipped by their container. Things that are actually changing size make this harder.

> 
> What if a) we're very conservative about how we shrink, so at least we don't
> shrink until a couple of seconds at the smaller size 

I didn't actually think about using time as a factor to change down texture size. This is a bit harder to implement, but probably worth it as we can afford to be much more aggressive with our growth strategy.

> and b) we increase by
> at least a factor of two every time we increase in a given direction?

Sounds good to me.
Created attachment 567379 [details] [diff] [review]
WIP overallocation for ThebesLayerBuffer

New version that's much more aggressive in increasing buffer sizes, and uses a timeout to delete these when they stop being needed.

I saw some cases of corruption with this earlier, but can't reproduce it now. It probably still exists, working on it.

Romaxa: Does this fix the performance issues you are seeing? It results in a lot less allocations for me.

I've been testing with a modified version of falling leaves, that only has a single leaf falling. Will upload this if you want to try it.
Attachment #564457 - Attachment is obsolete: true
Attachment #565158 - Attachment is obsolete: true
Attachment #564457 - Flags: feedback?(roc)
Attachment #567379 - Flags: feedback?(romaxa)
(Reporter)

Comment 22

6 years ago
Created attachment 567599 [details]
Allocation log

Tested this patch, probably it make less allocation, but visually it almost not noticable, because we still having ~5-10 allocations per second, and that is just does not make any difference at the end.
Here is dump of allocations with time(0) dumped
(Reporter)

Comment 23

6 years ago
Also with this patch I see remote thebes viewport rendering black areas...
(Reporter)

Comment 24

6 years ago
Wondering why we can't just pre-alloc full thebes Layer size if it's size less than some sanity value... let say TextureMax Size, or Shmem Max size... and if it is bigger than go with slow path...
As roc said, we often don't know what the size will be.

We could get layout to communicate the current size of the ThebesLayer (not just the visible part), and then use a hybrid approach.
With this patch, is it just while the leaves are first coming into view that we have to reallocate textures a few times? Or is something else happening? It'd be interesting to break down the log to see the history of individual surfaces.

We can gather the union of the GetBounds() of the various display items in a ThebesLayer to come up with an estimate of the maximum size, and feed the rectangular bounds of that in as a new Layer property --- say, GetBoundsHint or something like that. I guess if that bounds is smaller than some threshold, or close to the visible bounds, we should use it to set the initial buffer size.
For an individual leaf, we get around 4-5 (depending on frame rate and the path it takes) allocations as it moves onto the screen.

The leaf then falls without needing any drawing, and we eventually delete the software buffer. No changes are made to the texture since we aren't invalidating anything.

When the leaf gets moved back to the top (and offscreen), we invalidate and start the process again.

So 4-5ish allocations per leaf, per cycle of the animation.

If we could avoid the invalidation and keep the leaf texture untouched between cycles we'd benefit a lot here.

The bounds hint would also help the first cycle of the animation, but that's probably less important for this particular example.
By the way, why aren't our optimizations kicking in here to use an ImageLayer instead of a ThebesLayer?
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #28)
> By the way, why aren't our optimizations kicking in here to use an
> ImageLayer instead of a ThebesLayer?

Is there any way to test this optimization? I can't think of anything other than a compiled code test.

I ask because this optimization could get broken fairly easily without anybody noticing.

Purely hypothetical, but the original patch author could have incorrectly rebased the patch before landing so that this optimization has never worked at all.
We might need to add a test-only API to nsIDOMWindowUtils or something. or maybe inIDOMUtils.
(In reply to Matt Woodrow (:mattwoodrow) from comment #27)
> When the leaf gets moved back to the top (and offscreen), we invalidate and
> start the process again.

To clarify, we don't invalidate as such, but we don't use the layer for multiples frames and we end up with a new layer when we do need it.

Do you think it's worth trying to fix this? Or just focus on minimizing the allocations while it's visible.
Depends on: 695275
(Reporter)

Comment 32

6 years ago
Checked Falling leaves with patch from bug 695275, and it looks much better, but still we have some ImageLayers re-allocations when leaf going out from screen (dropped) and appear again on top (same size layer created again), shall we recycle image layers here?

Another problem is that sometimes leaf appear (offscreen possibly) as opaque image, and later turn into alpha layer content type, and that cause additional reallocation
(Reporter)

Comment 33

6 years ago
Created attachment 567848 [details]
Allocation log for ImageLayers
I think probably ImageLayerOGL (or maybe more of GL layers) should cache and recycle buffers. That seems simpler and more flexible than trying to keep layers around for offscreen elements.
(In reply to Oleg Romashin (:romaxa) from comment #32)
> Another problem is that sometimes leaf appear (offscreen possibly) as opaque
> image, and later turn into alpha layer content type, and that cause
> additional reallocation

Can we get around this by always using the same texture type and only varying the shader depending on whether the image is opaque or not?
In this particular example we're creating image layers when the visible region is empty, so we could just fix that.

A more general fix for content type changes would probably be good too though.
(Reporter)

Comment 37

6 years ago
> recycle buffers. That seems simpler and more flexible than trying to keep
> layers around for offscreen elements.
Then we need to cache ShadowableImage layer buffers...
currently we recycle implented only for GL layers...
(Reporter)

Comment 38

6 years ago
> Can we get around this by always using the same texture type and only
> varying the shader depending on whether the image is opaque or not?
that would bring us 2x more memory used per image on mobile, for opaque images (like plugins)
(Reporter)

Comment 39

6 years ago
Also if we won't cache ImageLayers, then we need to add Recycle cache trackers for all backends OGL,D3D, Basic, + ShadowableImageLayer buffers
(In reply to Oleg Romashin (:romaxa) from comment #38)
> > Can we get around this by always using the same texture type and only
> > varying the shader depending on whether the image is opaque or not?
> that would bring us 2x more memory used per image on mobile, for opaque
> images (like plugins)

Right OK. I keep forgetting about 16-bit vs 32-bit.

I think caching at the closest level to the allocation is probably best as long as the rest of the stack is not much overhead.
(Reporter)

Comment 41

6 years ago
> long as the rest of the stack is not much overhead.
IPC ShadowableLayers stack is a bit more expensive... Create/Transform/Paint transactions. but with recent Init/Swap API changes it should be faster, so probably it is ok.
(Reporter)

Comment 42

6 years ago
Created attachment 567944 [details]
Opreport with ImageLayer + cached buffers

Ok, I've added simple Buffer cache tracker for image layers (Shadow and OGL), and and got it working much faster, but it very slow comparing to how webkit2/ render same thing...
In fennec we have 12FPS, and all CPU used, see plain opreport
For webkit2 browser with HW acceleration on the same device I see 30FPS + 30% CPU free...
(Reporter)

Comment 43

6 years ago
Surfaces are cached but it still dropping layers, and that cause repainting (uploading) to GPU. that is not a big deal if we do it without expensive work in layout.
Webkit can hand CSS animations over to the compositing layer, which probably helps a lot in this testcase.

Bug 524925 might help us here too.

A hierarchical profile would be very useful.
(Reporter)

Comment 45

6 years ago
> Bug 524925 might help us here too.
this build already including those changes, but not sure if it helps in this case
(Reporter)

Comment 46

6 years ago
On N9 default browser leafs animation works even with suspended Child process, so all textures are uploaded to UI process, and animation rules... and after that there is no any interactions with content process, only UI textures transformation
(Reporter)

Comment 47

6 years ago
Created attachment 567951 [details]
Text callgraph profile
Attachment #562899 - Attachment is obsolete: true
Attachment #567599 - Attachment is obsolete: true
Attachment #567848 - Attachment is obsolete: true
(Reporter)

Comment 48

6 years ago
Created attachment 567952 [details]
SVG report, edges 0.05%, nodes 0.05% limit
Looks like no reflow involved (good, bug 524925 is working), lots of restyling and invalidation. To avoid that we need to offload animations to the layer system.
What's the plan for this work?
(Reporter)

Comment 51

6 years ago
Comment on attachment 567379 [details] [diff] [review]
WIP overallocation for ThebesLayerBuffer

I guess this patch is outdated, and there are some another solution for this problem.
Also this was not very helpful
Attachment #567379 - Flags: feedback?(romaxa)
You need to log in before you can comment on or make changes to this bug.