Closed Bug 689760 Opened 13 years ago Closed 2 years ago

Falling leaves are very slow, re-allocating textures on each paint...

Tracking

()

Status:

RESOLVED WORKSFORME

People

(Reporter: romaxa, Unassigned)

References

(
URL
)

Details

Attachments

(4 files, 5 obsolete files)

Desktop build backtrace 13 years ago Oleg Romashin (:romaxa) 3.39 KB, text/plain		Details
WIP overallocation for GL layers 13 years ago Matt Woodrow (:mattwoodrow) 1.80 KB, patch		Details \| Diff \| Splinter Review
WIP overallocation for ThebesLayerBuffer 13 years ago Matt Woodrow (:mattwoodrow) 3.24 KB, patch		Details \| Diff \| Splinter Review
WIP overallocation for ThebesLayerBuffer 13 years ago Matt Woodrow (:mattwoodrow) 9.29 KB, patch		Details \| Diff \| Splinter Review
Allocation log 13 years ago Oleg Romashin (:romaxa) 12.25 KB, text/plain		Details
Allocation log for ImageLayers 13 years ago Oleg Romashin (:romaxa) 13.38 KB, text/plain		Details
Opreport with ImageLayer + cached buffers 13 years ago Oleg Romashin (:romaxa) 10.87 KB, text/plain		Details
Text callgraph profile 13 years ago Oleg Romashin (:romaxa) 733.31 KB, text/plain		Details
SVG report, edges 0.05%, nodes 0.05% limit 13 years ago Oleg Romashin (:romaxa) 563.73 KB, image/svg+xml		Details

Oleg Romashin (:romaxa)

Reporter

Description

•

13 years ago

Try to run falling leaves example on Mobile FF with CSS3D transforms enabled And found that it is extremely slow because we are creating and destroying textures on each paint... looking deeply into Swap/Upload code and sizes found that for each paint we are trying to upload texture with different size ... why ? leaves images dimensions are not changing...

Oleg Romashin (:romaxa)

Reporter

Comment 1

•

13 years ago

If it works this way: 1) Calc size of non-rotated ThebesLayer (with non-rotated frames tree) 2) Create ThebesLayer (Texture) for non-transformed tree 3) Apply all transforms to ThebesLayer (rotation/scale et.c) In that case if ThebesLayer content not changing all the time, then we should endup with just compositing Leave image picture with different transform (rotation) How it feels like it working: 1) Calc size of rotated ThebesLayer frame subtree 2) alloc ThebesLayer with that size 3) blit resulting shadow thebes layer to GPU 4) calc new size of ThebesLayer rotated to different angle et.c.

Matt Woodrow (:mattwoodrow)

Comment 2

•

13 years ago

It should definitely be the first way. Looking at this on a desktop build (with gDumpPaintList = 1), all the transformed ThebesLayers are consistently 101x100. Any chance you can try track down why the sizes are changing for you? We will still be hitting the problem of allocating temporary surfaces for ContainerLayers.

Oleg Romashin (:romaxa)

Reporter

Comment 3

•

13 years ago

I've workarounded problem with temp surface with patch from 689416, so that is not a problem in this case

Oleg Romashin (:romaxa)

Reporter

Comment 4

•

13 years ago

Attached file Desktop build backtrace (obsolete) — Details

I see the same problem for desktop build too SurfaceBufferOGL::CreateBuffer()::328 CreateClampOrRepeatTextureImage, sz[62,44] SurfaceBufferOGL::CreateBuffer()::328 CreateClampOrRepeatTextureImage, sz[54,53] SurfaceBufferOGL::CreateBuffer()::328 CreateClampOrRepeatTextureImage, sz[96,78] SurfaceBufferOGL::CreateBuffer()::328 CreateClampOrRepeatTextureImage, sz[79,90] SurfaceBufferOGL::CreateBuffer()::328 CreateClampOrRepeatTextureImage, sz[100,90] SurfaceBufferOGL::CreateBuffer()::328 CreateClampOrRepeatTextureImage, sz[43,80] .....

Matt Woodrow (:mattwoodrow)

Comment 5

•

13 years ago

This looks to be causes by the leaves being partially clipped by their bounding box. As they move onto the screen we are reallocating the textures for every frame. We should look at a strategy for precaching some of this content to avoid this much texture churn. This sounds very similar to the problem we had with mobile XUL UI scrolling in, so CC'ing roc and heeen. This isn't 3d transforms specific, you can create this performance problem with any animated transform.

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 6

•

13 years ago

I don't think we should use hints. I think we should use thresholding so we don't reallocate textures too often, e.g. when the visible region grows, allocate a texture that's a fair bit bigger than the new visible region, so if it grows some more we don't have to allocate again. And when it shrinks, don't reallocate the texture unless the visible region size is a lot smaller than the current texture. Just rounding up to the nearest power of 2 wouldn't always work well if the size oscillates around a power of 2.

Bas Schouten (:bas.schouten)

Comment 7

•

13 years ago

I'm thinking we might be able to create large textures and actually work on doing a sort of 'memory manager' for textures. If we would have a user mode texture memory manager we could drastically reduce the amount of texture creation. The one downside would be clamping and such, but that could be overcome relatively easily, for example by complicating our geometries slightly (i.e. surrounding the quad by and outline the size of the area sampling pixels outside the quad with the clamped texture coordinate). This approach seems like a more solid long term approach, although it is a little more complicated.

Joe Drew (not getting mail)

Comment 8

•

13 years ago

(In reply to Bas Schouten (:bas) from comment #7) > I'm thinking we might be able to create large textures and actually work on > doing a sort of 'memory manager' for textures. If we would have a user mode > texture memory manager we could drastically reduce the amount of texture > creation. Ideally we'd just have a jemalloc that could allocate from a pool to avoid unnecessary fragmentation etc, but from my understanding of jemalloc, that's currently impossible.

Joe Drew (not getting mail)

Updated

•

13 years ago

Attachment #562899 - Attachment is patch: false

Bas Schouten (:bas.schouten)

Comment 9

•

13 years ago

(In reply to Joe Drew (:JOEDREW!) from comment #8) > (In reply to Bas Schouten (:bas) from comment #7) > > I'm thinking we might be able to create large textures and actually work on > > doing a sort of 'memory manager' for textures. If we would have a user mode > > texture memory manager we could drastically reduce the amount of texture > > creation. > > Ideally we'd just have a jemalloc that could allocate from a pool to avoid > unnecessary fragmentation etc, but from my understanding of jemalloc, that's > currently impossible. There's the additional problem that you need to be a little smarter. Since we don't just need contiguous texture memory we need the surface of the texture to 'fit' in the available surface area. Essentially unlike a 1D system like a heap manager you're doing a 2D system.

Florian Hänel [:heeen]

Comment 10

•

13 years ago

Ideally you wouldn't ever want to create and destroy textures if all you do is rotate and translate images that don't change. I guess trying to keep textures around for at least the next frame would help with the creation/destruction churn, if not with repainting each frame.

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Comment 11

•

13 years ago

I agree with comment 6. That would also ameliorate other issues like window resizing being dog slow (not that anyone cares). We used to see issues like this in fennec too, on page load, where the visible region would steadily grow as new content was added. My only concern is wasted memory, but a quick talos run would be an easy estimate. (And we could statically bound that through our resize policy.)

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Comment 12

•

13 years ago

Also, what does webkit do?

Oleg Romashin (:romaxa)

Reporter

Comment 13

•

13 years ago

webkit just allocate all textures ahead... and don't bother with saving memory on that... if allocation fail then it will just not render layer...

Matt Woodrow (:mattwoodrow)

Comment 14

•

13 years ago

Attached patch WIP overallocation for GL layers (obsolete) — Details — Splinter Review

Patch to make OpenGL allocate larger textures than required when the size is increasing. For sizing heuristics, I went with the smaller of 50% larger and 5x the current size increment. We can probably do better than this, this was just an initial test. As mentioned in the comment, we could probably get more information from layout to prevent memory being wasted unnecessarily here. Romaxa: Does this help in a noticeable way on mobile?

Attachment #564457 - Flags: feedback?(roc)

Oleg Romashin (:romaxa)

Reporter

Comment 15

•

13 years ago

actually this does not have any effect on mobile, because on mobile we are using another code path, I believe that is something around BasicShadowableThebesLayer and software thebesLayerBuffer

Oleg Romashin (:romaxa)

Reporter

Comment 16

•

13 years ago

Also I tried same code in http://mxr.mozilla.org/mozilla-central/source/gfx/layers/ThebesLayerBuffer.cpp#273 but it did not work as expected, we still creating buffers on every paint

Matt Woodrow (:mattwoodrow)

Comment 17

•

13 years ago

Good catch. I think we code we need to modify is: http://mxr.mozilla.org/mozilla-central/source/gfx/layers/opengl/ThebesLayerOGL.cpp#842 This is even worse than the desktop GL situation, because we're resizing the texture when it shrinks as well as when it grows. We should be able to fix this and gain a lot on all resize operations.

Matt Woodrow (:mattwoodrow)

Comment 18

•

13 years ago

Attached patch WIP overallocation for ThebesLayerBuffer (obsolete) — Details — Splinter Review

I'm struggling to get useful improvements from this without hints as to the possible sizes. I've been working with a modified version of falling leaves, with only a single leaf and the attached patch to view the texture re-allocations. The current version gets us down to around 5 textures allocated as the leaf moves onto the screen (plus two when it passes under the text - From a new layer). This is slightly higher sometimes, depending on the path it takes. The current code is overallocating by huge margins though, so will waste excessive amounts of video memory. Coming up with a good allocation algorithm that balances texture allocations and memory wastage is key here, and I'm out of ideas. I'm going to have a look at passing down the full possible bounds of the layer so we can clamp allocations to that size.

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 19

•

13 years ago

What if the layer contents are actually growing though, due to reflow? It'll be hard to predict what the maximum bounds are going to be. What if a) we're very conservative about how we shrink, so at least we don't shrink until a couple of seconds at the smaller size and b) we increase by at least a factor of two every time we increase in a given direction?

Matt Woodrow (:mattwoodrow)

Comment 20

•

13 years ago

(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #19) > What if the layer contents are actually growing though, due to reflow? It'll > be hard to predict what the maximum bounds are going to be. Hrm, this is true. I was mainly thinking about the case of moving objects that are partially clipped by their container. Things that are actually changing size make this harder. > > What if a) we're very conservative about how we shrink, so at least we don't > shrink until a couple of seconds at the smaller size I didn't actually think about using time as a factor to change down texture size. This is a bit harder to implement, but probably worth it as we can afford to be much more aggressive with our growth strategy. > and b) we increase by > at least a factor of two every time we increase in a given direction? Sounds good to me.

Matt Woodrow (:mattwoodrow)

Comment 21

•

13 years ago

Attached patch WIP overallocation for ThebesLayerBuffer — Details — Splinter Review

New version that's much more aggressive in increasing buffer sizes, and uses a timeout to delete these when they stop being needed. I saw some cases of corruption with this earlier, but can't reproduce it now. It probably still exists, working on it. Romaxa: Does this fix the performance issues you are seeing? It results in a lot less allocations for me. I've been testing with a modified version of falling leaves, that only has a single leaf falling. Will upload this if you want to try it.

Attachment #564457 - Attachment is obsolete: true

Attachment #565158 - Attachment is obsolete: true

Attachment #564457 - Flags: feedback?(roc)

Attachment #567379 - Flags: feedback?(romaxa)

Oleg Romashin (:romaxa)

Reporter

Comment 22

•

13 years ago

Attached file Allocation log (obsolete) — Details

Tested this patch, probably it make less allocation, but visually it almost not noticable, because we still having ~5-10 allocations per second, and that is just does not make any difference at the end. Here is dump of allocations with time(0) dumped

Oleg Romashin (:romaxa)

Reporter

Comment 23

•

13 years ago

Also with this patch I see remote thebes viewport rendering black areas...

Oleg Romashin (:romaxa)

Reporter

Comment 24

•

13 years ago

Wondering why we can't just pre-alloc full thebes Layer size if it's size less than some sanity value... let say TextureMax Size, or Shmem Max size... and if it is bigger than go with slow path...

Matt Woodrow (:mattwoodrow)

Comment 25

•

13 years ago

As roc said, we often don't know what the size will be. We could get layout to communicate the current size of the ThebesLayer (not just the visible part), and then use a hybrid approach.

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 26

•

13 years ago

With this patch, is it just while the leaves are first coming into view that we have to reallocate textures a few times? Or is something else happening? It'd be interesting to break down the log to see the history of individual surfaces. We can gather the union of the GetBounds() of the various display items in a ThebesLayer to come up with an estimate of the maximum size, and feed the rectangular bounds of that in as a new Layer property --- say, GetBoundsHint or something like that. I guess if that bounds is smaller than some threshold, or close to the visible bounds, we should use it to set the initial buffer size.

Matt Woodrow (:mattwoodrow)

Comment 27

•

13 years ago

For an individual leaf, we get around 4-5 (depending on frame rate and the path it takes) allocations as it moves onto the screen. The leaf then falls without needing any drawing, and we eventually delete the software buffer. No changes are made to the texture since we aren't invalidating anything. When the leaf gets moved back to the top (and offscreen), we invalidate and start the process again. So 4-5ish allocations per leaf, per cycle of the animation. If we could avoid the invalidation and keep the leaf texture untouched between cycles we'd benefit a lot here. The bounds hint would also help the first cycle of the animation, but that's probably less important for this particular example.

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 28

•

13 years ago

By the way, why aren't our optimizations kicking in here to use an ImageLayer instead of a ThebesLayer?

Matt Woodrow (:mattwoodrow)

Comment 29

•

13 years ago

(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #28) > By the way, why aren't our optimizations kicking in here to use an > ImageLayer instead of a ThebesLayer? Is there any way to test this optimization? I can't think of anything other than a compiled code test. I ask because this optimization could get broken fairly easily without anybody noticing. Purely hypothetical, but the original patch author could have incorrectly rebased the patch before landing so that this optimization has never worked at all.

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 30

•

13 years ago

We might need to add a test-only API to nsIDOMWindowUtils or something. or maybe inIDOMUtils.

Matt Woodrow (:mattwoodrow)

Comment 31

•

13 years ago

(In reply to Matt Woodrow (:mattwoodrow) from comment #27) > When the leaf gets moved back to the top (and offscreen), we invalidate and > start the process again. To clarify, we don't invalidate as such, but we don't use the layer for multiples frames and we end up with a new layer when we do need it. Do you think it's worth trying to fix this? Or just focus on minimizing the allocations while it's visible.

Matt Woodrow (:mattwoodrow)

Updated

•

13 years ago

Depends on: 695275

Oleg Romashin (:romaxa)

Reporter

Comment 32

•

13 years ago

Checked Falling leaves with patch from bug 695275, and it looks much better, but still we have some ImageLayers re-allocations when leaf going out from screen (dropped) and appear again on top (same size layer created again), shall we recycle image layers here? Another problem is that sometimes leaf appear (offscreen possibly) as opaque image, and later turn into alpha layer content type, and that cause additional reallocation

Oleg Romashin (:romaxa)

Reporter

Comment 33

•

13 years ago

Attached file Allocation log for ImageLayers (obsolete) — Details

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 34

•

13 years ago

I think probably ImageLayerOGL (or maybe more of GL layers) should cache and recycle buffers. That seems simpler and more flexible than trying to keep layers around for offscreen elements.

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 35

•

13 years ago

(In reply to Oleg Romashin (:romaxa) from comment #32) > Another problem is that sometimes leaf appear (offscreen possibly) as opaque > image, and later turn into alpha layer content type, and that cause > additional reallocation Can we get around this by always using the same texture type and only varying the shader depending on whether the image is opaque or not?

Matt Woodrow (:mattwoodrow)

Comment 36

•

13 years ago

In this particular example we're creating image layers when the visible region is empty, so we could just fix that. A more general fix for content type changes would probably be good too though.

Oleg Romashin (:romaxa)

Reporter

Comment 37

•

13 years ago

> recycle buffers. That seems simpler and more flexible than trying to keep > layers around for offscreen elements. Then we need to cache ShadowableImage layer buffers... currently we recycle implented only for GL layers...

Oleg Romashin (:romaxa)

Reporter

Comment 38

•

13 years ago

> Can we get around this by always using the same texture type and only > varying the shader depending on whether the image is opaque or not? that would bring us 2x more memory used per image on mobile, for opaque images (like plugins)

Oleg Romashin (:romaxa)

Reporter

Comment 39

•

13 years ago

Also if we won't cache ImageLayers, then we need to add Recycle cache trackers for all backends OGL,D3D, Basic, + ShadowableImageLayer buffers

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 40

•

13 years ago

(In reply to Oleg Romashin (:romaxa) from comment #38) > > Can we get around this by always using the same texture type and only > > varying the shader depending on whether the image is opaque or not? > that would bring us 2x more memory used per image on mobile, for opaque > images (like plugins) Right OK. I keep forgetting about 16-bit vs 32-bit. I think caching at the closest level to the allocation is probably best as long as the rest of the stack is not much overhead.

Oleg Romashin (:romaxa)

Reporter

Comment 41

•

13 years ago

> long as the rest of the stack is not much overhead. IPC ShadowableLayers stack is a bit more expensive... Create/Transform/Paint transactions. but with recent Init/Swap API changes it should be faster, so probably it is ok.

Oleg Romashin (:romaxa)

Reporter

Comment 42

•

13 years ago

Attached file Opreport with ImageLayer + cached buffers — Details

Ok, I've added simple Buffer cache tracker for image layers (Shadow and OGL), and and got it working much faster, but it very slow comparing to how webkit2/ render same thing... In fennec we have 12FPS, and all CPU used, see plain opreport For webkit2 browser with HW acceleration on the same device I see 30FPS + 30% CPU free...

Oleg Romashin (:romaxa)

Reporter

Comment 43

•

13 years ago

Surfaces are cached but it still dropping layers, and that cause repainting (uploading) to GPU. that is not a big deal if we do it without expensive work in layout.

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 44

•

13 years ago

Webkit can hand CSS animations over to the compositing layer, which probably helps a lot in this testcase. Bug 524925 might help us here too. A hierarchical profile would be very useful.

Oleg Romashin (:romaxa)

Reporter

Comment 45

•

13 years ago

> Bug 524925 might help us here too. this build already including those changes, but not sure if it helps in this case

Oleg Romashin (:romaxa)

Reporter

Comment 46

•

13 years ago

On N9 default browser leafs animation works even with suspended Child process, so all textures are uploaded to UI process, and animation rules... and after that there is no any interactions with content process, only UI textures transformation

Oleg Romashin (:romaxa)

Reporter

Comment 47

•

13 years ago

Attached file Text callgraph profile — Details

Attachment #562899 - Attachment is obsolete: true

Attachment #567599 - Attachment is obsolete: true

Attachment #567848 - Attachment is obsolete: true

Oleg Romashin (:romaxa)

Reporter

Comment 48

•

13 years ago

Attached image SVG report, edges 0.05%, nodes 0.05% limit — Details

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 49

•

13 years ago

Looks like no reflow involved (good, bug 524925 is working), lots of restyling and invalidation. To avoid that we need to offload animations to the layer system.

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Comment 50

•

13 years ago

What's the plan for this work?

Oleg Romashin (:romaxa)

Reporter

Comment 51

•

13 years ago

Comment on attachment 567379 [details] [diff] [review] WIP overallocation for ThebesLayerBuffer I guess this patch is outdated, and there are some another solution for this problem. Also this was not very helpful

Attachment #567379 - Flags: feedback?(romaxa)

BMO Automation

Updated

•

2 years ago

Severity: normal → S3

Gregory Pappas [:gregp]

Comment 52

•

2 years ago

This test case performs great now.

URL: http://romaxa.bolshe.net/css3d/leaves... → https://webkit.org/blog-files/leaves/...

Status: NEW → RESOLVED

Closed: 2 years ago

Resolution: --- → WORKSFORME

You need to log in before you can comment on or make changes to this bug.