9.29 KB, patch
|Details | Diff | Splinter Review|
10.87 KB, text/plain
733.31 KB, text/plain
563.73 KB, image/svg+xml
Try to run falling leaves example on Mobile FF with CSS3D transforms enabled And found that it is extremely slow because we are creating and destroying textures on each paint... looking deeply into Swap/Upload code and sizes found that for each paint we are trying to upload texture with different size ... why ? leaves images dimensions are not changing...
If it works this way: 1) Calc size of non-rotated ThebesLayer (with non-rotated frames tree) 2) Create ThebesLayer (Texture) for non-transformed tree 3) Apply all transforms to ThebesLayer (rotation/scale et.c) In that case if ThebesLayer content not changing all the time, then we should endup with just compositing Leave image picture with different transform (rotation) How it feels like it working: 1) Calc size of rotated ThebesLayer frame subtree 2) alloc ThebesLayer with that size 3) blit resulting shadow thebes layer to GPU 4) calc new size of ThebesLayer rotated to different angle et.c.
It should definitely be the first way. Looking at this on a desktop build (with gDumpPaintList = 1), all the transformed ThebesLayers are consistently 101x100. Any chance you can try track down why the sizes are changing for you? We will still be hitting the problem of allocating temporary surfaces for ContainerLayers.
I've workarounded problem with temp surface with patch from 689416, so that is not a problem in this case
Created attachment 562899 [details] Desktop build backtrace I see the same problem for desktop build too SurfaceBufferOGL::CreateBuffer()::328 CreateClampOrRepeatTextureImage, sz[62,44] SurfaceBufferOGL::CreateBuffer()::328 CreateClampOrRepeatTextureImage, sz[54,53] SurfaceBufferOGL::CreateBuffer()::328 CreateClampOrRepeatTextureImage, sz[96,78] SurfaceBufferOGL::CreateBuffer()::328 CreateClampOrRepeatTextureImage, sz[79,90] SurfaceBufferOGL::CreateBuffer()::328 CreateClampOrRepeatTextureImage, sz[100,90] SurfaceBufferOGL::CreateBuffer()::328 CreateClampOrRepeatTextureImage, sz[43,80] .....
This looks to be causes by the leaves being partially clipped by their bounding box. As they move onto the screen we are reallocating the textures for every frame. We should look at a strategy for precaching some of this content to avoid this much texture churn. This sounds very similar to the problem we had with mobile XUL UI scrolling in, so CC'ing roc and heeen. This isn't 3d transforms specific, you can create this performance problem with any animated transform.
I don't think we should use hints. I think we should use thresholding so we don't reallocate textures too often, e.g. when the visible region grows, allocate a texture that's a fair bit bigger than the new visible region, so if it grows some more we don't have to allocate again. And when it shrinks, don't reallocate the texture unless the visible region size is a lot smaller than the current texture. Just rounding up to the nearest power of 2 wouldn't always work well if the size oscillates around a power of 2.
I'm thinking we might be able to create large textures and actually work on doing a sort of 'memory manager' for textures. If we would have a user mode texture memory manager we could drastically reduce the amount of texture creation. The one downside would be clamping and such, but that could be overcome relatively easily, for example by complicating our geometries slightly (i.e. surrounding the quad by and outline the size of the area sampling pixels outside the quad with the clamped texture coordinate). This approach seems like a more solid long term approach, although it is a little more complicated.
(In reply to Bas Schouten (:bas) from comment #7) > I'm thinking we might be able to create large textures and actually work on > doing a sort of 'memory manager' for textures. If we would have a user mode > texture memory manager we could drastically reduce the amount of texture > creation. Ideally we'd just have a jemalloc that could allocate from a pool to avoid unnecessary fragmentation etc, but from my understanding of jemalloc, that's currently impossible.
(In reply to Joe Drew (:JOEDREW!) from comment #8) > (In reply to Bas Schouten (:bas) from comment #7) > > I'm thinking we might be able to create large textures and actually work on > > doing a sort of 'memory manager' for textures. If we would have a user mode > > texture memory manager we could drastically reduce the amount of texture > > creation. > > Ideally we'd just have a jemalloc that could allocate from a pool to avoid > unnecessary fragmentation etc, but from my understanding of jemalloc, that's > currently impossible. There's the additional problem that you need to be a little smarter. Since we don't just need contiguous texture memory we need the surface of the texture to 'fit' in the available surface area. Essentially unlike a 1D system like a heap manager you're doing a 2D system.
Ideally you wouldn't ever want to create and destroy textures if all you do is rotate and translate images that don't change. I guess trying to keep textures around for at least the next frame would help with the creation/destruction churn, if not with repainting each frame.
I agree with comment 6. That would also ameliorate other issues like window resizing being dog slow (not that anyone cares). We used to see issues like this in fennec too, on page load, where the visible region would steadily grow as new content was added. My only concern is wasted memory, but a quick talos run would be an easy estimate. (And we could statically bound that through our resize policy.)
Also, what does webkit do?
webkit just allocate all textures ahead... and don't bother with saving memory on that... if allocation fail then it will just not render layer...
Created attachment 564457 [details] [diff] [review] WIP overallocation for GL layers Patch to make OpenGL allocate larger textures than required when the size is increasing. For sizing heuristics, I went with the smaller of 50% larger and 5x the current size increment. We can probably do better than this, this was just an initial test. As mentioned in the comment, we could probably get more information from layout to prevent memory being wasted unnecessarily here. Romaxa: Does this help in a noticeable way on mobile?
actually this does not have any effect on mobile, because on mobile we are using another code path, I believe that is something around BasicShadowableThebesLayer and software thebesLayerBuffer
Also I tried same code in http://mxr.mozilla.org/mozilla-central/source/gfx/layers/ThebesLayerBuffer.cpp#273 but it did not work as expected, we still creating buffers on every paint
Good catch. I think we code we need to modify is: http://mxr.mozilla.org/mozilla-central/source/gfx/layers/opengl/ThebesLayerOGL.cpp#842 This is even worse than the desktop GL situation, because we're resizing the texture when it shrinks as well as when it grows. We should be able to fix this and gain a lot on all resize operations.
Created attachment 565158 [details] [diff] [review] WIP overallocation for ThebesLayerBuffer I'm struggling to get useful improvements from this without hints as to the possible sizes. I've been working with a modified version of falling leaves, with only a single leaf and the attached patch to view the texture re-allocations. The current version gets us down to around 5 textures allocated as the leaf moves onto the screen (plus two when it passes under the text - From a new layer). This is slightly higher sometimes, depending on the path it takes. The current code is overallocating by huge margins though, so will waste excessive amounts of video memory. Coming up with a good allocation algorithm that balances texture allocations and memory wastage is key here, and I'm out of ideas. I'm going to have a look at passing down the full possible bounds of the layer so we can clamp allocations to that size.
What if the layer contents are actually growing though, due to reflow? It'll be hard to predict what the maximum bounds are going to be. What if a) we're very conservative about how we shrink, so at least we don't shrink until a couple of seconds at the smaller size and b) we increase by at least a factor of two every time we increase in a given direction?
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #19) > What if the layer contents are actually growing though, due to reflow? It'll > be hard to predict what the maximum bounds are going to be. Hrm, this is true. I was mainly thinking about the case of moving objects that are partially clipped by their container. Things that are actually changing size make this harder. > > What if a) we're very conservative about how we shrink, so at least we don't > shrink until a couple of seconds at the smaller size I didn't actually think about using time as a factor to change down texture size. This is a bit harder to implement, but probably worth it as we can afford to be much more aggressive with our growth strategy. > and b) we increase by > at least a factor of two every time we increase in a given direction? Sounds good to me.
Created attachment 567379 [details] [diff] [review] WIP overallocation for ThebesLayerBuffer New version that's much more aggressive in increasing buffer sizes, and uses a timeout to delete these when they stop being needed. I saw some cases of corruption with this earlier, but can't reproduce it now. It probably still exists, working on it. Romaxa: Does this fix the performance issues you are seeing? It results in a lot less allocations for me. I've been testing with a modified version of falling leaves, that only has a single leaf falling. Will upload this if you want to try it.
Created attachment 567599 [details] Allocation log Tested this patch, probably it make less allocation, but visually it almost not noticable, because we still having ~5-10 allocations per second, and that is just does not make any difference at the end. Here is dump of allocations with time(0) dumped
Also with this patch I see remote thebes viewport rendering black areas...
Wondering why we can't just pre-alloc full thebes Layer size if it's size less than some sanity value... let say TextureMax Size, or Shmem Max size... and if it is bigger than go with slow path...
As roc said, we often don't know what the size will be. We could get layout to communicate the current size of the ThebesLayer (not just the visible part), and then use a hybrid approach.
With this patch, is it just while the leaves are first coming into view that we have to reallocate textures a few times? Or is something else happening? It'd be interesting to break down the log to see the history of individual surfaces. We can gather the union of the GetBounds() of the various display items in a ThebesLayer to come up with an estimate of the maximum size, and feed the rectangular bounds of that in as a new Layer property --- say, GetBoundsHint or something like that. I guess if that bounds is smaller than some threshold, or close to the visible bounds, we should use it to set the initial buffer size.
For an individual leaf, we get around 4-5 (depending on frame rate and the path it takes) allocations as it moves onto the screen. The leaf then falls without needing any drawing, and we eventually delete the software buffer. No changes are made to the texture since we aren't invalidating anything. When the leaf gets moved back to the top (and offscreen), we invalidate and start the process again. So 4-5ish allocations per leaf, per cycle of the animation. If we could avoid the invalidation and keep the leaf texture untouched between cycles we'd benefit a lot here. The bounds hint would also help the first cycle of the animation, but that's probably less important for this particular example.
By the way, why aren't our optimizations kicking in here to use an ImageLayer instead of a ThebesLayer?
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #28) > By the way, why aren't our optimizations kicking in here to use an > ImageLayer instead of a ThebesLayer? Is there any way to test this optimization? I can't think of anything other than a compiled code test. I ask because this optimization could get broken fairly easily without anybody noticing. Purely hypothetical, but the original patch author could have incorrectly rebased the patch before landing so that this optimization has never worked at all.
We might need to add a test-only API to nsIDOMWindowUtils or something. or maybe inIDOMUtils.
(In reply to Matt Woodrow (:mattwoodrow) from comment #27) > When the leaf gets moved back to the top (and offscreen), we invalidate and > start the process again. To clarify, we don't invalidate as such, but we don't use the layer for multiples frames and we end up with a new layer when we do need it. Do you think it's worth trying to fix this? Or just focus on minimizing the allocations while it's visible.
Checked Falling leaves with patch from bug 695275, and it looks much better, but still we have some ImageLayers re-allocations when leaf going out from screen (dropped) and appear again on top (same size layer created again), shall we recycle image layers here? Another problem is that sometimes leaf appear (offscreen possibly) as opaque image, and later turn into alpha layer content type, and that cause additional reallocation
I think probably ImageLayerOGL (or maybe more of GL layers) should cache and recycle buffers. That seems simpler and more flexible than trying to keep layers around for offscreen elements.
(In reply to Oleg Romashin (:romaxa) from comment #32) > Another problem is that sometimes leaf appear (offscreen possibly) as opaque > image, and later turn into alpha layer content type, and that cause > additional reallocation Can we get around this by always using the same texture type and only varying the shader depending on whether the image is opaque or not?
In this particular example we're creating image layers when the visible region is empty, so we could just fix that. A more general fix for content type changes would probably be good too though.
> recycle buffers. That seems simpler and more flexible than trying to keep > layers around for offscreen elements. Then we need to cache ShadowableImage layer buffers... currently we recycle implented only for GL layers...
> Can we get around this by always using the same texture type and only > varying the shader depending on whether the image is opaque or not? that would bring us 2x more memory used per image on mobile, for opaque images (like plugins)
Also if we won't cache ImageLayers, then we need to add Recycle cache trackers for all backends OGL,D3D, Basic, + ShadowableImageLayer buffers
(In reply to Oleg Romashin (:romaxa) from comment #38) > > Can we get around this by always using the same texture type and only > > varying the shader depending on whether the image is opaque or not? > that would bring us 2x more memory used per image on mobile, for opaque > images (like plugins) Right OK. I keep forgetting about 16-bit vs 32-bit. I think caching at the closest level to the allocation is probably best as long as the rest of the stack is not much overhead.
> long as the rest of the stack is not much overhead. IPC ShadowableLayers stack is a bit more expensive... Create/Transform/Paint transactions. but with recent Init/Swap API changes it should be faster, so probably it is ok.
Created attachment 567944 [details] Opreport with ImageLayer + cached buffers Ok, I've added simple Buffer cache tracker for image layers (Shadow and OGL), and and got it working much faster, but it very slow comparing to how webkit2/ render same thing... In fennec we have 12FPS, and all CPU used, see plain opreport For webkit2 browser with HW acceleration on the same device I see 30FPS + 30% CPU free...
Surfaces are cached but it still dropping layers, and that cause repainting (uploading) to GPU. that is not a big deal if we do it without expensive work in layout.
Webkit can hand CSS animations over to the compositing layer, which probably helps a lot in this testcase. Bug 524925 might help us here too. A hierarchical profile would be very useful.
> Bug 524925 might help us here too. this build already including those changes, but not sure if it helps in this case
On N9 default browser leafs animation works even with suspended Child process, so all textures are uploaded to UI process, and animation rules... and after that there is no any interactions with content process, only UI textures transformation
Created attachment 567951 [details] Text callgraph profile
Looks like no reflow involved (good, bug 524925 is working), lots of restyling and invalidation. To avoid that we need to offload animations to the layer system.
What's the plan for this work?
Comment on attachment 567379 [details] [diff] [review] WIP overallocation for ThebesLayerBuffer I guess this patch is outdated, and there are some another solution for this problem. Also this was not very helpful
You need to log in before you can comment on or make changes to this bug.