Open Bug 945082 Opened 6 years ago Updated Last year

Improve the performance when we use IntermediateSurface

Categories

(Core :: Graphics, defect, P2)

ARM
Gonk (Firefox OS)
defect

Tracking

()

tracking-b2g +

People

(Reporter: timdream, Unassigned)

References

(Depends on 1 open bug)

Details

(Keywords: perf, Whiteboard: [c=progress p= s= u=])

Attachments

(13 files, 4 obsolete files)

1.41 MB, text/html
Details
1.41 MB, text/html
Details
1.78 MB, text/html
Details
1.67 MB, text/html
Details
6.37 MB, text/html
Details
29.19 KB, image/png
Details
31.12 KB, image/png
Details
3.11 KB, patch
nical
: review+
Details | Diff | Splinter Review
2.91 KB, patch
Details | Diff | Splinter Review
625 bytes, text/html
gweng
: feedback+
jerry
: feedback-
Details
3.27 KB, patch
Details | Diff | Splinter Review
24.84 KB, patch
Details | Diff | Splinter Review
7.51 KB, patch
roc
: review+
Details | Diff | Splinter Review
+++ This bug was initially created as a clone of Bug #919410 +++

Unfortuately after bug 919410 and bug 937630 the lock screen fade-out animation is still not < 60fps. This is very observable on a Peak. We need to start the investigation again.

Blocking: Should block release because Andreas said lock screen has to be prefect.
Observation: 

If you slide your finger very slowly all the way to the
edge and only release your finger once the icons returns from their
highlighted state, the transition will be smoother, but still not 60fps.

There is already a observable built-in delay between the icons leaving their highlight state and the fade-out transition so I am not entirely sure this is fixable in Gaia.

Greg, BenWa, any idea?
Flags: needinfo?(gweng)
Flags: needinfo?(bgirard)
Solving bug 938737 here will help. Currently we have to prepare the background app when the user has chosen to unlock the screen. This is partly responsible for the delay.
Flags: needinfo?(bgirard)
Update: per discussion on IRC, BenWa will "profile again this week and see if the bugs we have in flight will do a sufficient dent or if we need to focus some efforts into this."

We also need to figure out the reason (or, any performance gain) we have when we decided to discard the app rendering at first place. I have little information beside what cjones said and did :-(

Another idea of mine is to offload frame.setVisible() a little more, somehow patch mozbrowser API to allow system app to say to the platform, hey, please repaint the app but do not resume it. That would give us app rendering before unlock but app will not eat away CPU/GPU we need for fade-out transition.

The last idea (which I considered as an workaround) is to cover the app frame with screenshot. However I suspect this will be over-complex and might not give us desired performance gain (think about the extra DOM and image blob in-memory, etc.)
(In reply to Tim Guan-tin Chien [:timdream] (MoCo-TPE) (please ni?) from comment #3)
> The last idea (which I considered as an workaround) is to cover the app
> frame with screenshot. However I suspect this will be over-complex and might
> not give us desired performance gain (think about the extra DOM and image
> blob in-memory, etc.)

The good news is Alive told me AppWindow have already come with this feature built-in. So the level of effort to do this in Gaia might not as heavy as I think.

Vivien, Alive told me there are some work going on this front for making screenshot itself faster or allowing rendering to stay in the frame? Not entirely sure. Could you shed some light here?
Flags: needinfo?(21)
What's the fps now, and how are we measuring it?
I am estimating with my eyeball and its about 5fps. I don't care about a mathematical 60fps but what we have right now is not good enough. This is a peak Keon. If this is only this one device, I am ok with not blocking on it, but I would like to make sure our shipping devices are ok.
(In reply to Tim Guan-tin Chien [:timdream] (MoCo-TPE) (please ni?) from comment #4)
> (In reply to Tim Guan-tin Chien [:timdream] (MoCo-TPE) (please ni?) from
> comment #3)
> > The last idea (which I considered as an workaround) is to cover the app
> > frame with screenshot. However I suspect this will be over-complex and might
> > not give us desired performance gain (think about the extra DOM and image
> > blob in-memory, etc.)
> 
> The good news is Alive told me AppWindow have already come with this feature
> built-in. So the level of effort to do this in Gaia might not as heavy as I
> think.
> 
> Vivien, Alive told me there are some work going on this front for making
> screenshot itself faster or allowing rendering to stay in the frame? Not
> entirely sure. Could you shed some light here?

There is some work by the layout/gfx team on something they called the Screenshot API. As far as I understand it will basically returns a copy of the buffer that is currently displayed in the compositor for the app, and will avoid the round-trip from the parent->child->parent which uses canvas. (i don't have the bug number handy).

About allowing rendering to stay in the frame. The current thinking is that it takes too much memory but I would like to re-discuss that with Roc due to the fact that we currently take a screenshot for the cards view and this also force us to slow down some basic interaction like clicking on the home button to avoid some ugliness.
Flags: needinfo?(21)
The only thing I can do is to turn off the icon animation (light on/off), and it's the only thing I haven't tested yet. I'll try to do this and report the result to Tim.
Flags: needinfo?(gweng)
(In reply to Andreas Gal :gal from comment #6)
> I am estimating with my eyeball and its about 5fps.

That's not OK. We should have a delay in starting the OMTAnimation (preparing the background) but once start it should be 60 FPS.

(In reply to Vivien Nicolas (:vingtetun) (:21) (RTO - Review Time Off until 09/12/13) from comment #7)

Screenshoting for something like this is a backup solution.

I have a bit of time so I'm going to try to understand why we throw the app layer tree away and if we can avoid doing so while still pausing the app.
(In reply to Andreas Gal :gal from comment #6)
> If this is only this one device, I am ok with not blocking on it,
> but I would like to make sure our shipping devices are ok.

Triage: the v1.3 shipping device will be available next week, we would verify that then. But we should work under an assumption that we need to support bigger phones.
Update: just received the shipping device in the office however it's DPI being set incorrectly. Will verify by hacking Gecko dpi settings.
Assignee: nobody → timdream
Update: I wasn't been able to set the dppx with user.js or prefs.js so I will try to rebuild Gecko.
Can you file a bug? We can probably read the right dppx from the hardware layer, or at the very least we want a pref.
(In reply to Andreas Gal :gal from comment #13)
> Can you file a bug? We can probably read the right dppx from the hardware
> layer, or at the very least we want a pref.

We could get the right dppx from the hardware layer, it's just that the DPI supplied from Gonk does comes with the right value. Steven has been told about the issue. Steven, is this issue being tracked on Bugzilla?

The pref exists, it's |layout.css.devPixelsPerPx|. It's just that somehow Gecko in the phone from vendor does not respond to it; in fact when I stop b2g and change /data/b2g/mozilla/*/default.prefs.js, it got overwritten every time Gecko restarts.
Flags: needinfo?(styang)
Something sounds borked with the vendor build. The HWC has xdpi and ydpi fields we can query. I assume we are doing that. Are we?
(In reply to Andreas Gal :gal from comment #15)
> Something sounds borked with the vendor build. The HWC has xdpi and ydpi
> fields we can query. I assume we are doing that. Are we?

Yes, https://mxr.mozilla.org/mozilla-central/source/widget/gonk/nsWindow.cpp#534
(In reply to Tim Guan-tin Chien [:timdream] (MoCo-TPE) (please ni?) from comment #14)
> (In reply to Andreas Gal :gal from comment #13)
> > Can you file a bug? We can probably read the right dppx from the hardware
> > layer, or at the very least we want a pref.
> 
> We could get the right dppx from the hardware layer, it's just that the DPI
> supplied from Gonk does comes with the right value. Steven has been told
> about the issue. Steven, is this issue being tracked on Bugzilla?
> 
it's fixed in partner's vendor build.

> The pref exists, it's |layout.css.devPixelsPerPx|. It's just that somehow
> Gecko in the phone from vendor does not respond to it; in fact when I stop
> b2g and change /data/b2g/mozilla/*/default.prefs.js, it got overwritten
> every time Gecko restarts.
Flags: needinfo?(styang)
(In reply to Steven Yang [:styang] from comment #17)
> it's fixed in partner's vendor build.
> 

Good, will work with you to get my build updated.

I also happen to see the animation smoothness actually depends on # of DOM in the lock screen div, for example, if I have music controls on the lock screen, the fade-out animation is never smooth. Not sure what to do about it.

Our best shot now is bug 938737, and maybe bug 940842 (will-animate). The latter will not reach 1.3.
Depends on: 938737
(In reply to Tim Guan-tin Chien [:timdream] (MoCo-TPE) (please ni?) from comment #18)
> (In reply to Steven Yang [:styang] from comment #17)
> > it's fixed in partner's vendor build.
> > 
> 
> Good, will work with you to get my build updated.
>

Update: Steven told me the partner have fixed their build, but they haven't release it yet.
Today Jerry and I tried to find the root cause of this bug. This is what we discovered today:

(Tested on Flatfish with Gecko 29, "almost" newest Gaia (1/06))

1. With both transform & opacity animation, the tasks are all handled by the main thread of the app, and then the performance would be very poor.

2. With only transform, the animation can go smooth, and the tasks would be OMTA. They're also fewer then than the case #1.

3. With only opacity, the animation would worse then case #2, but it's better than case #1. The log shows the tasks are crowding in the main thread, too.

4. With no animation, of course there's no lag.

So it seems the root cause is the opacity animation, and the OMTA failed while there are two animation.
The trace log can be opened with Chrome (until Bug 951418). And there is my branch with the test app:

https://github.com/snowmantw/gaia/tree/issue938738

(I modified the template app, so just launch it and you can see the result).
(In reply to Greg Weng [:snowmantw] from comment #20)
> Today Jerry and I tried to find the root cause of this bug. This is what we
> discovered today:
> 
> (Tested on Flatfish with Gecko 29, "almost" newest Gaia (1/06))
> 
> 1. With both transform & opacity animation, the tasks are all handled by the
> main thread of the app, and then the performance would be very poor.
> 
> 2. With only transform, the animation can go smooth, and the tasks would be
> OMTA. They're also fewer then than the case #1.
> 

I've only realized you are talking about OMTA after re-read the comment.
So we have fall off from the OMTA again? That shouldn't happen since we only transit opacity and transform.
Jerry, do we have enough information to tell why we are not OMTA enabled in the (1) scenario?
Flags: needinfo?(hshih)
I can't make sure we use OMTA or not here.
Transform and opacity anim should use omta.

I need to check these functions to see if we use the omta path.
nsDisplayList.cpp:
AddAnimationsAndTransitionsToLayer()

nsLayoutUtils.cpp
nsLayoutUtils::HasAnimationsForCompositor()


In attachment 8356480 [details] case (only use opacity anim), we can see it costs a lot of time(about 200ms) in ClientThebesLayer::PaintThebes()*2 for "Template" app.
In attachment 8356481 [details] case (only use transform anim), it call ClientThebesLayer::PaintThebes()(about 90ms) only once.

The compositor CompositorParent::Composite() performances are also different in these two case. (40ms vs 5ms)
I need to check why the opacity anim costs a lot.
Flags: needinfo?(hshih)
Benwa can you help here?
Flags: needinfo?(bgirard)
One thing to check is that the opacity is actually on a layer. It is possible that the opacity is being set on a part of the layer (not the whole layer) and then we will not get OMTA (or other optimisations).
No longer depends on: 938737
See Also: → 938737
blocking-b2g: 1.3? → 1.3+
May take a couple of days for BenWa to get to this, he has to cover on APZC issues before :cwiiis comes back from PTO.
De-assign myself since we are looking more at a platform issue rather than a Gaia one.

Jerry, are you still actively investigate this issue? Could you be the assignee here?
Assignee: timdream → nobody
Component: Gaia::System::Lockscreen → General
Flags: needinfo?(hshih)
Assignee: nobody → hshih
Flags: needinfo?(hshih)
Yes, I still try to narrow this problem.
I'm using layerscope and profiler tool to check the performance bottleneck.
Component: General → Graphics
Product: Firefox OS → Core
Whiteboard: [FT:System-Platform]
Attached file o_bezier_5s.html
My repo is the last mozilla-central(hg version 163112) and I get a different result comparing with attachment 8356480 [details].

My test case is a 5 second css opacity animation. The compositor cost about 16ms for one frame, but it cost about 40~60 ms in attachment 8356480 [details] for one frame.

I will create another test mixing css translate and opacity anim(as the lockscreen used).

--
If we have opacity anim, b2g will create a new render target for composition.

http://dxr.mozilla.org/mozilla-central/source/gfx/layers/composite/ContainerLayerComposite.cpp#224

We can see this block spends a lot of time in profiler. I will check what we do in this block.
Hi Nicholas,

Should we use useIntermediateSurface for a container layer which has opacity value?
http://dxr.mozilla.org/mozilla-central/source/gfx/layers/Layers.cpp#900

If we use useIntermediateSurface, we will create a temp surface and draw all the container layer's children into the temp surface. Finally, just apply one alpha operation for the temp surface.
Even though we reduce the alpha operation, we spend a lot of time for creating a new render target and stall the gpu pipeline by setting the new render target.

http://dxr.mozilla.org/mozilla-central/source/gfx/layers/composite/ContainerLayerComposite.cpp#224
create render target:
http://dxr.mozilla.org/mozilla-central/source/gfx/layers/composite/ContainerLayerComposite.cpp#263
set render target:
http://dxr.mozilla.org/mozilla-central/source/gfx/layers/composite/ContainerLayerComposite.cpp#273

I try to compose all layers individually and I get better performance for opengl compositor.
	
This is my result:
1. use temp surface(original)
attachment 8360368 [details]
2. no temp surface
attachment 8360365 [details]
Flags: needinfo?(ncameron)
(In reply to Jerry Shih[:jerry] from comment #36)
> Hi Nicholas,
> 
> Should we use useIntermediateSurface for a container layer which has opacity
> value?
> http://dxr.mozilla.org/mozilla-central/source/gfx/layers/Layers.cpp#900
> 
> If we use useIntermediateSurface, we will create a temp surface and draw all
> the container layer's children into the temp surface. Finally, just apply
> one alpha operation for the temp surface.
> Even though we reduce the alpha operation, we spend a lot of time for
> creating a new render target and stall the gpu pipeline by setting the new
> render target.

This is interesting data, thank you!

It sounds like we want to tweak our use of useIntermediateSurface, we should not abandon it completely. It brought some benefits on desktop, I think (otherwise we wouldn't use it at all). It also allows us to do OMTA on opacity very cheaply.

If you don't useIntermediateSurface, how are you doing the opacity adjustment? I didn't think we had a mechanism in place to do that on the compositor side.
Flags: needinfo?(ncameron)
I forgot to explain my test env.
I have a full screen image and I have css animations combining with transform(scale from 1x to 2x) and opacity(alpha from 1.0 to 0.0) for this image.
The test(comment 36) is on nexus 4(1280*800) device.
I think the low resolution device will not get this large boost.

--------

(In reply to Nick Cameron [:nrc] from comment #37)
> (In reply to Jerry Shih[:jerry] from comment #36)
> It sounds like we want to tweak our use of useIntermediateSurface, we should not abandon it completely. It
> brought some benefits on desktop, I think (otherwise we wouldn't use it at all). It also allows us to do OMTA

Can we check the IsCompositingCheap() to check we have gpu backend? I think the alpha operation is not a large cost for gpu.

if(IsCompositingCheap()){
  useIntermediateSurface=false;
}

> If you don't useIntermediateSurface, how are you doing the opacity
> adjustment? I didn't think we had a mechanism in place to do that on the
> compositor side.

I just modify this condition. Don't use intermediate surface even if (opacity != 1.0f && HasMultipleChildren()).

http://dxr.mozilla.org/mozilla-central/source/gfx/layers/Layers.cpp#899


I don't know this change whether get the correct rendering result.

case 1: use temp surface
1. create temp surface
2. draw each child in container
http://dxr.mozilla.org/mozilla-central/source/gfx/layers/composite/ContainerLayerComposite.cpp#286
3. draw the temp surface back to the screen surface
http://dxr.mozilla.org/mozilla-central/source/gfx/layers/composite/ContainerLayerComposite.cpp#332

case 2: don't use temp surface
1. draw each child in container into the screen surface
http://dxr.mozilla.org/mozilla-central/source/gfx/layers/composite/ContainerLayerComposite.cpp#286

I think these two cases should be the same. If I lost some important alpha value calculation from OMTA mechanism, we will get the wrong screen,but we can fix it.
I would like to just do alpha operation for each child and don't to break the opengl rendering pipeline(don't set a new render target).
Flags: needinfo?(ncameron)
By the way, we still cost a lot for the homescreen's first frame.
I need more time to find the solution for homescreen's rendering.
I was incorrect before. Opacity does not rely on getting an intermediate surface, although you won't get correct results (I think) by just ignoring a request for an intermediate surface because that affects how the effective opacity is calculated.

Anyway, you should not be getting an intermediate surface here in any case. So rather than ignoring it on the compositor, you should try to find out why you have one in the first place. You should only need one if you have a mask layer (caused by border-radius) or there is a 3D transform on the layer (and, I think, only one that can't be replaced with a 2D transform, i.e., involves perspective). See http://dxr.mozilla.org/mozilla-central/source/gfx/layers/Layers.cpp#883 for where the need for an intermediate surface should be computed.
Flags: needinfo?(ncameron)
(In reply to Nick Cameron [:nrc] from comment #40)
> I was incorrect before. Opacity does not rely on getting an intermediate
> surface, although you won't get correct results (I think) by just ignoring a
> request for an intermediate surface because that affects how the effective
> opacity is calculated.

Yes, I can't pass the reftest with just ignoring the intermediate surface request.
It looks like there's two problem. Intermediate surfaces and the layer building. Layer building will be solved by will-change (bug 940842). Jerry you should focus on the intermediate surface. Can you find out exactly why we are seeing them?
Flags: needinfo?(bgirard)
I track for the first usage of UseIntermediateSurface() from hg log.
The first commit log came from roc.

Hi roc,
Do you know why we use UseIntermediateSurface() mechanism?
Does it improve better performance(I think it can reduce the alpha operation) or have other reason?
Flags: needinfo?(roc)
I can take this one.

We use intermediate surface when we have an alpha value applied to a container layer (group opacity) or a non trivial 3D transform like described on Comment 40.

It makes performance worse because we have to do more work but its sometimes require to render the page properly. Can you find out why UseIntermediateSurface is true? We should be able to identify what on the page is causing it and avoid it ideally.
Flags: needinfo?(roc)
Finally, I know why we use temp surface. Thanks to BenWa and Matt.

http://help.adobe.com/en_US/illustrator/cs/using/images/pt_11.png
The left image is applying opacity to each circle separately, and the right side is applying it to the group.
If I remove the temp surface condition, it's hard to get the right side result.

The lockscreen fade-out anim is using group opacity animation, so we use temp surface code flow.
I will try to use opacity animation for every element instead of a group and see the profiler result.

I also want to add more checking condition for temp surface.
If all children in one group are not overlapping, we can skip to alloc a temp surface and compose each child individually. I will try the reftest first. If this idea is wrong, please let me know. Thanks!

B2G only checks (opacity != 1.0f && HasMultipleChildren())
http://dxr.mozilla.org/mozilla-central/source/gfx/layers/Layers.cpp#899
Note that group opacity uses a framebuffer object which is very slow on the PowerVR flatfish device.
Do we know why we have multiple layers inside the opacity group, and what they are for?

We might be able to get down to a single layer (or multiple non-overlapping layers with Jerry's suggestion in comment 45) with some gaia tweaks.

It might also be nice to add something to the layer border debugging tool that visually shows when we get an intermediate surface created, since we know this to be a badly performing path.
(In reply to Matt Woodrow (:mattwoodrow) from comment #47) 
> It might also be nice to add something to the layer border debugging tool
> that visually shows when we get an intermediate surface created, since we
> know this to be a badly performing path.

This is a very good idea. We add is function into the LayerScope tool enhancement plan.
I will also add one more kind of layer border color for temp surface.
From Gaia view, one thing we may solve this (with Gecko help) is to take screenshot and make it be animated. This is equal to manually blending and compositing the all DOMs. However, the issue is we now take a long time to take a screenshot, and this may reduce the benefits bring from this solution.
Duplicate of this bug: 956621
Show intermediate surface region in setting's "draw layers borders" debug option.
The border's color is yellow.
This is the lockscreen's layer dump.
The ref layer is homescreen.
The the container which uses temp surface is marked "######" below.

layer  ContainerLayerComposite (0x49766800) [shadow-visible=< (x=0, y=0, w=768, h=1280); >] [visible=< (x=0, y=0, w=768, h=1280); >] [opaqueContent] [metrics={ viewport=(x=0.000000, y=0.000000, w=384.000000, h=640.000000) viewportScroll=(x=0.000000, y=0.000000) displayport=(x=0.000000, y=0.000000, w=0.000000, h=0.000000) scrollableRect=(x=0.000000, y=0.000000, w=384.000000, h=640.000000) scrollId=0 }]
layer    ThebesLayerComposite (0x49767000) [shadow-clip=(x=0, y=0, w=0, h=0)] [clip=(x=0, y=0, w=0, h=0)] [not visible]
layer    ColorLayerComposite (0x49768000) [not visible] [opaqueContent] [color=rgba(0, 0, 0, 1)]
layer    ContainerLayerComposite (0x49767400) [shadow-clip=(x=0, y=0, w=768, h=1280)] [shadow-visible=< (x=0, y=0, w=768, h=1280); >] [clip=(x=0, y=0, w=768, h=1280)] [visible=< (x=0, y=0, w=768, h=1280); >] [opaqueContent]
layer      ThebesLayerComposite (0x49767800) [shadow-visible=< (x=0, y=40, w=768, h=1240); >] [visible=< (x=0, y=40, w=768, h=1240); >] [opaqueContent] [valid=< (x=0, y=40, w=768, h=1240); >]
layer        DeprecatedContentHostDoubleBuffered (0x45f3dfd0) [buffer-rect=(x=0, y=0, w=768, h=1280)] [buffer-rotation=(x=0, y=0)]
layer          GrallocDeprecatedTextureHostOGL (0x4547dbc0) [size=(width=768, height=1280)] [format=SurfaceFormat::B8G8R8X8] [flags=TEXTURE_USE_NEAREST_FILTER]
layer          GrallocDeprecatedTextureHostOGL (0x4547de80) [size=(width=768, height=1280)] [format=SurfaceFormat::B8G8R8X8] [flags=TEXTURE_USE_NEAREST_FILTER]
layer      RefLayerComposite (0x434f5800) [shadow-clip=(x=0, y=40, w=768, h=1140)] [shadow-transform=[ 1 0; 0 1; 0 40; ]] [shadow-visible=< (x=0, y=0, w=768, h=1140); >] [clip=(x=0, y=40, w=768, h=1140)] [transform=[ 1 0; 0 1; 0 40; ]] [visible=< (x=0, y=0, w=768, h=1140); >] [id=2]
layer        ContainerLayerComposite (0x496e3000) [shadow-transform=[ 1 0; 0 1; 0 0; ]] [shadow-visible=< (x=0, y=0, w=768, h=1140); >] [postScale=1, 1] [transform=[ 1 0; 0 1; 0 0; ]] [visible=< (x=0, y=0, w=768, h=1140); >] [metrics={ viewport=(x=0.000000, y=0.000000, w=384.000000, h=570.000000) viewportScroll=(x=0.000000, y=0.000000) displayport=(x=0.000000, y=0.000000, w=0.000000, h=0.000000) scrollableRect=(x=0.000000, y=0.000000, w=384.000000, h=570.000000) scrollId=2 }] [preScale=1, 1]
layer          ThebesLayerComposite (0x496e3800) [shadow-visible=< (x=0, y=0, w=768, h=1140); >] [visible=< (x=0, y=0, w=768, h=1140); >] [valid=< (x=0, y=0, w=768, h=1140); >]
layer            DeprecatedContentHostDoubleBuffered (0x478876d0) [buffer-rect=(x=0, y=0, w=768, h=1140)] [buffer-rotation=(x=0, y=0)]
layer              GrallocDeprecatedTextureHostOGL (0x49a18fc0) [size=(width=768, height=1140)] [format=SurfaceFormat::B8G8R8A8] [flags=NoFlags]
layer              GrallocDeprecatedTextureHostOGL (0x49a195c0) [size=(width=768, height=1140)] [format=SurfaceFormat::B8G8R8A8] [flags=NoFlags]
layer      ContainerLayerComposite (0x45f21800) [shadow-clip=(x=0, y=0, w=768, h=1280)] [shadow-transform=[ 1.37656 0; 0 1.37656; -144.6 -240.999; ]] [shadow-visible=< (x=0, y=0, w=1536, h=2560); >] [clip=(x=0, y=0, w=768, h=1280)] [transform=[ 1.00748 0; 0 1.00748; -2.87402 -4.79004; ]] [visible=< (x=0, y=0, w=1536, h=2560); >] [preScale=0.5, 0.5]
######### this layer has group-opacity
layer        ContainerLayerComposite (0x45f5ac00) [shadow-transform=[ 2 0; 0 2; 0 0; ]] [shadow-visible=< (x=0, y=0, w=1536, h=2560); >] [transform=[ 2 0; 0 2; 0 0; ]] [visible=< (x=0, y=0, w=1536, h=2560); >] [opacity=0.992516] [opaqueContent] [usesTmpSurf] [preScale=0.5, 0.5]
layer          ThebesLayerComposite (0x45f93c00) [shadow-visible=< (x=0, y=0, w=1536, h=2560); >] [visible=< (x=0, y=0, w=1536, h=2560); >] [opaqueContent] [valid=< (x=0, y=0, w=1536, h=2560); >]
layer            DeprecatedContentHostDoubleBuffered (0x45f3edd0) [buffer-rect=(x=0, y=0, w=1536, h=2560)] [buffer-rotation=(x=0, y=0)] [paint-will-resample]
layer              GrallocDeprecatedTextureHostOGL (0x433c0d00) [size=(width=1536, height=2560)] [format=SurfaceFormat::B8G8R8X8] [flags=NoFlags]
layer              GrallocDeprecatedTextureHostOGL (0x433c0a00) [size=(width=0, height=0)] [format=SurfaceFormat::UNKNOWN] [flags=NoFlags]
layer          CanvasLayerComposite (0x45f94000) [shadow-clip=(x=0, y=0, w=1536, h=2560)] [shadow-transform=[ 1 0; 0 1; 0 1100; ]] [shadow-visible=< (x=0, y=0, w=768, h=160); >] [clip=(x=0, y=0, w=1536, h=2560)] [postScale=2, 2] [transform=[ 1 0; 0 1; 0 1100; ]] [visible=< (x=0, y=0, w=768, h=160); >]
layer            ImageHost (0x433c0300) [picture-rect=(x=0, y=0, w=0, h=0)]
layer              StreamTextureHostOGL (0x433e3b00) [size=(width=768, height=160)] [format=SurfaceFormat::R8G8B8A8] [flags=TEXTURE_NEEDS_Y_FLIP]
layer          ThebesLayerComposite (0x45f95400) [shadow-visible=< (x=1148, y=2300, w=120, h=20); (x=280, y=2320, w=96, h=80); (x=1148, y=2320, w=120, h=80); (x=1148, y=2400, w=120, h=20); >] [visible=< (x=1148, y=2300, w=120, h=20); (x=280, y=2320, w=96, h=80); (x=1148, y=2320, w=120, h=80); (x=1148, y=2400, w=120, h=20); >] [valid=< (x=280, y=2300, w=988, h=120); >]
layer            DeprecatedContentHostDoubleBuffered (0x40663270) [buffer-rect=(x=280, y=2300, w=988, h=120)] [buffer-rotation=(x=0, y=0)] [paint-will-resample]
layer              GrallocDeprecatedTextureHostOGL (0x433c2300) [size=(width=988, height=120)] [format=SurfaceFormat::B8G8R8A8] [flags=NoFlags]
layer              GrallocDeprecatedTextureHostOGL (0x433c2240) [size=(width=0, height=0)] [format=SurfaceFormat::UNKNOWN] [flags=NoFlags]
layer      ThebesLayerComposite (0x45f95800) [shadow-visible=< (x=0, y=0, w=768, h=40); >] [visible=< (x=0, y=0, w=768, h=40); >] [opaqueContent] [valid=< (x=0, y=0, w=768, h=40); >]
layer        DeprecatedContentHostDoubleBuffered (0x45fbe970) [buffer-rect=(x=0, y=0, w=768, h=40)] [buffer-rotation=(x=0, y=0)]
layer          GrallocDeprecatedTextureHostOGL (0x433c2640) [size=(width=768, height=40)] [format=SurfaceFormat::B8G8R8X8] [flags=TEXTURE_USE_NEAREST_FILTER]
layer          GrallocDeprecatedTextureHostOGL (0x433c2380) [size=(width=0, height=0)] [format=SurfaceFormat::UNKNOWN] [flags=TEXTURE_USE_NEAREST_FILTER]
Attached patch layer.patch (obsolete) — Splinter Review
Check container children's overlap region.
If there is no overlap region, we can compose these layers without intermediate surface.

--
This patch can't pass the reftest, but I think this idea is right.
Is this another case which will break my rule?

What's wrong with my visible rect calculation?
I think we can just use children's local transform matrix. We don't need to apply the parent's matrix, because each children will apply the same parent's matrix later.
Comment on attachment 8365778 [details] [diff] [review]
show_temp_surface.patch

Review of attachment 8365778 [details] [diff] [review]:
-----------------------------------------------------------------

LGTM, nical?
Attachment #8365778 - Flags: review?(nical.bugzilla)
Attachment #8365778 - Flags: review?(nical.bugzilla) → review+
(In reply to Jerry Shih[:jerry] from comment #52)
> ######### this layer has group-opacity
> layer        ContainerLayerComposite (0x45f5ac00) [shadow-transform=[ 2 0; 0
> 2; 0 0; ]] [shadow-visible=< (x=0, y=0, w=1536, h=2560); >] [transform=[ 2
> 0; 0 2; 0 0; ]] [visible=< (x=0, y=0, w=1536, h=2560); >] [opacity=0.992516]
> [opaqueContent] [usesTmpSurf] [preScale=0.5, 0.5]

This is really silly. We're using an intermediate surface to set an opacity of 0.99 which is not visible. We should make sure we don't have an 'opacity: 0.99' somewhere in gaia :(.
Comment on attachment 8366644 [details] [diff] [review]
layer.patch

Review of attachment 8366644 [details] [diff] [review]:
-----------------------------------------------------------------

I recommend writing several reftest to test various cases of this. You have the right approach, I can't find anything wrong. Once you have reftest you can use them to debug.

::: gfx/layers/Layers.cpp
@@ +876,5 @@
> +      if (clipRect2 && clipRect2->IsEmpty())
> +        continue;
> +      if (child2->GetVisibleRegion().IsEmpty())
> +        continue;
> +      gfxRect child1_bound=child1->GetLocalTransform().TransformBounds(gfxRect(child1->GetVisibleRegion().GetBounds()));

You should roll this out of the loop
(In reply to Jerry Shih[:jerry] from comment #52)
> layer          CanvasLayerComposite (0x45f94000) [shadow-clip=(x=0, y=0,
> w=1536, h=2560)] [shadow-transform=[ 1 0; 0 1; 0 1100; ]] [shadow-visible=<
> (x=0, y=0, w=768, h=160); >] [clip=(x=0, y=0, w=1536, h=2560)] [postScale=2,
> 2] [transform=[ 1 0; 0 1; 0 1100; ]] [visible=< (x=0, y=0, w=768, h=160); >]

This canvas layer has looks to be at twice the resolution of the screen unless I'm not understanding the hidpi scaling. There could be a mistake here. If we're doing this incorrectly then we're making screenshot much slower then it should be. Both to draw and composite.
(In reply to Benoit Girard (:BenWa) from comment #56)
> (In reply to Jerry Shih[:jerry] from comment #52)
> This is really silly. We're using an intermediate surface to set an opacity
> of 0.99 which is not visible. We should make sure we don't have an 'opacity:
> 0.99' somewhere in gaia :(.

This log is a one frame log of the lockscreen "fade out" animation(the anim contain opacity 1.0->0.0 and scale 1x to 2x). I just post the opacity=0.99 frame. This container layer is shown on the screen.


(In reply to Benoit Girard (:BenWa) from comment #58)
> (In reply to Jerry Shih[:jerry] from comment #52)
> This canvas layer has looks to be at twice the resolution of the screen
> unless I'm not understanding the hidpi scaling. There could be a mistake
> here. If we're doing this incorrectly then we're making screenshot much
> slower then it should be. Both to draw and composite.

The 2x size buffer is because of the scale(2) rule.
http://dxr.mozilla.org/mozilla-central/source/layout/base/FrameLayerBuilder.cpp#2946
If we have scale animation, we will generate the final anim size buffer and send to the compositor for display the animation(OMTA). So I think all element in this container will become 2x size. Maybe for the best visual quality.

I want to add a new css prop to tell the b2g just use the orig size and scale up the orig size buffer with gpu. The prop will like "scale-method:resample | normal". I don't know this is a good idea or just like a workaround.
Simplified extract from the layer tree:

ContainerLayerComposite (0x45f5ac00) [transform=[ 2 0; 0 2; 0 0; ]] [visible=< (x=0, y=0, w=1536, h=2560); >] [opacity=0.992516] [opaqueContent] [usesTmpSurf] [preScale=0.5, 0.5]
    ThebesLayerComposite (0x45f93c00) [visible=< (x=0, y=0, w=1536, h=2560); >] [opaqueContent]
    CanvasLayerComposite (0x45f94000) [clip=(x=0, y=0, w=1536, h=2560)] [postScale=2, 2] [transform=[ 1 0; 0 1; 0 1100; ]] [visible=< (x=0, y=0, w=768, h=160); >]
    ThebesLayerComposite (0x45f95400) [visible=< (x=1148, y=2300, w=120, h=20); (x=280, y=2320, w=96, h=80); (x=1148, y=2320, w=120, h=80); (x=1148, y=2400, w=120, h=20); >]


So we have a <canvas> element, with content both in front and behind it.

What is the <canvas> element being used for here? Could we swap it out with a static image instead? Or maybe gecko could do that internally.

If we're not actively changing the <canvas> then we should probably avoid it, since it always has an active layer.

We'll take a hit when switching from an active layer to not, since we'll have to repaint the ThebesLayers. But then we should have only a single layer, and the animation should run better.
The canvas element is for the new slider (Bug 919410). If it is the most important factor affect the performance, I may prepare a static unlocking image to replace the element at the moment user unlock it.

A similar solution we have discussed is we may take screenshots for those large elements on the lockscreen. Like the '#lockscreen-header' element, which is a container contains sub elements like the clock, the connection information and others would cause main thread been occupied by much tasks. We tried to hide it and than the main thread got freed. If we can put it in an iframe or something we can take a partial screenshot (rather than whole lockscreen), we may be able to make the performance better. But as I know, screenshot has its own performance issue now.
Little bit sad that we had to reimplement the animation in canvas, I expect it was a fixable bug.

It does appear that the <canvas> element is causing the issues for the fade-out animation though, so switching it to a static image is probably the best idea.

We could do that within gecko, but there's no obvious way to detect that you don't want to continue modifying the canvas. We'd either have to provide an API, or just wait for N frames of the animation without an canvas changes.
But we tuned the performance well while the canvas still works normally. This tuning has been tested both in our demo app and the real lockscreen, although the lockscreen perform not as well as the demo app. What we done was to hide some other elements from the lockscreen (as I said, I don't know why the header would cause it).

And I don't think canvas on lockscreen is a *bug*. The bug is that the DOM elements can't be aligned as precise as possible when them are in animation, so we had have glitches and other annoying issues that taken us so many time tried to solve them. Another issue is if it is actually an animation, we should or should not perform them with DOMs? For example, if your animation can be done with DOM elements, but with some extreme tricky methods (I knew some guys use DOM elements "render" 3D animation without and before WebGL and CSS 3D transformation...), to use SVG, canvas or even WebGL may not be a blamable way.
(In reply to Jerry Shih[:jerry] from comment #59)
> (In reply to Benoit Girard (:BenWa) from comment #56)
> > (In reply to Jerry Shih[:jerry] from comment #52)
> > This is really silly. We're using an intermediate surface to set an opacity
> > of 0.99 which is not visible. We should make sure we don't have an 'opacity:
> > 0.99' somewhere in gaia :(.
> 
> This log is a one frame log of the lockscreen "fade out" animation(the anim
> contain opacity 1.0->0.0 and scale 1x to 2x). I just post the opacity=0.99
> frame. This container layer is shown on the screen.
> 
> 
> (In reply to Benoit Girard (:BenWa) from comment #58)
> > (In reply to Jerry Shih[:jerry] from comment #52)
> > This canvas layer has looks to be at twice the resolution of the screen
> > unless I'm not understanding the hidpi scaling. There could be a mistake
> > here. If we're doing this incorrectly then we're making screenshot much
> > slower then it should be. Both to draw and composite.
> 
> The 2x size buffer is because of the scale(2) rule.
> http://dxr.mozilla.org/mozilla-central/source/layout/base/FrameLayerBuilder.
> cpp#2946
> If we have scale animation, we will generate the final anim size buffer and
> send to the compositor for display the animation(OMTA). So I think all
> element in this container will become 2x size. Maybe for the best visual
> quality.
> 
> I want to add a new css prop to tell the b2g just use the orig size and
> scale up the orig size buffer with gpu. The prop will like
> "scale-method:resample | normal". I don't know this is a good idea or just
> like a workaround.

Actually there is a bug in the logic of FrameLayerBuilder. If we animate zooming out quickly (100s of ms), we should simply render at the lower resolution and then scale it. The lockscreen zooms in and fades out. There is no point rendering it 2x or 5x or whatever we set in the CSS. We should render it 1x and then just zoom it. Dito for the homescreen zoom in/zoom out animation.
Attached patch region.patchSplinter Review
change for comment 57.

--
This patch adds more condition for temp surface creation.
If all children in a container are not overlapping, we can just compose them individually without temp surface.

I can't pass the test layout/reftests/bugs/602200-4.html in bug 602200.
I still don't know why we should snap the transform matrix.
But in the overlap region testing, I think I can ignore the snapping concept.
I can just use the local matrix to transfrom the bounding rect and check the transformed bounding rect.
Maybe I lose something.
Attachment #8366644 - Attachment is obsolete: true
Attachment #8367774 - Flags: feedback?(roc)
I think this is a bug in BaseRect::Intersects :)

Looking at the test, I think both layers will have identical visible regions, and it looks like Intersects will return false for that.
(In reply to Matt Woodrow (:mattwoodrow) from comment #66)
> Looking at the test, I think both layers will have identical visible
> regions, and it looks like Intersects will return false for that.

  bool Intersects(const Sub& aRect) const
  {
    return x < aRect.XMost() && aRect.x < XMost() &&
           y < aRect.YMost() && aRect.y < YMost();
  }

That returns true for r.Intersects(r) as long as r is non-empty.
Indeed, I'm not sure what I was reading. Back to the drawing board then.
(In reply to Jerry Shih[:jerry] from comment #65)
> Created attachment 8367774 [details] [diff] [review]
> region.patch

Something like this could work, however I don't understand how this patch helps since in comment #52 the ContainerLayer you care about has children that do actually overlap.

One option not yet discussed here --- but which we have discussed elsewhere --- is to cache composited layer subtrees between composition frames. In particular, if we're generating an intermediate surface anyway, and there's enough VRAM available, why not keep it around and use it again in the next frame? If a layer in the subtree changes we'll know about it and can throw away the cached intermediate surface.
Attached file translateZ.html
Jerry, Greg,

Quick idea from holiday; want to hear your feedback on this.

I figured instead of transit transform: scale(), would it be better to transit transform: translateZ() ? On Firefox desktop, it's pretty obvious we are not re-compositioning a 4x big layer for the later transform -- I can see the pixel is being enlarged; for lock screen, this is exactly the pixels we don't want Gecko to spend time to calculate.

PS does anyone know not compositioning a 4x big region for translateZ() is a CSS spec'd behavior or just a Gecko behavior?
Attachment #8368397 - Flags: feedback?(hshih)
Attachment #8368397 - Flags: feedback?(gweng)
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #69)
> (In reply to Jerry Shih[:jerry] from comment #65)
> > Created attachment 8367774 [details] [diff] [review]
> > region.patch
> 
> Something like this could work, however I don't understand how this patch
> helps since in comment #52 the ContainerLayer you care about has children
> that do actually overlap.
> 
> One option not yet discussed here --- but which we have discussed elsewhere
> --- is to cache composited layer subtrees between composition frames. In
> particular, if we're generating an intermediate surface anyway, and there's
> enough VRAM available, why not keep it around and use it again in the next
> frame? If a layer in the subtree changes we'll know about it and can throw
> away the cached intermediate surface.

That's a good idea, the content within the temporary surface shouldn't be changing during the animation.

I assume most of the cost here is the allocation, so we could cache the intermediate surface regardless (as long as the size or content type don't change).

Caching the contents (and skipping compositing of that subtree) would be a secondary extension to that.
PM Reviewed, moving to backlog.
Suggest fixing in 1.4
blocking-b2g: 1.3+ → backlog
(In reply to Adam Rogers (:arog) from comment #72)
> PM Reviewed, moving to backlog.
> Suggest fixing in 1.4

Did you run this by Andreas? I think he's been the one pushing for these FPS fixes in the release.
Flags: needinfo?(arogers)
I am ok with the Z transform fix so we can unblock on the larger issue and fix that for 1.4.
(In reply to Andreas Gal :gal from comment #74)
> I am ok with the Z transform fix so we can unblock on the larger issue and
> fix that for 1.4.

Okay - sounds good.
Flags: needinfo?(arogers)
(In reply to Andreas Gal :gal from comment #74)
> I am ok with the Z transform fix so we can unblock on the larger issue and
> fix that for 1.4.

I still have not heard from Jerry or Greg whether or not transform: translateZ() will save the day. I have no necessary stuff with me to do the profiling.
Sorry...you know in this family-oriented holidays there're only quite few chances to have personal time for coding and testing...
I don't think I will have enough time to test the idea until 2/3, even though I think the idea may work.
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #69)
> (In reply to Jerry Shih[:jerry] from comment #65)
> Something like this could work, however I don't understand how this patch
> helps since in comment #52 the ContainerLayer you care about has children
> that do actually overlap.

Yes, we still have overlap children here, but we still can add this condition for other case.

> One option not yet discussed here --- but which we have discussed elsewhere
> --- is to cache composited layer subtrees between composition frames. In
> particular, if we're generating an intermediate surface anyway, and there's
> enough VRAM available, why not keep it around and use it again in the next
> frame? If a layer in the subtree changes we'll know about it and can throw
> away the cached intermediate surface.

If we have enough memory, we can do that.


Layer tree:
Container Layer A
->Canvans Layer B
->Thebes Layer C

If We have a container layer A which have two children B and C.
We have an animation on A, does it easy to check the layer B and C dirty?
I think we need to hook all canvas and thebes content draw call.
I will take a look.
Attached patch only_scale_1x.patch (obsolete) — Splinter Review
If the anim is too short(<2.0s), we just use the original buffer size.


Here is the layer dump data.
The layer buffer and the canvas drawing region become smaller.

#######lockscreen start, the buffer size is (768,1280)
ContainerLayerComposite (0x47204400) [shadow-clip=(x=0, y=0, w=768, h=1280)] [shadow-transform=[ 1.06811 0; 0 1.06811; -26.1544 -43.5906; ]] [shadow-visible=< (x=0, y=0, w=768, h=1280); >] [clip=(x=0, y=0, w=768, h=1280)] [transform=[ 1.0096 0; 0 1.0096; -3.68805 -6.14679; ]] [visible=< (x=0, y=0, w=768, h=1280); >]
  ContainerLayerComposite (0x47204800) [shadow-visible=< (x=0, y=0, w=768, h=1280); >] [visible=< (x=0, y=0, w=768, h=1280); >] [opacity=0.990396] [opaqueContent] [usesTmpSurf]
    ThebesLayerComposite (0x47205000) [shadow-visible=< (x=0, y=0, w=768, h=1280); >] [visible=< (x=0, y=0, w=768, h=1280); >] [opaqueContent] [valid=< (x=0, y=0, w=768, h=1280); >]
      DeprecatedContentHostDoubleBuffered (0x461fe630) [buffer-rect=(x=0, y=0, w=768, h=1280)] [buffer-rotation=(x=0, y=0)] [paint-will-resample]
    CanvasLayerComposite (0x47206000) [shadow-clip=(x=0, y=0, w=768, h=1280)] [shadow-transform=[ 1 0; 0 1; 0 1100; ]] [shadow-visible=< (x=0, y=0, w=768, h=160); >] [clip=(x=0, y=0, w=768, h=1280)] [transform=[ 1 0; 0 1; 0 1100; ]] [visible=< (x=0, y=0, w=768, h=160); >]
      ImageHost (0x45597800) [picture-rect=(x=0, y=0, w=0, h=0)]
        StreamTextureHostOGL (0x4518f9a0) [size=(width=768, height=160)] [format=SurfaceFormat::R8G8B8A8] [flags=TEXTURE_NEEDS_Y_FLIP]
    ThebesLayerComposite (0x47206800) [shadow-visible=< (x=574, y=1150, w=60, h=10); (x=140, y=1160, w=48, h=40); (x=574, y=1160, w=60, h=40); (x=574, y=1200, w=60, h=10); >] [visible=< (x=574, y=1150, w=60, h=10); (x=140, y=1160, w=48, h=40); (x=574, y=1160, w=60, h=40); (x=574, y=1200, w=60, h=10); >] [valid=< (x=140, y=1150, w=494, h=60); >]
      DeprecatedContentHostDoubleBuffered (0x461feda0) [buffer-rect=(x=140, y=1150, w=494, h=60)] [buffer-rotation=(x=0, y=0)] [paint-will-resample]
#######lockscreen end
With attachment 8369096 [details] [diff] [review], we spend 15~23 ms for compositing on nexus4.
It's much better than the original one(40ms up).

I will try this patch on helix again.
(In reply to Tim Guan-tin Chien [:timdream] (MoCo-TPE) (please ni?) from comment #70) 
> I figured instead of transit transform: scale(), would it be better to
> transit transform: translateZ() ? On Firefox desktop, it's pretty obvious we
> are not re-compositioning a 4x big layer for the later transform -- I can
> see the pixel is being enlarged; for lock screen, this is exactly the pixels
> we don't want Gecko to spend time to calculate.
> 
> PS does anyone know not compositioning a 4x big region for translateZ() is a
> CSS spec'd behavior or just a Gecko behavior?

I see the buffer size which is 1x on nexus4 with translateZ.
TranslateZ also has better performance comparing with the 4x buffer size one.
We will get better performance for the "lockscreen fade-out anim" with translateZ or attachment 8369096 [details] [diff] [review].

These tests are on helix(800*480). 

1. original implement
We will have 4x buffer size(1600*920).

css:
#lockscreen {
  ....
  transition:
    transform 2s ease,
    opacity 2s ease,
}

#lockscreen.unlocked {
  transform: scale(2);
  opacity: 0;
  pointer-events: none;
}

profiler:
CompositorParent::CompositeInTransaction: avg:62.881867188 ms, max:73.048 ms, min:56.895 ms


2. modify gecko with attachment 8369096 [details] [diff] [review]
Fix buffer to 1x size(800*480).

css:
#lockscreen {
  ....
  transition:
    transform 2s ease,
    opacity 2s ease,
}

#lockscreen.unlocked {
  transform: scale(2);
  opacity: 0;
  pointer-events: none;
}

profiler:
CompositorParent::CompositeInTransaction: avg:29.402666016 ms, max:35.021 ms, min:26.946 ms


3. only modify gaia using translateZ()
The buffer size also will be 1x(800*480).

css:
#lockscreen {
  ....
  transition:
    transform 2s ease,
    opacity 2s ease,
}

#lockscreen.unlocked {
  transform: perspective(100px) translateZ(50px);
  opacity: 0;
  pointer-events: none;
}

profiler:
CompositorParent::CompositeInTransaction: avg:30.581933594 ms, max:38.148 ms, min:27.316 ms
Thanks Jerry :) I've just tried the patch on Peak too. The transition has never been smoother \o/

Will file a another bug and get it landed. Jerry, please amend the summary of this bug to make it more accurate.

I would still love to know not creating a 4x layer for translateZ() is a CSS feature or not.
(In reply to Tim Guan-tin Chien [:timdream] (MoCo-TPE) (please ni?) from comment #83)
> Thanks Jerry :) I've just tried the patch on Peak too. The transition has
> never been smoother \o/
> 
> Will file a another bug and get it landed. Jerry, please amend the summary
> of this bug to make it more accurate.
> 
> I would still love to know not creating a 4x layer for translateZ() is a CSS
> feature or not.

https://bugzilla.mozilla.org/show_bug.cgi?id=966711
Comment on attachment 8368397 [details]
translateZ.html

Finally the travel was finished and I'm glad to see the page just perfectly shows the visual and Gecko effects (according to Jerry's profiling). We should adopt this as a valid solution and use it in the similar cases in Gaia, until we find a better solution (would we?).
Attachment #8368397 - Flags: feedback?(gweng) → feedback+
Comment on attachment 8369096 [details] [diff] [review]
only_scale_1x.patch

Yeah this is the right approach I think.
Attachment #8369096 - Flags: feedback?(roc)
Tim, I am pretty sure CSS doesn't specify whether you are supposed to render at a higher resolution or use the GPU to scale. We already discovered that Safari behaves differently for scale and always scales with the GPU.
Comment on attachment 8369096 [details] [diff] [review]
only_scale_1x.patch

Review of attachment 8369096 [details] [diff] [review]:
-----------------------------------------------------------------

This approach sounds reasonable. However, to make the code more clear, I suggest returning the max(transition-time, animation-time) to the caller via an out-parameter and having the caller in FrameLayerBuilder check it. That way all our heuristics are in one place.
Attachment #8369096 - Flags: feedback?(roc) → feedback+
Quick test to try caching the intermediate surfaces.

Jerry, can you please test to see if this makes any difference to the composite times.
Flags: needinfo?(hshih)
Attached patch only_scale_1x.patch v2 (obsolete) — Splinter Review
Put all logic in FrameLayerBuilder as comment 88.
We can set the maximum scale size in ChooseScaleAndSetTransform().

--
What's the suitable max anim length for 1x buffer size?
1 Second? 1.5S? Or more?
Attachment #8369096 - Attachment is obsolete: true
Attachment #8369428 - Flags: feedback?(roc)
(In reply to Matt Woodrow (:mattwoodrow) from comment #89)
> Created attachment 8369277 [details] [diff] [review]
> Cache ContainerLayer temporary surface
> Quick test to try caching the intermediate surfaces.
> Jerry, can you please test to see if this makes any difference to the
> composite times.

Thanks Matt!

This is the composite time.
Device: Helix (800*480)

1. original implement(4x buffer size, no cache)
CompositorParent::CompositeInTransaction: avg:62.881867188 ms, max:73.048 ms, min:56.895 ms

2. apply cache attachment 8369277 [details] [diff] [review](4x buffer size)
CompositorParent::CompositeInTransaction: avg:49.215 ms, max:53.613 ms, min:46.543 ms

3. 1x buffer size
CompositorParent::CompositeInTransaction: avg:29.402666016 ms, max:35.021 ms, min:26.946 ms

4. 1x buffer size and cache
CompositeInTransaction:	avg:27.440599609 ms, max:34.138 ms, min:24.870 ms

This is not a huge boost for 1x buffer size.
We cost a lot for changing render target(pipeline stall?).

I have implemented this approach before, but I'm afraid of the OOM problem.
Maybe we can only cache the intermediate surface in animation.
If the animation has finished, we can release the cache buffer.
(check http://dxr.mozilla.org/mozilla-central/source/gfx/layers/composite/AsyncCompositionManager.cpp#399 ?)
Furthermore, if we have so many intermediate surface, we can turn off the cache mechanism.
Flags: needinfo?(hshih)
Comment on attachment 8369428 [details] [diff] [review]
only_scale_1x.patch v2

I would argue we should do this for any duration (as safari does). If it animates, we scale the layer.
(In reply to Andreas Gal :gal from comment #92)
> I would argue we should do this for any duration (as safari does). If it
> animates, we scale the layer.

It's easy to come up with examples where forcing a 1x scale during all animations is really bad for performance or quality. For the former, animate from scale(0.2) to scale(0.3). For the latter, animate from scale(3.0) to scale(4.0), or from scale(3.0) to scale(4.0) over 10 seconds.
er, "from scale(1.0) to scale(4.0) over 10 seconds".
Comment on attachment 8369428 [details] [diff] [review]
only_scale_1x.patch v2

Review of attachment 8369428 [details] [diff] [review]:
-----------------------------------------------------------------

::: layout/base/FrameLayerBuilder.cpp
@@ +2942,5 @@
>      // maximum scale.
>      if (aContainerFrame->GetContent() &&
>          nsLayoutUtils::HasAnimationsForCompositor(
>            aContainerFrame->GetContent(), eCSSProperty_transform)) {
> +      // Bug 945082

Don't put bug numbers in comments. People can dig it out with version control.

@@ +2953,5 @@
> +      float maxAnimLength;
> +
> +      nsLayoutUtils::GetMaximumAnimatedScaleAndLength(aContainerFrame->GetContent(),
> +                                             maxScale, maxAnimLength);
> +      scale = (maxAnimLength <= MAX_ORIGINAL_SIZE_ANIM_LENGTH) ? (gfxSize(1, 1)) : (maxScale);

We shouldn't use 1,1 here since that can be bad for performance if the animation is varying over a range of scales that are all quite small (and bad for quality if it's varying over a range of scales that are all quite large).

How about this: compute the maximum scale of all the keyframes as now, and also compute the minimum scale. If the mininmum >= 1.0, use it; if the maximum <= 1.0, use it; otherwise use 1.0.

That doesn't solve the "slowly increasing scale from 1.0" problem but I think we should figure that out later.

In my previous review comment I suggested moving the heuristic out here. I think I was wrong, since that makes the interface to this helper method too complicated. We should rename GetMaximumAnimatedScale to ComputeScaleToUseForAnimation and put the heuristic in that function, and have it return nothing but the scale.

::: layout/base/nsLayoutUtils.h
@@ +1720,2 @@
>     */
> +  static void GetMaximumAnimatedScaleAndLength(nsIContent* aContent, gfxSize& aMaxScaleOut, float& aMaxAnimLengthOut);

Keep returning the scale as the function result.
Ok, that's really interesting. Looks like it helps a lot with large allocations, but barely makes a difference when it's only the size of the screen.

This version attempts to re-use the cached content too, and so should avoid the render target change entirely. Hacked together, so I bet there's bug. Seems to work for this case though (tested on desktop).

And I totally agree with your concerns about memory usage. That's probably the harder part of doing this, I just want to get an idea of what the potential wins are before sinking too much time into it.
Flags: needinfo?(hshih)
We can always purge from the OOM hook.
Bonus points if we use mwu's new volatile cache stuff that the kernel disposes of automagically.
Attached patch only_scale_1x.patch v3 (obsolete) — Splinter Review
Move scale logic into nsLayoutUtils::ComputeSuitableScaleForAnimation().

---
I like to put the scale logic into nsLayoutUtils.cpp. It's easy to use.

I prefer use 1x if the scale is bigger than 1. People may be confused with the different quality result.

Or we can add a new css property for the explicit laye size.

css example:
transform:
  scale(2);
  scale-base(1,1); /* use (1,1) size to scale up */
Attachment #8369428 - Attachment is obsolete: true
Attachment #8369428 - Flags: feedback?(roc)
(In reply to Jerry Shih[:jerry] from comment #99)
> I prefer use 1x if the scale is bigger than 1. People may be confused with
> the different quality result.

I don't understand this point. Why would people be confused when they get better quality?
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #100)
> (In reply to Jerry Shih[:jerry] from comment #99)
> > I prefer use 1x if the scale is bigger than 1. People may be confused with
> > the different quality result.
> 
> I don't understand this point. Why would people be confused when they get
> better quality?

a: scale(1) to scale(3) -> only use 1x buffer size
b: scale(2) to scale(3) -> use 4x buffer size

People may be confused why we get better quality when we use b.
Talked to Jerry in the office and figured out what we want to do here.

Can we put this bug behind a pref and only enable it in mobile (Android and FxOS)? IMHO the current Gecko behavior provide better experience on Desktop, compare to Safari/Chrome.
(In reply to Matt Woodrow (:mattwoodrow) from comment #96) 
> Ok, that's really interesting. Looks like it helps a lot with large
> allocations, but barely makes a difference when it's only the size of the
> screen.
> 
> This version attempts to re-use the cached content too, and so should avoid
> the render target change entirely. Hacked together, so I bet there's bug.
> Seems to work for this case though (tested on desktop).


We get large boost with attachment 8369796 [details] [diff] [review].

Device: helix(800x480)
1. with 4x buffer size
CompositeInTransaction:	avg:20.0775400391 ms, max:48.448 ms, min:16.353 ms

2. with 1x buffer size
CompositeInTransaction:	avg:17.355199219 ms, max:20.603 ms, min:16.508 ms


In https://bugzilla.mozilla.org/attachment.cgi?id=8369796&action=diff#a/gfx/layers/composite/ContainerLayerComposite.cpp_sec3 , if we can make sure that the container's dirty region is empty, we can just use the composed intermediate surface. Is my thought right for this patch?
Status: NEW → ASSIGNED
Flags: needinfo?(hshih) → needinfo?(matt.woodrow)
(In reply to Jerry Shih[:jerry] from comment #101)
> (In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment
> #100)
> > (In reply to Jerry Shih[:jerry] from comment #99)
> > > I prefer use 1x if the scale is bigger than 1. People may be confused with
> > > the different quality result.
> > 
> > I don't understand this point. Why would people be confused when they get
> > better quality?
> 
> a: scale(1) to scale(3) -> only use 1x buffer size
> b: scale(2) to scale(3) -> use 4x buffer size
> 
> People may be confused why we get better quality when we use b.

OK. But I think it's usually not a problem when people get better results when they expect. It's usually only a problem when they get worse results than they expect. So I think this is OK.

(In reply to Tim Guan-tin Chien [:timdream] (MoCo-TPE) (please ni?) from comment #102)
> Talked to Jerry in the office and figured out what we want to do here.
> 
> Can we put this bug behind a pref and only enable it in mobile (Android and
> FxOS)? IMHO the current Gecko behavior provide better experience on Desktop,
> compare to Safari/Chrome.

I thought about that but I'd rather be consistent on desktop vs mobile where possible, and here I think it is possible.
Comment on attachment 8368397 [details]
translateZ.html

It is not easy to convert a scale size into "perspective(a) translateZ(b)."
I think the scale size should be done by gecko.
Attachment #8368397 - Flags: feedback-
Attachment #8368397 - Flags: feedback?(hshih)
Attached patch scale.patch v4Splinter Review
Update patch for comment 95.

The scale value will be:
If the minimum scale >= 1.0f, use it; if the maximum <= 1.0f, use it; otherwise use 1.0f.

---
Do we have another idea for scale value choosing?
Attachment #8370193 - Attachment is obsolete: true
Attachment #8370193 - Flags: feedback?(roc)
Attachment #8372391 - Flags: review?(roc)
(In reply to Jerry Shih[:jerry] from comment #106)
> Created attachment 8372391 [details] [diff] [review]
> scale.patch v4
> 
> Update patch for comment 95.
> 
> The scale value will be:
> If the minimum scale >= 1.0f, use it; if the maximum <= 1.0f, use it;
> otherwise use 1.0f.
> 
> ---
> Do we have another idea for scale value choosing?

Move to bug 972310.
Has there been any further progress or resolution for this bug?
Keywords: perf
Priority: -- → P2
Whiteboard: [c=progress p= s= u=]
See Also: → 1016559
I'm working on project silk now. And I will go back if I finish that works.
We have bug 972310 to improve the fade out animation at lockscreen. What's your lockscreen performance bottleneck?
Assignee: hshih → nobody
I am more concerned as it relates to bug 1016559, but do not have any particular performance bottleneck of my own.
Please record a profile and attach. No need
to guess here.
Blocks: 1016559
I am actually lost track of what's being fixed here, let along it's relation to bug 1016559.

The latest lock screen FPS regression was investigated in bug 1018592, and it was caused by bug 1028216. Not sure what more to fix...

Jerry, as you are the last person working on this bug, would you update the summary so it's more descriptive?
Flags: needinfo?(hshih)
No longer blocks: 1016559
(In reply to Tim Guan-tin Chien [:timdream] (MoCo-TPE) (please ni?) from comment #112)
> I am actually lost track of what's being fixed here, let along it's relation
> to bug 1016559.
> 
> The latest lock screen FPS regression was investigated in bug 1018592, and
> it was caused by bug 1028216. Not sure what more to fix...
> 
> Jerry, as you are the last person working on this bug, would you update the
> summary so it's more descriptive?

Lockscreen fade out animation need to draw 4x screen size buffer by cpu before bug 972310, so we got a bad performance.
With bug 972310, if we have scale animation from 1x size to 4x, we just draw the 1x size buffer and scale to 4x using gpu.
Flags: needinfo?(hshih) → needinfo?(timdream)
(By the way, I've discussed this with Tim and Jerry, that we need a FPS daily recorder to make graphic regressions like this can be pin down more easily. Jerry has proposed that this requires a new API to let test program can retrieve FPS.)
Right now we have bug 972310, so the remaining problem in this bug is how to improve the performance when we use IntermediateSurface.
Change the title.
Summary: lockscreen fade-out animation is still < 60fps → Improve the performance when we use IntermediateSurface
Flags: needinfo?(timdream)
mchang worked on what comment 114 describes
Flags: needinfo?(matt.woodrow)
Note that I have a patch to retain intermediate surface in bug 1087530. We may want to do the work there and depend on that since I've done my analysis and I'm seeing extra flushes there.
Depends on: 1087530
Depends on: 1092360
add tracking flag for backlog.
tracking-b2g: --- → +
blocking-b2g: backlog → ---
No assignee, updating the status.
Status: ASSIGNED → NEW
No assignee, updating the status.
No assignee, updating the status.
You need to log in before you can comment on or make changes to this bug.