Closed Bug 920921 Opened 6 years ago Closed 5 years ago

[Flatfish]: Flatfish has bad performance on Homescreen swiping

Categories

(Core :: Graphics, defect, P3)

ARM
Gonk (Firefox OS)
defect

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: vliu, Unassigned)

References

(Depends on 1 open bug)

Details

(Keywords: perf, Whiteboard: [c=handeye p= s= u=flatfish] [TPE_GFX, flatfishRun1, flatfish only][flatfish][TCP=performance])

Attachments

(13 files, 1 obsolete file)

This issue is opened because we got bad performance when wiping in Homescreen.
Blocks: flatfish
By the low level performance profiling(with IMG's PVRTune tool), we can see that there's no CPU or GPU bound in this case. The lagging especially happens when flipping pages. Then both CPU and GPU are not so busy. But FPS is low, basically screen is stuck. And then GPU loading increased with FPS after being stuck for a short time. Profiling of composition time looks reasonable. I think maybe there's synchronization issue somewhere.
Whiteboard: [TPE_GFX]
Attached file trace_nexus4.html
Can be opened by Chrome browser only.
Attached file trace_flatfish.html
Can be opened by Chrome browser only.
Comparing with attachment 811879 [details] and 811881, we can see that HomeScreen process in flatfish takes more than 17ms while the one in nexus-4 takes less than 6ms.
Watching the CPU frequency, we found the frequency of nexus-4 will scale to 1.5G while the one of flatfish will scale to 1G. Only CPU 0~1 are enabled in both devices.

watch -n 0.3 "adb shell cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq"
Just to fix the max. scaling frequency for Nexus-4.

root@android:/sys/devices/system/cpu/cpu0/cpufreq # echo 1000000 > scaling_max

It got the same performance issue.
As the messages I observed last week, I can list two differences between nexus-4 and flatfish.

1.Gralloc_module_unlock issue on Flatfish device. This case happens when wiping the Homescreen to another page. Nexus-4 doesn't have this issue. 
  E/IMGSRV  ( 1764): :0: gralloc_module_unlock: Buffer is already unlocked
  W/GraphicBufferMapper( 1764): unlock(...) failed -22 (Invalid argument)

2. Adding the log message in the entry of nsWindow::GetLayerManager(...) in nsWindow.cpp, I found it keep dumping this entry point message even I didn't touch the Homescreen. Nexus-4 doesn't have this issue.

For the above, using Gecko profiling tool to narrow down in Gecko is the next move.
Here is the statistics got from systrace(during 5s period). The top 3 expensive processes are as below and only 2 CPUs are launched as sliding Homescreen. It's obvious that Homescreen is the bottleneck.

We also profiled the same scenario on Android OS. 4 CPUs are launched and the UE is much smoother than FxOS. 

Is FxOS's Homescreen multi-threading ? Do we have to think about this problem ? We can't expect CPUs we'll meet in the future are powerful as Nexus-4.

Slices:
Homescreen	2976.6 ms	1823 occurrences
Compositor	725.553 ms	1350 occurrences
b2g	        338.346 ms	558 occurrences

*Totals	4610.006 ms	9539 occurrences
 	 	 
Selection start	0 ms	
Selection extent	4999.328 ms
Here is the systrace profiling for FlatFish with Android OS. As mentioned, both launcher(Homescreen) and SurfaceFlinger(Compositor) are multi-threading and 4 CPUs are launched.
Due to this performance issue is so critical, with above information, please give advice who would be the suitable owner for this issue.
Flags: needinfo?(pchang)
Peter, dose b2g enable skia_gl now?
It might reduce the rendering time when homescreen changes to new page.
(In reply to Jerry Shih[:jerry] from comment #11)
> Peter, dose b2g enable skia_gl now?
> It might reduce the rendering time when homescreen changes to new page.

Sorry, skia-gl is only for canvas. It's not our use case.
> Here is the statistics got from systrace(during 5s period). The top 3
> expensive processes are as below and only 2 CPUs are launched as sliding
> Homescreen. It's obvious that Homescreen is the bottleneck.
> 
> We also profiled the same scenario on Android OS. 4 CPUs are launched and
> the UE is much smoother than FxOS. 
> 
Why 4 CPUs ran on Android but not FxOS? How about the performance when disable HWUI on Android?

> Is FxOS's Homescreen multi-threading ? Do we have to think about this
> problem ? We can't expect CPUs we'll meet in the future are powerful as
> Nexus-4.
No, it is single thread and uses cairo as software rendering.

Could we get more narrow profiling result, like spent time inside cairo?
Is it possible that it is memory bound issue?

> 
> Slices:
> Homescreen	2976.6 ms	1823 occurrences
> Compositor	725.553 ms	1350 occurrences
> b2g	        338.346 ms	558 occurrences
> 
> *Totals	4610.006 ms	9539 occurrences
>  	 	 
> Selection start	0 ms	
> Selection extent	4999.328 ms

(In reply to vlin from comment #10)
> Due to this performance issue is so critical, with above information, please
> give advice who would be the suitable owner for this issue.
I would prefer to have more detail profiling result, not just show Homescreen eats lots of CPU resource.

(In reply to Jerry Shih[:jerry] from comment #11)
> Peter, dose b2g enable skia_gl now?
> It might reduce the rendering time when homescreen changes to new page.

b2g only enabled skia_gl for 2d canvas now.(In reply to vlin from comment #8)
Flags: needinfo?(pchang)
Whiteboard: [TPE_GFX] → [TPE_GFX], flatfishRun1
Running Android OS with HWUI(GL rendering).
Running Android OS without HWUI(GL rendering).
This issue is supposed to be CPU bound. We disabled GL-rendering in Android OS for experiment. The display FPS then significantly drop down from 60 to less than 10.

According to attachment 813422 [details] and attachment 813423 [details], we can see launcher(Homescreen) takes per performTraversals(frame) more than 100ms without GL-rendering while it takes less than 16ms with GL-rendering. Furthermore, the bottleneck of performTraversals is just the "draw function". Android's launcher is multi-threading, but that seems doesn't help a bit of performance.

We also watched this issue by "adb shell top".
Both HomeScreen process(in FxOS) and Launcher process(disable GL-rendering in Android) will use up the resource of CPU(1 of 4).

So the critical point is GL-rendering, not multi-threading as mentioned earlier.
It's a must for flatfish. Marked as koi+ according to triage result.
blocking-b2g: --- → koi+
Whiteboard: [TPE_GFX], flatfishRun1 → [TPE_GFX, flatfishRun1, flatfish only]
It's so lucky we just met Bas Schouten in the Summit and he had some opinions about this performance issue. There's still room for graphic stack to improve.
Flags: needinfo?(bas)
(In reply to vlin from comment #18)
> It's so lucky we just met Bas Schouten in the Summit and he had some
> opinions about this performance issue. There's still room for graphic stack
> to improve.

Does Bas have any information or possible root cause that we can take a look?

I've just walked through all systrace data you have attached.  It seems systrace does not provide vital information on FirefoxOS, except CPU activity. I'll try to do some time measurement using systrace and will keep you informed if I have any progress.  Peter's suggestion of Cairo render might be a good start point to check.
Can you please attach a gecko-level profile? (profile.sh) Please link a cleopatra profile and we can take it from there. Thanks.
(In reply to Andreas Gal :gal from comment #20)
> Can you please attach a gecko-level profile? (profile.sh) Please link a
> cleopatra profile and we can take it from there. Thanks.

There is another bug to track gecko profiler launch problem on flatfish, bug 922548.
(In reply to Terry Li from comment #19)
> (In reply to vlin from comment #18)
> > It's so lucky we just met Bas Schouten in the Summit and he had some
> > opinions about this performance issue. There's still room for graphic stack
> > to improve.
> 
> Does Bas have any information or possible root cause that we can take a look?
> 
> I've just walked through all systrace data you have attached.  It seems
> systrace does not provide vital information on FirefoxOS, except CPU
> activity. I'll try to do some time measurement using systrace and will keep
> you informed if I have any progress.  Peter's suggestion of Cairo render
> might be a good start point to check.

Bas mentioned there might be overdrawing issue. If so, I think it's not about Cairo itself, but about how graphic stack utilizes Cairo. For example, graphic stack should calculate the overlapping region and prevent the lower clip from drawing.

We never probe gecko by Android Systrace APIs so there's just CPU activity seen in Systrace so far.
If you would like to break down this problem, you can follow the step to enable Android Systrace on FxOS and probe gecko by ATRACE_INIT and ATRACE_CALL API.
https://docs.google.com/a/mozilla.com/document/d/16NKpqCLwQH7pBUyyymnWl94iom2YRDZf7qh2FJpn_UM/edit?usp=sharing
I think we should spend our energy on fixing the gecko profiler first. If there is some memory corruption going on, arbitrary things can be wrong here. We have various bugs open to improve overdraw and a few other things (VBO use). So lets fix the gecko profiler and then we can resume this bug. Poking around here without the right tooling will not yield quick results.
My systrace result shows that Vlin is right about CPU bound. Homescreen takes too much time on ThebesLayer drawing and software DrawBufferWithRotation.

From this data, we can also see that there is double-gralloc-unlock issue in homescreen thread. I think that is why we always see error message of gralloc unlock.
(In reply to vlin from comment #22)
> (In reply to Terry Li from comment #19)
> > (In reply to vlin from comment #18)
> > > It's so lucky we just met Bas Schouten in the Summit and he had some
> > > opinions about this performance issue. There's still room for graphic stack
> > > to improve.
> > 
> > Does Bas have any information or possible root cause that we can take a look?
> > 
> > I've just walked through all systrace data you have attached.  It seems
> > systrace does not provide vital information on FirefoxOS, except CPU
> > activity. I'll try to do some time measurement using systrace and will keep
> > you informed if I have any progress.  Peter's suggestion of Cairo render
> > might be a good start point to check.
> 
> Bas mentioned there might be overdrawing issue. If so, I think it's not
> about Cairo itself, but about how graphic stack utilizes Cairo. For example,
> graphic stack should calculate the overlapping region and prevent the lower
> clip from drawing.
> 
> We never probe gecko by Android Systrace APIs so there's just CPU activity
> seen in Systrace so far.
> If you would like to break down this problem, you can follow the step to
> enable Android Systrace on FxOS and probe gecko by ATRACE_INIT and
> ATRACE_CALL API.
> https://docs.google.com/a/mozilla.com/document/d/
> 16NKpqCLwQH7pBUyyymnWl94iom2YRDZf7qh2FJpn_UM/edit?usp=sharing

The google doc needs permission.
This seems to be the same issue Sotaro fixed. Are you using the latest builds for this?
(In reply to Andreas Gal :gal from comment #26)
> This seems to be the same issue Sotaro fixed. Are you using the latest
> builds for this?
Is there any related bug id about Sotaro fixes we can refer to?
Keywords: perf
(In reply to Andreas Gal :gal from comment #26)
> This seems to be the same issue Sotaro fixed. Are you using the latest
> builds for this?

Do you mean bug 912134 which fixed by Sotaro?
Yeah probably one of that series of bugs. Are you on a latest build?
(In reply to Andreas Gal :gal from comment #29)
> Yeah probably one of that series of bugs. Are you on a latest build?

I think they do have the latest code but still have this performance problem.


In comment 5, I locked the CPU freq at 1GHz in my nexus 4 and also found the poor performance during home scrolling(with repaint events).

But I got problem to launch gecko profile on nexus 4. I will sync to latest code base and try again.

peter@peter-desktop:~$ adb shell "echo 1000000 > sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq"
peter@peter-desktop:~$ adb shell "echo 918000 > sys/devices/system/cpu/cpu0/cpufreq/scaling_min_freq"
peter@peter-desktop:~$ watch -n 0.3 "adb shell cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq"
I said I believed this could be the overdrawing issues BenWa was looking into :).
Flags: needinfo?(bas)
Yeah, on the high resolution screen overdraw is definitely an issue. I have measured that myself on Flatfish. What we need right now is a bit focus. We need to get the profiler to work. Until then we are tapping in the dark here. So please focus all energy on making the profiler work and finding out why it dies with memory corruption. Until that is fixed, there is nothing to be done here. Any system with active memory corruption is so unpredictable and non-deterministic that this is all mostly pointless here until thats fixed.
Depends on: 922548
(In reply to Bas Schouten (:bas.schouten) from comment #31)
> I said I believed this could be the overdrawing issues BenWa was looking
> into :).
Flags: needinfo?(bas)
(In reply to Bas Schouten (:bas.schouten) from comment #31)
> I said I believed this could be the overdrawing issues BenWa was looking
> into :).

Is there any Bug ID for this ?
(In reply to Andreas Gal :gal from comment #32)
> Yeah, on the high resolution screen overdraw is definitely an issue. I have
> measured that myself on Flatfish. What we need right now is a bit focus. We
> need to get the profiler to work. Until then we are tapping in the dark
> here. So please focus all energy on making the profiler work and finding out
> why it dies with memory corruption. Until that is fixed, there is nothing to
> be done here. Any system with active memory corruption is so unpredictable
> and non-deterministic that this is all mostly pointless here until thats
> fixed.

We did try gecko profiler for a couple days. But it ... Bug 922548 was created to follow up gecko profiler issue. It seems only BenWa and Cervantes Yu(in Taipei Performance team) can help us so far. That's why we spent our time on using "Android profiler" in parallel.

Before "gecko profiler" is fixed, I think something is still worth to do and discuss.

I tried to get familiar with gecko profiler on Nexus-4 device. Then I thought there's no essential difference btw "gecko profiler" and "Android profiler". Their measurements are both time-based. "Gecko profiler" shows function calls in tree while "Android profiler" shows them in graphics. Comment 24 proofed that "Android profiler" can find the bottleneck in gecko. The only hard work is to put tracing APIs in where gecko profiler has probed.

On the other hand, Comment 6 and Comment 16 bring me some thoughts.
Once we met CPU not powerful enough, we got problem. 
Android's software rendering got the same problem with us. They didn't optimize it ?
(The real case is Android made HWUI and rendered Homescreen with it.)
Maybe we still have room to optimize software rendering. Do we think about supporting more GL-rendering cases faster ?
vlin, we should never do buffer rotation on the homescreen. Something is clearly wrong and we are repainting. Turn on paint flashing and try to diagnose that. All of this would be trivial to diagnose with the gecko profiler. I really think that we are wasting valuable engineering time here by trying to bang our heads against a problem without the right tool.
Can you please upload this to cleopatra and then link that? thanks
Based on comment 30, we may have similar problem on Nexus 4. Therefore, I tried to check this performance issue on Nexus 4.

Now I'm able to get the profiling result with Android JB 4.2 environment.

I modified gaia homescreen to keep switching pages (page 2<->page 3) and attachment 814830 [details] is the result of gecko profiler.
In attachment 814831 [details], I saw lots of CPU usage on graphic buffer allocation in content process.

I also added timer to dump the graphic buffer allocation time on content side.
And I did find the expensive cost for 720p buffer allocation on content side.

Right now we open another bug 924788 to avoid buffer allocation from gaia side, like homescreen always cache three pages (left, current, right pages).

Another idea is to add "graphic buffer recycle"(same size/same format) on content side, like just re-used previous non-used graphic buffer. I will work on the patch to see it works or not.

10-09 10:13:57.760  2388  2388 I Gecko   : time(406) takes 32.596ms w 768 h 1141
10-09 10:13:57.780  2388  2388 I Gecko   : buffer created 0x43e83880
10-09 10:13:57.780  2388  2388 I Gecko   : time(406) takes 17.213ms w 768 h 1141
10-09 10:13:58.260  2388  2388 I Gecko   : buffer created 0x43e83d00
10-09 10:13:58.260  2388  2388 I Gecko   : time(406) takes 33.359ms w 768 h 1141
10-09 10:13:58.280  2388  2388 I Gecko   : buffer created 0x43e83e00
10-09 10:13:58.280  2388  2388 I Gecko   : time(406) takes 16.481ms w 768 h 1141
10-09 10:13:58.771  2388  2388 I Gecko   : buffer created 0x43e83900
10-09 10:13:58.771  2388  2388 I Gecko   : time(406) takes 32.901ms w 768 h 1141
10-09 10:13:58.801  2388  2388 I Gecko   : buffer created 0x43e83980
10-09 10:13:58.801  2388  2388 I Gecko   : time(406) takes 18.892ms w 768 h 1141
10-09 10:13:59.261  2388  2388 I Gecko   : buffer created 0x43e83a00
10-09 10:13:59.261  2388  2388 I Gecko   : time(406) takes 36.563ms w 768 h 1141
10-09 10:13:59.281  2388  2388 I Gecko   : buffer created 0x43e83d00
10-09 10:13:59.281  2388  2388 I Gecko   : time(406) takes 15.626ms w 768 h 1141
10-09 10:13:59.762  2388  2388 I Gecko   : buffer created 0x43e83780
10-09 10:13:59.762  2388  2388 I Gecko   : time(406) takes 32.962ms w 768 h 1141
10-09 10:13:59.782  2388  2388 I Gecko   : buffer created 0x43e83880
10-09 10:13:59.782  2388  2388 I Gecko   : time(406) takes 12.544ms w 768 h 1141

https://mxr.mozilla.org/mozilla-central/source/gfx/layers/ipc/ShadowLayerUtilsGralloc.cpp#406
Peter, it is absolutely critical that you connect with someone who understands the rendering side here. There should be ZERO painting here, and ZERO buffer allocation, if you swipe left right and no page goes out of view completely. If you see something else, the actual bug is somewhere else than gralloc buffer allocation (though, the above is a bug as well, and likely a known one). Again, please, talk to Milan's team today.
(In reply to Andreas Gal :gal from comment #39)
> Can you please upload this to cleopatra and then link that? thanks

Upload the link.

http://people.mozilla.org/~bgirard/cleopatra/?customProfile=http://people.mozilla.org/~pchang/profile_2116_Homescreen.sym#(In reply to Andreas Gal :gal from comment #41)

> Peter, it is absolutely critical that you connect with someone who
> understands the rendering side here. There should be ZERO painting here, and
> ZERO buffer allocation, if you swipe left right and no page goes out of view
> completely. If you see something else, the actual bug is somewhere else than
> gralloc buffer allocation (though, the above is a bug as well, and likely a
> known one). Again, please, talk to Milan's team today.

Andres, I discussed with vlin about this performance issue before. 
We are testing the home scrolling (ex: scroll from page 1 to page 4) which includes several full screen repaint/new buffer allocation. And you will see lots of color changes when enables "flash repainted area" setting.

For home scrolling between two pages(no repaint/no buffer allocation), flatfish and nexus 4 could reach to 60 fps.
Ok, so this means the compositor is fine, and we are gated in the content process.

This is an issue sotaro is actively looking at and we have known since the work week. We have lock contention issues in the content process when allocating gralloc buffers. There is a very high chance that you are running into the same problem. Please, talk to Milan and the team today. We are duplicating work here.
I see SVG effects being applied during painting. Can someone please check why we are using SVG on the home screen?

Andreas
> This is an issue sotaro is actively looking at and we have known since the
> work week. We have lock contention issues in the content process when
> allocating gralloc buffers. There is a very high chance that you are running
> into the same problem. Please, talk to Milan and the team today. We are
> duplicating work here.

Sure, I will check with Milan today for the follow up.
style/app_name_mask.svg:<svg xmlns="http://www.w3.org/2000/svg" version="1.1">
style/app_name_mask.svg:</svg>
style/app_offline_filter.svg:<svg xmlns="http://www.w3.org/2000/svg">
style/app_offline_filter.svg:</svg>
style/app_tapped_filter.svg:<svg xmlns="http://www.w3.org/2000/svg">
style/app_tapped_filter.svg:</svg>
style/grid.css:	mask: url('app_name_mask.svg#fade_right_mask');
style/grid.css:  filter: url('app_tapped_filter.svg#blur');
style/homescreen.css:  filter: url('app_offline_filter.svg#grayscale');

Ok can someone please check when this was added? These kind of filters are comically slow to draw for us on mobile (we aren't offloading them onto the GPU yet).
(In reply to Andreas Gal :gal from comment #47)
> style/app_name_mask.svg:<svg xmlns="http://www.w3.org/2000/svg"
> version="1.1">
> style/app_name_mask.svg:</svg>
> style/app_offline_filter.svg:<svg xmlns="http://www.w3.org/2000/svg">
> style/app_offline_filter.svg:</svg>
> style/app_tapped_filter.svg:<svg xmlns="http://www.w3.org/2000/svg">
> style/app_tapped_filter.svg:</svg>
> style/grid.css:	mask: url('app_name_mask.svg#fade_right_mask');
> style/grid.css:  filter: url('app_tapped_filter.svg#blur');
> style/homescreen.css:  filter: url('app_offline_filter.svg#grayscale');
> 
> Ok can someone please check when this was added? These kind of filters are
> comically slow to draw for us on mobile (we aren't offloading them onto the
> GPU yet).

Loop Cristian to comment about homescreen SVG effect.
Flags: needinfo?(crdlc)
It was added 9 and 10 months ago (bug 805977 and bug 815152). The offline was added 4 months ago (bug 870419)
Flags: needinfo?(crdlc)
Any chance to use opacity effects instead? At higher resolutions this seems to hurt a lot.
* I don't know another mechanism to implement this bug 805977 about truncating app names.

https://bug805977.bugzilla.mozilla.org/attachment.cgi?id=692890

* The same happens to implement this visual (bug 815152)

https://bug815152.bugzilla.mozilla.org/attachment.cgi?id=699143

If the visual changes.. and an opacity effect is enough, it could be done easily or another effect

* I think that the last one (for offline apps) could be implemented with an opacity change without problems IMHO
(In reply to vlin from comment #34)
> (In reply to Bas Schouten (:bas.schouten) from comment #31)
> > I said I believed this could be the overdrawing issues BenWa was looking
> > into :).
> 
> Is there any Bug ID for this ?

The bug 921212 is trying to avoid allocating a gralloc buffer in certain situations because that operation is known to be slow (e.g., ~30ms)
Flags: needinfo?(bas)
See also bug 919610.
Depends on: 919610
It should never take 30ms, but yes we can avoid it. So two bugs here. Allocation taking 30ms. And allocation happening.

crdlc, can you talk to the UX peeps and see if we can substitute some opacity effects for the time being? SVG filter effects are comically slow for us, especially at high resolutions. We are working on this, but it will take a bit and its not something we can do for 1.2.
There is a third bug here. We should never ever hit buffer rotation. We should just pre-render whole planes. We need a separate bug for this and have to see what triggers the buffer rotation. So at least 4 bugs to fix here.

1. SVG filter effects use (gaia)
2. 30ms allocations
3. re-allocation due to unrotation (we have a bug for this)
4. buffer rotation being hit at all

Please file and link. At the airport with horrible wifi.
Depends on: 924942
Priority: -- → P1
Gregor,

Please find someone on the System FE team to work on this.

Thanks,
fxos-perf-triage
Flags: needinfo?(anygregor)
Whiteboard: [TPE_GFX, flatfishRun1, flatfish only] → [TPE_GFX, flatfishRun1, flatfish only] [c= p= s= u=] SystemsFE
(In reply to Mike Lee [:mlee] from comment #56)
> Gregor,
> 
> Please find someone on the System FE team to work on this.
> 

This seems gfx related. What do you want the systemsFE team to do here?
Flags: needinfo?(anygregor) → needinfo?(mlee)
Milan,

Not sure your team has access to Flatfish devices but please add this to your backlog and have someone look into this when possible.

Thanks.
Flags: needinfo?(mlee) → needinfo?(milan)
(In reply to Andreas Gal :gal from comment #55)
> There is a third bug here. We should never ever hit buffer rotation.

Agreed, this isn't a problem caused by 921212.

> We
> should just pre-render whole planes. We need a separate bug for this and
> have to see what triggers the buffer rotation. So at least 4 bugs to fix
> here.
> 
> 1. SVG filter effects use (gaia)
> 2. 30ms allocations
> 3. re-allocation due to unrotation (we have a bug for this)
> 4. buffer rotation being hit at all
> 
> Please file and link. At the airport with horrible wifi.

I think 3&4 is instead the layer/buffer changing size or hitting something causing canReuseBuffer to fail since like you said we shouldn't be hitting buffer unrotation:
http://mxr.mozilla.org/mozilla-central/source/gfx/layers/ThebesLayerBuffer.cpp#550

Let's get someone to confirm this now. I'll have some time next week to look at this if I can reproduce something similar on hamachi.
We're working with Peter in Taipei (who has device access) to do some profiling and see if we can answer some of these questions.
Flags: needinfo?(milan)
Depends on: 925616
(In reply to Benoit Girard (:BenWa) from comment #59)
> (In reply to Andreas Gal :gal from comment #55)
> > There is a third bug here. We should never ever hit buffer rotation.
> 
> Agreed, this isn't a problem caused by 921212.
> 
> > We
> > should just pre-render whole planes. We need a separate bug for this and
> > have to see what triggers the buffer rotation. So at least 4 bugs to fix
> > here.
> > 
> > 1. SVG filter effects use (gaia)
> > 2. 30ms allocations
> > 3. re-allocation due to unrotation (we have a bug for this)
> > 4. buffer rotation being hit at all
> > 
> > Please file and link. At the airport with horrible wifi.
> 
> I think 3&4 is instead the layer/buffer changing size or hitting something
> causing canReuseBuffer to fail since like you said we shouldn't be hitting
> buffer unrotation:
> http://mxr.mozilla.org/mozilla-central/source/gfx/layers/ThebesLayerBuffer.
> cpp#550
> 
> Let's get someone to confirm this now. I'll have some time next week to look
> at this if I can reproduce something similar on hamachi.

The following are the callstack for DrawBufferWithRotation calls when scrolling homescreen.
I think it is invoked by contentclient by default.
http://mxr.mozilla.org/mozilla-central/source/gfx/layers/client/ContentClient.cpp#550

In order not to misunderstand with other bufferrotation cases, we can add profiler log at the following line.

http://mxr.mozilla.org/mozilla-central/source/gfx/layers/client/ContentClient.cpp#539


#0  mozilla::layers::RotatedBuffer::DrawBufferWithRotation (this=0xbe80d2d0, aTarget=0x442c2100, aSource=mozilla::layers::RotatedBuffer::BUFFER_BLACK, aOpacity=1, aOperator=mozilla::gfx::OP_SOURCE, aMask=0x0, aMaskTransform=0x0)
    at /Volumes/ramdisk/b2g_central/gfx/layers/ThebesLayerBuffer.cpp:246
#1  0x4124fcca in mozilla::layers::ContentClientDoubleBuffered::UpdateDestinationFrom (this=0x404e34c0, aSource=..., aUpdateRegion=...) at /Volumes/ramdisk/b2g_central/gfx/layers/client/ContentClient.cpp:550
#2  0x4124fb58 in mozilla::layers::ContentClientDoubleBuffered::SyncFrontBufferToBackBuffer (this=0x404e34c0) at /Volumes/ramdisk/b2g_central/gfx/layers/client/ContentClient.cpp:520
This is my systrace result so far. Allow me to have some explanation:

1. It looks like the No.1 bottleneck is SyncFrontBufferToBackBuffer.
2. SyncFrontBufferToBackBuffer eventually calls neon_composite_over_8888_8888, which uses "over" operator and will do alpha blending calculation in per-pixel basis.
3. From the timeline, after SyncFrontBufferToBackBuffer on a 800x1260 region, the PainBuffer() does a 800x1185 pattern fill on the same buffer, which makes a single PaintThebes() call cost over 500ms.
4. Around the position of 0.98sec to 1.5sec, 4 PaintThebes() happens in one EndTransactionInternal() and updated area by SyncFrontBufferToBackBuffer are very large.  If one EndTransactionInternal() call is supposed to generate one frame, there will be almost 4 times of full screen pixel drawing on a frame.

Following are some possible optimizations I have come up with. I believe there will be better ideas proposed by other graphics-pros.
a) Use memcpy(or NEON optimized memcpy) to implement SyncFrontBufferToBackBuffer.
b) In PaintThebes(), calculate painting region before front-back buffer sync to minimize pixel update region.
c) Find out why 4 PaintThebes() have to be called within one EndTransactionInternal().
d) Evaluate the possibility to use a larger buffer to make page scrolling or flipping more efficient. For example, use one 1600x1280 buffer for 2-page-scrolling instead of 2 800x1280 buffers.
e) Link the buffer to an EGLImage and replace some of operations by GPU.
Attachment #814045 - Attachment is obsolete: true
Since one EndTransactionInternal() calling cost 520ms (1.5s-0.98s), and a PaintThebes cost 500ms, how can neon_composite_over_8888_8888() be biggest bootleneck?  Could you explain it?
(In reply to Thinker Li [:sinker] from comment #63)
> Since one EndTransactionInternal() calling cost 520ms (1.5s-0.98s), and a
> PaintThebes cost 500ms, how can neon_composite_over_8888_8888() be biggest
> bootleneck?  Could you explain it?

Forget it! I just misunderstand it.
(In reply to Terry Li from comment #62)
> 3. From the timeline, after SyncFrontBufferToBackBuffer on a 800x1260
> region, the PainBuffer() does a 800x1185 pattern fill on the same buffer,
> which makes a single PaintThebes() call cost over 500ms.
Thanks Thinker, I did find an error here. The case of item 3 happens on the position 0.393sec - 0.648sec, which is 254ms, not 500ms.
Whiteboard: [TPE_GFX, flatfishRun1, flatfish only] [c= p= s= u=] SystemsFE → [TPE_GFX, flatfishRun1, flatfish only] [c= p= s= u=1.2] SystemsFE
Whiteboard: [TPE_GFX, flatfishRun1, flatfish only] [c= p= s= u=1.2] SystemsFE → [TPE_GFX, flatfishRun1, flatfish only] [c= p= s= u=1.2]
Target Milestone: --- → 1.2 C3(Oct25)
Depends on: 921212
(In reply to Andreas Gal :gal from comment #55)
> There is a third bug here. We should never ever hit buffer rotation. We
> should just pre-render whole planes. We need a separate bug for this and
> have to see what triggers the buffer rotation. So at least 4 bugs to fix
> here.
> 
> 1. SVG filter effects use (gaia)
> 2. 30ms allocations
> 3. re-allocation due to unrotation (we have a bug for this)
> 4. buffer rotation being hit at all
> 
> Please file and link. At the airport with horrible wifi.


With my local patch of bug 925616, I could avoid item 2 (30ms allocation) when scrolling homoescreen among page 1/2/3.

And the following are gecko profiling result with my local patch of bug 925616.
There are two most CPU-bound sections, SVG effect and PaintThebesLayer.

http://people.mozilla.org/~bgirard/cleopatra/?customProfile=http://people.mozilla.org/~pchang/profile_homescreen1016.sym#

[profiling summary]
CPU
25.7%	nsDisplayList::PaintRoot
	->...
	->gfx::DrawThebesLayer
	->PaintInactiveLayer
12.5%	->nsDisplaySVGEffects::PaintAsLayer


10.3%	ClientThebesLayer::PaintThebes
	-> ContentClientDoubleBufferd::UpdateDestinationFrom
10.3%	-> RotateBuffer::DrawBufferWIthRoatation
Assigning to Peter, as he seems most able to resolve this per the last comment.
Assignee: nobody → pchang
Note that fixing bug 921212 will have us stop allocating gralloc for unrotate.
No. The homescreen should never ever unrotate. Any buffer rotation on the home screen is a bug and we should investigate the cause.
Exactly - from comment 55:

> 1. SVG filter effects use (gaia)
> 2. 30ms allocations
> 3. re-allocation due to unrotation (we have a bug for this)
> 4. buffer rotation being hit at all

Peter's reporting having fixed #2, BenWa+Bas fixed #3 (bug 921212), and the "we should not rotate" is #4 (bug 927572 - I was sure this was already entered, but could not find the bug and put one in myself.)
Hi, Andreas & Milan, I think the buffer is not really rotated.  The reason why DrawBufferWithRotation() is always called is that SyncFrontBufferToBackBuffer() uses RotatedBuffer for pixel-copying purposes.
(In reply to Terry Li from comment #62)
> a) Use memcpy(or NEON optimized memcpy) to implement
> SyncFrontBufferToBackBuffer.
I just tried to implement a memcpy way to replace UpdateDestinationFrom() in SyncFrontBufferToBackBuffer(). The execution time of SyncFrontBufferToBackBuffer() become -30% less(ex. 142ms -> 97ms). I use the same way bug 921212 did to expose the raw data of front and back buffer. And I have some questions, hope someone can give me a hint.

1) What are white buffer and black buffer? How to handle them correctly?
2) Why not sync front buffer to back buffer in SwapBuffers()?
3) Is it possible to separate 2D Paint into tiles and paint them in different threads?
Attached image svg_trace.png
I generated the system trace with my patch of bug 925616 on nexus 4.
And attached the systrace about the detail break down of content renderering.

I just list the time break down of Tick period.
a. DoProcessRestyle 35.4 ms
b. ProcessDisplayItem 12 ms
c. DrawThebesLayer 89.913 ms (contain 6 SVG PaintAsLayer)

For the "PaintAsLayer" keyword, it is traced by the following line.
http://mxr.mozilla.org/mozilla-central/source/layout/base/nsDisplayList.cpp#4731

P.S. I also noticed systrace had some overhead during my profiling.

I had tried bug 924942 to remove SVG for homescreen but DrawThebesLayer didn't have too much improvement.

I have a question here. Do we really make sure we could generate this content within 16 ms on Nexus 4 with 720p resolution 1GHz? If not, it is better to cache the content on ap side for home scrolling.
Peter, we should never ever repaint any content that doesn't come in anew. So the next homescreen page that comes into view we paint, but everything thats already in motion should never paint.

Great find with the sync front to back. We should definitely optimize that. I wonder whether we can do it async. Can we insert a shader into the pipeline to copy this instead of using the CPU?

We want multi-threaded painting but for that we need OMTP first. Its in the works but not near term.
(In reply to Milan Sreckovic [:milan] on PTO 17-18 Oct from comment #70)
> Exactly - from comment 55:
> 
> > 1. SVG filter effects use (gaia)
> > 2. 30ms allocations
> > 3. re-allocation due to unrotation (we have a bug for this)
> > 4. buffer rotation being hit at all
> 
> Peter's reporting having fixed #2, BenWa+Bas fixed #3 (bug 921212), and the
> "we should not rotate" is #4 (bug 927572 - I was sure this was already
> entered, but could not find the bug and put one in myself.)

We've tried those patches locally and here are the status.
Just synced with Peter.

1. SVG filter effects use (gaia)
No significant improvement. 
Bug Bug 924942

2. 30ms allocations
   2.1 re-allocation
   Improved.(With known issue: garbage)
   Bug 925616

   2.2 long latency (malloc randomly blocked by Composition)
   No Bug.

3. re-allocation due to unrotation (we have a bug for this)
No significant improvement for Homescreen.
Bug Bug 921212

4. buffer rotation being hit at all
Actually no buffer rotation happened.

Others

a. Modify visibility for homescreen (gaia)
No improvement. (still re-painting)
Looking into it. (Following up Comment 74)

BTW.
Here is the typical pattern of this performance issue.
https://bugzilla.mozilla.org/attachment.cgi?id=818304
We have 3 bottlenecks here.
1. re-styling.
2. memory allocation.
3. re-painting.
Attached file switch_to_systrace.zip
@Terry, Would you please apply this patch for profiling ? Then we all have the same probing points.
(In reply to Vincent Lin[:vincentlin] from comment #76)
> @Terry, Would you please apply this patch for profiling ? Then we all have
> the same probing points.
Vincent, thanks.
(In reply to Andreas Gal :gal from comment #74)
> Peter, we should never ever repaint any content that doesn't come in anew.
> So the next homescreen page that comes into view we paint, but everything
> thats already in motion should never paint.
> 
I think here I met the first painting performance problem during homescreen scrolling(from page 1 to page 3).
I'm checking any difference if I enable layer.force-active flag.

> Great find with the sync front to back. We should definitely optimize that.
> I wonder whether we can do it async. Can we insert a shader into the
> pipeline to copy this instead of using the CPU?
> 
I think bug 928123 is trying to improve syncfronttoback performance.

> We want multi-threaded painting but for that we need OMTP first. Its in the
> works but not near term.
I test the bug 928123 on my nexus4(cpu fix 1G).
It improves SyncFrontBufferToBackBuffer() peformance.
Depends on: 928123
(In reply to peter chang[:pchang] from comment #73)
> Created attachment 818391 [details]
> svg_trace.png
> 
> I generated the system trace with my patch of bug 925616 on nexus 4.
> And attached the systrace about the detail break down of content renderering.
> 
> I just list the time break down of Tick period.
> a. DoProcessRestyle 35.4 ms
> b. ProcessDisplayItem 12 ms
> c. DrawThebesLayer 89.913 ms (contain 6 SVG PaintAsLayer)

Inside DrawThebesLayer, I saw "PaintOneShadow" called several times(depends on how many icons on one pages). 
If I disabled it in the following line, the DrawThebesLayer function could be reduced from 89 ms to 25 ms based on systrace file.

http://mxr.mozilla.org/mozilla-central/source/layout/generic/nsTextFrame.cpp#5859

Check the gecko implementation for shadow text drawing.
You can add a 'profiler_label_printf' and append the x/y position of the call. This should break down the cost per icons on the page if you wanted to confirm that theory.
(In reply to Benoit Girard (:BenWa) from comment #81)
> You can add a 'profiler_label_printf' and append the x/y position of the
> call. This should break down the cost per icons on the page if you wanted to
> confirm that theory.

Good idea, I will update the gecko profiler data later.
(In reply to Vincent Lin[:vincentlin] from comment #75)
> Here is the typical pattern of this performance issue.
> https://bugzilla.mozilla.org/attachment.cgi?id=818304
> We have 3 bottlenecks here.
> 1. re-styling.
> 2. memory allocation.
> 3. re-painting.
#2 and #3 is dependent - the root cause is the same, layer be destroyed, frame need to recreate a layer(#2) and rendering onto that layer(#3)
(In reply to C.J. Ku[:CJKu] from comment #83)
> (In reply to Vincent Lin[:vincentlin] from comment #75)
> > Here is the typical pattern of this performance issue.
> > https://bugzilla.mozilla.org/attachment.cgi?id=818304
> > We have 3 bottlenecks here.
> > 1. re-styling.
> > 2. memory allocation.
> > 3. re-painting.
> #2 and #3 is dependent - the root cause is the same, layer be destroyed,
> frame need to recreate a layer(#2) and rendering onto that layer(#3)

Well, it's right if there's a solution to prevent layer from being destroyed.
We had an experiment on Homescreen Gaia side(Others-a in Comment 75), but it's not effective.

So far.
Bug 925616(Ongoing) is trying to reuse buffer, but it still needs to clean garbage and repaint. (#2 only)
Bug 928123(Committed) reduced the time consumption of fillRect. (#3 only)

If any information wrong or lost, please kindly let me know.
(In reply to peter chang[:pchang] from comment #82)
> (In reply to Benoit Girard (:BenWa) from comment #81)
> > You can add a 'profiler_label_printf' and append the x/y position of the
> > call. This should break down the cost per icons on the page if you wanted to
> > confirm that theory.
> 
> Good idea, I will update the gecko profiler data later.

After adding the x/y position with "PaintOneShadow" lable, the following link showed heavy cost in PaintOneShadow.

http://people.mozilla.org/~bgirard/cleopatra/?customProfile=http://people.mozilla.org/~pchang/profile_920921.sym


I also dump the x/y position when I scroll to a page that contains 16 icons(4x4).

I/Gecko   ( 7391): PaintOneShadow x 1560.000000 y 4500.000000
I/Gecko   ( 7391): PaintOneShadow x 7447.000000 y 4500.000000
I/Gecko   ( 7391): PaintOneShadow x 12825.000000 y 4500.000000
I/Gecko   ( 7391): PaintOneShadow x 18810.000000 y 4500.000000   <--first row
I/Gecko   ( 7391): PaintOneShadow x 787.000000 y 10260.000000
I/Gecko   ( 7391): PaintOneShadow x 7125.000000 y 10260.000000
I/Gecko   ( 7391): PaintOneShadow x 13492.000000 y 10260.000000
I/Gecko   ( 7391): PaintOneShadow x 19117.000000 y 10260.000000  <--Second row
I/Gecko   ( 7391): PaintOneShadow x 1830.000000 y 16020.000000
I/Gecko   ( 7391): PaintOneShadow x 7642.000000 y 16020.000000
I/Gecko   ( 7391): PaintOneShadow x 13447.000000 y 16020.000000
I/Gecko   ( 7391): PaintOneShadow x 18487.000000 y 16020.000000  <--Third row
I/Gecko   ( 7391): PaintOneShadow x 660.000000 y 21780.000000
I/Gecko   ( 7391): PaintOneShadow x 6420.000000 y 21780.000000
I/Gecko   ( 7391): PaintOneShadow x 12202.000000 y 21780.000000
I/Gecko   ( 7391): PaintOneShadow x 17453.000000 y 21780.000000  <--Forth row
Another relative issue for scrolling is CSS restyling
Bug 862276: improve retyle performance while panning.
Depends on: 930587
Depends on: 862276
Duplicate of this bug: 919610
Depends on: 930980
Depends on: 931082
Status: NEW → ASSIGNED
Target Milestone: 1.2 C3(Oct25) → 1.2 C4(Nov8)
Depends on: 931206
IMO this isn't flatfish only. It's just much worse on the flatfish. The bugs that depend on here are blocking Homescreen from being perfectly smooth.
Depends on: 931262
(In reply to Benoit Girard (:BenWa) from comment #88)
> IMO this isn't flatfish only. It's just much worse on the flatfish. The bugs
> that depend on here are blocking Homescreen from being perfectly smooth.

I understand it's unacceptable on Flatfish - are you saying it's also unacceptable on phone sized screens?  I'm not sure if "perfectly smooth" is required or just a good thing.  I want to avoid people *automatically* koi+ all the bugs that are in the "depends" list, because that's a serious list...
Updating Target Milestone for FxOS Perf koi+'s.
For the ones who concerned about touch performance, this profiling just proofs there's no performance issue with it in software-wise. (FlatFish do has touch problem in hardware-wise.)

The profiling is at the beginning of touching and moving Homescreen.

B2G/InputReader thread will keep pooling EventHub within 15~20ms while B2G/b2g thread will send touch event to Homescreen periodically around 15~20ms. So there is no problem with touch event traffic.

The main problem is still on restyle and painting. Homescreen may skip touch events due to it's dealing with the first one.
Flatfish alone should not be a koi+.  Let's re-triage.
blocking-b2g: koi+ → koi?
Taipei triage result = 1.3?
blocking-b2g: koi? → 1.3?
as we confirmed that Flatfish will use v1.3 and we need to respect current testing cycle, set target milestone to 12/6 since 1.3FC tag is not created yet.
blocking-b2g: 1.3? → 1.3+
Target Milestone: 1.2 C4(Nov8) → 1.3 Sprint 6 - 12/6
Whiteboard: [TPE_GFX, flatfishRun1, flatfish only] [c= p= s= u=1.2] → [TPE_GFX, flatfishRun1, flatfish only] [c= p= s= u=1.3]
Depends on: 939392
Depends on: 939466
Depends on: 939565
(In reply to Vincent Lin[:vincentlin] from comment #91)
> 
> The main problem is still on restyle and painting. Homescreen may skip touch
> events due to it's dealing with the first one.

I had told with CJ for this styling issue a few weeks ago.  I had mentioned a short term solution at then, but I have forgot to update here until now.

Since one of main issues is restyling, and the number of CSS rules affect the speed, to reduce the time of restyling we could move all content of the homescreen into an iframe and do transformation on the iframe for swiping.  For swiping it changes only style rules in the parent frame instead of the child iframe that include a large number of rules.  With this approach, change only small part of the Gaia.  @dbaron, does it make sense?
Flags: needinfo?(dbaron)
I think it's better to wait for the results of bug 931668 before attempting anything that drastic.
Flags: needinfo?(dbaron)
Depends on: 941984
Depends on: 944564
Component: Gaia::Homescreen → Graphics
Product: Firefox OS → Core
Moving from 1.3+ to 1.4+ since Flatfish will use 1.4.
blocking-b2g: 1.3+ → 1.4+
Summary: [Flatfish]: Flatfish has bad performance on Homescreen wiping → [Flatfish]: Flatfish has bad performance on Homescreen swiping
Whiteboard: [TPE_GFX, flatfishRun1, flatfish only] [c= p= s= u=1.3] → [TPE_GFX, flatfishRun1, flatfish only] [c=handeye p= s= u=1.4]
Target Milestone: 1.3 Sprint 6 - 12/6 → ---
Is this bug actionable other than waiting for bug 931668?  it feels like a meta bug at this point.
Whiteboard: [TPE_GFX, flatfishRun1, flatfish only] [c=handeye p= s= u=1.4] → [c=handeye p= s= u=flatfish] [TPE_GFX, flatfishRun1, flatfish only]
This doesn't fall under the QC blocking feature list & DSDS feature list. Renoming.
blocking-b2g: 1.4+ → 1.4?
Moving to backlog - flatfish doesn't hit the QC feature list & DSDS feature list.
blocking-b2g: 1.4? → backlog
Depends on: 983960
Unassigned because this is a kind of meta bug.
Assignee: pchang → nobody
Priority: P1 → P3
Moving back to NEW.
Status: ASSIGNED → NEW
Whiteboard: [c=handeye p= s= u=flatfish] [TPE_GFX, flatfishRun1, flatfish only] → [c=handeye p= s= u=flatfish] [TPE_GFX, flatfishRun1, flatfish only][flatfish]
Whiteboard: [c=handeye p= s= u=flatfish] [TPE_GFX, flatfishRun1, flatfish only][flatfish] → [c=handeye p= s= u=flatfish] [TPE_GFX, flatfishRun1, flatfish only][flatfish][TCP=performance]
Is this bug still valid? I didn't notice any laggines or bad perfomance while swipping Homescreen.
BTW, I flashed a 2.1 release.
I have a DEBUG/MOZ_PROFILING build flashed into the device, so if someone needs any Cleopatra (or even PVRTune) report, I could provide one.
I'll look into this.
Flags: needinfo?(feer56)
I didn't see any issue. 

ni? reporter.
Flags: needinfo?(feer56) → needinfo?(vliu)
I also don't see any or heard from anyone about bad performance issue. Clear the ni?.
Flags: needinfo?(vliu)
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → WORKSFORME
blocking-b2g: backlog → ---
You need to log in before you can comment on or make changes to this bug.