Open
Bug 708524
Opened 13 years ago
Updated 5 months ago
High CPU usage with simple CSS animation
Categories
(Core :: Layout, defect)
Tracking
()
NEW
People
(Reporter: cjones, Unassigned)
References
Details
Attachments
(14 files, 1 obsolete file)
826 bytes,
text/plain
|
Details | |
452 bytes,
text/plain
|
Details | |
450 bytes,
text/plain
|
Details | |
682.26 KB,
text/plain
|
Details | |
585.90 KB,
text/plain
|
Details | |
4.79 KB,
patch
|
Details | Diff | Splinter Review | |
1.38 MB,
text/plain
|
Details | |
121.68 KB,
text/plain
|
Details | |
1.81 MB,
text/x-log
|
Details | |
324.44 KB,
text/plain
|
Details | |
507.59 KB,
text/plain
|
Details | |
800 bytes,
text/plain
|
Details | |
288.08 KB,
text/plain
|
Details | |
1.40 MB,
text/plain
|
Details |
The B2G homescreen has a battery indicator that shows charging/discharging status. When AC power is plugged in, the indicator shows a CSS animation that indicates charging status. The animation is over a very small area of the screen, maybe 5x20 pixels. When the animation runs, the b2g process on device consumes ~30-40% CPU. That seems way too high.
We're going to profile and see what's wrong.
Comment 1•13 years ago
|
||
Comment 2•13 years ago
|
||
Comment 3•13 years ago
|
||
Comment 4•13 years ago
|
||
The oprofile for target need to patch to support ARM Cortex-A9 used by SGS2 phone.
Adding attached oprofile reports for homescreen.css with animation on and off.
There are 2 animation off reports sampled at different time.
Each was sampled for 10 seconds.
Please ignore the "CPU: invalid cpu type, speed 0 MHz (estimated)" message in the report, it should not matter.
Comment 5•13 years ago
|
||
Can you try do to a build of gecko with symbols and get a detailed profile? Thanks!
Reporter | ||
Comment 6•13 years ago
|
||
B2g builds have symbols by default. You just need to tell oprofile to show function information. Getting a callgraph might help too. See the gIthub issue about enabling profiling in the sgs2 kernel to see which patch to get for better stack walking.
Comment 7•13 years ago
|
||
Let me check how to make callgraph working on SGS2.
Below is some additional information with symbols.
Most CPU usage are coming from those fetch_scanline_* functions of gecko.
=================================================
CPU: invalid cpu type, speed 0 MHz (estimated)
Counted CPU_CYCLES events (Number of CPU cycles) with a unit mask of 0x00 (No unit mask) count 125000
samples % app name symbol name
1615 4.5107 libxul.so fetch_scanline_x1r5g5b5
1554 4.3403 libxul.so fetch_scanline_a1b5g5r5
1509 4.2146 libxul.so fetch_scanline_x4r4g4b4
1459 4.0750 libxul.so fetch_scanline_x1b5g5r5
1388 3.8767 libxul.so fetch_scanline_a4r4g4b4
1326 3.7035 libGLESv2_mali.so _gles_fb_tex_sub_image_2d
1302 3.6365 libxul.so fetch_scanline_a4b4g4r4
1241 3.4661 oprofiled odb_update_node_with_offset
764 2.1338 vmlinux ring_buffer_consume
693 1.9355 vmlinux schedule
562 1.5697 vmlinux s5pv310_enter_idle
440 1.2289 libxul.so fast_composite_scaled_bilinear_neon_8888_8888_pad_SRC
438 1.2233 vmlinux __do_softirq
279 0.7792 vmlinux tick_nohz_stop_sched_tick
271 0.7569 libm.so floor
267 0.7457 vmlinux tick_nohz_restart_sched_tick
242 0.6759 vmlinux add_preempt_count
236 0.6591 vmlinux __queue_work
211 0.5893 vmlinux sub_preempt_count
208 0.5809 libxul.so mozilla::imagelib::ConvertColormap(unsigned int*, unsigned int)
194 0.5418 libxul.so nsFrameManager::ReparentStyleContext(nsIFrame*)
177 0.4944 libc.so memcpy
...................
=================================================
Comment 8•13 years ago
|
||
Callgraph for "libxul.so".
Reporter | ||
Comment 9•13 years ago
|
||
Thanks Shian-Yow. We're not getting backtraces from the expensive functions in pIxman, so it's hard to tell what's expensive. In general we should fix pixman to give us better stacks, but for the purposes of this bug we can try to get some higher-level profiling data.
My first guess id that we're invalidating too much content. Roc, tn is there a guide to getting invalidation information somewhere? Is the easiest way to check this to look st display list dumps?
Comment 10•13 years ago
|
||
Another result measured at a different time. The libxul.so consumes 37% CPU, but the callgraph result was different from previous attachment.
======================================
CPU: invalid cpu type, speed 0 MHz (estimated)
Counted CPU_CYCLES events (Number of CPU cycles) with a unit mask of 0x00 (No unit mask) count 125000
CPU_CYCLES:125000|
samples| %|
------------------
10462 37.5130 libxul.so
9717 34.8417 vmlinux
2025 7.2609 libGLESv2_mali.so
1925 6.9024 oprofiled
1079 3.8689 libc.so
927 3.3239 libMali.so
660 2.3665 libmozutils.so
386 1.3841 libm.so
338 1.2119 libnspr4.so
======================================
Comment 11•13 years ago
|
||
In my testing, there are 2 cases.
Case 1: libxul.so consumes 54%, a lot coming from pixman.
List of functions consume higher than %1 usage:
=====================================================
samples % app name symbol name
2841 6.0810 oprofiled odb_update_node_with_offset
1815 3.8849 libxul.so fetch_scanline_x4r4g4b4
1802 3.8571 libxul.so fetch_scanline_x1r5g5b5
1795 3.8421 libxul.so fetch_scanline_a1b5g5r5
1734 3.7116 libxul.so fetch_scanline_x1b5g5r5
1668 3.5703 libxul.so fetch_scanline_a4r4g4b4
1547 3.3113 libGLESv2_mali.so _gles_fb_tex_sub_image_2d
1515 3.2428 libxul.so fetch_scanline_a4b4g4r4
1074 2.2989 vmlinux ring_buffer_consume
938 2.0077 vmlinux s5pv310_enter_idle
769 1.6460 vmlinux schedule
618 1.3228 libxul.so fast_composite_scaled_bilinear_neon_8888_8888_pad_SRC
591 1.2650 vmlinux __do_softirq
=====================================================
Case 2: libxul.so consumes 37%, no expensive functions seen.
List of functions consume higher than %1 usage:
=====================================================
samples % app name symbol name
1391 5.0186 oprofiled odb_update_node_with_offset
1310 4.7263 libGLESv2_mali.so _gles_fb_tex_sub_image_2d
779 2.8105 vmlinux s5pv310_enter_idle
757 2.7312 vmlinux ring_buffer_consume
650 2.3451 vmlinux schedule
343 1.2375 vmlinux add_preempt_count
337 1.2159 vmlinux __do_softirq
311 1.1221 libxul.so fast_composite_scaled_bilinear_neon_8888_8888_pad_OVER
297 1.0715 libm.so floor
294 1.0607 vmlinux sub_preempt_count
=====================================================
(In reply to Chris Jones [:cjones] [:warhammer] from comment #9)
> My first guess id that we're invalidating too much content. Roc, tn is
> there a guide to getting invalidation information somewhere? Is the easiest
> way to check this to look st display list dumps?
The easiest way without changing code, yes. Set MOZ_DUMP_PAINT_LIST=1 in the environment or gDumpPaintList=1 in a debugger.
Comment 13•13 years ago
|
||
I use something like this when looking into painting/invalidation things.
Comment 14•13 years ago
|
||
Comment 15•13 years ago
|
||
Comment 16•13 years ago
|
||
I attached 2 dump paint logs (Animation on and off).
In Animation on case, the log was very busy.
Can someone take a look to see if any anything abnormal? :)
Reporter | ||
Comment 17•13 years ago
|
||
Yes:
-----
Painting --- before optimization (dirty 0,0,28800,48000):
-----
If I read correctly, the entire viewport is being invalidated by the battery animation. That's bad.
No, that's just us building the display list for the entire window to ensure we rebuild layers correctly. The actual invalid region doesn't show up in this log. You'll need to look at the paint events your Gonk widget is dispatching.
Comment 19•13 years ago
|
||
Here's another log added Timothy's debug patch.
Does it help to indicate more clues?
Comment 20•13 years ago
|
||
Previous log was incorrect. Update with this.
Attachment #582622 -
Attachment is obsolete: true
Comment 21•13 years ago
|
||
This is another log with Timothy's debug patch, but turn off MOZ_DUMP_PAINT_LIST to show less information.
This log was captured for about 20 seconds.
The message "paint event bounds 0 0 480 800" was found 895 times.
And "InvalidateThebesLayerContents" was found 2893 times.
Does this mean we are invalidating too much?
Comment 22•13 years ago
|
||
paint event bounds 0 0 480 800
DrawThebesLayer aRegionToDraw 0 0 480 24
InvalidateThebesLayerContents 405 3 29 16
nsViewManager::UpdateViewNoSuppression 405 3 29 16
InvalidateThebesLayerContents 0 0 480 24
nsViewManager::UpdateViewNoSuppression 0 0 480 24
InvalidateThebesLayerContents 0 0 480 24
nsViewManager::UpdateViewNoSuppression 0 0 480 24
This repeats over and over. The coordinates look weird for
InvalidateThebesLayerContents 405 3 29 16
nsViewManager::UpdateViewNoSuppression 405 3 29 16
Comment 23•13 years ago
|
||
The coordinate "405 3 29 16" is the battery image on the top right corner, and "0 0 480 24" is the notification bar at the top.
Next thing is to find out why they repeated so frequently.
Comment 24•13 years ago
|
||
From that log it looks like we invalidate reasonable rects, but every paint event is for the entire window still. We repaint 480x24 rect of a layer and composite the layer tree and blit that to the screen for the rest of the window. What does the DOM tree look like? Is the battery indicator inside a floated element maybe?
nsPaintEvent event(true, NS_PAINT, gWindowToRedraw);
event.region = gScreenBounds;
So the paint event bounds are always the screen. Nothing to see there. I sure hope we're using GL to composite everything.
I think we may need a hierarchical profile to get a better grip on what's costly here.
(In reply to Shian-Yow Wu from comment #23)
> The coordinate "405 3 29 16" is the battery image on the top right corner,
> and "0 0 480 24" is the notification bar at the top.
We shouldn't be repainting the notification bar, right? That may be some kind of bug, although it may not explain the performance issue.
Reporter | ||
Comment 26•13 years ago
|
||
Yuck. Fwiw we saw the same symptom on android too.
We are compositing with GL.
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #25)
> We shouldn't be repainting the notification bar, right? That may be some
> kind of bug, although it may not explain the performance issue.
Not the whole thing.
is this the same bug as bug 698297?
Comment 28•13 years ago
|
||
This is the profiling result with perf.
After using "libxul.so" compiled with "-fno-omit-frame-pointer", it indicated CPU time was spent on these 2 functions in pixman.
14.48% b2g-debug libxul.so [.] bits_image_fetch_bilinear_affine_pad_r5g6b5
11.23% b2g-debug libxul.so [.] convert_r5g6b5
The high CPU time functions are different from Bug 698297. But can't say whether the 2 bugs are related or not, as I cannot get callgraph working for pixman functions.
Comment 29•13 years ago
|
||
Sounds like we are doing some 16-bit rgb to 24-bit rgb conversion here. Thats odd. We are driving the screen with 24bpp. Where would the 16-bit rgb come from?
Reporter | ||
Comment 30•13 years ago
|
||
This bug is mostly invalid, until we fix bug 711867.
gecko uses 16bpp surfaces by default in GFX_OPTIMIZE_MOBILE.
Reporter | ||
Comment 31•13 years ago
|
||
OK, still seeing high CPU usage after bug 711867 landed. Let's get some oprofile data.
Comment 32•13 years ago
|
||
Comment 33•13 years ago
|
||
We need a stack profile.
Comment 35•13 years ago
|
||
This is oprofile with call graph.
Hmm, how to interpret that?
How much of the time is spend under FrameLayerBuilder::BuildContainerLayerFor?
How much under nsFrame::BuildDisplayListForStackingContext?
Comment 37•13 years ago
|
||
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #36)
> Hmm, how to interpret that?
>
> How much of the time is spend under
> FrameLayerBuilder::BuildContainerLayerFor?
>
> How much under nsFrame::BuildDisplayListForStackingContext?
Below was the opcontrol commands been used, and the log were generated from opreport at host side.
opcontrol --vmlinux=/data/local/vmlinux --kernel-range=0xc059c000,0xc0c06000 --event=CPU_CYCLES:125000
opcontrol --callgraph=6
opcontrol --start;sleep 10;opcontrol --stop
It's sampled by CPU_CYCLES event for 10 seconds.
I'm not sure if it's accurate to estimate time spent from sample percentage.
If we calculate it this way, then the estimated time should be 9.16ms (10 sec * 0.0916%) for FrameLayerBuilder::BuildContainerLayerFor, and 4.58ms for nsIFrame::BuildDisplayListForStackingContext.
Reporter | ||
Comment 38•13 years ago
|
||
1287 4.9133 libGLESv2_mali.so libGLESv2_mali.so _gles_fb_tex_sub_image_2d
This is us possibly doing excessive texture uploading, which might mean that our uploading code is using a slow path.
669 2.5540 libmozutils.so libmozutils.so __clzsi2
I have no idea what this is. It looks suspicious. glandium, any pointers?
314 1.1987 libxul.so libxul.so pixman_composite_over_8888_8888_asm_neon
This is surprising, because we should be compositing in GL.
296 1.1300 libnspr4.so libnspr4.so /system/b2g/libnspr4.so
This is also suspicious. We should be getting symbols from nspr, so that's a separate bug too.
Comment 39•13 years ago
|
||
(In reply to Chris Jones [:cjones] [:warhammer] from comment #38)
> 669 2.5540 libmozutils.so libmozutils.so
> __clzsi2
>
> I have no idea what this is. It looks suspicious. glandium, any pointers?
http://gcc.gnu.org/onlinedocs/gccint/Integer-library-routines.html
Updated•2 years ago
|
Severity: normal → S3
You need to log in
before you can comment on or make changes to this bug.
Description
•