Open Bug 708524 Opened 13 years ago Updated 5 months ago

High CPU usage with simple CSS animation

Categories

(Core :: Layout, defect)

ARM
Gonk (Firefox OS)
defect

Tracking

()

People

(Reporter: cjones, Unassigned)

References

Details

Attachments

(14 files, 1 obsolete file)

826 bytes, text/plain
Details
452 bytes, text/plain
Details
450 bytes, text/plain
Details
682.26 KB, text/plain
Details
585.90 KB, text/plain
Details
4.79 KB, patch
Details | Diff | Splinter Review
1.38 MB, text/plain
Details
121.68 KB, text/plain
Details
1.81 MB, text/x-log
Details
324.44 KB, text/plain
Details
507.59 KB, text/plain
Details
800 bytes, text/plain
Details
288.08 KB, text/plain
Details
1.40 MB, text/plain
Details
The B2G homescreen has a battery indicator that shows charging/discharging status. When AC power is plugged in, the indicator shows a CSS animation that indicates charging status. The animation is over a very small area of the screen, maybe 5x20 pixels. When the animation runs, the b2g process on device consumes ~30-40% CPU. That seems way too high. We're going to profile and see what's wrong.
The oprofile for target need to patch to support ARM Cortex-A9 used by SGS2 phone. Adding attached oprofile reports for homescreen.css with animation on and off. There are 2 animation off reports sampled at different time. Each was sampled for 10 seconds. Please ignore the "CPU: invalid cpu type, speed 0 MHz (estimated)" message in the report, it should not matter.
Can you try do to a build of gecko with symbols and get a detailed profile? Thanks!
B2g builds have symbols by default. You just need to tell oprofile to show function information. Getting a callgraph might help too. See the gIthub issue about enabling profiling in the sgs2 kernel to see which patch to get for better stack walking.
Let me check how to make callgraph working on SGS2. Below is some additional information with symbols. Most CPU usage are coming from those fetch_scanline_* functions of gecko. ================================================= CPU: invalid cpu type, speed 0 MHz (estimated) Counted CPU_CYCLES events (Number of CPU cycles) with a unit mask of 0x00 (No unit mask) count 125000 samples % app name symbol name 1615 4.5107 libxul.so fetch_scanline_x1r5g5b5 1554 4.3403 libxul.so fetch_scanline_a1b5g5r5 1509 4.2146 libxul.so fetch_scanline_x4r4g4b4 1459 4.0750 libxul.so fetch_scanline_x1b5g5r5 1388 3.8767 libxul.so fetch_scanline_a4r4g4b4 1326 3.7035 libGLESv2_mali.so _gles_fb_tex_sub_image_2d 1302 3.6365 libxul.so fetch_scanline_a4b4g4r4 1241 3.4661 oprofiled odb_update_node_with_offset 764 2.1338 vmlinux ring_buffer_consume 693 1.9355 vmlinux schedule 562 1.5697 vmlinux s5pv310_enter_idle 440 1.2289 libxul.so fast_composite_scaled_bilinear_neon_8888_8888_pad_SRC 438 1.2233 vmlinux __do_softirq 279 0.7792 vmlinux tick_nohz_stop_sched_tick 271 0.7569 libm.so floor 267 0.7457 vmlinux tick_nohz_restart_sched_tick 242 0.6759 vmlinux add_preempt_count 236 0.6591 vmlinux __queue_work 211 0.5893 vmlinux sub_preempt_count 208 0.5809 libxul.so mozilla::imagelib::ConvertColormap(unsigned int*, unsigned int) 194 0.5418 libxul.so nsFrameManager::ReparentStyleContext(nsIFrame*) 177 0.4944 libc.so memcpy ................... =================================================
Callgraph for "libxul.so".
Thanks Shian-Yow. We're not getting backtraces from the expensive functions in pIxman, so it's hard to tell what's expensive. In general we should fix pixman to give us better stacks, but for the purposes of this bug we can try to get some higher-level profiling data. My first guess id that we're invalidating too much content. Roc, tn is there a guide to getting invalidation information somewhere? Is the easiest way to check this to look st display list dumps?
Another result measured at a different time. The libxul.so consumes 37% CPU, but the callgraph result was different from previous attachment. ====================================== CPU: invalid cpu type, speed 0 MHz (estimated) Counted CPU_CYCLES events (Number of CPU cycles) with a unit mask of 0x00 (No unit mask) count 125000 CPU_CYCLES:125000| samples| %| ------------------ 10462 37.5130 libxul.so 9717 34.8417 vmlinux 2025 7.2609 libGLESv2_mali.so 1925 6.9024 oprofiled 1079 3.8689 libc.so 927 3.3239 libMali.so 660 2.3665 libmozutils.so 386 1.3841 libm.so 338 1.2119 libnspr4.so ======================================
In my testing, there are 2 cases. Case 1: libxul.so consumes 54%, a lot coming from pixman. List of functions consume higher than %1 usage: ===================================================== samples % app name symbol name 2841 6.0810 oprofiled odb_update_node_with_offset 1815 3.8849 libxul.so fetch_scanline_x4r4g4b4 1802 3.8571 libxul.so fetch_scanline_x1r5g5b5 1795 3.8421 libxul.so fetch_scanline_a1b5g5r5 1734 3.7116 libxul.so fetch_scanline_x1b5g5r5 1668 3.5703 libxul.so fetch_scanline_a4r4g4b4 1547 3.3113 libGLESv2_mali.so _gles_fb_tex_sub_image_2d 1515 3.2428 libxul.so fetch_scanline_a4b4g4r4 1074 2.2989 vmlinux ring_buffer_consume 938 2.0077 vmlinux s5pv310_enter_idle 769 1.6460 vmlinux schedule 618 1.3228 libxul.so fast_composite_scaled_bilinear_neon_8888_8888_pad_SRC 591 1.2650 vmlinux __do_softirq ===================================================== Case 2: libxul.so consumes 37%, no expensive functions seen. List of functions consume higher than %1 usage: ===================================================== samples % app name symbol name 1391 5.0186 oprofiled odb_update_node_with_offset 1310 4.7263 libGLESv2_mali.so _gles_fb_tex_sub_image_2d 779 2.8105 vmlinux s5pv310_enter_idle 757 2.7312 vmlinux ring_buffer_consume 650 2.3451 vmlinux schedule 343 1.2375 vmlinux add_preempt_count 337 1.2159 vmlinux __do_softirq 311 1.1221 libxul.so fast_composite_scaled_bilinear_neon_8888_8888_pad_OVER 297 1.0715 libm.so floor 294 1.0607 vmlinux sub_preempt_count =====================================================
(In reply to Chris Jones [:cjones] [:warhammer] from comment #9) > My first guess id that we're invalidating too much content. Roc, tn is > there a guide to getting invalidation information somewhere? Is the easiest > way to check this to look st display list dumps? The easiest way without changing code, yes. Set MOZ_DUMP_PAINT_LIST=1 in the environment or gDumpPaintList=1 in a debugger.
Attached patch debug patchSplinter Review
I use something like this when looking into painting/invalidation things.
I attached 2 dump paint logs (Animation on and off). In Animation on case, the log was very busy. Can someone take a look to see if any anything abnormal? :)
Yes: ----- Painting --- before optimization (dirty 0,0,28800,48000): ----- If I read correctly, the entire viewport is being invalidated by the battery animation. That's bad.
No, that's just us building the display list for the entire window to ensure we rebuild layers correctly. The actual invalid region doesn't show up in this log. You'll need to look at the paint events your Gonk widget is dispatching.
Here's another log added Timothy's debug patch. Does it help to indicate more clues?
Previous log was incorrect. Update with this.
Attachment #582622 - Attachment is obsolete: true
This is another log with Timothy's debug patch, but turn off MOZ_DUMP_PAINT_LIST to show less information. This log was captured for about 20 seconds. The message "paint event bounds 0 0 480 800" was found 895 times. And "InvalidateThebesLayerContents" was found 2893 times. Does this mean we are invalidating too much?
paint event bounds 0 0 480 800 DrawThebesLayer aRegionToDraw 0 0 480 24 InvalidateThebesLayerContents 405 3 29 16 nsViewManager::UpdateViewNoSuppression 405 3 29 16 InvalidateThebesLayerContents 0 0 480 24 nsViewManager::UpdateViewNoSuppression 0 0 480 24 InvalidateThebesLayerContents 0 0 480 24 nsViewManager::UpdateViewNoSuppression 0 0 480 24 This repeats over and over. The coordinates look weird for InvalidateThebesLayerContents 405 3 29 16 nsViewManager::UpdateViewNoSuppression 405 3 29 16
The coordinate "405 3 29 16" is the battery image on the top right corner, and "0 0 480 24" is the notification bar at the top. Next thing is to find out why they repeated so frequently.
From that log it looks like we invalidate reasonable rects, but every paint event is for the entire window still. We repaint 480x24 rect of a layer and composite the layer tree and blit that to the screen for the rest of the window. What does the DOM tree look like? Is the battery indicator inside a floated element maybe?
nsPaintEvent event(true, NS_PAINT, gWindowToRedraw); event.region = gScreenBounds; So the paint event bounds are always the screen. Nothing to see there. I sure hope we're using GL to composite everything. I think we may need a hierarchical profile to get a better grip on what's costly here. (In reply to Shian-Yow Wu from comment #23) > The coordinate "405 3 29 16" is the battery image on the top right corner, > and "0 0 480 24" is the notification bar at the top. We shouldn't be repainting the notification bar, right? That may be some kind of bug, although it may not explain the performance issue.
Yuck. Fwiw we saw the same symptom on android too. We are compositing with GL. (In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #25) > We shouldn't be repainting the notification bar, right? That may be some > kind of bug, although it may not explain the performance issue. Not the whole thing.
Attached file perf callgraph
This is the profiling result with perf. After using "libxul.so" compiled with "-fno-omit-frame-pointer", it indicated CPU time was spent on these 2 functions in pixman. 14.48% b2g-debug libxul.so [.] bits_image_fetch_bilinear_affine_pad_r5g6b5 11.23% b2g-debug libxul.so [.] convert_r5g6b5 The high CPU time functions are different from Bug 698297. But can't say whether the 2 bugs are related or not, as I cannot get callgraph working for pixman functions.
Sounds like we are doing some 16-bit rgb to 24-bit rgb conversion here. Thats odd. We are driving the screen with 24bpp. Where would the 16-bit rgb come from?
This bug is mostly invalid, until we fix bug 711867. gecko uses 16bpp surfaces by default in GFX_OPTIMIZE_MOBILE.
OK, still seeing high CPU usage after bug 711867 landed. Let's get some oprofile data.
Attached file oprofile summary
Attached file oprofile detail
This is oprofile with call graph.
Hmm, how to interpret that? How much of the time is spend under FrameLayerBuilder::BuildContainerLayerFor? How much under nsFrame::BuildDisplayListForStackingContext?
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #36) > Hmm, how to interpret that? > > How much of the time is spend under > FrameLayerBuilder::BuildContainerLayerFor? > > How much under nsFrame::BuildDisplayListForStackingContext? Below was the opcontrol commands been used, and the log were generated from opreport at host side. opcontrol --vmlinux=/data/local/vmlinux --kernel-range=0xc059c000,0xc0c06000 --event=CPU_CYCLES:125000 opcontrol --callgraph=6 opcontrol --start;sleep 10;opcontrol --stop It's sampled by CPU_CYCLES event for 10 seconds. I'm not sure if it's accurate to estimate time spent from sample percentage. If we calculate it this way, then the estimated time should be 9.16ms (10 sec * 0.0916%) for FrameLayerBuilder::BuildContainerLayerFor, and 4.58ms for nsIFrame::BuildDisplayListForStackingContext.
1287 4.9133 libGLESv2_mali.so libGLESv2_mali.so _gles_fb_tex_sub_image_2d This is us possibly doing excessive texture uploading, which might mean that our uploading code is using a slow path. 669 2.5540 libmozutils.so libmozutils.so __clzsi2 I have no idea what this is. It looks suspicious. glandium, any pointers? 314 1.1987 libxul.so libxul.so pixman_composite_over_8888_8888_asm_neon This is surprising, because we should be compositing in GL. 296 1.1300 libnspr4.so libnspr4.so /system/b2g/libnspr4.so This is also suspicious. We should be getting symbols from nspr, so that's a separate bug too.
(In reply to Chris Jones [:cjones] [:warhammer] from comment #38) > 669 2.5540 libmozutils.so libmozutils.so > __clzsi2 > > I have no idea what this is. It looks suspicious. glandium, any pointers? http://gcc.gnu.org/onlinedocs/gccint/Integer-library-routines.html
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: