Open Bug 708524 Opened 13 years ago Updated 5 months ago

High CPU usage with simple CSS animation

Tracking

()

Status:

NEW

People

(Reporter: cjones, Unassigned)

References

Details

Attachments

(14 files, 1 obsolete file)

oprofile with animation on 13 years ago Shian-Yow Wu [:swu] 826 bytes, text/plain		Details
oprofile with animation off 13 years ago Shian-Yow Wu [:swu] 452 bytes, text/plain		Details
oprofile with animation off (new) 13 years ago Shian-Yow Wu [:swu] 450 bytes, text/plain		Details
callgraph for libxul.so 13 years ago Shian-Yow Wu [:swu] 682.26 KB, text/plain		Details
callgraph for libxul.so with different result 13 years ago Shian-Yow Wu [:swu] 585.90 KB, text/plain		Details
debug patch 13 years ago Timothy Nikkel (:tnikkel) 4.79 KB, patch		Details \| Diff \| Splinter Review
Dump paint log with MOZ_DUMP_PAINT_LIST=1 (Animation on) 13 years ago Shian-Yow Wu [:swu] 1.38 MB, text/plain		Details
Dump paint log with MOZ_DUMP_PAINT_LIST=1 (Animation off) 13 years ago Shian-Yow Wu [:swu] 121.68 KB, text/plain		Details
Animation on with MOZ_DUMP_PAINT_LIST=1 and Timothy's debug patch 13 years ago Shian-Yow Wu [:swu] 1.38 MB, text/plain		Details
Animation on with MOZ_DUMP_PAINT_LIST=1 and Timothy's debug patch 13 years ago Shian-Yow Wu [:swu] 1.81 MB, text/x-log		Details
Animation on log with only Timothy's debug patch 13 years ago Shian-Yow Wu [:swu] 324.44 KB, text/plain		Details
perf callgraph 13 years ago Shian-Yow Wu [:swu] 507.59 KB, text/plain		Details
oprofile summary 13 years ago Shian-Yow Wu [:swu] 800 bytes, text/plain		Details
oprofile detail 13 years ago Shian-Yow Wu [:swu] 288.08 KB, text/plain		Details
oprofile with callgraph 13 years ago Shian-Yow Wu [:swu] 1.40 MB, text/plain		Details

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Reporter

Description

•

13 years ago

The B2G homescreen has a battery indicator that shows charging/discharging status. When AC power is plugged in, the indicator shows a CSS animation that indicates charging status. The animation is over a very small area of the screen, maybe 5x20 pixels. When the animation runs, the b2g process on device consumes ~30-40% CPU. That seems way too high. We're going to profile and see what's wrong.

Shian-Yow Wu [:swu]

Comment 1

•

13 years ago

Attached file oprofile with animation on — Details

Shian-Yow Wu [:swu]

Comment 2

•

13 years ago

Attached file oprofile with animation off — Details

Shian-Yow Wu [:swu]

Comment 3

•

13 years ago

Attached file oprofile with animation off (new) — Details

Shian-Yow Wu [:swu]

Comment 4

•

13 years ago

The oprofile for target need to patch to support ARM Cortex-A9 used by SGS2 phone. Adding attached oprofile reports for homescreen.css with animation on and off. There are 2 animation off reports sampled at different time. Each was sampled for 10 seconds. Please ignore the "CPU: invalid cpu type, speed 0 MHz (estimated)" message in the report, it should not matter.

Andreas Gal :gal

Comment 5

•

13 years ago

Can you try do to a build of gecko with symbols and get a detailed profile? Thanks!

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Reporter

Comment 6

•

13 years ago

B2g builds have symbols by default. You just need to tell oprofile to show function information. Getting a callgraph might help too. See the gIthub issue about enabling profiling in the sgs2 kernel to see which patch to get for better stack walking.

Shian-Yow Wu [:swu]

Comment 7

•

13 years ago

Let me check how to make callgraph working on SGS2. Below is some additional information with symbols. Most CPU usage are coming from those fetch_scanline_* functions of gecko. ================================================= CPU: invalid cpu type, speed 0 MHz (estimated) Counted CPU_CYCLES events (Number of CPU cycles) with a unit mask of 0x00 (No unit mask) count 125000 samples % app name symbol name 1615 4.5107 libxul.so fetch_scanline_x1r5g5b5 1554 4.3403 libxul.so fetch_scanline_a1b5g5r5 1509 4.2146 libxul.so fetch_scanline_x4r4g4b4 1459 4.0750 libxul.so fetch_scanline_x1b5g5r5 1388 3.8767 libxul.so fetch_scanline_a4r4g4b4 1326 3.7035 libGLESv2_mali.so _gles_fb_tex_sub_image_2d 1302 3.6365 libxul.so fetch_scanline_a4b4g4r4 1241 3.4661 oprofiled odb_update_node_with_offset 764 2.1338 vmlinux ring_buffer_consume 693 1.9355 vmlinux schedule 562 1.5697 vmlinux s5pv310_enter_idle 440 1.2289 libxul.so fast_composite_scaled_bilinear_neon_8888_8888_pad_SRC 438 1.2233 vmlinux __do_softirq 279 0.7792 vmlinux tick_nohz_stop_sched_tick 271 0.7569 libm.so floor 267 0.7457 vmlinux tick_nohz_restart_sched_tick 242 0.6759 vmlinux add_preempt_count 236 0.6591 vmlinux __queue_work 211 0.5893 vmlinux sub_preempt_count 208 0.5809 libxul.so mozilla::imagelib::ConvertColormap(unsigned int*, unsigned int) 194 0.5418 libxul.so nsFrameManager::ReparentStyleContext(nsIFrame*) 177 0.4944 libc.so memcpy ................... =================================================

Shian-Yow Wu [:swu]

Comment 8

•

13 years ago

Attached file callgraph for libxul.so — Details

Callgraph for "libxul.so".

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Reporter

Comment 9

•

13 years ago

Thanks Shian-Yow. We're not getting backtraces from the expensive functions in pIxman, so it's hard to tell what's expensive. In general we should fix pixman to give us better stacks, but for the purposes of this bug we can try to get some higher-level profiling data. My first guess id that we're invalidating too much content. Roc, tn is there a guide to getting invalidation information somewhere? Is the easiest way to check this to look st display list dumps?

Shian-Yow Wu [:swu]

Comment 10

•

13 years ago

Attached file callgraph for libxul.so with different result — Details

Another result measured at a different time. The libxul.so consumes 37% CPU, but the callgraph result was different from previous attachment. ====================================== CPU: invalid cpu type, speed 0 MHz (estimated) Counted CPU_CYCLES events (Number of CPU cycles) with a unit mask of 0x00 (No unit mask) count 125000 CPU_CYCLES:125000| samples| %| ------------------ 10462 37.5130 libxul.so 9717 34.8417 vmlinux 2025 7.2609 libGLESv2_mali.so 1925 6.9024 oprofiled 1079 3.8689 libc.so 927 3.3239 libMali.so 660 2.3665 libmozutils.so 386 1.3841 libm.so 338 1.2119 libnspr4.so ======================================

Shian-Yow Wu [:swu]

Comment 11

•

13 years ago

In my testing, there are 2 cases. Case 1: libxul.so consumes 54%, a lot coming from pixman. List of functions consume higher than %1 usage: ===================================================== samples % app name symbol name 2841 6.0810 oprofiled odb_update_node_with_offset 1815 3.8849 libxul.so fetch_scanline_x4r4g4b4 1802 3.8571 libxul.so fetch_scanline_x1r5g5b5 1795 3.8421 libxul.so fetch_scanline_a1b5g5r5 1734 3.7116 libxul.so fetch_scanline_x1b5g5r5 1668 3.5703 libxul.so fetch_scanline_a4r4g4b4 1547 3.3113 libGLESv2_mali.so _gles_fb_tex_sub_image_2d 1515 3.2428 libxul.so fetch_scanline_a4b4g4r4 1074 2.2989 vmlinux ring_buffer_consume 938 2.0077 vmlinux s5pv310_enter_idle 769 1.6460 vmlinux schedule 618 1.3228 libxul.so fast_composite_scaled_bilinear_neon_8888_8888_pad_SRC 591 1.2650 vmlinux __do_softirq ===================================================== Case 2: libxul.so consumes 37%, no expensive functions seen. List of functions consume higher than %1 usage: ===================================================== samples % app name symbol name 1391 5.0186 oprofiled odb_update_node_with_offset 1310 4.7263 libGLESv2_mali.so _gles_fb_tex_sub_image_2d 779 2.8105 vmlinux s5pv310_enter_idle 757 2.7312 vmlinux ring_buffer_consume 650 2.3451 vmlinux schedule 343 1.2375 vmlinux add_preempt_count 337 1.2159 vmlinux __do_softirq 311 1.1221 libxul.so fast_composite_scaled_bilinear_neon_8888_8888_pad_OVER 297 1.0715 libm.so floor 294 1.0607 vmlinux sub_preempt_count =====================================================

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 12

•

13 years ago

(In reply to Chris Jones [:cjones] [:warhammer] from comment #9) > My first guess id that we're invalidating too much content. Roc, tn is > there a guide to getting invalidation information somewhere? Is the easiest > way to check this to look st display list dumps? The easiest way without changing code, yes. Set MOZ_DUMP_PAINT_LIST=1 in the environment or gDumpPaintList=1 in a debugger.

Timothy Nikkel (:tnikkel)

Comment 13

•

13 years ago

Attached patch debug patch — Details — Splinter Review

I use something like this when looking into painting/invalidation things.

Shian-Yow Wu [:swu]

Comment 14

•

13 years ago

Attached file Dump paint log with MOZ_DUMP_PAINT_LIST=1 (Animation on) — Details

Shian-Yow Wu [:swu]

Comment 15

•

13 years ago

Attached file Dump paint log with MOZ_DUMP_PAINT_LIST=1 (Animation off) — Details

Shian-Yow Wu [:swu]

Comment 16

•

13 years ago

I attached 2 dump paint logs (Animation on and off). In Animation on case, the log was very busy. Can someone take a look to see if any anything abnormal? :)

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Reporter

Comment 17

•

13 years ago

Yes: ----- Painting --- before optimization (dirty 0,0,28800,48000): ----- If I read correctly, the entire viewport is being invalidated by the battery animation. That's bad.

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 18

•

13 years ago

No, that's just us building the display list for the entire window to ensure we rebuild layers correctly. The actual invalid region doesn't show up in this log. You'll need to look at the paint events your Gonk widget is dispatching.

Shian-Yow Wu [:swu]

Comment 19

•

13 years ago

Attached file Animation on with MOZ_DUMP_PAINT_LIST=1 and Timothy's debug patch (obsolete) — Details

Here's another log added Timothy's debug patch. Does it help to indicate more clues?

Shian-Yow Wu [:swu]

Comment 20

•

13 years ago

Attached file Animation on with MOZ_DUMP_PAINT_LIST=1 and Timothy's debug patch — Details

Previous log was incorrect. Update with this.

Attachment #582622 - Attachment is obsolete: true

Shian-Yow Wu [:swu]

Comment 21

•

13 years ago

Attached file Animation on log with only Timothy's debug patch — Details

This is another log with Timothy's debug patch, but turn off MOZ_DUMP_PAINT_LIST to show less information. This log was captured for about 20 seconds. The message "paint event bounds 0 0 480 800" was found 895 times. And "InvalidateThebesLayerContents" was found 2893 times. Does this mean we are invalidating too much?

Andreas Gal :gal

Comment 22

•

13 years ago

paint event bounds 0 0 480 800 DrawThebesLayer aRegionToDraw 0 0 480 24 InvalidateThebesLayerContents 405 3 29 16 nsViewManager::UpdateViewNoSuppression 405 3 29 16 InvalidateThebesLayerContents 0 0 480 24 nsViewManager::UpdateViewNoSuppression 0 0 480 24 InvalidateThebesLayerContents 0 0 480 24 nsViewManager::UpdateViewNoSuppression 0 0 480 24 This repeats over and over. The coordinates look weird for InvalidateThebesLayerContents 405 3 29 16 nsViewManager::UpdateViewNoSuppression 405 3 29 16

Shian-Yow Wu [:swu]

Comment 23

•

13 years ago

The coordinate "405 3 29 16" is the battery image on the top right corner, and "0 0 480 24" is the notification bar at the top. Next thing is to find out why they repeated so frequently.

Timothy Nikkel (:tnikkel)

Comment 24

•

13 years ago

From that log it looks like we invalidate reasonable rects, but every paint event is for the entire window still. We repaint 480x24 rect of a layer and composite the layer tree and blit that to the screen for the rest of the window. What does the DOM tree look like? Is the battery indicator inside a floated element maybe?

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 25

•

13 years ago

nsPaintEvent event(true, NS_PAINT, gWindowToRedraw); event.region = gScreenBounds; So the paint event bounds are always the screen. Nothing to see there. I sure hope we're using GL to composite everything. I think we may need a hierarchical profile to get a better grip on what's costly here. (In reply to Shian-Yow Wu from comment #23) > The coordinate "405 3 29 16" is the battery image on the top right corner, > and "0 0 480 24" is the notification bar at the top. We shouldn't be repainting the notification bar, right? That may be some kind of bug, although it may not explain the performance issue.

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Reporter

Updated

•

13 years ago

Depends on: 711867

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Reporter

Comment 26

•

13 years ago

Yuck. Fwiw we saw the same symptom on android too. We are compositing with GL. (In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #25) > We shouldn't be repainting the notification bar, right? That may be some > kind of bug, although it may not explain the performance issue. Not the whole thing.

Naoki Hirata :nhirata (please use needinfo instead of cc)

Comment 27

•

13 years ago

is this the same bug as bug 698297?

Shian-Yow Wu [:swu]

Comment 28

•

13 years ago

Attached file perf callgraph — Details

This is the profiling result with perf. After using "libxul.so" compiled with "-fno-omit-frame-pointer", it indicated CPU time was spent on these 2 functions in pixman. 14.48% b2g-debug libxul.so [.] bits_image_fetch_bilinear_affine_pad_r5g6b5 11.23% b2g-debug libxul.so [.] convert_r5g6b5 The high CPU time functions are different from Bug 698297. But can't say whether the 2 bugs are related or not, as I cannot get callgraph working for pixman functions.

Andreas Gal :gal

Comment 29

•

13 years ago

Sounds like we are doing some 16-bit rgb to 24-bit rgb conversion here. Thats odd. We are driving the screen with 24bpp. Where would the 16-bit rgb come from?

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Reporter

Comment 30

•

13 years ago

This bug is mostly invalid, until we fix bug 711867. gecko uses 16bpp surfaces by default in GFX_OPTIMIZE_MOBILE.

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Reporter

Comment 31

•

13 years ago

OK, still seeing high CPU usage after bug 711867 landed. Let's get some oprofile data.

Shian-Yow Wu [:swu]

Comment 32

•

13 years ago

Attached file oprofile summary — Details

Shian-Yow Wu [:swu]

Comment 33

•

13 years ago

Attached file oprofile detail — Details

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 34

•

13 years ago

We need a stack profile.

Shian-Yow Wu [:swu]

Comment 35

•

13 years ago

Attached file oprofile with callgraph — Details

This is oprofile with call graph.

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 36

•

13 years ago

Hmm, how to interpret that? How much of the time is spend under FrameLayerBuilder::BuildContainerLayerFor? How much under nsFrame::BuildDisplayListForStackingContext?

Shian-Yow Wu [:swu]

Comment 37

•

13 years ago

(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #36) > Hmm, how to interpret that? > > How much of the time is spend under > FrameLayerBuilder::BuildContainerLayerFor? > > How much under nsFrame::BuildDisplayListForStackingContext? Below was the opcontrol commands been used, and the log were generated from opreport at host side. opcontrol --vmlinux=/data/local/vmlinux --kernel-range=0xc059c000,0xc0c06000 --event=CPU_CYCLES:125000 opcontrol --callgraph=6 opcontrol --start;sleep 10;opcontrol --stop It's sampled by CPU_CYCLES event for 10 seconds. I'm not sure if it's accurate to estimate time spent from sample percentage. If we calculate it this way, then the estimated time should be 9.16ms (10 sec * 0.0916%) for FrameLayerBuilder::BuildContainerLayerFor, and 4.58ms for nsIFrame::BuildDisplayListForStackingContext.

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Reporter

Comment 38

•

13 years ago

1287 4.9133 libGLESv2_mali.so libGLESv2_mali.so _gles_fb_tex_sub_image_2d This is us possibly doing excessive texture uploading, which might mean that our uploading code is using a slow path. 669 2.5540 libmozutils.so libmozutils.so __clzsi2 I have no idea what this is. It looks suspicious. glandium, any pointers? 314 1.1987 libxul.so libxul.so pixman_composite_over_8888_8888_asm_neon This is surprising, because we should be compositing in GL. 296 1.1300 libnspr4.so libnspr4.so /system/b2g/libnspr4.so This is also suspicious. We should be getting symbols from nspr, so that's a separate bug too.

Mike Hommey [:glandium]

Comment 39

•

13 years ago

(In reply to Chris Jones [:cjones] [:warhammer] from comment #38) > 669 2.5540 libmozutils.so libmozutils.so > __clzsi2 > > I have no idea what this is. It looks suspicious. glandium, any pointers? http://gcc.gnu.org/onlinedocs/gccint/Integer-library-routines.html

BMO Automation

Updated

•

2 years ago

Severity: normal → S3

You need to log in before you can comment on or make changes to this bug.