Closed Bug 1173140 Opened 10 years ago Closed 9 years ago

[Shinano][Aries] Battery life regression

Categories

(Firefox OS Graveyard :: General, defect)

ARM
Gonk (Firefox OS)
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED WORKSFORME

People

(Reporter: gerard-majax, Unassigned)

References

Details

I cannot be more precise, but with master from a couple of days I have noticed that the battery does not last as much as my builds from ~10-15 days ago. That's with mostly the very same usage. I would say the battery life went from 4 full days to 2.5/3.
Flags: needinfo?(jhylands)
Jhylands, is there anything we can do for the aries device for battery life performance/regression range testing?
I don't know what an aries device is, but I don't have one, and don't have a harness for one, so there isn't much I can do to help. I haven't noticed any major regressions on the flame with my automated testing.
Flags: needinfo?(jhylands)
(In reply to Jon Hylands [:jhylands] from comment #2) ... > I haven't noticed any major regressions on the flame with my automated > testing. I experience a drop in battery lifetime since I moved from v18D_nightly to c18D_nightly_v2 and Build 20150615010203 (commit 1bf2da10)
bug 1178869 might be indicating that it's not just a Aries issue.
Here are some more reports :bug 1137066, bug 1135487, bug 1135466, bug 1133510, bug 1130725, bug 1121029, Some people have been experiencing battery loss in 2.0 : bug 1097083, bug 1095846
(In reply to Naoki Hirata :nhirata (please use needinfo instead of cc) from comment #4) > bug 1178869 might be indicating that it's not just a Aries issue. Interesting, since the build there would more or less match the timeframe I had in mind when doing my first report.
With build efbea75178d2 Zibi Braniecki — Bug 1163245 - [Battery][Gonk] Implementation of battery discharging remaining time. r=dhylands NAME PID WAKEUPS RUNTIME NEW b2g 208 1000 2210 ms no (Nuwa) 334 1 0 ms no Homescreen 1166 28 33 ms no Find My Device 2966 42 37 ms no Camera 3126 181 74 ms no Calendar 3886 289 821 ms no ~ 3 hrs of time span. Looks like email got killed. Gecko inbound : dfeb48f0d492 NAME PID WAKEUPS RUNTIME NEW b2g 209 1099 2497 ms no (Nuwa) 445 4 1 ms no Homescreen 1136 29 6 ms no Find My Device 3315 36 6 ms no Calendar 5532 83 206 ms no ~ 3 hrs as well. Not sure if it makes that much of an impact, there seems to be a regression with Calendar app? http://hg.mozilla.org/integration/b2g-inbound/pushloghtml?fromchange=dfeb48f0d492&tochange=efbea75178d2 still collecting data.
w/ yesterday's build : dfeb48f0d492 NAME PID WAKEUPS RUNTIME NEW b2g 207 821 2712 ms no (Nuwa) 451 2 168 ms no Homescreen 3326 22 4 ms no Calendar 4148 106 186 ms no (Preallocated a 17526 116 859 ms yes I made a small script to make sure I get 3 hrs.
I'm not convinced that this method would help us get the information we want to figure out the battery drain issue...
gecko rev : d017bfb96f62af16f6a9af6eadf105c959ac81e7 NAME PID WAKEUPS RUNTIME NEW b2g 209 474 1377 ms no (Nuwa) 446 0 0 ms no Find My Device 3675 57 13 ms no Homescreen 4239 58 14 ms no Calendar 4760 380 834 ms no (Preallocated a 4768 4 1 ms no
gecko rev: 9ea351d4ded09b85a026413d8f0613430dc6e335 b2g 208 902 4617 ms no (Nuwa) 380 4 7 ms no Find My Device 1735 49 15 ms no Homescreen 3252 43 10 ms no Calendar 4166 226 407 ms no (Preallocated a 12180 84 403 ms yes Note: the difference here is I also had launched the usage app. I'm not sure if that affected the b2g app to run more cycles...
Gabriele, I'm not sure if the power comsumption by app is visible with the trace.sh tool. Perhaps I'm looking at it wrong. I posted some numbers here as comparisons. How do I tell if something is too far out of bounds? Since these are all in ms, it doesn't seem completely off to the point where it would highly drain the battery... Could I get some help here? Does this also mean that we need some API in gecko around battery like Android SDK has? I think we need a little more tooling around battery/power consumption...
Flags: needinfo?(gsvelto)
This looks like the same issue I'm bisecting in bug 1130725. The most relevant number when running my scripts is the number of wakeups. In comment 7 you're seeing 1000+ wakeups while the phone is idle (I'm assuming you did all the testing with the screen off and waiting for the phone to settle down after flashing it), that's a lot for a phone that's in standby. I'd expect an order of magnitude less wakeups in a 3h period for a phone that's in standby.
Flags: needinfo?(gsvelto)
Thanks for the information. I wasn't sure if 5 minutes was enough for a good sample size; I was afraid of getting a lot of noise and variation. Yes. After launching different apps that I was thinking might be suspect, I then set the phone off and then let it settle a little bit before I ran my small script. I suspect that usage app running the background and the email app running in the background may inflate the b2g wakeup cycles. On my dogfood device, I noticed that I ran out of battery power faster; I had pluotsorbet open in the background. ( ~ 2 days and just sleeping ) Build ID 20150619225606 Gaia Revision 4c06ed88ddccaba8dc941e5006bd2a9e57306f07 Gaia Date 2015-06-19 22:17:18 Gecko Revision 7c1a6b1151a1539186b950a144387e2d7f378d1b I suspect that there's two battery life regressions, that there are some apps that we might want to investigate and some best practices that we might want to mention to make awareness of having people save battery life.
I've dug further into the kernel trace and with a completely idle phone (freshly flashed, no network connection and all applications in their original, unconfigured state) I see this pattern repeating every 100ms: <idle>-0 [000] d.h4 13887.660015: sched_wakeup: comm=Timer pid=2128 prio=120 success=1 target_cpu=000 Timer-2128 [000] d..5 13887.660145: sched_stat_runtime: comm=Timer pid=2128 runtime=136198 [ns] vruntime=1100766428723 [ns] Timer-2128 [000] d..5 13887.660158: sched_wakeup: comm=b2g pid=2085 prio=120 success=1 target_cpu=000 Timer-2128 [000] d..3 13887.660223: sched_stat_runtime: comm=Timer pid=2128 runtime=77292 [ns] vruntime=1100766506015 [ns] b2g-2085 [000] d..4 13887.660339: sched_stat_runtime: comm=b2g pid=2085 runtime=116614 [ns] vruntime=1100766756273 [ns] b2g-2085 [000] d..4 13887.660351: sched_wakeup: comm=BgHangManager pid=2122 prio=121 success=1 target_cpu=000 b2g-2085 [000] d..4 13887.660482: sched_stat_runtime: comm=b2g pid=2085 runtime=142188 [ns] vruntime=1100766898461 [ns] b2g-2085 [000] d..4 13887.660495: sched_wakeup: comm=Timer pid=2128 prio=120 success=1 target_cpu=000 b2g-2085 [000] d..3 13887.660826: sched_stat_runtime: comm=b2g pid=2085 runtime=343489 [ns] vruntime=1100767241950 [ns] BgHangManager-2122 [000] d..3 13887.660900: sched_stat_runtime: comm=BgHangManager pid=2122 runtime=74636 [ns] vruntime=1100766534306 [ns] Timer-2128 [000] d..3 13887.660962: sched_stat_runtime: comm=Timer pid=2128 runtime=63073 [ns] vruntime=1100766569088 [ns] I read it as follows: after an idle interval the timer thread in the main process is waking up, waking up the main process' main thread (probably by posting a task on the main thread's task queue) which in turn wakes up the BgHangManager thread. I'm not familiar with this code yet but I'll try digging further to figure out what's going on.
Gabriele, look at https://bugzilla.mozilla.org/show_bug.cgi?id=970358#c0 . Could it be that we have a misconfigured build?
(In reply to Fabrice Desré [:fabrice] from comment #16) > Gabriele, look at https://bugzilla.mozilla.org/show_bug.cgi?id=970358#c0 . > Could it be that we have a misconfigured build? Thanks for the tip, I have an engineering build on my device though so I might be chasing the wrong issue. I'll try getting a trace with a plain user build.
(In reply to Gabriele Svelto [:gsvelto] from comment #17) > (In reply to Fabrice Desré [:fabrice] from comment #16) > > Gabriele, look at https://bugzilla.mozilla.org/show_bug.cgi?id=970358#c0 . > > Could it be that we have a misconfigured build? > > Thanks for the tip, I have an engineering build on my device though so I > might be chasing the wrong issue. I'll try getting a trace with a plain user > build. I'm not sure a VARIANT=userdebug or user build will have this flagged toggled.
It's keyed on the GRE_MILESTONE, which is 42.0a1 currently on trunk: http://mxr.mozilla.org/mozilla-central/source/configure.in#3516
I've just tried a local build for my Flame with VARIANT=user and the hang monitor is enabled, objdir-gecko/xpcom/threads/backend.mk contains this line: DEFINES += -DMOZ_ENABLE_BACKGROUND_HANG_MONITOR=1 -DOS_POSIX=1 -DOS_LINUX=1 objdir-gecko/config/autoconf.mk has GRE_MILESTONE set to 41.0a1 so that might be the cause. Can anybody check on a Z3C using the regular foxfooding builds too? Run the following commands: adb shell 'echo 2048 > /sys/kernel/debug/tracing/buffer_size_kb' adb shell 'echo "" > /sys/kernel/debug/tracing/trace' adb shell 'echo 1 > /sys/kernel/debug/tracing/events/sched/sched_stat_runtime/enable' adb shell 'echo 1 > /sys/kernel/debug/tracing/events/sched/sched_wakeup/enable' adb shell 'echo 1 > /sys/kernel/debug/tracing/events/sched/sched_wakeup_new/enable' adb shell 'echo 1 > /sys/kernel/debug/tracing/tracing_enabled' Wait a few seconds and then adb shell 'cat /sys/kernel/debug/tracing/trace' | grep BgHangManager If you get any output then the background hang manager is definitely running and draining the battery. Don't forget to run this to stop tracing when you're done: adb shell 'echo 0 > /sys/kernel/debug/tracing/tracing_enabled'
That's definitively happening on mine. Let's disable that on all b2g device builds.
Gabriele, while this is bad, I don't see any recent change around the BackgroundHandMonitor, so I'm moved that to a new bug : bug 1180533
(In reply to Fabrice Desré [:fabrice] from comment #22) > Gabriele, while this is bad, I don't see any recent change around the > BackgroundHandMonitor, so I'm moved that to a new bug : bug 1180533 I don't think this is an issue with the BackgroundHangMonitor per se but rather with how it is used. Looking at the code the way it works is that once a background thread that uses it starts working on a task it's supposed to call BackgroundHangMonitor::NotifyActivity() and keep calling it to prove it's not stuck. Once it is done the thread should call BackgroundHangMonitor::NotifyWait() which stops the hang monitor and waits for the next activity notification. It seems to me that there's a thread somewhere that's never calling BackgroundHangMonitor::NotifyWait() but rather keeps calling BackgroundHangMonitor::NotifyActivity() thus prompting the hang monitor to wake up continuously. I'll try running a bisection because I feel that just disabling the hang monitor is masking the real issue.
Quick update, I prepared a build with a ~1 month old version of gecko and I can still see the BgHangManager wakeups but they're not as frequent as with recent versions. I've got the feeling these may have become worse over time so I'll take this chance to open a bug related to that. At this stage I suspect that other similar issues might have been caused by this (e.g. bug 1130725) with varying degrees of badness depending on the hang manager's clients behavior.
See Also: → 1180677
Gabriele, as we spoke on IRC, here is a feedback around this issue. I did a build just after Fabrice's patch landed, and I have dogfooded this build since. What I could notice so far is a slight improvement. I think somehow placing calls does trigger something that starts to eat my battery again. I remember that in the past we dot Audio channels leaking, especially in the callscreen app, and that this was already triggering battery life regression.
Flags: needinfo?(gsvelto)
There hasn't been many changes to the callscreen lately so I don't think we should have problems with leaking Audio contexts. That being said since bug 1094764 landed we might just use that in the callscreen and dialer to ensure that we always stop playing audio when we don't need it anymore. I'll file a bug and attach it here.
Filed bug 1183622 for the audio context part.
Flags: needinfo?(gsvelto)
See Also: → 1183622
Situation seems to be better these days. Tracking slowdown in bug 1189004.
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.