Closed Bug 1473997 Opened 3 years ago Closed 2 years ago

find a way to measure the battery usage of geckoview

Categories

(GeckoView :: General, enhancement, P2)

enhancement

Tracking

(geckoview64 wontfix, geckoview65 wontfix, firefox64 wontfix, firefox65 wontfix, firefox66 fixed)

RESOLVED FIXED
mozilla66
Tracking Status
geckoview64 --- wontfix
geckoview65 --- wontfix
firefox64 --- wontfix
firefox65 --- wontfix
firefox66 --- fixed

People

(Reporter: jmaher, Assigned: bc)

References

(Blocks 3 open bugs)

Details

(Whiteboard: [geckoview:fenix:p1])

Attachments

(10 files, 4 obsolete files)

1.08 KB, text/plain
Details
298 bytes, text/plain
Details
297 bytes, text/plain
Details
101.25 KB, text/plain
Details
51.47 KB, text/plain
Details
1.34 MB, application/zip
Details
139.87 KB, image/png
Details
1.11 KB, application/json
Details
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
one metric we want to track for our big mobile push this year is battery consumption.  While this is hard to test and maybe near impossible to get accurate without custom hardware, I think we can do a good job of measuring something and tracking against it.

This could be simplified if we could:
* measure battery
* do a fixed transaction (say run speedometer)
* measure battery

In fact, if this is what we did, then the integration/tooling would not be so painful.

The question is, what is the transaction for "measure battery" ?

I assume we can measure current battery levels, maybe as simple as:
adb shell dumpsys battery

if that is done and harvested from the log files, this shouldn't be too hard.

Our real devices at Bitbar are connected to usb the whole time, maybe we could measure the battery before/after as well as the power draw on the usb port if we want to get fancier?  Just measuring the battery level before/after a hard test would give us something to compare against over time.
(In reply to Joel Maher ( :jmaher ) (UTC-4) from comment #0)
> Just measuring the battery level
> before/after a hard test would give us something to compare against over
> time.

This sounds like a great start to me. Could we hook this up soon to see how consistent/accurate it is across runs with the same build?
Flags: needinfo?(vchin)
Flags: needinfo?(jmaher)
we currently don't have speedometer running; we could hook this up to an existing test suite like mochitest-media; my concern there is that is more dynamic, so we could get battery measurements and see a change only to find out a test was changed, or we experienced a timeout that was acceptable inside the test.

once we have speedometer up, I am happy to put some engineering efforts into looking at battery measurements.
Flags: needinfo?(jmaher)
We should baseline the battery drop for the duration of what the fixed transaction to account for battery drain when nothing is happening.

For the fixed transaction I'd prefer something more in line with what people would do with their phones, e.g. order something online, watch a video, etc. would be ideal. Speedometer would be a good data point but preferably not the only transaction we'll measure.

I'd also like to have Focus open a page and then put in the background and measure the battery drain as well.
Flags: needinfo?(vchin)
We can measure the current draw on android only when the device is disconnected from usb.

We can get the information from

https://developer.android.com/studio/command-line/dumpsys

dumpsys batterystats [<app>]

If we use dumpsys batterystats --checkin we get machine readable form of the output and filter for pws (power use summary) or pwi (power use item). We'll need to get the properly formatted uid for the application we care about but can also get the draw for the other users/apps on the device from the same output. If we wanted, there are other data available in the output.

For this to work at bitbar I think we can

bitbar framework: connect adb to the device over usb
bitbar framework: enable adb over tcpip
bitbar framework: connect adb to the device over tcpip
bitbar framework: use uhubctl to turn off the port
bitbar framework: start the test container
mozilla test    : run the test using the adb connection over ip instead of usb.
bitbar framework: use uhubctl to turn on the port

The container can not sit idle waiting for a test since the device will be discharging while the container runs.
Assignee: nobody → bob
Status: NEW → ASSIGNED
See Also: → 1190280
Depends on: 1499102
Depends on: 1499352
The battery statistics in Android come from a collection of sub-metrics like CPU usage and network usage. The overall statistic is not available while the device is plugged in, but I think we may still be able to collect the sub-metrics even if the device is plugged in. That would be an interesting project to potentially work on to get some more detailed data.
Attached file run-battery-test.sh
Demo bash script which leverages raptor to run a battery test. 

I chose --page-cycles 10 for this example as it is a good compromise over shorter or longer values. I did a run with 50 at one point which took quite some time and led to the device screen timing out which stopped geckview from running. It also ran the battery down to less than 50% but I don't see that it gave much more data that this shorter run which took a little less than 30 minutes.

I did find that I needed adb shell settings put system screen_off_timeout 7200000 which sets the screen timeout to 2 hours which will be needed especially for the page cycle 50 or similar case. It appears this is persistent over reboots and power off/on cycles. This isn't necessary for devices/emulators with direct power connections however is only required in the case where the device is running off of battery.

I also collected dumpsys battery before and after as well as two forms of dumpsys batterystats... one with -c (a machine readable csv format) and the other human readable. I'll attach for your benefit.

The batterystats -c version is preferable for automation since it disambiguates the user id of the app. 

$ adb shell dumpsys package org.mozilla.geckoview_example | grep userId
    userId=10200

which also matches the user id used in the -c version:
9,0,i,uid,10200,org.mozilla.geckoview_example

In the human readable form, the user id is reported as u0a200 which I interpret as some sort of mixture between a hex 0x10 and trailing 200. Of course since this is Android, the user id from a running process is slightly different

$ adb shell ps | grep geckoview
u0_a200      11401   743 2067500 200644 0                   0 S org.mozilla.geckoview_example
u0_a200      11423   743 1996300 149860 0                   0 S org.mozilla.geckoview_example:tab

The human readable format of batterystats has:
  Estimated power use (mAh):
    Uid u0a200: 122 ( cpu=119 wifi=2.69 ) Including smearing: 280 ( screen=135 proportional=22.7 )

The machine readble version of batterystats shows this same information as
9,10200,l,pwi,uid,122,0,135,22.7

A key for the values in the machine readable version is available at <https://developer.android.com/studio/command-line/dumpsys#batterystats-section-ids>.

The reported power usage from Android's settings for geckview show the sum of cpu + wifi + smeared screen but do not include the smeared proportional. "Smear" appears to mean a heuristic assignment of power usage to the app for things which it shares with other apps on the device. I don't know what the proportional value means. I think the total and the subvalues would be good values to report. I'm not sure of the value of the battery charge percentage or the temperature which is pretty much pegged to the max if you run the test sufficiently long.

I believe I can hack the perfherder json to include the power data as a suite along side of the benchmark results so we can have both. This will be distinguished from the normal raptor test through the use of a different test name/symbol for the power measurement version.

The inclusion of the bugreport allowed me to run battery historian <https://github.com/google/battery-historian> locally to produce a nice interactive web app that showed a great deal of information much of which is beyond my understanding but which may be useful to developers. Docker isn't supported on Fedora 29 but I was able to build battery historian locally and it works well. It does not appear to actually depend on the batterystats file but only the bugreport file. I'll attach a screen shot but if you run batterystats either via the docker image or a locally built instance you will be able to run it using the attached bugreport zip file.

I changed the order of when the device is reconnected from the example instructions on the battery historian page so that the batterystats and bugreport are collected before the device is reconnected. This matches better how things will work on bitbar and also prevents the usb connection from partially recharging the device before the bugreport is completed. I think the instructions on the battery historian page reflect the fact that they do not use adb over tcpip to connect to the device and thus require the usb connection to collect the report.

This is just a demo. At bitbar everything in the script up to the prompt to disconnect the device from usb will be handled by the test framework.

I am considering including the following as artifacts to be uploaded.

battery-before.txt
battery-after.txt
batterystats-c.txt (machine readable version)
batterystats.txt (human readble version. We don't need this unless you think it is useful)
bugreport-....zip

Thoughts?
Flags: needinfo?(snorp)
Flags: needinfo?(rwood)
Flags: needinfo?(jmaher)
Flags: needinfo?(cpeterson)
Attached file battery-before.txt
Attached file battery-after.txt
Attached file batterystats.txt
Attached file batterystats-c.txt
I think this sets us up for a great spot for starting to build a V1 of a battery test.

One thing that seems important, always upload the bugreport.zip so that an engineer can use battery historian (we could build it into a 1 click loaner even !) and investigate battery usage.

As for hacking perfherder_data, I agree- we could have:
data:
 - metadata (build, type, rev, branch, etc.)
 - testname: battery_idle, value 157, modifiers (type, high/low)
   - subtests: cpu: 85, wifi: 4, screen: 6, proportional 5
 - testname: geckoview_idle, value 271, modifiers (...)
   - subtests: cpu: 114, wifi: 9, screen: 99, proportional 49
 - testname: geckoview_speedometer, value 314, modifiers (...)
   - subtests: cpu: 157, wifi: 9, screen: 99, proportional 49

This would give us a picture of 3 data points so we could track them over time.  I assume battery_idle will remain stable over time and have very little noise.  But geckoview_idle and geckoview_speedometer I would expect to change.

On the subject of runtime- 30 minutes seems adequate, ideally less would be good- could we do a 15 minute test and see something measurable?  If we were doing just geckoview_idle and geckoview_speedometer, then 30 minutes each would be sufficient and maybe once/day we could track battery_idle instead of per-commit on m-c.  Even with just running two scenarios, I would still advocate for a shorter time (15 minutes).

Is it important to let the device recharge to 100% before each measurement?  I assume that will be overhead of 5-10 minutes per test.

we are using speedometer as it is a fixed work set, I think ideally a pageload would be more interesting, but we do not have automation for pageload running yet, so using speedometer gets us running a test and getting familiar with the test and results while polishing tools and edge cases.
Flags: needinfo?(jmaher)
I'm not sure how useful a measurement for a perfectly idle device would be but we can check it out at least locally and see.

I did page cycles 10 just as a demo and it turned out to be about 1/2 hour. We could easily reduce it so long as the power usage was sufficient to measure.

I did speedometer as a first example, but considered that unity-webgl might be another good test that would perhaps give a sufficiently orthogonal measurement. If we supported the mitmproxy type tests on android, they would be perfect for additional tests.

I'm not sure how important the battery percentage is so long as it doesn't hit the "low battery" switch. I'll check with bitbar as to the current limit for putting a device up for testing and the procedures for customizing it for the battery framework.

What were you thinking of with regard to the modifiers?
I agree the idle case might be overkill- but good to measure locally and make sure we understand the relationship.

My concern with allowing for a recharge, I understand that battery my deplete faster depending on the charging state (likewise the age) of the battery.  I guess if we run the test is the same way each time, we could at least be comparing the same workflow across revisions- so probably the recharging isn't as important; it might be a good local experiment.

for the modifiers, that is just perfherder stuff like lower_is_better, time/score, pgo/opt/nightly/e10s, etc.
This is a great start! I was able to get battery historian going and it's very cool.

One thing we're concerned about is battery usage when Gecko is running in the background. For that, I think the test would load a page, then switch to the home screen and measure for 10 minutes or so. We would then try to eliminate as much cpu/network usage as possible during this time.
Flags: needinfo?(snorp)
Wow that's really cool (so much data!). Maybe in the future there will be a 'perf-html.io' kind of site but for battery profiles :) Is there a way I can try this out with my GP2? I'm not sure where to get the host IP from (I notice there are two different host IPs in the run-battery-test.sh i.e. x.x.x.9, and x.x.x.7.
Flags: needinfo?(rwood)
.9 is the device. .7 is the host. Change to match your device and system. To do the battery historian stuff you'll need to either run the docker image or build it as described on the battery historian page.
(In reply to Bob Clary [:bc:] from comment #18)

Thanks, works great on my GP2 via my MacBook, very cool!!

cat battery-before.txt

Current Battery Service state:
  AC powered: false
  USB powered: false
  Wireless powered: false
  Max charging current: 0
  Max charging voltage: 0
  Charge counter: 2730636
  status: 3
  health: 2
  present: true
  level: 100
  scale: 100
  voltage: 4387
  temperature: 310
  technology: Li-ion

cat battery-after.txt

Current Battery Service state:
  AC powered: false
  USB powered: false
  Wireless powered: false
  Max charging current: 0
  Max charging voltage: 0
  Charge counter: 2538609
  status: 3
  health: 2
  present: true
  level: 95
  scale: 100
  voltage: 4176
  temperature: 380
  technology: Li-ion
Attached patch bug-1473997-battery.patch (wip) (obsolete) — Splinter Review
Work in process patch. I ended up using the human readable version of batterystats since it detailed the cpi and wifi energy usage separately while the machine readable version didn't.

rwood: I was looking around how you submit the test results and was wondering if you had any suggestions on a good way to incorporate the values from the power measurements? Any other feedback welcome as well.
Attachment #9027165 - Flags: feedback?(rwood)
Comment on attachment 9027165 [details] [diff] [review]
bug-1473997-battery.patch (wip)

Review of attachment 9027165 [details] [diff] [review]:
-----------------------------------------------------------------

Tried this out locally on OSX with my Google Pixel 2 and it worked great.

./mach raptor-test --test raptor-speedometer --app geckoview --power-test --binary org.mozilla.geckoview_example --host 192.168.0.14 --page-cycles 1

And I notice in the terminal after the speedometer test finished:

11:14:30     INFO -  raptor-main uid: u0a139, cpu: 9.53, wifi: 0.267, screen: 0, proportional: 1.33

Suggestion: I'd move the _init_geckoview_power_test and _finish_geckoview_power_test out of the main raptor.py and into their own file i.e. /raptor/raptor/power.py.

Ok, so to get results dumped into perfherder, they need to be in the PERFHERDER_DATA json output. To get that, you'll need to get the results to the Raptor control server - and then the control server takes it from there. For regular Raptor tests, the webext runner.js collects the results and posts to the control server here [1].

So I'd say create a send_results methond in your power.py or something, that directly talks to the raptor control server or something like that (i.e. no need to do an http post since raptor.py has access to the control server class).

I'll add another comment in a minute regarding formatting the PERDHERDER_DATA json, as we talked about on IRC.

[1] https://searchfox.org/mozilla-central/rev/d35199ef15ffa57d190886e20d5df6471faf0870/testing/raptor/webext/raptor/runner.js#442

::: testing/raptor/mach_commands.py
@@ +194,5 @@
> +                device = ADBAndroid(verbose=True)
> +                adbhost = ADBHost(verbose=True)
> +                device_serial = "%s:5555" % device.get_ip_address()
> +                device.command_output(["tcpip", "5555"])
> +                raw_input("Please disconnect your device from USB then press any key...")

just a nit, I had to press <ENTER> (space key wouldn't work)
Attachment #9027165 - Flags: feedback?(rwood) → feedback+
Running Raptor with the --power-test flag will have Raptor run the 'speedometer' benchmark, as well as collect the power usage measurements (very cool :bc!). My understanding is that the power data reported is:

u0a cpu (i.e. 119)
u0a wifi (i.e. 2.69)
smearing screen (i.e. 135)
smearing proportional (i.e. 22.7)

All of those added up to get the total mAh value (i.e. 280).

The total mAh value will be reported as the overall test result 'value' which will be alerted on in perfherder. Each of the four values will be reported as subtests. So the PERDHERDER_DATA json would look something like this:

PERFHERDER_DATA: {"framework": {"name": "raptor"}, "suites": [{"extraOptions": [], "name": "raptor-battery-speedometer", "lowerIsBetter": true, "alertThreshold": 2.0, "value": 280, "subtests": [{"name": "u0a-cpu", "lowerIsBetter": true, "alertThreshold": 2.0, "replicates": [119], "value": 119, "unit": "mAh"}, {"name": "u0a-wifi", "lowerIsBetter": true, "alertThreshold": 2.0, "replicates": [2.69], "value": 2.69, "unit": "mAh"}, {"name": "smear-screen", "lowerIsBetter": true, "alertThreshold": 2.0, "replicates": [135], "value": 135, "unit": "mAh"}, {"name": "smear-proportional", "lowerIsBetter": true, "alertThreshold": 2.0, "replicates": [22.7], "value": 22.7, "unit": "mAh"}], "type": "benchmark", "unit": "mAh"}]}

Raptor dumps out a single PERDHERDER_DATA json structure after the test is finished. We want to report the battery data that was produced during / after running the Raptor test i.e. speedometer. Question is what do we do with the regular Raptor speedometer test results?

It is my understanding that we can only have the one top-level result in PERFHERDER_DATA json which will be used for the automated regression detection/reporting. So we can't combine the Raptor power measurements with the Raptor speedometer test measurements in a single results json because we want to have alerting on the power data.

Question for :wlach/:igoldan: If we have Raptor dump out two PERFHERDER_DATA json structures (one for the battery power measurements, and a second separate one for the actual regular Raptor speedometer test results) will they both be ingested in perfherder, even when both are in the same production job?

If Perfherder can handle two results structures, then I'd say let's dump out both. If Perfherder supports only one PERFHERDER_DATA json dumped per job, then I'd say the fastest solution (as :bc and I discussed on IRC) is when Raptor is run with the --power-test flag, we should dump out a PERFHERDER_DATA json with the battery power measurements instead and just ignore the actual speedometer results. Thanks.
Flags: needinfo?(wlachance)
Flags: needinfo?(igoldan)
Perfherder has no problem parsing multiple PERFHERDER_DATA structures, we already do this for e.g. builds.
Flags: needinfo?(wlachance)
Flags: needinfo?(igoldan)
(In reply to William Lachance (:wlach) (use needinfo!) from comment #23)
> Perfherder has no problem parsing multiple PERFHERDER_DATA structures, we
> already do this for e.g. builds.

Thanks :wlach, that's good news! IMO, ideally we should do that then - dump two separate PERFHERDER_DATA structures then, one for power and one for the actual Raptor test that was run. That will make it much easier not having to build one json, and especially for in the future if we want to run --power-test with different Raptor tests other than speedometer.
Attached patch bug-1473997-battery.patch (wip2) (obsolete) — Splinter Review
Attachment #9027165 - Attachment is obsolete: true
(In reply to Robert Wood [:rwood] from comment #26)
> Created attachment 9028385 [details]
> Bug 1473997 - find a way to measure the battery usage of geckoview (:bc's
> initial patch); r?jmaher,davehunt"

Took your latest patch :bc (attachment 9027687 [details] [diff] [review]) and converted to phabricator, so that I can add another commit on top of that to format/dump out the battery results for Perfherder.
Attached file raptor-power.json
I added a patch on top of :bc's original, which adds support to Raptor for 'supporting data' i.e. other measurements that are taken outside of the actual Raptor test itself.

From a local run, this is the perfherder data json that is dumped for the power data (in addition to / separate from the actual speedometer test result):

18:10:13     INFO -  raptor-output PERFHERDER_DATA: {"framework": {"name": "raptor"}, "suites": [{"extraOptions": "power", "name": "raptor-speedometer-geckoview-power", "lowerIsBetter": true, "alertThreshold": 2.0, "value": 10.777, "subtests": [{"lowerIsBetter": true, "unit": "mAh", "name": "raptor-speedometer-geckoview-power-proportional", "value": 1.19, "alertThreshold": 2.0}, {"lowerIsBetter": true, "unit": "mAh", "name": "raptor-speedometer-geckoview-power-screen", "value": 0.0, "alertThreshold": 2.0}, {"lowerIsBetter": true, "unit": "mAh", "name": "raptor-speedometer-geckoview-power-wifi", "value": 0.317, "alertThreshold": 2.0}, {"lowerIsBetter": true, "unit": "mAh", "name": "raptor-speedometer-geckoview-power-cpu", "value": 9.27, "alertThreshold": 2.0}], "type": "power", "unit": "mAh"}]}

The data is also written to a 'raptor-power.json' artifact (attached).

For the test name I just added 'power' to the Raptor test that was run; and each subtest is the test name + the type of measurement (consistent with other tests like pageloads).
Blocks: 1473078
Blocks: 1512038
From https://bugzilla.mozilla.org/show_bug.cgi?id=1512038#c16

I need to update the Raptor patch: the 'raptor-power.json' artifact is being created when run locally, but it is not being uploaded as a treeherder artifact when running in production. NI myself so it's in my queue.
Flags: needinfo?(rwood)
If it is in the blobber_upload_dir the framework should see it and copy it with the other artifacts. We should probably do the same for both raptor.json and raptor-power.json.
(In reply to Bob Clary [:bc:] from comment #33)
> If it is in the blobber_upload_dir the framework should see it and copy it
> with the other artifacts. We should probably do the same for both
> raptor.json and raptor-power.json.

Thanks :bc, raptor.json is already copied over and uploaded, it's just renamed to perfherder-data.json
Attachment #9028385 - Attachment is obsolete: true
Flags: needinfo?(rwood)
Attachment #9028809 - Attachment is obsolete: true
Attachment #9027687 - Attachment is obsolete: true
Blocks: 1515406
I've updated the patches to:

- Only output the power data PERFHERDER_DATA json, not the actual Raptor test (i.e. speedometer) result. The actual
Raptor test result will still be available as the 'raptor.json' artifact; and

- Upload the 'raptor-power.json' as a treeherder artifact

Had to rebase etc, the old patches are marked obsolete, the new ones are Comment 35 and Comment 36.
Product: Firefox for Android → GeckoView
This looks good. I'm going to finalize my patches in Bug 1512038 and ask for review.
rwood had added the normal raptor speedometer jobs to the existing power push. I've submitted a try push with Fuzzy query=linux64 | android-hw raptor-speedometer

<https://treeherder.mozilla.org/#/jobs?repo=try&tier=1%2C2%2C3&group_state=expanded&revision=0756bfba963da2ae65a54b46efde6ad7cb700a6c>
Pushed by rwood@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/1169b5865935
find a way to measure the battery usage of geckoview (:bc's initial patch); r=jmaher
https://hg.mozilla.org/integration/autoland/rev/b1773aef5b6b
Add support to Raptor to receive/format/output power data; r=jmaher
https://hg.mozilla.org/mozilla-central/rev/1169b5865935
https://hg.mozilla.org/mozilla-central/rev/b1773aef5b6b
Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 66
Note: https://treeherder.mozilla.org/perf.html#/graphs?series=try,1832499,1,10&series=try,1832494,1,10&series=try,1832489,1,10

This is showing higher energy usage for the failures to report results which will result in a bi-modal distribution of values. It may also be a higher rate of failures overall but I'm not sure. I think I need to focus on the raptor failures a bit before we can move this to m-c.
Target Milestone: Firefox 66 → mozilla66
Whiteboard: [geckoview:fenix:p1]
Attachment #9028385 - Attachment is obsolete: false
Attachment #9028385 - Attachment is obsolete: true
You need to log in before you can comment on or make changes to this bug.