Closed Bug 1512038 Opened 1 year ago Closed 1 year ago

Tc configs for a new raptor speedometer geckoview power measurements test

Categories

(Testing :: Raptor, enhancement)

Version 3
enhancement
Not set

Tracking

(firefox66 fixed)

RESOLVED FIXED
mozilla66
Tracking Status
firefox66 --- fixed

People

(Reporter: rwood, Assigned: bc)

References

(Blocks 1 open bug)

Details

Attachments

(3 files, 4 obsolete files)

Create a new Raptor test job for retrieving power measurements while running the Speedometer benchmark.

Suggestions for what we should call this job? We can't use 'sp-p' because we already use '-p' for added gecko profiling jobs.

Maybe just append to the front i.e. 'p-sp' for power jobs? Or do we want an entire section i.e. Rap-P-e10s?
Ideas welcome for what to name / what treeherder job symbol to use for this and future Raptor power tests...
Flags: needinfo?(jmaher)
Flags: needinfo?(dave.hunt)
Flags: needinfo?(bob)
Assignee: nobody → rwood
Status: NEW → ASSIGNED
R-P(sp)

R-P == Raptor Power
Flags: needinfo?(jmaher)
Attached patch raptor-power-taskgraph.patch (obsolete) — Splinter Review
I was working on this already. It has a different symbol than jmaher suggested and also has the --host argument I mentioned on irc. It does generate the proper task payload but I was going to talk to you and someone from mozharness and/or taskcluster land about whether passing the host ip address in this fashion is appropriate. Another possibility is we support the reading of the IP_ADDRESS from the environment if it is not specified on the command line.
Flags: needinfo?(bob)
And actually, that is the wrong environment variable if you want to reuse the patch.... I believe the appropriate one would have been HOST_IP.
As we discussed, I'll take this and get the frameworks created and try runs done in parallel.
Assignee: rwood → bob
I have the Docker images created and have been running some tests:
<https://treeherder.mozilla.org/#/jobs?repo=try&tier=1%2C2%2C3&group_state=expanded&revision=920daa69936408a02da2bc6a805f45192da6a7d7>

The kludge I am using to attempt to stuff --host $HOST_IP into command line works locally but not in production mozharness. I'll look into it and may possibly try the extra-options route in the test yml if that doesn't work out. Shouldn't be too much longer.
very good news :bc, it sounds like we are really close and next week we can turn this on- thanks for your persistence here and we can all pitch in to debug/review/hack on what remains.
Attached patch raptor-power-taskgraph.patch (obsolete) — Splinter Review
I haven't tested this due to a failing internet which can not seem to be able to submit a try job. I tried to add a separate command line argument to tell the framework to use env HOST_IP but the vagaries of the way raptor handles the command line options between the mach and mozharness versions has me stumped. This approach uses a magical value for --host which then tells everyone to use the environment variable works locally but I don't know about in production in a pure mozharness environment. Maybe we'll get lucky.

If you apply the patches from bug 1473997 then this patch you should be able to test using the pixel2 and

./mach try fuzzy --full --query "android-hw-p2 raptor-speedometer-geckoview-power" --artifact

I need to manually start the containers once the try job has been submitted, so ping me if you do so.

I need to get Bitbar to connect the g5 to the Cambrionix hub before we can test it but that should be possible by Noon ET/9 AM PT.
Attachment #9030222 - Flags: review?(rwood)
(In reply to Joel Maher ( :jmaher ) (UTC-4) from comment #2)
> R-P(sp)
> 
> R-P == Raptor Power

I think I agree with Joel's suggestion to have a new group for Raptor tests that track the power usage. That said, for consistency with the existing jobs, I would lean towards 'Rap-P' as per Rob's suggestion in comment 0.

Also, do we still need the 'e10s' component? I thought this was enabled everywhere now, or am I wrong?
Flags: needinfo?(dave.hunt)
technically we still run some non-e10s tests, but they are limited and will probably go away in a few weeks or before end of Q1.

I am fine moving raptor to remove the e10s name on treeherder.
(In reply to Joel Maher ( :jmaher ) (UTC-4) from comment #10)
> technically we still run some non-e10s tests, but they are limited and will
> probably go away in a few weeks or before end of Q1.
> 
> I am fine moving raptor to remove the e10s name on treeherder.

I * think * the 'e10s' is still required deep in taskcluster for some transforms, getting try syntax, etc... I don't think we can just remove the 'e10s' - but not certain.
correct :rwood, we can remove the rap-e10s and make it rap as that is just a symbol on treeherder, but for posting to graphs, etc. we should keep the e10s
Updated Docker container to support power testing. Includes support for both usb and tcp connections and includes updated minidump_stackwalk from bug 1512706.
Attachment #9029474 - Attachment is obsolete: true
Attachment #9030222 - Attachment is obsolete: true
Attachment #9030222 - Flags: review?(rwood)
Depends on: 1514159
<https://treeherder.mozilla.org/#/jobs?repo=try&revision=b1656e01572f248b213f628be73a0052bce61308&searchStr=raptor>

g5 and p2 runs with new Docker container for all defined tests.

Note that p2 works well but the g5 does not report the full set of data that we can obtain on the p2 and therefore reports 0 for raptor-main power data with our current regexp which parses the data. Before I submit the patch for review, I would like some feedback on if testing the g5 and reporting only the cpu energy usage is sufficient or should we attempt to include other data? Is it worth while to test the g5 running Android 7?

Bitbar is going to investigate if we can easily update the g5 to Android 8 while keeping root and if this will result in comparable measurements to what is available on the p2 on Android 8. Is this something we would consider?

Android 7 (g5): Uid u0a118: 18.7 ( cpu=18.7 )
<https://taskcluster-artifacts.net/QClqWfdvSR-JnN-bAmSvAg/0/public/test_info/batterystats.txt>
  Estimated power use (mAh):
    Capacity: 2876, Computed drain: 39.3, actual drain: 0-28.8
    Uid u0a118: 18.7 ( cpu=18.7 )
    Screen: 12.9
    Cell standby: 2.23 ( radio=2.23 )

 
Android 8 (p2):
<https://taskcluster-artifacts.net/Qcahkq4eQ9aG_YS2MZ0sAw/0/public/test_info/batterystats.txt>
  Estimated power use (mAh):
    Capacity: 2700, Computed drain: 23.2, actual drain: 0
    Uid u0a125: 9.32 ( cpu=9.26 wifi=0.0628 ) Including smearing: 11.4 ( proportional=2.07 )
    Screen: 8.87 Excluded from smearing
    Uid 0: 1.95 ( cpu=1.95 wifi=0.000889 ) Excluded from smearing

rwood: If you wouldn't mind giving the revision a once over before I submit it for review that would be great.

cpeterson: Do you have any thoughts about the g5 on Android 7 or Android 8?
Flags: needinfo?(rwood)
Flags: needinfo?(cpeterson)
Attached patch raptor-power-taskgraph.patch (obsolete) — Splinter Review
Attaching wip patch for your viewing pleasure. Still needs work to deal with g5 differences.
(In reply to Bob Clary [:bc:] from comment #15)
> Created attachment 9031440 [details] [diff] [review]
> raptor-power-taskgraph.patch
> 
> Attaching wip patch for your viewing pleasure. Still needs work to deal with
> g5 differences.

Thanks :bc, the wip patch LGTM and good workaround using env var for the host ip.

A couple of things I noticed on the try run:

The raptor-power.json is not being uploaded as an artifact - the power data is dumped to perfherder via PERFHERDER_DATA but the artifact isn't showing up; I'll make an update to my patch in Bug 1473997 to fix that.

We are dumping both the regular Raptor test results (i.e. speedometer) AND the power data to perfherder, in two separate PERFHERDER_DATA json blobs. When the Raptor power job is selected in treeherder, the result shown is the Raptor test result (i.e. speedometer) but not the power measurements. The power data is submitted so it will be alerted on - but to me it seems a bit confusing to select the power job in treeherder and see speedometer results not the power measurements.

Question: should we even be submitting the data for the Raptor test (i.e. speedometer) or should we just submit power data? Then when power jobs are selected we would see power numbers. What does everyone think? Either way again any changes there will be in my patch in Bug 1473997 not the patches here.
Flags: needinfo?(rwood)
Flags: needinfo?(jmaher)
Flags: needinfo?(dave.hunt)
Flags: needinfo?(bob)
Great question :rwood.  I am happy to see :bc's patch being in such a complete state :)

I recall there was a request to run specific tests as the workload and consider reporting the test results in addition to the battery metrics.  If we determine that is useful, we should report all.  I would question then why we are running the workload "in addition to" the battery test.  

What really matters is what do we do on a regression?  Do we care about speedometer changes that are unique to battery vs running standalone?  Would we file a bug and get people to backout or fix ASAP?  ideally any changes in speedometer run standalone should be the same changes as seen while run inside the battery metric.  I would lean towards not reporting the workload metrics.

As for the interface on treeherder, it would be nice to click on a power metrics job and see the battery info in the performance data 'tab'.  If it is mixed with other data that is fine.
Flags: needinfo?(jmaher)
I'm ok with not submitting the speedometer performance data. That seems simplest especially in light of jmaher's comment about what would we do if we found a performance regression during the power test. rwood: Can you handle that in Bug 1473997 ?
Flags: needinfo?(bob)
Attached patch raptor-power-android-7.patch (obsolete) — Splinter Review
This patch applies after the raptor-power-taskgraph patch and fudges the collection of the power data a bit to handle the different format and data available for Android 7.

Two try runs are available at:
<https://treeherder.mozilla.org/#/jobs?repo=try&tier=1%2C2%2C3&author=bclary%40mozilla.com&group_state=expanded&fromchange=966555dda9043dbbba42372cd8c65bffdc8aa258>

One of the g5s has a networking problem and is orange. The other appears ok.

Pulling from the logs, I get:

g5 9 minutes
uid: u0a118, cpu: 19.1, wifi: 0.0463, screen: 13.2, proportional: 0
uid: u0a118, cpu: 18.6, wifi: 0.0519, screen: 12.9, proportional: 0

p2 aarch64 6 minutes
uid: u0a125, cpu: 11.3, wifi: 0.0467, screen: 9.48, proportional: 1.96
uid: u0a125, cpu: 11.3, wifi: 0.0566, screen: 9.49, proportional: 2.07
uid: u0a125, cpu: 11.3, wifi: 0.0602, screen: 9.62, proportional: 1.87
uid: u0a125, cpu: 11.3, wifi: 0.0465, screen: 9.48, proportional: 2.10
uid: u0a125, cpu: 11.4, wifi: 0.0501, screen: 9.52, proportional: 2.09

p2 arm7 5 minutes
uid: u0a125, cpu: 9.21, wifi: 0.0457, screen: 7.07, proportional: 1.73
uid: u0a125, cpu: 9.20, wifi: 0.0537, screen: 7.15, proportional: 1.74
uid: u0a125, cpu: 9.22, wifi: 0.0411, screen: 7.11, proportional: 1.76
uid: u0a125, cpu: 9.34, wifi: 0.0437, screen: 7.09, proportional: 1.72
uid: u0a125, cpu: 9.31, wifi: 0.0527, screen: 7.15, proportional: 1.80

This run suffers from the raptor-power.json not being copied to the upload directory and from the perf PERFHERDER_DATA hiding the power PERFHERDER_DATA which rwood is handling in the other patches.

If this handling of the Android 7 power values is acceptible, then once rwood finishes updating his patches, I'll rebase these, fold them together and submit for review.
this is great :bc, my only comment related to what I said before about posting the speedometer results vs the battery, I think that will be handled in Bug 1473997.
How often are we planning to run the power usage jobs? If we're going to run them as often as the speedometer jobs then my main concern would be the demand on the devices and if we'll hit capacity. In this case, it may make sense for us to report the power usage in addition to the test results. If we're planning on running these less frequently, then I agree that the power usage jobs should only report the power metrics and discard the other measurements.
Flags: needinfo?(dave.hunt)
we have limited devices, I would recommend mozilla-central and try only- not autoland/inbound.
Attachment #9031440 - Attachment is obsolete: true
Attachment #9032768 - Flags: review?(rwood)
Attachment #9032768 - Flags: review?(rwood) → review+
Attachment #9032769 - Flags: review?(rwood) → review+
Pushed by bclary@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/b043f7f99f97
Enable raptor-speedometer-geckoview-power for android-hw-{p2,g5} on try, r=rwood.
https://hg.mozilla.org/integration/mozilla-inbound/rev/bac5e091e27d
Handle different batterystats output for Raptor power tests on Android 7, r=rwood.
https://hg.mozilla.org/mozilla-central/rev/b043f7f99f97
https://hg.mozilla.org/mozilla-central/rev/bac5e091e27d
Status: ASSIGNED → RESOLVED
Closed: 1 year ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla66
Flags: needinfo?(cpeterson)
You need to log in before you can comment on or make changes to this bug.