Closed Bug 1533848 Opened 5 years ago Closed 5 years ago

extend android battery metrics to gather memory information from dumpsys

Categories

(Testing :: Performance, enhancement)

Version 3
enhancement
Not set
normal

Tracking

(firefox68 fixed)

RESOLVED FIXED
mozilla68
Tracking Status
firefox68 --- fixed

People

(Reporter: jmaher, Assigned: bc)

References

(Blocks 1 open bug)

Details

Attachments

(5 files)

the current battery metrics measure battery, we could use similar techniques to gather memory from the overall system.

In discussing what Amazon does, we have:

  • Memory - they use dumpsys to capture foreground and background memory
    of the Firefox app with YouTube content loaded

so I think whatever payload we test for battery, we can harvest the basis system memory from dumpsys.

You'll just need to adapt https://searchfox.org/mozilla-central/source/testing/raptor/raptor/power.py to parse the data you want from the batterystats output.

Actually, what you will need is probably adb shell dumpsys meminfo which you'll call similarly to how dumpsys batterystats was called... raptor.device.shell_output("dumpsys meminfo...

$ adb shell dumpsys meminfo -h
meminfo dump options: [-a] [-d] [-c] [-s] [--oom] [process]
  -a: include all available information for each process.
  -d: include dalvik details.
  -c: dump in a compact machine-parseable representation.
  -s: dump only summary of application memory usage.
  -S: dump also SwapPss.
  --oom: only show processes organized by oom adj.
  --local: only collect details locally, don't call process.
  --package: interpret process arg as package, dumping all
             processes that have loaded that package.
  --checkin: dump data for a checkin
If [process] is specified it can be the name or 
pid of a specific process to dump.

There are tons of other things you can dumpsys... You can get a list via adb shell dumpsys -l.

Remember the output will probably be different for Android 7/Moto G5 and Android 8/Pixel 2 ...

This is good information :bc, I think getting a single point as an example and hooking it into the existing battery test would be good. That would ease the learning/hacking curve to extend this for all use cases.

We probably want to be able to add it to any test. Just adding it to power means we are limited to the devices connected to the cambrionix hub. I'll stack some bits together and see what I can get accomplished.

Assignee: nobody → bob

There's a lot of prior art here from the B2G days. get_about_memory.py used a fifo queue to request memory dumps, if we want to get more detailed reports that's probably the way to go. We still have the code to support this in tree.

We also had a util to run on device called b2g-info that scraped the memory info we cared about.

I'm not sure I need to go that route but if you think it is necessary please let me know.

The approach I am planning to take is to use dumpsys meminfo --checkin to get a csv file of the data. I need to talk with rwood about the appropriate hooks into raptor today.

should give the details needed for the format of the output file.

Flags: needinfo?(bob)

https://treeherder.mozilla.org/#/jobs?repo=try&revision=1b08146fc82dd3b214541b2af21c1ab3e21097ad

with the number of pending tasks this might complete before I die of old age.

Comment on attachment 9053998 [details] [diff] [review]
bug-1533848-raptor-memory.patch wip

The try job finished. I'd appreciate your feedback.
Attachment #9053998 - Flags: feedback?(rwood)
Attachment #9053998 - Flags: feedback?(jmaher)
Comment on attachment 9053998 [details] [diff] [review]
bug-1533848-raptor-memory.patch wip

Review of attachment 9053998 [details] [diff] [review]:
-----------------------------------------------------------------

overall this looks pretty sound, I could come up with a series of nits, but we are really prototyping this.  One thought- this looks to be a new test vs extending the existing battery test- I am not sure what is the best route forward, but something worth discussing.
Attachment #9053998 - Flags: feedback?(jmaher) → feedback+

Just extending the battery metrics/power test seemed to be a very limited approach which would restrict this to the devices connected to the cambrionix hub and require distinct tests.

This approach allows you to run a limited memory check on any test (including power) in addition to the original test. I figured it would be more useful generally.

In fact, I'm not sure how I could have done it without putting the supporting data in the control server.

My initial approach was more modeled on how the power test worked, but in that approach I had to de-focus/focus the browser multiple times since I wasn't able to determine when the browser was available or perhaps already shutdown by the control server. Fortunately or unfortunately that approach wouldn't work in general. It appeared to work for speedometer but when I tried it for the tp6m tests, the web extension would send a shutdown request to the control server when the browser lost focus which terminated the test immediately.

With this approach any test can be supplemented with the memory data measured after the test has run. I suppose we could hook the memory measurement into other places in the control server if we wished to obtain additional measurements during the test execution.

I see how this can be done in addition to a power test, I am sure there are other metrics, but this code shows how to extend raptor and get metrics from a test. Curious to see what :rwood has to say.

(In reply to Bob Clary [:bc:] from comment #10)

My initial approach was more modeled on how the power test worked, but in that approach I had to de-focus/focus the browser multiple times since I wasn't able to determine when the browser was available or perhaps already shutdown by the control server. Fortunately or unfortunately that approach wouldn't work in general. It appeared to work for speedometer but when I tried it for the tp6m tests, the web extension would send a shutdown request to the control server when the browser lost focus which terminated the test immediately.

Interesting about your initial approach - from some android reading I have a theory, that when focus of the geckoview example app is lost, it gets to onPause [0], which says that if the new process (covering app) needs more memory it can kill existing processes from the covered app. Just guessing maybe that is happening and the Raptor content 'measure.js' process is being killed - but the Raptor runner.js (background) is still running and times out waiting for results since 'measure.js' never reports them. Just a theory.

[0] https://developer.android.com/reference/android/app/Activity.html#onPause()

Just to better describe my first approach, I put the de-focus and focus and memory measurement in raptor's wait_for_test_finish's while loop and it continuously hid and showed the app while the test was running. It was even more disruptive than I thought it would be and increased the run time for the test significantly. I'm glad tp6m killed the app when I tried it so I could decide on the control server approach which I like much better.

When de-focusing the app, I just displayed the home screen so I'm not sure it would count as needing more memory. If I recall correctly, the messages I saw at least in speedometer were that app wasn't started after the de-focus but were just brought to the front. tp6m pretty much killed it after one cycle and I don't remember off hand what it said.

Where should we go with this now? Proceed to phabricator and attempting to land or ...

Comment on attachment 9053998 [details] [diff] [review]
bug-1533848-raptor-memory.patch wip

Review of attachment 9053998 [details] [diff] [review]:
-----------------------------------------------------------------

Hey :bc,

This is really cool (getting the memory info). However I'm not too keen with this approach - of using the control server itself to generate supporting data, for a couple of reasons -

First, it is not really the purpose of the control server, to generate data itself - and I'm a bit worried that if we start adding code here to generate data, that will open the door for lots of additional data gathering methods put here that people want to run on browser shutdown, and the control server not really being a control server anymore, in a way.

Also putting it here on browser shutdown doesn't give the option to NOT have this happen except for not having --memory-test flag. This might cause issues with cold page-load where we shutdown and restart the app between each page load (current default is 10 times per test page). This would slow down the cold page load suites, and also open the door for possibility of more intermittent issues? Whereas if it's not here, but in run_test_cold, then you could decide to run it just before all browser cycles start and then after all cycles are finished (or every x browser cycles) etc.

I do understand why you used this method though if putting the app in the background caused an automatic Raptor timeout and terminating the browser/test immediately. I think though if possible we should try to find out why that is happening and how to somehow prevent that? I'd really prefer to use the same data flow that we use for the --power-test data generation, as I was kind of hoping that would become the standard method for any supporting data generation in Raptor.

When the browser loses focus, is it a standard raptor page timeout message that the control server is receiving that tells it to shutdown? Maybe before having the browser lose focus you could tell the control server to temporarily ignore any shutdown messages, then turn that flag off after the memory data was received?
Attachment #9053998 - Flags: feedback?(rwood) → feedback+

(In reply to Robert Wood [:rwood] from comment #14)

When the browser loses focus, is it a standard raptor page timeout message
that the control server is receiving that tells it to shutdown? Maybe before
having the browser lose focus you could tell the control server to
temporarily ignore any shutdown messages, then turn that flag off after the
memory data was received?

Actually that probably wouldn't work as guessing the state of the webext/test is already destroyed at that point. There must be another way to solve that problem...

One of my main issues was not knowing in raptor.py if the browser was running or not as the web extension communicates directly with the control server and can shut itself down without anything in raptor.py being aware of it. Perhaps if we could register call backs from the control server we could leverage that to do measurements in raptor.py before shutdown or in response to the webext sending something to the control server.

Hmmm. I've been looking at the code for a couple of hours trying to come up with ideas. I was trying to think of a way that the control server (just before shutdown) could call raptor.py and verify it's ready to shutdown. At that point raptor.py could get the memory info via raptor/memory.py, then reply to the control server saying it's now safe to continue with the shutdown. Kind of how you suggested having callbacks.

I don't really see a good/clean way to do that though, so I'm not sure that's any better TBH. I'm starting to think since this is a special type of supplementary data (i.e. we need to get the memory info while the browser/app is still running; NOT after it's shutdown) that your implementation is really the only way we can do this despite my reservations. The data is reported the same way as other supplementary data (submit_supporting_data) so it does follow that flow anyway, just the initial gathering is triggered by the control server.

So we should probably go with your implementation - maybe though move all the code from report_android_memory_usage(self): (except the self.submit_supporting_data(meminfo_data)) into raptor/memory.py? Btw what if it was called something like 'generate_android_memory_profile' instead of 'report_android_memory_usage' (just an idea).

Are you planning on running this for cold page-load tests too (i.e. every browser cycle)?

I can move that stuff into memory.py no problem. I can also rename it however you like.

I have no idea about how it might be used or even if jmaher is ok with this approach. I'll eagerly await his verdict.

the goal here was to show how we could extend tests and collect memory- I imagine there is a need for cpu as well. As for the integration into raptor and specifics, I am fine with whatever works and makes sense long term. There isn't clear direction here and I know others will actually be doing memory measurements and getting the specific values and reporting what is ideal.

Showing an example of how we can run a test and collect a metric is what this bug is about. I think there is a concern given the use of the control server, although it might be hard to solve. It might make sense to have this landed in a "work in progress" state that isn't too far from ideal and leave this up to folks who will be doing memory measurements to do the final tweaks.

:rwood, you bring up some good questions about cold vs warm, what would we do differently if we were measuring on cold? I assume we would collect a series of data points and report the geomean of the series. :bc is doing the core integration with Android so we can move forward, if there is a clear direction you would prefer for raptor it is logical that bc does this, otherwise lets make this a good faith effort that shows how it can be done.

These are the changes I made in response to rwood. If we decide to go this route, we can incorporate them.

I had a different idea that might allow us to keep these types of tests which want to hook into control server to perform measurements at various points in the test suite out of control_server.py.

We could adapt the control server so that we could make get/post requests to it in order to control how it behaves.

We could ask it

  • is the browser is still running or in the process of shutting down?
  • ask it to wait for us to perform a measurement after it receives results or a shutdown request.
  • ask it to proceed with normal operations.
  • ask it to remove the wait.

I believe the nature of the http server control_server is based upon would allow us to handle multiple requests simultaneously and not worry about it being blocked from responding to the raptor.py process or the webext in the browser.

rwood: What do you think?

There would be the possibility that we would have to handle or create a time out if the requestor didn't remove the wait in a timely fashion but we could clearly report that as a specific bug in the requestor in the error message.

Flags: needinfo?(rwood)

(In reply to Bob Clary [:bc:] from comment #20)

Thanks :bc I'll have a look at your latest update.

rwood: What do you think?

Interesting idea - TBH though I'm not sure if that extra work (and potential new issues) would be worth the effort, just to move this out of the control server. At the moment we have alot of Raptor intermittent failures (which we are trying to fix) so I'm a bit weary of making changes like this. What else would this gain us besides moving getting the memory measurement code out of the control server? I can live with it being there, at least for now, especially when most of the code will be in raptor/memory.py anyway. Perhaps down the road though we could re-visit.

Flags: needinfo?(rwood)

It would give us some flexibility with regard to performing measurements. jmaher may have more details but it seems there might be additional scenarios such taking snapshots of memory at intervals, and recording memory at idle or additional things such as temperature etc.

I can understand the reluctance to add more since we already have enough intermittent issues.

Flags: needinfo?(bob) → needinfo?(jmaher)
Flags: needinfo?(jmaher)

I attempted the approach to separate the measurements from control server by sending messages to it in order to control its operation.

To do that I needed to make it multihreaded. See

https://hg.mozilla.org/try/rev/262091093f4a8ebf3493b88a2eae340dd370c3d8

https://treeherder.mozilla.org/#/jobs?repo=try&tier=1%2C2%2C3&revision=6c65562154dec84c15a928a972e219385cd1261f

I then adapted the previous memory measurement work to separate the memory management. I included a suppression for log_request in the http server if the code was 200 to reduce log spam. I built the changes to the memory management on top of the previous patches so we can see the changes and preserve the old work. I submitted a try job for all of the raptor android tests with --memory-test added for illustration and comparison. See

https://hg.mozilla.org/try/rev/753cc9d6d3683dde2fea16935ed3ab9cf04b6412

https://treeherder.mozilla.org/#/jobs?repo=try&tier=1%2C2%2C3&revision=83209fcc6dd137b69f4ee2e4fe2e17a3ac7aff6d

These were from the same pull from inbound yesterday.

rwood: I would appreciate your feedback on if you think this is a good way to go. I'd like to talk to you more about the cold tests and how to include the memory results with normal results so we could have both if desired.

Flags: needinfo?(rwood)

Very cool stuff here - glad to see it!

Have we thought about the intersection of this functionality with awsy-like measurements (using nsIMemoryReporterManager, nsIMemoryInfoDumper, etc from within Firefox)? I suppose dumpsys is easy to apply uniformly to any app, while nsIMemory* can give us fine-grained browser-centric data. I wonder if these could be pulled together, so that, perhaps, when dumpsys identifies a regression or anomaly, nsIMemory* memory reports could provide detailed data for Firefox developers.

See Also: → 1501559

(In reply to Geoff Brown [:gbrown] from comment #24)

Very cool stuff here - glad to see it!

Have we thought about the intersection of this functionality with awsy-like measurements (using nsIMemoryReporterManager, nsIMemoryInfoDumper, etc from within Firefox)? I suppose dumpsys is easy to apply uniformly to any app, while nsIMemory* can give us fine-grained browser-centric data. I wonder if these could be pulled together, so that, perhaps, when dumpsys identifies a regression or anomaly, nsIMemory* memory reports could provide detailed data for Firefox developers.

get_about_memory.py that I mentioned in comment 5 allows you to retrieve a memory report without the need to be in process. I agree it would be nice to get finer grained details and might be worthy of a follow up bug.

(In reply to Bob Clary [:bc:] from comment #23)

This looks great. I really like your idea of using the 'wait-*' control server commands (and thanks for the awesome descriptions in your code comments).

Looks like it will handle things fine if the task itself running during a 'wait-set' (i.e. gathering the power profile) crashes or exceeds a time limit, then the 'wait-set' is deleted and an error logged (so I'm assuming then at this point the browser would be shutdown fine).

Maybe the 'wait-set' cmd should also include the maximum time limit to wait (as I see right now in your prototype it's just hard-coded 60 seconds)?

Btw, when making these changes if you could update the control server unit test [0] for this new functionality that would be excellent. Thanks!

About including memory results with normal results - as you know, right now if supplementary data exists then Raptor will only output a PERFHERDER_DATA json blob with that data, and not the actual Raptor test results. There is already a bug on file (Bug 1515406) to add a test INI setting to indicate that you would like to have both the supplementary data and Raptor test results dumped to perfherder (wouldn't be a big patch).

[0] https://searchfox.org/mozilla-central/source/testing/raptor/test/test_control_server.py

Flags: needinfo?(rwood)

(In reply to Robert Wood [:rwood] from comment #26)

Maybe the 'wait-set' cmd should also include the maximum time limit to wait (as I see right now in your prototype it's just hard-coded 60 seconds)?

Making the timeout specific to each wait would require tracking the timeout per state. Since this is really just intended as a fail safe to prevent the controller from being stuck in a loop, can we get by with a wait-set-timeout that sets a global timeout for all waits?

(In reply to Bob Clary [:bc:] from comment #27)

Making the timeout specific to each wait would require tracking the timeout per state. Since this is really just intended as a fail safe to prevent the controller from being stuck in a loop, can we get by with a wait-set-timeout that sets a global timeout for all waits?

Ah right good point, yes a wait-set-timeout sounds great, thx.

I did try runs with rebuild 2

tip inbound:
https://treeherder.mozilla.org/#/jobs?repo=try&tier=1%2C2%2C3&revision=2bad29704669c54ea4654d8f5da503a67bdac6c1

tip inbound with patches for multithreaded control server, suppress http 200 request log output, and new dumpsys memory tests using the control server waits along with the change to allow configurable wait timeout and the pytest:
https://treeherder.mozilla.org/#/jobs?repo=try&tier=1%2C2%2C3&revision=32009ab74c11a2187590b0069cb4a43a16b84a06

Note I put the test in test_raptor.py as that made more sense to me.

The async wait_continue test fails 2/2 on Linux and 1/2 on Windows. It passes locally for me on Fedora 29 though. The Windows and my local Fedora 29 both have Python 2.7.15, pytest-3.6.2, py-1.5.4, pluggy-0.6.0 but the Linux machine has Python 2.7.12, pytest-3.6.2, py-1.5.4, pluggy-0.6.0. Perhaps other Raptor issues on Linux are related to this.

rwood has okayed just removing the async wait_continue test.

Pushed by bclary@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/219832d78e12
Make the control server multithreaded, r=rwood.
https://hg.mozilla.org/integration/autoland/rev/dc74bceea27b
Suppress HTTP 200 request log output, r=rwood.
https://hg.mozilla.org/integration/autoland/rev/9ca0cca3edb0
Implement Android memory test using control server waits, r=rwood.
Blocks: 1545466
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: