Closed Bug 1665012 Opened 4 years ago Closed 4 years ago

Deploy Samsung Galaxy S7 in CI

Categories

(Infrastructure & Operations :: RelOps: General, task)

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: davehunt, Assigned: aerickson)

References

Details

(Keywords: leave-open)

Attachments

(2 files)

From a discussion with :jmaher, we believe that we have capacity to replace a couple of each existing device type (Moto G5 and Pixel 2). Perhaps we can start with a pool of four Samsung Galaxy S7s, turn on tier 2 perf tests against them, and see how it affects our capacity.

Assignee: relops → aerickson

Joel,

Can you confirm?

Take 2 g5's and 2 p2's out and replace them with Samsung Galaxy S7s? Which pool would you like the p2's to come from?

Thanks,
Andy

//

current pools
gecko-t-bitbar-gw-unit-p2 37
gecko-t-bitbar-gw-perf-g5 26
gecko-t-bitbar-gw-perf-p2 10
(2 test devices, 1 g5, 1p2)

Flags: needinfo?(jmaher)

2 p2's from gecko-t-bitbar-gw-unit-p2.

We would run a small set of perf tests on here, so maybe a poolname of:
gecko-t-bitbar-gw-perf-s7

:davehunt, I could imagine this being the primary device pool for perf, if we have reliable test runs and perf data, what would the next step be for you?

Flags: needinfo?(jmaher) → needinfo?(dave.hunt)

I've asked Bitbar to replace pixel2-12 and pixel2-13 (from the p2 unit pool) with Samsung Galaxy S7 devices.

What Android version do we want on the S7 devices? I think we can have 6, 7, or 8.

BC, what were you testing with?

(In reply to Joel Maher ( :jmaher ) (UTC -0400) from comment #2)

2 p2's from gecko-t-bitbar-gw-unit-p2.

We would run a small set of perf tests on here, so maybe a poolname of:
gecko-t-bitbar-gw-perf-s7

:davehunt, I could imagine this being the primary device pool for perf, if we have reliable test runs and perf data, what would the next step be for you?

If as you say we have reliable test results, we would likely want to replace more of the g5/p2 devices with s7 but I think it's important to get some tests running and analyse the results before making that decision.

Flags: needinfo?(dave.hunt)

(In reply to Andrew Erickson [:aerickson] from comment #4)

What Android version do we want on the S7 devices? I think we can have 6, 7, or 8.

BC, what were you testing with?

I was testing my Pixel2 with Android 10 and 11 before I bricked it. I've been testing since then with emulators for Android 7-11 with and without Google Play. Google Play in the emulator means it isn't rooted. Android 6 is pretty old and may cause issues with geckodriver. I would say a minimum of 7. I don't have an opinion between 7 and 8.

would we not want the most modern version possible?

Bitbar has noted that the S7 can only be upgraded (no downgrades). Not sure if this is true for all Samsung devices.

We've got a new plan as discussed in the mobile infra meeting.

New plan:
4 s7g: OS 8, rooted if possible, pull from the g5 pool

Stanley at Bitbar will attempt to root and put the one S7G they have in stock online next week.

They'll need to order at least 2 more (there might be one more in stock).

Status: NEW → ASSIGNED

Update from Bitbar:

The rest of the Galaxy S7s should arrive early next week, so we should be able to get those connected by end of the next week.

4 S7g's (exynos processor) are now online in our Bitbar account. They are rooted.

I've added the new hosts to the devicepool config in a test pool and have started some jobs. Bitbar still needs to make some changes to the Bitbar frameworks (mozilla-tcp, mozilla-usb) in the UI to add the new device serial numbers and IPs, but then the tasks should start working.

thanks :aerickson, is there an ETA on these being fully available (.i.e. bitbar making the changes?)

:davehunt, is there a bug we can link to for scheduling tests on here?

Flags: needinfo?(dave.hunt)
Flags: needinfo?(aerickson)

Bitbar finished the framework changes shortly after my update. 3 of the 4 devices are working (generic worker is running jobs). The 4th should be online tomorrow.

I had to fix an issue on the docker image regarding device detection and enabling charging. That's fixed, but now I'm hitting something related to root. I'll double check everything is configured regarding root with Bitbar. Output is below. https://treeherder.mozilla.org/jobs?repo=try&revision=250ee9c8712324c060c5a6ece9687f7f5c8ea746.

The devices are in the 'test-3' pool temporarily. Here's a phab showing how to target them. https://phabricator.services.mozilla.com/D95652

I'll get the final pools for them setup shortly.


[task 2020-11-03T01:14:57.255Z] Using adb 1.0.41
[task 2020-11-03T01:14:57.255Z] /system/bin/ls -1A supported
[task 2020-11-03T01:14:57.255Z] Native cp support: True
[task 2020-11-03T01:14:57.255Z] Native chmod -R support: True
[task 2020-11-03T01:14:57.255Z] Native chown -R support: True
[task 2020-11-03T01:14:57.255Z] Native normal pidof support: True
[task 2020-11-03T01:14:57.255Z] adbd not restarted as root
[task 2020-11-03T01:14:57.255Z] su -c setenforce 0 exitcode 0, stdout: None
[task 2020-11-03T01:14:57.255Z] su -c supported
[task 2020-11-03T01:14:57.255Z] Setting SELinux Permissive
[task 2020-11-03T01:14:57.255Z] Setting test_root to /data/local/tmp/test_root
[task 2020-11-03T01:14:57.255Z] 01:14:51     INFO - Running post-action listener: _resource_record_post_action
[task 2020-11-03T01:14:57.255Z] 01:14:51     INFO - [mozharness: 2020-11-03 01:14:51.242156Z] Finished install step (failed)
[task 2020-11-03T01:14:57.255Z] 01:14:51    FATAL - Uncaught exception: Traceback (most recent call last):
[task 2020-11-03T01:14:57.255Z] 01:14:51    FATAL -   File "/builds/task_160436600911563/workspace/mozharness/mozharness/base/script.py", line 2358, in run
[task 2020-11-03T01:14:57.255Z] 01:14:51    FATAL -     self.run_action(action)
[task 2020-11-03T01:14:57.255Z] 01:14:51    FATAL -   File "/builds/task_160436600911563/workspace/mozharness/mozharness/base/script.py", line 2292, in run_action
[task 2020-11-03T01:14:57.255Z] 01:14:51    FATAL -     self._possibly_run_method(method_name, error_if_missing=True)
[task 2020-11-03T01:14:57.255Z] 01:14:51    FATAL -   File "/builds/task_160436600911563/workspace/mozharness/mozharness/base/script.py", line 2244, in _possibly_run_method
[task 2020-11-03T01:14:57.255Z] 01:14:51    FATAL -     return getattr(self, method_name)()
[task 2020-11-03T01:14:57.255Z] 01:14:51    FATAL -   File "/builds/task_160436600911563/workspace/mozharness/mozharness/mozilla/testing/raptor.py", line 998, in install
[task 2020-11-03T01:14:57.255Z] 01:14:51    FATAL -     self.device.uninstall_app(self.binary_path)
[task 2020-11-03T01:14:57.255Z] 01:14:51    FATAL -   File "/builds/task_160436600911563/workspace/build/venv/lib/python2.7/site-packages/mozdevice/adb.py", line 3897, in uninstall_app
[task 2020-11-03T01:14:57.255Z] 01:14:51    FATAL -     if self.is_app_installed(app_name, timeout=timeout):
[task 2020-11-03T01:14:57.255Z] 01:14:51    FATAL -   File "/builds/task_160436600911563/workspace/build/venv/lib/python2.7/site-packages/mozdevice/adb.py", line 3631, in is_app_installed
[task 2020-11-03T01:14:57.255Z] 01:14:51    FATAL -     "pm list package %s" % app_name, timeout=timeout, enable_run_as=False
[task 2020-11-03T01:14:57.255Z] 01:14:51    FATAL -   File "/builds/task_160436600911563/workspace/build/venv/lib/python2.7/site-packages/mozdevice/adb.py", line 1912, in shell_output
[task 2020-11-03T01:14:57.255Z] 01:14:51    FATAL -     raise ADBProcessError(adb_process)
[task 2020-11-03T01:14:57.255Z] 01:14:51    FATAL - ADBProcessError: args: adb wait-for-device shell su -c 'pm list package org.mozilla.geckoview_example', exitcode: 255, stdout: android.os.DeadObjectException: Transaction failed on small parcel; remote process probably died
[task 2020-11-03T01:14:57.255Z] 01:14:51    FATAL - 	at android.os.BinderProxy.transactNative(Native Method)
[task 2020-11-03T01:14:57.255Z] 01:14:51    FATAL - 	at android.os.BinderProxy.transact(Binder.java:761)
[task 2020-11-03T01:14:57.255Z] 01:14:51    FATAL - 	at android.os.BinderProxy.shellCommand(Binder.java:815)
[task 2020-11-03T01:14:57.255Z] 01:14:51    FATAL - 	at com.android.commands.pm.Pm.runShellCommand(Pm.java:334)
[task 2020-11-03T01:14:57.255Z] 01:14:51    FATAL - 	at com.android.commands.pm.Pm.runList(Pm.java:722)
[task 2020-11-03T01:14:57.255Z] 01:14:51    FATAL - 	at com.android.commands.pm.Pm.run(Pm.java:138)
[task 2020-11-03T01:14:57.255Z] 01:14:51    FATAL - 	at com.android.commands.pm.Pm.main(Pm.java:107)
[task 2020-11-03T01:14:57.255Z] 01:14:51    FATAL - 	at com.android.internal.os.RuntimeInit.nativeFinishInit(Native Method)
[task 2020-11-03T01:14:57.255Z] 01:14:51    FATAL - 	at com.android.internal.os.RuntimeInit.main(RuntimeInit.java:287)
[task 2020-11-03T01:14:57.255Z] 01:14:51    FATAL - Running post_fatal callback...
[task 2020-11-03T01:14:57.255Z] 01:14:51    FATAL - Exiting -1
Flags: needinfo?(aerickson)

(In reply to Joel Maher ( :jmaher ) (UTC -0800) from comment #13)

:davehunt, is there a bug we can link to for scheduling tests on here?

Not that I'm aware of. I've just reviewed the notes from the meeting last month and don't see a decision on what we're planning to run on these? In bug 1670284 sparky has listed a subset of high value tests that we're going to continue running with webrender disabled once we make webrender the default. Perhaps this would be a good set to enable here?

Flags: needinfo?(dave.hunt) → needinfo?(gmierz2)

Here's the list of tests that we chose as high-value for android:

  • facebook
  • espn
  • amazon-search
  • allrecipes
  • youtube-watch
  • microsoft-support
  • google-search
Flags: needinfo?(gmierz2)

my main goal is if we set these up we should be using them, can we get a bug on file to follow up on status of getting tests scheduled.

Flags: needinfo?(dave.hunt)

The rooting being used on the S7s is broken at least for adb shell 'su -c "pm list packages"'.

I do not support bending over backwards to support the broken rootedness on S7. I recommend that if we can't get a reliable rooting method for the S7s that we run them unrooted and deal with the lack of perf tuning. Four devices isn't enough to really support a full set of tests on the S7s.

Blocks: 1675753

(In reply to Joel Maher ( :jmaher ) (UTC -0800) from comment #17)

my main goal is if we set these up we should be using them, can we get a bug on file to follow up on status of getting tests scheduled.

I've opened bug 1675753.

Flags: needinfo?(dave.hunt)

The rooting method Bitbar used on the S7s seems flawed. They've disabled the root (via the "Superuser" app) and tests appear to be working after making changes to the Bitbar Devicepool Docker image to handle unrooted devices (https://github.com/bclary/mozilla-bitbar-docker/pull/50).

We've hit a small bug in mozlog that happended with the mozdevice upgrade (https://bugzilla.mozilla.org/show_bug.cgi?id=1676486). Once that's resolved and the Docker image has been rebuilt, tested, and deployed we should be ready to schedule jobs.

I've worked around Bug 1676486 by manually installing mozfile in the docker image (change added to https://github.com/bclary/mozilla-bitbar-docker/pull/50).

Some tests are passing. Running more tests and analyzing with BC.

Depends on: 1676726

current issues:

test run: https://treeherder.mozilla.org/jobs?repo=try&group_state=expanded&selectedTaskRun=R8-ZEg-gRcGtCWipeJ6WHA.0&tier=1%2C2%2C3&revision=a3143e873c3221a33e4a1ac5fb59095f31cb538c

The logcat contains a large amount of spew from magisk and the disabled su command. I think before this goes live, we need to get bitbar to reflash these devices to the original android version and run without magisk installed at all.

I have several mozdevice bugs I am going to get to today which should also help.

Bug 1678163 is required to use rootless android for browsertime raptor/browsertime.

Depends on: 1678163
Pushed by asasaki@mozilla.com: https://hg.mozilla.org/ci/ci-configuration/rev/e1313d0ec8ad add autophone clients for s7s at bitbar r=aki
Keywords: leave-open
Pushed by rmaries@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/05ff9cad3f5e configure taskgraph to send jobs to s7g devices at bitbar r=jmaher

The 4 devices are in two pools (2 in a perf pool and 2 in a unit pool). Please let me know if we'd like them allocated differently. Pool names are mentioned in https://hg.mozilla.org/mozilla-central/rev/05ff9cad3f5e.

Testing of both pools to ensure they're configured:
s7-perf jobs: https://treeherder.mozilla.org/jobs?repo=try&revision=610e171a5156659e3541d49168fa35fed18916f3
s7-unit jobs: https://treeherder.mozilla.org/jobs?revision=0a2aba4967db23017ca88f43bb28e2651caef308&repo=try&group_state=expanded

There are still issues with timeouts on some tests.

Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED

unit test jobs need --no-artifact when pushing to try. Sadly these are showing in treeherder as Android 8.0 Pixel2 - this should be fixed to not be confusing.

Status: RESOLVED → REOPENED
Resolution: FIXED → ---

OK, I'll rerun the unit jobs.

I'm hacking up the in tree generation (https://hg.mozilla.org/try/rev/0fffe78fad9ea4eb96ac0010fc8956afa87eb62c) because no jobs are actually scheduled on these yet. Maybe they'll be fixed when tasks are actually pointed at them?

https://bugzilla.mozilla.org/show_bug.cgi?id=1675753 is tracking scheduling of jobs.

that worked great; the jsreftests just ran slower, but the logs in general look good.

Closing as the phones are racked and test jobs work.

Tracking scheduling of tests on these phones in Bug 1675753.

Status: REOPENED → RESOLVED
Closed: 4 years ago4 years ago
Resolution: --- → FIXED

:jmaher, should we create a new bug for further greening or should I NI sparky/dhunt on 1675753?

Flags: needinfo?(jmaher)

good question- we have two factors:

  1. perf (sparky/dhunt)
  2. unit (for now me)

the unittests like jittest, jsreftest, crashtest, reftest, webgl, media- those are running on pixel2, I don't think we want to run them on both Pixel2 and S7, I would pick one flavor to run on S7 if we want to at all, maybe media tests. Are we planning on creating a larger pool than 4? if so, I could see us splitting some up to get unittest coverage on s7 and p2.

I will be revisiting the jittest/jsreftest because part of the reason on android is arm64, and we will be getting coverage (cheaper) on osx/aarch64 in ideally this month- possibly we don't need this load on pixel2- while that is unrelated to the S7, it means I am revisiting our unittests.

Flags: needinfo?(jmaher)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: