Android PGO run tasks failing when build is optimized for size, "-Os"
Categories
(Firefox Build System :: Android Studio and Gradle Integration, defect)
Tracking
(firefox73 fixed)
Tracking | Status | |
---|---|---|
firefox73 | --- | fixed |
People
(Reporter: acreskey, Assigned: aerickson)
References
Details
Attachments
(1 file)
In Bug 1591725 we're looking at different build optimization flags for Android.
One of these options, -Os
, is leading to PGO run failures such as this one:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=85b33251696b529470c49b85788ffa3105c29f78&selectedJob=273808395
From the logs:
[task 2019-10-31T00:14:15.434Z] 00:14:15 INFO - Running main action method: install
[task 2019-10-31T00:15:55.071Z] 00:15:55 INFO - Failed to install /builds/worker/fetches/geckoview-androidTest.apk on None: ADBError install failed for /builds/worker/fetches/geckoview-androidTest.apk. Got: Performing Push Install
[task 2019-10-31T00:15:55.071Z] 00:15:55 INFO - /builds/worker/fetches/geckoview-androidTest.apk: 1 file pushed. 2.3 MB/s (155123944 bytes in 65.077s)
[task 2019-10-31T00:15:55.071Z] 00:15:55 INFO - pkg: /data/local/tmp/geckoview-androidTest.apk
[task 2019-10-31T00:15:55.071Z] 00:15:55 INFO - Failure [INSTALL_FAILED_CONTAINER_ERROR]
Locally I've been able to install the artifact: geckoview-androidTest.apk
Reporter | ||
Comment 1•6 years ago
|
||
According to stack overflow this may be mysteriously solvable by adding android:installLocation="internalOnly"
to manifest, or else increasing the device's virtual memory.
![]() |
||
Comment 2•6 years ago
|
||
(In reply to Andrew Creskey from comment #1)
... or else increasing the device's virtual memory.
The arm emulator's avd has
hw.ramSize=1024
vm.heapSize=128
I seem to recall that the ramSize is at its maximum for that emulator version + android 4.3 image...but I'm not sure.
![]() |
||
Comment 3•6 years ago
|
||
I imagine this would not be a problem on the x86_64 emulator -- bug 1548962 -- which, in our standard setup, has much more memory and storage space available.
Reporter | ||
Comment 4•6 years ago
|
||
I did try a push with android:installLocation="internalOnly"
in the manifest but it gives the same result INSTALL_FAILED_CONTAINER_ERROR.
Reporter | ||
Comment 5•6 years ago
|
||
(In reply to Geoff Brown [:gbrown] from comment #2)
The arm emulator's avd has
hw.ramSize=1024
vm.heapSize=128I seem to recall that the ramSize is at its maximum for that emulator version + android 4.3 image...but I'm not sure.
This is another post where the avd device memory sizes were increased to avoid INSTALL_FAILED_CONTAINER_ERROR
https://github.com/flutter/flutter/issues/8824
![]() |
||
Comment 6•6 years ago
|
||
I tried to increase the memory size in https://treeherder.mozilla.org/#/jobs?repo=try&revision=8a4e44694f269155b7354d880ff8fdb4075ccd9d, but it did not work.
hw.ramSize = 2048
...
emulator: 3: KEY='hw.ramSize' VALUE='2048'
so the request was recognized, but...
Truncating RAM at 00000000-7fffffff to -33ffffff (vmalloc region overlap).
...
Memory: 832MB = 832MB total
Memory: 840192KB available (2900K code, 707K data, 124K init)
it was truncated.
Reporter | ||
Comment 7•6 years ago
|
||
Interesting -- so truncation is happening even with the default requested size, hw.ramSize=1024
.
I did try to run the PGO steps locally but it looks like it requires a library that's only in the linux toolchain (I'm on OSX).
42:42.65 /Users/acreskey/ndk/android-ndk-r20/toolchains/arm-linux-androideabi-4.9/prebuilt/darwin-x86_64/lib/gcc/arm-linux-androideabi/4.9.x/../../../../arm-linux-androideabi/bin/ld: error: cannot open /Users/acreskey/.mozbuild/clang/lib/clang/9.0.0/lib/linux/libclang_rt.profile-arm-android.a: No such file or directory
Michael -- am I right that I need linux to build the android PGO-instrumented app?
![]() |
||
Comment 8•6 years ago
|
||
You don't need Linux, but you do need a clang toolchain that has the necessary runtime libraries for Android, which our bootstrapped toolchains don't.
You should be able to mach artifact toolchain --from-build linux64-clang9-android-cross
to get a clang/
directory with the appropriate runtime libraries, and then copy them into the appropriate place under $HOME/.mozbuild/clang
.
Reporter | ||
Comment 9•6 years ago
|
||
(In reply to Nathan Froyd [:froydnj] from comment #8)
You don't need Linux, but you do need a clang toolchain that has the necessary runtime libraries for Android, which our bootstrapped toolchains don't.
You should be able to
mach artifact toolchain --from-build linux64-clang9-android-cross
to get aclang/
directory with the appropriate runtime libraries, and then copy them into the appropriate place under$HOME/.mozbuild/clang
.
That would be great.
What I'm seeing is this error:
./mach artifact toolchain --from-build linux64-clang9-android-cross
... Could not find a toolchain build named `linux64-clang9-android-cross`
![]() |
||
Comment 10•6 years ago
|
||
Sigh, try --from-build linux64-clang-android-cross
.
Reporter | ||
Comment 11•6 years ago
|
||
Progress... I'm getting this (at the tip of mozilla-central), so I can try the fresh checkout.
./mach artifact toolchain --from-build linux64-clang-android-cross
0:05.21 Could not find artifacts for a toolchain build named `linux64-clang-android-cross`. Local commits and other changes in your checkout may cause this error. Try updating to a fresh checkout of mozilla-central to use artifact builds.
Reporter | ||
Comment 12•6 years ago
|
||
Thank you Nathan, ./mach artifact toolchain --from-build linux64-clang-android-cross
worked great in a fresh repo.
Next issue is that the android MOZ_PGO
build appears to be looking for fennec instead of the geckoview-androidTest.apk
(surprising given Bug 1582221), but perhaps that's because my local builds are unsigned and have different binary names.
If I modify build/pgo/profileserver.py
to use the binary that I just built, geckoview-withGeckoBinaries-debug-androidTest.apk
, it proceeds further, until I hit Permission denied
starting the Firefox Runner.
3:24.63 ['/Users/acreskey/dev/firefox/src/build/obj-release-android/gradle/build/mobile/android/geckoview/outputs/apk/androidTest/withGeckoBinaries/debug/geckoview-withGeckoBinaries-debug-androidTest.apk', 'data:text/html,<script>Quitter.quit()</script>', '-foreground', '-profile', '/tmp/tmpoDXDZH']
3:24.63 Traceback (most recent call last):
3:24.63 File "/Users/acreskey/dev/firefox/src/mozilla-central/build/pgo/profileserver.py", line 108, in <module>
3:24.63 runner.start()
3:24.63 File "/Users/acreskey/dev/firefox/src/mozilla-central/testing/mozbase/mozrunner/mozrunner/base/browser.py", line 85, in start
3:24.63 BaseRunner.start(self, *args, **kwargs)
3:24.63 File "/Users/acreskey/dev/firefox/src/mozilla-central/testing/mozbase/mozrunner/mozrunner/base/runner.py", line 136, in start
3:24.63 reraise(RunnerNotStartedError, "Failed to start the process: %s" % value, tb)
3:24.63 File "/Users/acreskey/dev/firefox/src/mozilla-central/testing/mozbase/mozrunner/mozrunner/base/runner.py", line 131, in start
3:24.63 process.run(self.timeout, self.output_timeout)
3:24.63 File "/Users/acreskey/dev/firefox/src/mozilla-central/testing/mozbase/mozprocess/mozprocess/processhandler.py", line 811, in run
3:24.63 self.proc = self.Process([self.cmd] + self.args, **args)
3:24.63 File "/Users/acreskey/dev/firefox/src/mozilla-central/testing/mozbase/mozprocess/mozprocess/processhandler.py", line 123, in __init__
3:24.63 universal_newlines, startupinfo, creationflags)
3:24.63 File "/Users/acreskey/.pyenv/versions/2.7.11/lib/python2.7/subprocess.py", line 710, in __init__
3:24.63 errread, errwrite)
3:24.63 File "/Users/acreskey/.pyenv/versions/2.7.11/lib/python2.7/subprocess.py", line 1335, in _execute_child
3:24.63 raise child_exception
3:24.63 mozrunner.errors.RunnerNotStartedError: Failed to start the process: [Errno 13] Permission denied
3:25.11 make[1]: *** [profiledbuild] Error 1
But I might be going into a rabbit hole in trying to build my own local PGO -Os
build.
The problem is that even if get this running I won't be able to run high job count raptor performance tests with it, as I can on try.
Reporter | ||
Comment 13•6 years ago
|
||
I wonder if it would be possible to do a test using this android-7.0
emulator?
https://searchfox.org/mozilla-central/source/testing/config/tooltool-manifests/androidarm_7_0/mach-emulator.manifest
If I change the definition of the 4.3
device here ...
![]() |
||
Comment 14•6 years ago
|
||
I wouldn't recommend that, based on the experience in bug 1519489 (but I'm not entirely sure).
I've had no end of frustration with Android arm emulators in general: They tend to be very slow and sometimes unreliable. Why not make the switch to x86_64, bug 1548962?
Reporter | ||
Comment 15•6 years ago
|
||
Ah, that's good to know.
I did do a hacked test, attempting to use the android 7 device but it didn't start. Not an area that I know very much about, so this could be incorrectly setup.
So maybe using the x86_64 build to generate the PGO data is the best next step.
Personally I have enough experience with the PGO setup and automation to make this change. But maybe we want to prioritize this.
Comment 16•6 years ago
|
||
(In reply to Andrew Creskey [:acreskey] [he/him] from comment #15)
Ah, that's good to know.
I did do a hacked test, attempting to use the android 7 device but it didn't start. Not an area that I know very much about, so this could be incorrectly setup.So maybe using the x86_64 build to generate the PGO data is the best next step.
Personally I have enough experience with the PGO setup and automation to make this change. But maybe we want to prioritize this.
Switching to the x86_64 is not particularly hard (I have some patches ready to go), but is blocked on bug 1545497. :pmoore, has that bug stalled? Any idea what's left to get that finalized so we can stop using the android 4.3 emulator for PGO?
Comment 17•6 years ago
|
||
We do now have generic-worker multiuser engine on linux, so we can run tasks securely on a linux host machine, outside of docker.
These tasks would most likely run as non-privileged users on the host - is that also ok?
I'm assuming no containers need to be created in the tasks themselves, but if they do, we should probably consider using something like podman that supports running containers as non-privileged users on the host.
If this meets your requirements, the next steps would be setting up a dedicated generic-worker linux worker pool for these tasks, either in GCP or AWS.
Comment 18•6 years ago
|
||
(In reply to Pete Moore [:pmoore][:pete] from comment #17)
We do now have generic-worker multiuser engine on linux, so we can run tasks securely on a linux host machine, outside of docker.
These tasks would most likely run as non-privileged users on the host - is that also ok?
I think a non-privileged user is fine, as long as they have access to /dev/kvm. Any idea if that's possible? In docker, /dev/kvm access is only provided if run with --privileged. Outside of docker, I think it should work as long as the user that the task is running under is in whatever group has file permissions for /dev/kvm. (Eg: On Ubuntu, /dev/kvm is crw-rw---- 1 root kvm, so adding the user to the 'kvm' group lets them use kvm).
I'm assuming no containers need to be created in the tasks themselves, but if they do, we should probably consider using something like podman that supports running containers as non-privileged users on the host.
I don't believe we need to use a container specifically for Android PGO.
If this meets your requirements, the next steps would be setting up a dedicated generic-worker linux worker pool for these tasks, either in GCP or AWS.
Should I file a separate bug blocking bug 1545497? Or does that bug cover it?
Reporter | ||
Comment 19•6 years ago
|
||
I was looking at the attempted -Os
PGO builds and I noticed something strange:
The geckoview-androidTest.apk
binary is exceptionally large -- 152MB.
The current -Oz
builds, I'm seeing a geckoview-androidTest.apk
size of 117MB.
For the -O2
build, the geckoview-androidTest.apk
is 92MB.
So I think that could explain the INSTALL_FAILED_CONTAINER_ERROR
install error.
The difference in apk come from libxul.so
-- I'm scratching my head as to why the instrumented -Os
library is so much larger than the others.
Comment 20•6 years ago
•
|
||
(In reply to Michael Shal [:mshal] from comment #18)
If this meets your requirements, the next steps would be setting up a dedicated generic-worker linux worker pool for these tasks, either in GCP or AWS.
Should I file a separate bug blocking bug 1545497? Or does that bug cover it?
A separate bug is probably best. Regarding the set up of the linux host, most of this can be reasonably self-serve. I'd recommend you cargo cult the gwci-linux
machine image definition, remove lines 40-49 (since docker not required) and add any steps there you need for toolchains on the host etc, and place it in a new directory (e.g. /worker_types/android-pgo
) together with the other files copied from the /worker_types/gwci-linux
directory, adapted as needed, and submit a generic-worker PR once you believe you have everything on the host you need. Unfortunately gnome3 desktop is currently required, so you'll need to leave that in for now. At some point, we'll drop the requirement for gnome3 to support headless tasks, but currently all generic-worker linux multiuser tasks run under a gnome 3 graphical desktop environment.
Anyone with permissions to create AMIs in the production EC2 or GCP Compute Engine account(s) can run the worker_type.sh
script to generate the machine images, and then, assuming this will run in the firefox-ci cluster, a ci-configuration patch will be required to update worker-images.yml and worker-pools.yml with the machine image ids etc.
Reporter | ||
Comment 21•6 years ago
|
||
In terms of using the existing emulator, I wonder if it would be possible to make the profile-generate
build smaller so that it could fit on the device? Something that can be stripped out of the library just when doing the profiling run?
(assuming binary size is the problem).
Reporter | ||
Comment 22•6 years ago
|
||
Michael was able to reproduce the INSTALL_FAILED_CONTAINER_ERROR
locally with the default AVD.
With the disk.dataPartition.size
increased from 600M to 1200M in avd/test-1.avd/config.ini
he was able to install and run the profiling build on the emulator.
Geoff, how feasible is to increase the dataPartition on this android 4.3 emulator?
(Even if for just a one off test).
Outside of changing the avd definition, I could only find an emulator option to increase the system partition.
![]() |
||
Comment 23•6 years ago
|
||
There is an emulator command line option, -partition-size:
https://developer.android.com/studio/run/emulator-commandline
I think you could simply add '-partition-size', '1200' to the emulator arguments at
If that's troublesome, we could update config.ini in an updated avd in tooltool.
Reporter | ||
Comment 24•6 years ago
|
||
I did try a larger -partition-size
yesterday (2 gigs),
It looks like it's sticking from the emulator logs:
https://firefoxci.taskcluster-artifacts.net/SdzQ1HmzS2iPY3Kb5Gkfmw/5/public/build/blobber_upload_dir/emulator-FWzJXJ.log
disk.systemPartition.size = 2g
(Otherwise I see disk.systemPartition.size = 221m
)
However I believe that this is just for the system partition but not user data where the apk would be installed.
That one I still see at 600m:
disk.dataPartition.size = 600m
Having an updated avd config.ini would be great from my perspective, it looks like that would unblock us.
I wouldn't know how to do that though.
![]() |
||
Comment 25•6 years ago
|
||
aerickson - Can you help? The relevant avd is
We need to update only the config.ini in that avd with disk.dataPartition.size = 1200.
Assignee | ||
Comment 26•6 years ago
|
||
Yeah, absolutely.
Assignee | ||
Comment 27•6 years ago
|
||
I've packaged the new AVD but I'm not able to upload until I get a new taskcluster token for tooltool (tooltool tokens are now disabled).
Assignee | ||
Comment 28•6 years ago
|
||
Reporter | ||
Comment 29•6 years ago
|
||
I'm confirming that the attached patch allows us to build android at -Os in automation.
https://treeherder.mozilla.org/#/jobs?repo=try&selectedJob=280129731&revision=eb5d285e277a8575a7900ac92865631163c4c358
Thank you Andrew and everyone who's been helping moving this along.
Comment 30•6 years ago
|
||
Comment 31•6 years ago
|
||
bugherder |
Description
•