Testing Android (and B2G) in a VM

RESOLVED FIXED

Status

RESOLVED FIXED
5 years ago
4 years ago

People

(Reporter: gal, Unassigned)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(15 attachments, 10 obsolete attachments)

6.64 KB, patch
armenzg
: feedback+
Details | Diff | Splinter Review
3.61 KB, patch
armenzg
: feedback+
Details | Diff | Splinter Review
12.26 KB, patch
armenzg
: review+
Details | Diff | Splinter Review
2.27 KB, patch
Callek
: review+
Details | Diff | Splinter Review
191 bytes, patch
Callek
: review+
Details | Diff | Splinter Review
35.24 KB, patch
gbrown
: review+
Details | Diff | Splinter Review
1.89 KB, patch
aki
: review+
Details | Diff | Splinter Review
9.07 KB, patch
Details | Diff | Splinter Review
11.26 KB, patch
aki
: review+
Details | Diff | Splinter Review
529 bytes, patch
Details | Diff | Splinter Review
6.24 KB, patch
aki
: review+
Details | Diff | Splinter Review
17.26 KB, patch
gbrown
: review+
Details | Diff | Splinter Review
4.94 KB, patch
aki
: review+
Details | Diff | Splinter Review
1.35 KB, patch
gbrown
: review+
Details | Diff | Splinter Review
2.22 KB, patch
gbrown
: review+
Details | Diff | Splinter Review
(Reporter)

Description

5 years ago
We are currently testing Android on real hardware, which is expensive to operate, scales poorly, and has reliability issues. Testing a VM will never be as precise as testing on real hardware, but it can be a big part of the test mix. We should investigate setting up testing in a VM on EC2 (if EC2 ends up not working, maybe in a physical cloud).

Plan: (variations to the order here might make sense)

1. Get Fennec to work in the Android emulator JB/x86 on a physical x86 host. Run some mochitests etc.
2. Get Fennec to work in the JB/ARM7 emulator (ARM7 emulation might be incomplete and might need work).
3. Get Fennec to work in JB/x86 on EC2 (Intel VT issues due to double virtualization).
4. Get Fennec to work in JB/ARM7 on EC2 (probably not that hard after 2).
5. Figure out whether we can use OpenGL ES 2.0 in the emulator via mesa for layers and WebGL.
6. Get older versions of Android to work, ideally as far back as 2.3.
7. Start getting the existing tests to work, see whether we can replicate the existing testing on a physical machine, and ultimately in ec2.
8. Figure out what to do with performance tests. Check whether running Talos with mesa on a physical machine gives results that somehow compare with tegra, and then see if running that in ec2 gives any meaningful/reliable results at all.

The ideal outcome is that we can virtualize most if not all Android testing. We should revisit this plan frequently. Depending on individual results we will probably have to update the plan, and maybe also update some of the goals, since some things might be easier than we thought, harder than we thought, or even impossible.
(Reporter)

Comment 1

5 years ago
mwu, does the above approach make sense to you? Any feedback?

Comment 2

5 years ago
Judging by what we're currently doing for FFOS testing, we should be able to do JB/ARM7 with GLES2 on EC2. AIUI, on FFOS emulators, the machine is configured with a xserver with a dummy video output and glx/dri enabled. The xserver/mesa falls back on software rendering and provides enough opengl for the emulator to do what it needs. I think the one drawback of EC2 is that we can't get kvm working on that, but it seems like ARM emu is fast enough for B2G so we wouldn't need to add x86.

There's one drawback to qemu's ARM emulation though - it doesn't catch unaligned accesses. That might be a reason to keep some spidermonkey testing on real hardware.

I have no idea if we can get things working well on Gingerbread. GLES2 support is probably difficult. Getting ICS to work might be reasonable, though we'd probably have to adapt FFOS's manifest to pick up the necessary fixes.

So, my suggestions would be:

1. Drop x86 testing unless it's shown that ARM testing isn't fast enough.
2. Do #5 alongside #2 / #4 - we already know how to configure an xserver on EC2 to provide opengl to the emulator.
3. Build our own emulator - we are catching emulator bugs with FFOS testing and these fixes can be used for good on the Android side too. One of our fixes allows SkiaGL to work - bug 905141 .
(Reporter)

Comment 3

5 years ago
Thanks mwu. I agree with all of the above. I also agree that we can probably get by without x86 if we can't get it to work, but I still would like to at least do a thorough investigation of it and measure the speedup. Even if ARM is fast enough, if x86 is significantly faster, once we operate this at scale (hundreds of machine hours per checkin), it might make a big financial difference if we save 50% machine cycles, for example.
First impressions: the stock android emulators "work" and a fresh fennec x86 build "works" on some the atom configs.

But it doesn't work very well: emulator itself is flaky (depending on configuration, many variants refuse to boot) and the one fennec I've got working on an atom config doesn't seem to want to repaint content area (though site previews in the tab-switch menu shows it is loading content).

I'll start fiddling with newer qemus and/or poking at the test automation bits (sut-agent and marionette).
Further notes:

I dug into the automation far enough to find mozharness and SUT, read the logs and buildbot configs, installed SUT agent on the target and started poking around trying to make mozharness run; while reading, came across this config:

https://hg.mozilla.org/build/mozharness/file/tip/configs/android/androidx86.py
(adjacent to the panda config: https://hg.mozilla.org/build/mozharness/file/tip/configs/android/android_panda_releng.py)

Looking at the commits on that file it seems Armen is already covering this work in bug 895186 and its tracking bug 891959. I'm not sure where I should go with that. Try to help out with those bugs? Close this as a dupe? Work in parallel more on the matters of automating this on AWS? I had my AWS credentials from Rust revoked so AWS-work is not going to be terribly easy from here; and I'm not in releng so I have no access to things like the host utils, AVDs they're using and such.

Until advised otherwise I suppose I'll continue trying to convince mozharness to run that config on my workstation with local AVDs and tools, and if I get that going, continue looking into the matter of the GL pipe (I found the code for that in AOSP yesterday, looks like it ought to work).

Alternatively maybe I should coordinate with Armen to see about cloning this config into an arm emulator config, and trying to work side-by-side on that?
(Reporter)

Comment 6

5 years ago
Lets catch up with Armen on the x86 effort. That might take care of (1) completely. Sounds like we are unclear on AWS still. I was told the double virtualization is a problem. How can I get credentials to you? or is it easier if I give you my corp card #?

As for 2 and 4 and beyond thats still all open right?
I'll chat with Armen then, see what he thinks about splitting efforts there and/or directing my attention to ARM7 configs. For AWS in the short term / experiment sake, I can switch over to using my personal AWS account; if we wind up setting anything more serious up I should figure out how releng normally charges these things.
(Reporter)

Comment 8

5 years ago
Bob, how can we get an AWS account switched over to your corp card bill?

Comment 9

5 years ago
The x86 work that I'm wrapping up runs 4 test jobs (one on each emulator) on 4 emulators on talos-linux64-ix-* machines (in-house).
https://tbpl.mozilla.org/?tree=Cedar&jobname=Android x86&showall=1

As I mentioned to graydon, my androidx86 scripts and configs are not that x86 specific.

The android_panda.py script is to run tests on pandas.

There were technical reasons on why gbrown and the a-team determined that EC2 instances were no good. I believe graphics related. I can't recall why though.

The AVDs were provided by gbrown. Documentation is provided in bug 894507. IIUC nothing that the scripts reach are necessarily to be kept private in releng internal systems.

We can loan AWS Ubuntu 64-bit machines if needed.
(Reporter)

Comment 10

5 years ago
If you could loan out an AWS instance that would be great. That would be a quick way to unblock here.

For gfx we want to try mesa (software OpenGL) and see whether our backends (WebGL, skiaGL and layers/OGL) can render against that. One of the things on the agenda here.
Followed up with armen and gbrown, figured out plan of what I'll be doing (following armen's ash-mozharness work with parallel arm7 variant); muddled around and got local version of ash-mozharness starting up x86 emulators, uploading artifacts into them. Stopped at trying to figure out where to get host tools (not able to download from here) and how to build my own AVDs for arm7. Will continue to follow up with gbrown and dminor.

Comment 12

5 years ago
It sounds like you're following up with all the right people on this, Graydon. Let us know how else we can help.

I'd love to find a way to get android 2.3 emulators to work in the cloud. That could save us quite a bit of trouble because when we discontinue the tegras next year we have to get android 2.3 testing running somewhere to replace them. Right now, if we have to continue doing hardware, then we'd be using pandaboards if we can't get these emulators working. 

I'd second Mwu that I'd rather use ARM emulators everywhere since it's closer to what we're shipping. But, I am also curious to know how much faster x86 actually is versus ARM.  If there is a significant difference there, we could do some kind of interleaved run (x86 per push, ARM every 5 pushes or some other such scheme).

I'm pretty skeptical about performance testing on emulators running in VMs. That said, we have the ability with the new datazilla talos visualizations to compare the relative noise from these measurements to the hardware talos measurements. So, if you get the emulators running, I can help with the noise analysis and we can see whether the emulator option is viable for perf testing.

Updated

5 years ago
Depends on: 915177

Comment 13

5 years ago
(In reply to Andreas Gal :gal from comment #10)
> If you could loan out an AWS instance that would be great. That would be a
> quick way to unblock here.
> 

Filed bug 915177 for this purpose.
(In reply to Clint Talbert ( :ctalbert ) from comment #12)

The one item I'll point out here is that using rack mount machines with emulators or AWS may not actually be less expensive than using the pandaboards.  We already have sunk cost into roughly 900 panda boards of which we're only currently using about half (for Android 4.0). It also depends on how many emulators we could run on one machine at a time.

Would the intention be to still keep the pandas and port 2.3 to them as planned, and (if we also get emulation working) supplement that capacity with rackmount/aws emulation. Or would the plan be to replace the pandas entirely and dispose of that hardware?  We'd be able to reuse the servers that currently act as the foopies/imaging servers for the pandas, but the chassis and pandaboards would be unrecoverable costs (we could distribute pandas to developers for personal use, but that may not be useful if we're not testing on them).

Is there an expectation that we'll have a reasonably good idea about future plans (whatever they may be, AWS, hardware purchase, or using pandas) by December when the budgeting forecast for FY2014 is due?
(Reporter)

Comment 15

5 years ago
This work aims at providing the right answers so we can make a decision on the future of our testing approach for mobile. I genuinely don't know many of the answers here, so until Graydon gets them for us, I think its premature to plan much. Keep doing what you are doing until we have a new plan, based on new data.

That having said, my gut feeling is that cloud based testing has massive advantages, and that the sunk cost for the 900 tegras is a fallacy. Those 900 tegras are located in a data center in Santa Clara, which is probably the most expensive real estate on the planet, far ahead of Shanghai and Manhattan, and we pay for it per month. On top of that we need several humans to keep those 900 boards running since the boards weren't designed for this kind of abuse, and those humans we pay per months as well. In most businesses real-estate, power, and human capital far outpace equipment investment.

Add to that the fact that those boards don't scale. We have 900. When we have a work week, hundreds of developers congregate and we get cycle delays in the order of magnitude of hours to days. On the other hand, Christmas Eve we could probably get buy with much fewer machines sucking power and killing trees.

I would like to see us move everything onto cloud infrastructure that we possibly can move there. I am confident we can move all our mobile unit/integration testing onto VMs, to the point where we need hardware testing only at small scale, in our offices, run by QA with some mild automation or by hand. I am also somewhat hopeful that we can do most of our performance testing on VMs and still catch major regressions per check-in. As Clint, I am doubtful that we can completely eliminate performance testing on real hardware, so again, there will be a manual/physical component here, but hopefully that can be small scale and low frequency (one a day?). I might be wrong. Give Graydon a few weeks and we will know.
:gal: That sounds great, and I agree that part of the cost is definitely human time and that having the ability to handle burst capacity (and not have to worry about providing physical boards) would be fantastic. :graydon: jwatkins is our go to guy for the android 4.0 stuff on the pandas, so if you'd like any help from us/want to brainstorm, please drop him a line.

Comment 17

5 years ago
graydon: Latest Androidx86 code has landed in the official mozharness repo. Please ignore ash-mozharness from here on.
https://hg.mozilla.org/build/mozharness/rev/a5bb3243d9f0

Comment 18

5 years ago
Clarification:
- # of Tegras: 400+
- # of Pandas: 850+

Jobs running on Tegras: Android 2.2 armv7, noion & armv6
Jobs running on Pandas: Android 4.0
Understood, will track mozharness repo directly.
I have access to an AWS machine for testing now. Need sudo permission on it to install requisite tools.

Comment 21

5 years ago
(In reply to Graydon Hoare :graydon from comment #20)
> I have access to an AWS machine for testing now. Need sudo permission on it
> to install requisite tools.

Working this out through email.
Update: today was mostly fighting timeouts and trying to convince the AWS machine to run. The AVMs have to have the hardware-GPU setting disabled in order to start up and I still can't get mozharness to successfully get a mochitest running without timing out somewhere in the process. Everything is too slow. Will keep investigating places where timeouts and retry-cycles are hardcoded into the scripts.

Meanwhile, took a moment to do a little side-by-side perf testing to get ballpark numbers about how bad all this is going to be and what emulation penalties we're suffering (using fhourstone, totally random looking integer benchmark). The results aren't great, though it looks like possibly we're not actually managing to pick up kvm in 32bit mode this way (this is qemu-user, I'll poke at the in-emulator version tomorrow):

my 2-year-old i7 workstation:
(Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz):

    x64 code:
          native: 11680.4 Kpos/sec
      qemu/kvm64:  3430.4 Kpos/sec

    i386 code:
          native: 11891.9 Kpos/sec
      qemu/kvm32:   775.2 Kpos/sec

    arm7 code:
        qemu/arm:   687.8 Kpos/sec


AWS machine:
(Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz):

    x64 code:
          native:  5917.7 Kpos/sec
      qemu/kvm64:  1387.8 Kpos/sec

    i386 code:
          native:  3588.6 Kpos/sec
      qemu/kvm32:   360.3 Kpos/sec

    arm7 code:
        qemu/arm:   326.9 Kpos/sec
The instance you've got is an m1.medium instance. (see http://aws.amazon.com/ec2/instance-types/ and http://aws.amazon.com/ec2/pricing/).

If you'd like, we can change to a faster instance type. It requires the VM to be shut down, but otherwise is pretty painless.
Possibly-good news? Further fiddling with the provided full-system qemus in the SDK, rather than qemu-user, shows it does in fact pick up kvm for 32bit mode, and we get 4978.1 Kpos/sec on my workstation doing 32bit x86 emulation.

Sadly, arm appears to run worse (presumably the qemu I'm using is newer; of course no kvm), 305.6 Kpos/sec inside the system emulator on my desktop.

Full-system emulator speeds within the AWS machine are ... quite bad. Emulator takes ~5 minutes to boot, gets 130 Kpos/sec on x86 and arm alike. This part confuses me a bit. It looks a little like qemu-x86_64 is getting access to kvm, where i386 is not? But this makes no sense: the AWS machine is paravirtualized, no? Or is it hvm? It has no 'vmx' bit in /proc/cpuinfo, only 'hypervisor'.

In any case, yes, it'd probably best if we could bring up another AWS machine at some point, but I'm not sure exactly which kind yet; I have a private amazon account and I'll run a few experiments to see which if any configs give us access to vmx and/or a hardware gpu. I'll post back here when I have any insight on that.
Fiddled configs long enough to get successful mochitest run on the AWS machine, took about 27 minutes for the mochitest-1 group, on emulated x86. While that was going, checked on 'hvm' and faster AWS machines, they sadly do not provide nesting / vmx bits; however on faster AWS machines are an improvement. On an m3.xlarge:

  x64 native: 9717.6 Kpos/sec
  x64 qemu-user: 3035.1 Kpos/sec
  x86 native: 5732.9 Kpos/sec
  x86 qemu-user: 905.2 Kpos/sec

I'll shut that m3.xl machine down for now and focus on building suitable test AVMs for the arm emulator.
Test AVDs for arm are here: https://people.mozilla.org/~graydon/test-arm.tar.gz

Preliminary mozharness run (more or less "Armen's config with 'x86' switched to 'arm'") seems to at least start up emulators, download and install fennec and gets as far as timing out after the SUT redirect attempts. Will experiment more next week. It's slow, but might work.

One other thing I notice in passing: while AWS runs on xen and therefore apparently doesn't want to support nested vmx bits (so the x86 emulator will always run at about a 10x speed penalty due to the hypervisor) the GCE service runs on KVM, which _does_ support nested vmx (at least on paper). I may poke around at that to see how real it is, if that's ok. The 10x penalty seems a little unfortunate.
That URL is no longer valid, moved to https://people.mozilla.org/~graydon/test-arm-2013-09-13.tar.gz but it's obsolete anyways. Fixed a couple minor steps-done-wrong on Friday (and compensated to some limited extent for an ambiguity in the instructions, see bug 894507) to produce https://people.mozilla.org/~graydon/test-arm-2013-09-16.tar.gz which successfully completes a single mochitest-1 run. Though not terribly reliably (system GUI crashes from time to time) and the wall clock time is a full 30 minutes just for that batch.

A number of loose ends to tie up, and much more performance investigating, but it does (in a very preliminary sense) seem to work, doing the ARM + AWS combination.

Note: the GL pipe is turned off entirely, though, and I've yet to see any configuration in which firefox actually displays stuff in the content area during testing. Somehow that's just not flushing to the screen (it's loading -- thumbnail previews are visible in the tab-switcher).
Took a little digression to visit GCE. Bad news is that GCE, while running on KVM, doesn't seem to be using KVM in nested mode. No vmx bit in the guest, installing qemu-kvm gives:

[FAIL] Your system does not have the CPU extensions required to use KVM. Not doing anything. ... failed!

and so forth. Giving a standard instance a fhourstones run gets:

Intel(R) Xeon(R) CPU E5-2689 0 @ 2.60GHz
     x64 native: 9674.5 Kpos/sec
  x64 qemu-user: 3290.7 Kpos/sec

So more or less equivalent to the measurements I took on the AWS m3.xlarge (Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz); the m3.xl is a 4 core config at $0.50/hr and a GCE 'standard' 4 core config is $0.53/hr. So no real advantage here.
Have been spending today trying to see which configs of the emulator work with older android API levels. The x86 emulator works for API level 10 (android 2.3, gingerbread), but the arm emulator does not. Perhaps I'm just doing something obviously-wrong but I figured I'd post in here because so far no combinations I try work, they all fail like so:

E/AndroidRuntime(  343): FATAL EXCEPTION: main
E/AndroidRuntime(  343): java.lang.UnsatisfiedLinkError: Couldn't load mozglue: findLibrary returned null
E/AndroidRuntime(  343):        at java.lang.Runtime.loadLibrary(Runtime.java:429)
E/AndroidRuntime(  343):        at java.lang.System.loadLibrary(System.java:554)
E/AndroidRuntime(  343):        at org.mozilla.gecko.mozglue.GeckoLoader.loadMozGlue(GeckoLoader.java:246)
E/AndroidRuntime(  343):        at org.mozilla.gecko.GeckoApplication.onCreate(GeckoApplication.java:101)
E/AndroidRuntime(  343):        at android.app.Instrumentation.callApplicationOnCreate(Instrumentation.java:969)
E/AndroidRuntime(  343):        at android.app.ActivityThread.handleBindApplication(ActivityThread.java:3272)
E/AndroidRuntime(  343):        at android.app.ActivityThread.access$2200(ActivityThread.java:117)
E/AndroidRuntime(  343):        at android.app.ActivityThread$H.handleMessage(ActivityThread.java:969)
E/AndroidRuntime(  343):        at android.os.Handler.dispatchMessage(Handler.java:99)
E/AndroidRuntime(  343):        at android.os.Looper.loop(Looper.java:123)
E/AndroidRuntime(  343):        at android.app.ActivityThread.main(ActivityThread.java:3683)
E/AndroidRuntime(  343):        at java.lang.reflect.Method.invokeNative(Native Method)
E/AndroidRuntime(  343):        at java.lang.reflect.Method.invoke(Method.java:507)
E/AndroidRuntime(  343):        at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:839)
E/AndroidRuntime(  343):        at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:597)
E/AndroidRuntime(  343):        at dalvik.system.NativeStart.main(Native Method)
W/ActivityManager(   61):   Force finishing activity org.mozilla.fennec_graydon/.App
W/ActivityManager(   61): Activity pause timeout for HistoryRecord{4077dad8 org.mozilla.fennec_graydon/.App}

despite there clearly being a lib/armeabi-v7a/libmozglue.so in the .apk. The apk runs fine on a hardware 2.3 device (I'm sufficiently behind the curve to _have_ one of those) so I'm a little perplexed. Thinking perhaps the difference has to do with ABI: the emulators for API-level 10 refer to themselves as ABI armeabi, not armeabi-v7a (as the API-level 16 ones do). Perhaps the difference is confusing the linker? Will look further.

Comment 30

5 years ago
It sounds like it's not extracting libmozglue.so. Can you try the armv6 builds? Maybe the emulator isn't configured for armv7.
Definitely doesn't like armv6. Worse even! Gives a SIGILL: 

D/dalvikvm(  480): Trying to load lib /data/data/org.mozilla.fennec_graydon/lib/libmozglue.so 0x40515c78
I/DEBUG   (   31): *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
I/DEBUG   (   31): Build fingerprint: 'generic/sdk/generic:2.3.3/GRI34/101070:eng/test-keys'
I/DEBUG   (   31): pid: 480, tid: 480  >>> org.mozilla.fennec_graydon <<<
I/DEBUG   (   31): signal 4 (SIGILL), code 1 (ILL_ILLOPC), fault addr 80435c8c
I/DEBUG   (   31):  r0 00000000  r1 80464918  r2 80464428  r3 80464428
I/DEBUG   (   31):  r4 80464918  r5 be99f358  r6 804648f8  r7 be99f354
I/DEBUG   (   31):  r8 804648f8  r9 0000ce48  10 43b01040  fp 41d48bb8
I/DEBUG   (   31):  ip 80463d6c  sp be99f328  lr 80435c88  pc 80435c8c  cpsr 60000010
I/DEBUG   (   31):          #00  pc 00035c8c  /data/data/org.mozilla.fennec_graydon/lib/libmozglue.so
I/DEBUG   (   31):          #01  lr 80435c88  /data/data/org.mozilla.fennec_graydon/lib/libmozglue.so

I'm going to try repacking the armv7 binary with the library dir renamed to armeabi, and maybe fiddle with arch settings in the build environment if that doesn't work. Will post back if I get anything working.
I get a SIGILL when blindly repacking armeabi-v7a as armeabi, so it seems like we're distinguishing ABI intentionally here. Looks like perhaps it's because we're using thumb2 instructions in the armeabi-v7a builds (see bug 616020) so I'll try a non-thumb2 build.
(Note that the armv6 builds don't use thumb at all, FWIW.)
Yeah. I've now done a reasonably thorough set of different builds, every config knob I can see to turn on the arm builds, they all SIGILL on that emulator. I'm going to look more closely at the faulting code now.
Faulting instruction is the uxtb at 0x35c8c:

00035c80 <std::locale::operator!=(std::locale const&) const>:
   35c80:       e92d4008        push    {r3, lr}
   35c84:       ebffffcd        bl      35bc0 <std::locale::operator==(std::locale const&) const>
   35c88:       e2200001        eor     r0, r0, #1
   35c8c:       e6ef0070        uxtb    r0, r0
   35c90:       e8bd8008        pop     {r3, pc}

Emulated machine is:

<4>CPU: ARM926EJ-S [41069265] revision 5 (ARMv5TEJ), cr=00093177
<4>CPU: VIVT data cache, VIVT instruction cache
<4>Machine: Goldfish

ARM manuals say that 'uxtb' was introduced with ARMv6, so yeah, it won't work on this machine. I'll try producing a compatible build, see if we can work on ARMv5TEJ.
Yeah, that does it. Building with:

  ac_add_options --with-arch=armv5te

works on the gingerbread ARM images (both on workstation and on the AWS host). Looks like we can go as far back as we like. This is just testing manually with a hand-build apk; will we be publishing builds for armv6 / armv5te in our build infrastructure? Which android revs should I be making test AVD image bundles for? JB, ICS and gingerbread? Froyo?

Still not getting any visual output in the content area. Will start looking into that now.
We build the armv6 builds, but I don't think there are any plans to target anything lower. Do they not have an emulator that supports armv6 or better?
The gingerbread images AOSP ships in their SDK are armv5te. I can try putting together gingerbread better-than-v6 emulators (possibly v7-a? I know it did target such devices; I have one) but that'll involve a full android build from AOSP upstream. Somewhat involved exercise. I'd like to be sure we want it before I set out to try; I'll need to repartition my workstation to make room for such adventures and/or buy a new HDD. Don't have the 30gb-per-build spare kicking around :)
(Reporter)

Comment 39

5 years ago
Buy a new HDD and expense it. Let me know if you need my CC number :)
We're only shipping armv6 and armv7 Android builds, with the latter being what most of our users use, so I think being able to test those builds is vital to making this a useful effort.

Interesting side note, we don't currently have real armv6 test coverage, since we don't have armv6 hardware to test on. We test the armv6 builds by running them on our armv7 hardware, which isn't a great test, so running them on armv6 emulators would actually be better in some ways than our current test setup. (The flip side of that is that if you only set up armv7 emulators to test both types of builds, it's no worse than what we're doing now.)
(Also, thanks for all your hard work here!)
Cool. I'll aim for armv6-on-froyo (or possibly gingerbread). Turns out the toolchain:source compatibility has bitrotted in the meantime. Finding a froyo-era JDK that runs on today's hosts is .. "challenging", at the moment I've backed off that and am aiming for gingerbread to start. That too is a bit of a muddle: 2013's C++ compilers refuse to compile 2011's C++ (at least parts of it). Woo! I'll try a variety of downgrades and time machines, post back here any info about which if any can produce Actual AVDs.
Slight update: I can get AOSP gingerbread to build using a modern ubuntu if I install {gcc,g++}-4.4{,multilib} and build with CC=gcc-4.4 CXX=g++-4.4 make. But: that spits out an armv5te build. When I try to fiddle the subarch build settings to start on an armv6 build, it gets a little ways in before halting on a missing makefile for that subarch (they only provide armv5te and armv7-a). Adding one of those, next up is a missing subarch JIT core in dalvik, same deal. I'll see how much fiddling is required to make one of those for armv6; it looks like it's partly machine generated / templatized, might not be too involved. Not sure how many days to put into this, but I'll keep down the path for a while.

(I guess there just aren't enough armv5te devices for us to bother supporting a build?)
Spent a while playing stop-modify-go with the AOSP tree to get an armv6 subarch built, was getting frustrated at the number of pieces inside dalvik and realizing that I _should_ only need to pass different cpu flags into qemu at the outermost layer: an armv5te userspace _should_ work on armv6 hardware. Sadly all the arm11xx (=armv6) subtargets of qemu fail to boot the provided armv5te-subarch goldfish kernel. So now moving on to trying to build an armv6-subarch goldfish kernel alone, to boot into armv5te userspace. Fingers crossed.
Afternoon of struggling with armv6, not much to show for it. Neither the AOSP goldfish kernels, nor a manual rebuild from upstream, nor various other plausible arm code like debian-netinst, raspberry pi or freebsd images want to get very far on any armv6 qemu cpu I can throw them at. It seems like the _only_ qemu cpu willing to run at all is the armv5te variant the AOSP emulator uses, arm926 (despite trying all 26 cpu variants). Which is nonsense since the emulator manages (somehow) to run an armv7-a cpu. I am clearly doing _something_ wrong here but can't figure out what. Will continue digging tomorrow.
Couple more days of investigation, status update:

  - Successfully booted an raspi kernel on modern qemu with -cpu arm1176. So this proves qemu _can_ emulate
    armv6 hardware in at least some configurations. It's not experimental / unsupported, just needs very specific
    flags and emulated hw config (versatilepb, 256mb, sda2, arm1176 or similar)

  - Can't quite get stock qemu to boot goldfish config. Not clear how much the goldfish board config is
    even representable with qemu command line flags. Might need to port goldfish forward?

  - AOSP emulator lacks arm1176, it forked a while back. It has arm1136 which is plausibly good enough
    (the arm1136 in stock qemu will also boot the same raspi image).

  - AOSP emulator hangs trying to start the armv5te kernel with -cpu arm1136. This suggests either the emulator
    is so old that it had broken armv6 support, or else there are arch-specific things in the early boot sequence
    that require a kernel _specifically_ built for armv6.

  - Now pursuing latter possibility, fiddling with kernel configs to try to produce an armv6-specific kernel
    that can boot on the old AOSP emulator. If that fails for long enough, will try either more attempts at
    using stock qemu to boot goldfish kernels, or porting goldfish board definition forward, or else modifying
    AOSP to have a newer qemu (hopefully not!)
Update: taught myself enough Kconfig and defconfig twiddling and looked far enough into the failure of the first fault I was seeing on a 1136 cpu (failed write to an "unsupported" CP15 register) that I got a booting armv6 goldfish kernel now, in a gingerbread AVD, under the stock AOSP emulator's qemu, with its 1136-r2 cpu (which fixes the CP15 bug):

$ emulator -avd test-gingerbread -show-kernel -kernel build/arch/arm/boot/zImage -qemu -cpu arm1136-r2
Uncompressing Linux... done, booting the kernel.
Booting Linux on physical CPU 0
Initializing cgroup subsys cpu
Linux version 3.4.0-g6dff53c-dirty (graydon@tantrum) (gcc version 4.6.x-google 20120106 (prerelease) (GCC) ) #5 PREEMPT Tue Oct 1 18:27:33 EDT 2013
CPU: ARMv6-compatible processor [4107b362] revision 2 (ARMv6TEJ), cr=00c5387d
CPU: VIPT aliasing data cache, unknown instruction cache
Machine: Goldfish
...

The userspace comes part-way up and the GUI initializes but it then gets somewhat crashy. I'll look into why tomorrow and later. Attaching some wip kconfigs and such here for posterity.
Created attachment 812873 [details] [diff] [review]
patch against the android goldfish kernel tree to support armv6

This is just for safekeeping of the work / observation of what I'm up to. Will take several more rounds to be review-worthy.
Fussing with the userspace more suggests that I am in fact going to need a full armv6 build of the whole thing. I will keep trying to disprove this but in the meantime I'm trying to also build an armv6 userspace.

Since this was so daunting last time (including a fair bit of fiddling inside dalvik) I looked around a little more and discovered that while AOSP itself doesn't ship one, there's a cyanogenmod fork (https://github.com/androidarmv6/android) that claims to. Trying to build the gingerbread branch of this presently. Unfortunately full android builds are enormous and even initializing a workspace for them takes most of a workday of file-transfer time.

Comment 50

5 years ago
What is the target testing arch(s) that we want to move to the cloud?

I ask because we mention armv6 quite a lot in the bug, however, if I understand correctly armv6 is not where we would gain the most value since we officially target armv7 and we have more users there (IIUC we only have a small set of armv6 devices supported for Beta - not sure if officially or just experimentally).

Regardless if armv6 becomes or does not become officially a target for this bug, I just wanted to encourage targeting armv7 in hopes that it is easier than armv6 support.

That said, I'm not a decision maker or might even got all the facts right.

Thanks and best of luck!
(Reporter)

Comment 51

5 years ago
The tentative target is to move *everything* into the cloud, to the degree its possible.

Comment 52

5 years ago
ARMv7 has already been largely proven to work by the B2G testing infrastructure. ARMv6 is the tricky part that would let us kill the tegras without losing anything.

BTW, if you get around to trying to make hardware GPU work, you can probably refer to the VMs used for B2G testing. I set them up for that last year. I believe the important parts were in the x config (dummy video driver) and updating mesa & xorg to the very bleeding edge. Then we used something like x11vnc to actually see what's going on. I probably mentioned this already, but might as well get it written down in a bugzilla bug.
(In reply to Michael Wu [:mwu] from comment #52)

> BTW, if you get around to trying to make hardware GPU work, you can probably
> refer to the VMs used for B2G testing. I set them up for that last year. I
> believe the important parts were in the x config (dummy video driver) and
> updating mesa & xorg to the very bleeding edge. Then we used something like
> x11vnc to actually see what's going on. I probably mentioned this already,
> but might as well get it written down in a bugzilla bug.

Indeed, I haven't got to that yet, but while we're on the topic: I remain a little confused about it. Are you suggesting _hardware_ hardware? Or virtual hardware (eg. mesa in software mode) that just happens to be running on the outside of the qemu emulator, but still inside the xen VM on AWS? I could picture the latter being faster than emulated rendering, but I don't _think_ we get real GPU hardware to work with on an AWS node, do we?

Comment 54

5 years ago
I mean virtual hardware, which looks like hardware from the emulated device's POV, but is actually just Mesa doing everything in software. xorg has to be used, rather than xvfb or any other x servers to provide what the emulator expects.

FWIW, Amazon does seem to have some options for real GPU hardware - http://aws.amazon.com/gpu/ . Probably not useful for our case, but it apparently exists.
(In reply to Michael Wu [:mwu] from comment #54)
> I mean virtual hardware, which looks like hardware from the emulated
> device's POV, but is actually just Mesa doing everything in software. xorg
> has to be used, rather than xvfb or any other x servers to provide what the
> emulator expects.

Right, ok, that's what I thought you'd described before. I'll have a look at it later on when I finish the adventure of armv6. Thanks.

> FWIW, Amazon does seem to have some options for real GPU hardware -
> http://aws.amazon.com/gpu/ . Probably not useful for our case, but it
> apparently exists.

Yeah, I saw those but I wasn't sure that was part of the plan.
(In reply to Michael Wu [:mwu] from comment #52)

FYI the both tegras AND the pandas are ARMv7, neither is ARMv6.

Comment 57

5 years ago
I was hoping we could move the Android 2.2 Tegra Armv7 testing to EC2 first. That way we could leave armv6 and noion testing on the tegras (they would have better wait times) while we still investigate armv6 on ec2.

Anyhow, whatever you choose is fine with me.

Comment 58

5 years ago
My understanding is that we are planning to move Android 2.2 testing off of Tegras in the near future. I've already spent some time looking at Android 2.2 testing on the pandaboards and I believe this will come up again next quarter. It would be nice to know if EC2 is a viable option for Android 2.2 armv7 testing as it would save the effort of trying to create stable Android 2.x images for the pandaboards.
As Dan says, we will stop supporting Android 2.2 by the end of 2014, but need to stand up testing support for Android 2.3 before that happens. If we can get 2.3 working on AWS, that obviates the need to move (all or at least most) testing to the pandas.
Update: continuing to struggle with the android build system. It is exceptionally complex, and I am having a very hard time making any headway with it. I am unclear how much time I ought to put in to trying to convince it to build (either patching stock aosp or using 3rd party variants -- neither seems particularly easy).

Will continue for a while but I suspect it's worth asking: if I can't manage to build a gingerbread-armv6 emulator, would "running armv6 builds on the jellybean armv7-a emulator" be of any value, as far as testing? It's a long way from what end users will actually be running (different arch, different OS release) but it's a combination that might serve as a smoketest, at least, and might work with the stock aosp parts. The whole problem I'm pursuing is trying to get a "more realistic" armv6 environment (right arch, OS release users are likely to be on). If neither of those are crucial, and if simply running a smoketest of the armv6 firefox is adequate, I can pause this matter and shift to more productive stuff in the short term.

Comment 61

5 years ago
(In reply to Graydon Hoare :graydon from comment #60)
> Update: continuing to struggle with the android build system. It is
> exceptionally complex, and I am having a very hard time making any headway
> with it. I am unclear how much time I ought to put in to trying to convince
> it to build (either patching stock aosp or using 3rd party variants --
> neither seems particularly easy).
> 
> Will continue for a while but I suspect it's worth asking: if I can't manage
> to build a gingerbread-armv6 emulator, would "running armv6 builds on the
> jellybean armv7-a emulator" be of any value, as far as testing? It's a long
> way from what end users will actually be running (different arch, different
> OS release) but it's a combination that might serve as a smoketest, at
> least, and might work with the stock aosp parts. The whole problem I'm
> pursuing is trying to get a "more realistic" armv6 environment (right arch,
> OS release users are likely to be on). If neither of those are crucial, and
> if simply running a smoketest of the armv6 firefox is adequate, I can pause
> this matter and shift to more productive stuff in the short term.

We currently armv6 builds on armv7 devices. Would "running armv6 builds on the
 jellybean armv7-a emulator" be equivalent to it for the most part?
I assume that running armv6 on armv7 tegras is as well a long way from what our end users will actually running, no?

I hope this helps!
(In reply to Graydon Hoare :graydon from comment #60)
> Will continue for a while but I suspect it's worth asking: if I can't manage
> to build a gingerbread-armv6 emulator, would "running armv6 builds on the
> jellybean armv7-a emulator" be of any value, as far as testing? It's a long

Our current state of Android testing consists of the following:
Tegras running 2.2, Pandaboards running 4.0 (both are armv7).

We run our armv7 Firefox builds on both sets of devices. We run our armv6 builds on the Tegras. We do not currently have any testing of the armv6 builds on actual armv6 hardware. Getting that would be a nice improvement, but not worth losing your sanity over.

I guess the sticking point here is getting 2.3 working with anything > armv5? AIUI our current biggest issue is that we want to end-of-life the Tegras, but then we lose test coverage for everything < 4.0. If we can't get a >armv5 emulator running 2.x without an enormous expenditure of time then I don't think this solves that problem for us. Getting Android 4.0 tests running in an emulator, while neat, doesn't really solve a pressing problem for us since we have a giant pile of Pandaboards.
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #62)

> I guess the sticking point here is getting 2.3 working with anything >
> armv5?

Yeah. The aosp emulator works in 2 broad flavours:

   2.2 & 2.3 on armv5te
   3.x & 4.x on armv7-a

If we want 2.x on newer-than-armv5te, I need to try to get _some_ sort of custom build working. It might be easier if I try to get 2.x on armv7-a, but I figured if I was going for custom builds I'd try "proper" 2.x on armv6 for testing the armv6 binaries on an honestly-limited target.

I don't think we're quite at "enormous" expenditure of time yet, but I will broaden the scope of builds I'm trying to get going to "anything 2.x with >armv5", given the pressing tegra-EOL problem; that gives a couple other options :)
(Reporter)

Comment 64

5 years ago
I think armv7 should be our top priority to make work. The majority of our users will be on that going forward. Everything else is bonus.
Update: taking a different tack, I've tried today (and will keep trying) to get a gingerbread build running on an armv7-a emulator as suggested. This won't be an _ideal_ gingerbread armv6 test environment but it should at least be easier to get going. So far some luck, but not perfect.

For those interested in reproducing, the steps (sadly not all that well documented) are:

 $ mkdir aosp
 $ cd aosp
 $ repo init -u https://android.googlesource.com/platform/manifest -b android-2.3.7_r1
 $ repo sync
 $ lunch 1
 $ make -j TARGET_ARCH_VARIANT=armv7-a
 $ cd ../
 $ git clone https://android.googlesource.com/kernel/goldfish.git
 $ cd goldfish
 $ git checkout origin/android-goldfish-2.6.29
 $ make goldfish_armv7_defconfig
 $ make -j
 $ cd ../aosp
 $ out/host/linux-x86/bin/emulator -show-kernel -kernel ../goldfish/arch/arm/boot/zImage -qemu -cpu cortex-a8

That will get you most, but not all, of the way into a gingerbread armv7 session. It gets me something like:

Linux version 2.6.29-ge3d684d-dirty (graydon@tantrum) (gcc version 4.6.x-google 20120106 (prerelease) (GCC) ) #1 Tue Oct 15 13:58:00 PDT 2013
CPU: ARMv7 Processor [410fc080] revision 0 (ARMv7), cr=10c0387f
CPU: VIPT nonaliasing data cache, VIPT nonaliasing instruction cache
Machine: Goldfish
Memory policy: ECC disabled, Data cache writeback
Built 1 zonelists in Zone order, mobility grouping on.  Total pages: 24384
Kernel command line: qemu=1 console=ttyS0 android.checkjni=1 android.qemud=ttyS1 android.ndns=2
...
yaffs_read_super: isCheckpointed 0
init: cannot find '/system/etc/install-recovery.sh', disabling 'flash_recovery'
sh: can't access tty; job control turned off
$ eth0: link up
warning: `rild' uses 32-bit capabilities (legacy support in use)

At which point it iloops, presumably in a panic. The emulator GUI is up and the kernel seems to be _mostly_ booted, but doesn't want to go much further. I'll keep poking at it to see if I can figure out why it's panicking there.
Update: Success! At least under the revised gingerbread-armv7-a plan. Under AOSP docs we have this gem: http://source.android.com/source/known-issues.html#black-gingerbread-emulator

Which suggests (for reference sake) shifting the qemu version forward to R12 to get gingerbread to work at all (why they pin the gingerbread branch to a non-working version of the emulator, I can't tell):

  $ repo forall platform/external/qemu -c git checkout aosp/tools_r12

When I do this and rebuild (with CC=gcc-4.4 CXX=g++-4.4 TARGET_ARCH_VARIANT=armv7-a) I get an emulator that gets through booting into gingerbread on armv7-a, which I can install a nightly fennec armv6 APK on and it seems to work.

I'll write up a complete build script for this next, so others can reproduce (the steps above were close but leave a couple things unwritten re: getting toolchain & and setting env vars). Then see about making AVD tarballs for integration with Armen's automation scripts.
Oh, also worth asking here: you say the tegras are doing 2.2. Shall I go a step further back to 2.2 (froyo) or continue focusing on 2.3 (gingerbread)? Or both?
We're phasing out support for 2.2 by the end of 2014, and we definitely need to stand up 2.3 quite a bit before that happens.  IMHO, getting 2.3 working is more critical than 2.2 unless there's a possibility of switching over the entire production infrastructure for 2.2 before January (which would imply that we didn't have to move the the majority of the tegras out of mountain view).  I'm going to guess that's not too likely, though, since there's a lot of verification that has to be done even after we get the base OS up and running.

Comment 69

5 years ago
(In reply to Graydon Hoare :graydon from comment #67)
> Oh, also worth asking here: you say the tegras are doing 2.2. Shall I go a
> step further back to 2.2 (froyo) or continue focusing on 2.3 (gingerbread)?
> Or both?

I'd prefer to keep focusing on 2.3. We're going to phase out 2.2 testing anyway and we have a plan to keep the tegras working until phase out. We need a means to run 2.3, and if we can do that in emulation, that saves us a bunch of headaches with trying to do 2.3 on the panda boards.
A little fussing with details, this appears to mostly-reliably build the correct kernel and emulator for gingerbread-armv7a, assuming you have a lot of bandwidth and disk space.

https://github.com/graydon/dockerfiles-aosp-build/blob/master/gingerbread-armv7-a/build.sh

I'll do AVDs next. Was trying to get a (docker) packaged build env for others to hack on but it seems docker's a little unstable / crashy when applied to large images like this.
Update: I've spent the week working on mozharness-based automation for construction of AVDs, incorporating the contents of the build script above and the steps outlined in bug 894507, (re)writing in python, and simplifying parts such that as much as possible happens from AOSP defaults. The resulting script is parameterized by android release version and target (sub)architecture and adjusts the numerous slightly-different-values (ini files, path names, URLs, env vars) that have to change in concert when switching from one such setting to another.

It's not 100% working but close (builds the AVDs but doesn't customize them with the test-automation apks yet). Should have something to show by mid next week.
Created attachment 823707 [details]
android_emulator_build.py

This is a preliminary (but from my testing here, working) build script that should start from a mostly-stock ubuntu 12.04LTS and build an AOSP userspace, kernel, emulator and a set of testing AVDs customized with the necessary bits of mozilla testing infrastructure described in bug 894507. There are no manual steps.

It's abstracted over target arch (armv5te, armv7 and x86) and target android release (2.2, 2.3, 4.0, etc.), though I've so far only been testing AVDs it builds with armv7-android-2.3.7_r1 (which I'm presently uploading a test set of to http://people.mozilla.org/~graydon/AVDs-armv7a-android-2.3.7_r1-build-2013-10-28-graydon.tar.gz but the outbound connection here says it'll take an hour to push the necessary 200MB, sigh).

The script could probably do with a lot more abstraction (it always uses the latest nightly target binaries, for example, and hard-codes in the device screen size and whatnot) and I suspect the armv5te won't work yet since our testing APKs are atmv6 at newest. But it's something to build on. It's mozharness-based and I'd like to submit this to the mozharness/scripts directory. Not sure how review for that sort of thing works.
Seems like copying the emulator executables and the AVDs to a fresh ubuntu 32bit instance in AWS (and doing a very small edit to the .ini files) produces a working system. I'll adjust the script to make that edit. Meanwhile http://people.mozilla.org/~graydon/emulators-armv7a-android-2.3.7_r1-build-2013-10-29-graydon.tar.gz are emulator and adb binaries that run the AVDs above.
Minor update: re-running AVD builds on AWS to check "clean environment" behavior, while testing different fennec APKs on desktop.

The fennec-armv6 APK runs in gingerbread-armv7a but not the fennec-armv7a APK. As before, this is an ABI setting difference, but now in android userspace, not kernel & emulator (those I have corrected). Build props sadly confirms:

$ adb -e shell cat /system/build.prop | grep abi
ro.product.cpu.abi=armeabi

I will work on this presently. Meanwhile uploading the current version of AVD-build script for review. Modified script to set .ini homedir to /home/cltbld as in other scripts, and to clean out redundant .ini files / anything not autogenerated by the emulator.
Created attachment 824879 [details]
script to build various AVDs and emulators from AOSP + mozilla test utils
Attachment #823707 - Attachment is obsolete: true
Attachment #824879 - Flags: review?(armenzg)
Created attachment 824926 [details]
script to build various AVDs and emulators from AOSP + mozilla test utils

Revised to include setting TARGET_CPU_ABI=(x86|armeabi|armeabi-v7a) and TARGET_CPU_ABI2 during AOSP build, which (when set together) are sufficient to run both armv6 and armv7-a fennec APKs:

$ adb -e shell cat /system/build.prop | grep abi
ro.product.cpu.abi=armeabi-v7a
ro.product.cpu.abi2=armeabi
Attachment #824879 - Attachment is obsolete: true
Attachment #824879 - Flags: review?(armenzg)
Attachment #824926 - Flags: review?(armenzg)
Created attachment 825017 [details]
script to build various AVDs and emulators from AOSP + mozilla test utils

tiny update to run emulator headless when customizing initial AVD
Attachment #824926 - Attachment is obsolete: true
Attachment #824926 - Flags: review?(armenzg)
Attachment #825017 - Flags: review?(armenzg)

Comment 79

5 years ago
I've briefly looked at the patch, however, I'm on buildduty and I would like to leave it until next week to review it.
Is that  OK?

On another note, what is the long term purpose of the script?
Is it so we have a reference of what was used to build the emulator?
Is it so we can easily create a newer emulator build in the future if we needed to?
Are we considering running this regularly on the releng infra?

If we're not going to run it regularly on the releng infra, I would be inclined to do a light review, test it on Ubuntu 32-bit machine and land it as-is on mozharness.

BTW, what AWS instance did you have loaned for this work?
This one? dev-tst-linux64-ec2-graydon? In your comments I read "32-bit" rather than "64-bit" machine.
If we're shipping these AVDs to systems using Linux packaging tools (apt, in this case, I assume), then that script would probably turn into a debian/control file et al, suitable for creating such a package.

That would also neatly sidestep issues of external reproducibility: if someone outside Mozilla wants to create the same AVD, they just acquire and supply the same raw materials to the build script, and they have themselves a .deb.  We've been talking about how to better ship AVDs in bug 913011.
(In reply to Armen Zambrano [:armenzg] (Release Engineering) (EDT/UTC-4) from comment #79)
> I've briefly looked at the patch, however, I'm on buildduty and I would like
> to leave it until next week to review it.
> Is that  OK?

Sure. I'll update with a few more fixes in the meantime. Don't mean to be a pest, just figured it's worth landing.

> On another note, what is the long term purpose of the script?

Producing new AVDs as we vary the input parameters. We are interested in testing (if I understand correctly):

  - android 2.2.x (froyo) for armv7a (testing the armv6 and armv6 apks)
  - android 2.3.x (gingerbread) for armv7a (same)
  - android 4.0 (ICS) for armv7a and x86 (testing the armv6, armv7 and x86 apks)
  - android 4.3 (jellybean) for armv7a and x86 (same)

Possibly others (there are actually 11 android API-levels in between major codenames there), but even with just those we're talking about 6 sets of 4 AVDs. We will also need to regenerate all of them any time we discover a mistake in our work or want to upgrade to a new SUT agent or emulator or such.

> Is it so we have a reference of what was used to build the emulator?

Sure, that's also helpful. I would prefer "stuff I did on my workstation" to be something automatic that anyone can do.

> Is it so we can easily create a newer emulator build in the future if we
> needed to?
> Are we considering running this regularly on the releng infra?

Not regularly, no. Periodically as we need to refresh AVDs. I expect once we have the wrinkles ironed out, only every six months or so. Presumably sometimes when new versions of android hit the wire.

> If we're not going to run it regularly on the releng infra, I would be
> inclined to do a light review, test it on Ubuntu 32-bit machine and land it
> as-is on mozharness.

Yeah, I don't think it needs exceptionally detailed review. Just a look over for the basics of "is this code even appropriate for storage in the mozharness repo".

> BTW, what AWS instance did you have loaned for this work?
> This one? dev-tst-linux64-ec2-graydon? In your comments I read "32-bit"
> rather than "64-bit" machine.

I think that's the one, yes. The 32bit-ism above is ... curious, I haven't quite worked it out yet, but yes, I did mean to say "32bit". It seems like AOSP prefers to build a 32bit host emulator even when it's on a 64bit host; so I was running the most recent emulator test on a 32bit machine (I am dynamically activating AWS machines as I need them on my personal account as I need them; dev-tst-linux64-ec2-graydon is powered off presently though it's probably time to bring it back online soon for more finishing touches).

I haven't quite found the switch to change the 32bit-ism yet. Hopefully will dig it up shortly.
(In reply to Dustin J. Mitchell [:dustin] (I read my bugmail; don't needinfo me) from comment #80)
> If we're shipping these AVDs to systems using Linux packaging tools (apt, in
> this case, I assume), then that script would probably turn into a
> debian/control file et al, suitable for creating such a package.
> 
> That would also neatly sidestep issues of external reproducibility: if
> someone outside Mozilla wants to create the same AVD, they just acquire and
> supply the same raw materials to the build script, and they have themselves
> a .deb.  We've been talking about how to better ship AVDs in bug 913011.

Currently the script above just makes two tarballs: one containing AVDs and one containing emulators. But it could easily be modified to produce .debs (though I have no idea how to appease lintian / obey debian packaging guidelines, I'm sure I could figure it out).

I should point out, concerning reproducibility: it's quite arduous and likely a little failure-prone. It involves pulling in about 13G of source and toolchain repos from various things google is currently keeping online, but may not always, and building an additional 4G of temporaries. Takes the better part of a day and times out / fails with some frequency.
Well, that sounds like an interesting challenge that we can tackle after this is ready to go.  We can certainly start with hand-built install-these-binaries .debs while we figure it out.  Ultimately, I think we'd want a static, local copy of that 13G, rather than relying on external hosting (and is that pulling tags from those repos, or are different pulls potentially getting different source?).  Then we can rebuild exactly the same AVD if desired, and make changes from there only when we want to.  Anyway, that's getting a little ahead of things, so back to your regularly scheduled getting-this-stuff-working :)
Created attachment 826175 [details]
script to build various AVDs and emulators from AOSP + mozilla test utils

minor fixes for building newer emulators / android versions

Updated

5 years ago
Attachment #826175 - Flags: review?(armenzg)

Updated

5 years ago
Attachment #825017 - Attachment is obsolete: true
Attachment #825017 - Flags: review?(armenzg)
Created attachment 827149 [details]
script to build various AVDs and emulators from AOSP + mozilla test utils

Minor update that includes only refactoring. I initially tried today to arrange the per-AVD sub directories to only contain userdata.img and userdata-qemu.img, and put system.img, ramdisk.img and kernel-qemu in the shared ~/.android/avd directory, but the old emulator used for gingerbread seems to ignore command line flags -kernel, -system and -ramdisk when provided with an -avd flag. So I think we have to put all the .img files in per-AVD directories.

(This is probably for the best long-term anyways, since it permits single developers to have multiple different-arch / different-android-release AVDs on the same machine.)
Attachment #826175 - Attachment is obsolete: true
Attachment #826175 - Flags: review?(armenzg)
Attachment #827149 - Flags: review?(armenzg)
Created attachment 827155 [details] [diff] [review]
Steps to making androidx86_emulator_unittest.py arch-neutral

This should probably also include renaming androidx86_emulator_unittest.py to android_emulator_unittest.py. Uploading here to show/check intent; haven't tested against the modified AVDs yet. Will try tomorrow.
Attachment #827155 - Flags: feedback?(armenzg)
Attachment #827149 - Attachment mime type: text/x-python → text/plain

Comment 87

5 years ago
Comment on attachment 827155 [details] [diff] [review]
Steps to making androidx86_emulator_unittest.py arch-neutral

Review of attachment 827155 [details] [diff] [review]:
-----------------------------------------------------------------

FTR, we want to create a different config file for Android 2.3 testing. Probably obvious but wanted to make sure it was said explicitly.

Right direction so far.

::: configs/android/androidx86.py
@@ +7,4 @@
>      "device_ip": "127.0.0.1",
>      "default_sut_port1": "20701",
>      "default_sut_port2": "20700", # does not prompt for commands
> +    "avds_path": "/home/cltbld/avds/test-avds.tar.gz",

We have to leave this as is for now.
This is deploy by puppet and would require an associated puppet change with it.

This will be fixed in bug 919812.

::: scripts/androidx86_emulator_unittest.py
@@ +183,4 @@
>          command = [
>              "emulator", "-avd", emulator["name"],
>              "-debug", "all",
> +            "-port", str(emulator["emulator_port"])

Where are we specifying these paths?

@@ +187,3 @@
>          ]
> +        if "emulator_cpu" in self.config:
> +            command += ["-qemu", "-cpu", self.config["emulator_cpu"] ]

Where do you specify emulator_cpu? I can't see it.

@@ +317,4 @@
>          We have deployed through Puppet tar ball with the pristine templates.
>          Let's unpack them every time.
>          '''
> +        if os.path.exists(os.path.join(self.config[".avds_dir"], "test-1.avd")):

It seems we would need to make the avd bundle to contain avd files named appropriately or to put the name of the avd file inside of the x86 config file.

However, this will be different in bug 919812.
Attachment #827155 - Flags: feedback?(armenzg) → feedback+

Comment 88

5 years ago
Comment on attachment 827149 [details]
script to build various AVDs and emulators from AOSP + mozilla test utils

I'm focusing this week on the bugs I have assigned for Android x86 as I might be asked to drop it and switch to another project soon.

Sorry for delaying you in here. If you need a review in here you probably want to run it through gbrown or jlund.

I'm happy to do so next week if it is still has value by then.
(In reply to Armen Zambrano [:armenzg] (Release Engineering) (EDT/UTC-4) from comment #87)

> FTR, we want to create a different config file for Android 2.3 testing.
> Probably obvious but wanted to make sure it was said explicitly.

Yeah. I think we need somewhere between 3 and 9 different configs, depending on requirements I'm not really clear on. I mentioned a guess at 9 different target configs above in comment 81. Anyone on this bug aware of the total set of targets and APKs we're interested in (including difference between armv6 and armv7a APKs)?


> ::: configs/android/androidx86.py
> @@ +7,4 @@
> >      "device_ip": "127.0.0.1",
> >      "default_sut_port1": "20701",
> >      "default_sut_port2": "20700", # does not prompt for commands
> > +    "avds_path": "/home/cltbld/avds/test-avds.tar.gz",
> 
> We have to leave this as is for now.
> This is deploy by puppet and would require an associated puppet change with
> it.
> 
> This will be fixed in bug 919812.

I don't understand what that means. I gather puppet somehow places the AVD tarball on the test machine, yes? We will need it to place one of several different AVDs on the test machine, depending on the emulated target we're testing. How will we convey this parameter to puppet in advance of running this script?

> ::: scripts/androidx86_emulator_unittest.py
> @@ +183,4 @@
> >          command = [
> >              "emulator", "-avd", emulator["name"],
> >              "-debug", "all",
> > +            "-port", str(emulator["emulator_port"])
> 
> Where are we specifying these paths?

I assume you're asking here about the removal of the paths for kernel-qemu, system.img and ramdisk.img. As I said in command 85, the "emulator" command seems not to pay any attention to the flags '-kernel', '-ramdisk' or '-system' when it is also passed '-avd', at least older versions of the emulator (such as the R12 one used in the gingerbread build). It does, however, automatically work if the associated files kernel-qemu, system.img and ramdisk.img are stored in the per-AVD directories (avd/test-1.avd etc.) -- it sees those files and uses them. So in the AVD-creation script I posted I am storing (redundant) copies of those images in each AVD directory.

This has an additional advantage, beyond working on old emulators: permitting different kernels and system.img files to be present in a user's ~/.android/avd directory. For example a user could have gingerbread-armv7a, jellybean-armv7a and jellybean-x86 AVDs all at the same time, switching between them by passing different -avd flags.

> @@ +187,3 @@
> >          ]
> > +        if "emulator_cpu" in self.config:
> > +            command += ["-qemu", "-cpu", self.config["emulator_cpu"] ]
> 
> Where do you specify emulator_cpu? I can't see it.

Nowhere for x86. In the arm-specific config file (not included for review here) we need to pass 'cortex-a8' or 'cortex-a9' to get the emulator to run armv7 instructions.

> 
> @@ +317,4 @@
> >          We have deployed through Puppet tar ball with the pristine templates.
> >          Let's unpack them every time.
> >          '''
> > +        if os.path.exists(os.path.join(self.config[".avds_dir"], "test-1.avd")):
> 
> It seems we would need to make the avd bundle to contain avd files named
> appropriately or to put the name of the avd file inside of the x86 config
> file.
> 
> However, this will be different in bug 919812.

In the AVD-creation script I made above, it makes AVDs named "test-1", "test-2", "test-3" and "test-4" regardless of the architecture and android version specified. It encodes the architecture and android version in the resulting tarball name. I assumed that we would download one of several tarballs and rename it to "test-avds.tar.gz", and then all subsequent code in the anroid_emulator_unittest.py script could ignore android version and architecture variation (aside from passing a different -cpu flag when starting the emulator).
(In reply to Armen Zambrano [:armenzg] (Release Engineering) (EDT/UTC-4) from comment #88)

> Sorry for delaying you in here. If you need a review in here you probably
> want to run it through gbrown or jlund.

No worries, I'm continuing to tweak and am not strictly blocked (don't really know how to deploy this into pre-production / testing yet anyways). Thanks for all your help so far!

Comment 91

5 years ago
WRT to 9 configurations, I assume that in production (tbpl) we would be using two of them (Android x86 and Android 2.3 on armv7) initially.

(In reply to Graydon Hoare :graydon from comment #89)
> (In reply to Armen Zambrano [:armenzg] (Release Engineering) (EDT/UTC-4)
> from comment #87)
...
> > ::: configs/android/androidx86.py
> > @@ +7,4 @@
> > >      "device_ip": "127.0.0.1",
> > >      "default_sut_port1": "20701",
> > >      "default_sut_port2": "20700", # does not prompt for commands
> > > +    "avds_path": "/home/cltbld/avds/test-avds.tar.gz",
> > 
> > We have to leave this as is for now.
> > This is deploy by puppet and would require an associated puppet change with
> > it.
> > 
> > This will be fixed in bug 919812.
> 
> I don't understand what that means. I gather puppet somehow places the AVD
> tarball on the test machine, yes? We will need it to place one of several
> different AVDs on the test machine, depending on the emulated target we're
> testing. How will we convey this parameter to puppet in advance of running
> this script?
> 
You have the correct understanding about puppet.
However, we're going to avoid using puppet for the deployment of avd files.
We're going to download into disk from the tooltool webhost and place it under /builds or /tools.

> > ::: scripts/androidx86_emulator_unittest.py
> > @@ +183,4 @@
> > >          command = [
> > >              "emulator", "-avd", emulator["name"],
> > >              "-debug", "all",
> > > +            "-port", str(emulator["emulator_port"])
> > 
> > Where are we specifying these paths?
> 
> I assume you're asking here about the removal of the paths for kernel-qemu,
> system.img and ramdisk.img. As I said in command 85, the "emulator" command
> seems not to pay any attention to the flags '-kernel', '-ramdisk' or
> '-system' when it is also passed '-avd', at least older versions of the
> emulator (such as the R12 one used in the gingerbread build). It does,
> however, automatically work if the associated files kernel-qemu, system.img
> and ramdisk.img are stored in the per-AVD directories (avd/test-1.avd etc.)
> -- it sees those files and uses them. So in the AVD-creation script I posted
> I am storing (redundant) copies of those images in each AVD directory.
> 
> This has an additional advantage, beyond working on old emulators:
> permitting different kernels and system.img files to be present in a user's
> ~/.android/avd directory. For example a user could have gingerbread-armv7a,
> jellybean-armv7a and jellybean-x86 AVDs all at the same time, switching
> between them by passing different -avd flags.
> 
This will require re-packaging the current Android x86 avd tar balls.

gbrown, does this work for you?
If so, could you please create a new tar ball for bug 919812?

> > @@ +187,3 @@
> > >          ]
> > > +        if "emulator_cpu" in self.config:
> > > +            command += ["-qemu", "-cpu", self.config["emulator_cpu"] ]
> > 
> > Where do you specify emulator_cpu? I can't see it.
> 
> Nowhere for x86. In the arm-specific config file (not included for review
> here) we need to pass 'cortex-a8' or 'cortex-a9' to get the emulator to run
> armv7 instructions.
> 
Ah OK. Could you please add this note to the script? I think it will help in the future.

> > 
> > @@ +317,4 @@
> > >          We have deployed through Puppet tar ball with the pristine templates.
> > >          Let's unpack them every time.
> > >          '''
> > > +        if os.path.exists(os.path.join(self.config[".avds_dir"], "test-1.avd")):
> > 
> > It seems we would need to make the avd bundle to contain avd files named
> > appropriately or to put the name of the avd file inside of the x86 config
> > file.
> > 
> > However, this will be different in bug 919812.
> 
> In the AVD-creation script I made above, it makes AVDs named "test-1",
> "test-2", "test-3" and "test-4" regardless of the architecture and android
> version specified. It encodes the architecture and android version in the
> resulting tarball name.
This approach would work. I will be fixing bug 919812 to accommodate for this.

> I assumed that we would download one of several
> tarballs and rename it to "test-avds.tar.gz", and then all subsequent code
> in the anroid_emulator_unittest.py script could ignore android version and
> architecture variation (aside from passing a different -cpu flag when
> starting the emulator).

I was hoping to cache the tar balls under /builds or /tools since they are quite large and they take a while to download. This is an optimization for setup times as well as reducing network hiccups.
We would only be keeping the latest version for each Android platform on disk.
Flags: needinfo?(gbrown)
I talked to gbrown today and he agreed to allow me to attempt scripted AVD builds until I can make x86 AVDs that are satisfactory for his uses, rather than have him manually do it as in 919812.

I brought up a couple AWS machines and built the following AVDs and emulators from AOSP, via script (for arm-jb and x86-jb, as well as arm-gingerbread from 2 days back):

http://people.mozilla.org/~graydon/AVDs-armv7a-android-2.3.7_r1-build-2013-11-05-ubuntu.tar.gz
http://people.mozilla.org/~graydon/AVDs-armv7a-android-4.3.1_r1-build-2013-11-07-ubuntu.tar.gz
http://people.mozilla.org/~graydon/AVDs-x86-android-4.3.1_r1-build-2013-11-07-ubuntu.tar.gz

http://people.mozilla.org/~graydon/emulators-armv7a-android-2.3.7_r1-build-2013-11-04-ubuntu.tar.gz
http://people.mozilla.org/~graydon/emulators-armv7a-android-4.3.1_r1-build-2013-11-07-ubuntu.tar.gz
http://people.mozilla.org/~graydon/emulators-x86-android-4.3.1_r1-build-2013-11-07-ubuntu.tar.gz

Fwiw I'm also now archiving my mozharness changes-in-progress here:

https://github.com/graydon/build-mozharness/tree/bug910092-android-aws-testing

and will squash/rebase/rewrite them as necessary. I'll coordinate with gbrown and others to try to reach a satisfactory scripted AVD construction that's paired with necessary changes to AVD use in the unit-test script.
Flags: needinfo?(gbrown)
I tried using AVDs-x86-android-4.3.1_r1-build-2013-11-07-ubuntu.tar.gz on the loaner that I have been using for Android x86 testing. I started the emulators semi-manually with:

emulator -avd test-1 -port 5554 -qemu -m 1024 -enable-kvm
emulator -avd test-2 -port 5556 -qemu -m 1024 -enable-kvm
...

The emulators started fine, but neither the watcher nor the sutagent started (I verified that they were installed).

I am surprised by the use of Android 4.3.1 here. The current Android x86 images are 4.2 (JOP40C). I believe there are security changes in 4.3 that interfere with full operation of sutagent (nothing that would prevent sutagent from starting, but will be a problem for test automation).
(In reply to Geoff Brown [:gbrown] from comment #93)

> I am surprised by the use of Android 4.3.1 here. The current Android x86
> images are 4.2 (JOP40C). I believe there are security changes in 4.3 that
> interfere with full operation of sutagent (nothing that would prevent
> sutagent from starting, but will be a problem for test automation).

Weird, yeah, I can reproduce the SUTAgent-non-start here, looks like just adding sdcard.img fixes the gingerbread one but not the jellybean one. It went with 4.3.1 because I asked for "jellybean", nothing too sneaky; I'm rerunning build on JOP40C/android-4.2_r1 presently. Will double check SUTAgent-startup before bothering you again, sorry.
Rebuilt with 4.2, and added https://github.com/graydon/build-mozharness/commit/aa9edb88d3fa04465ee10f6e0f17103d91369f06 to the script, I now see this AVD auto-starting SUTAgent (slowly):

http://people.mozilla.org/~graydon/AVDs-x86-android-4.2_r1-build-2013-11-07-ubuntu.tar.gz

LMK if you need other customizations; if not I'll re-run the armv7 AVD builds (for 4.2) as well.

Updated

5 years ago
Blocks: 936601
(In reply to Graydon Hoare :graydon from comment #95)

I re-tested with the AVDs in Comment 95. SUTAgent started fine. However, reftests fail with:

 INFO -  REFTEST TEST-UNEXPECTED-FAIL | | EXCEPTION: [Exception... "Component returned failure code: 0x80004005 (NS_ERROR_FAILURE) [nsIDOMWindowUtils.layerManagerType]"  nsresult: "0x80004005 (NS_ERROR_FAILURE)"  location: "JS frame :: chrome://reftest/content/reftest.jsm :: BuildConditionSandbox :: line 566"  data: no]

which seems to be:

sandbox.layersGPUAccelerated =
       gWindowUtils.layerManagerType != "Basic";
I note:

hardware-qemu.ini:hw.gpu.enabled = no

That seems bad.
(In reply to Geoff Brown [:gbrown] from comment #98)
> I note:
> 
> hardware-qemu.ini:hw.gpu.enabled = no
> 
> That seems bad.

Yes, this is because I mistakenly left "hw.gpu.enabled=yes" out of the config.ini file, and it auto-generates hardware-qemu.ini from config.ini and its defaults. If you add "hw.gpu.enabled=yes" to the config.ini file it ought to get going (though on my laptop here it crashes out due to not liking my X server's openGL implementation, by the look of it).

I'll repack the AVDs with this change.
Setting hw.gpu.enabled=yes in config.ini does the trick -- reftests pass and I don't see any problems with the other tests either.
Updated to include GPU-enabling setting. Exciting, this is the first time I've got content-area host GL acceleration working on the ARM images. Strange mix of fast (panning) and slow (rendering):

http://people.mozilla.org/~graydon/AVDs-armv7a-android-2.3.7_r1-build-2013-11-13-ubuntu.tar.gz
http://people.mozilla.org/~graydon/AVDs-armv7a-android-4.2_r1-build-2013-11-13-ubuntu.tar.gz
http://people.mozilla.org/~graydon/AVDs-x86-android-4.2_r1-build-2013-11-13-ubuntu.tar.gz

Updated

5 years ago
Attachment #812873 - Attachment is obsolete: true

Updated

5 years ago
Attachment #827149 - Attachment is obsolete: true
Attachment #827149 - Flags: review?(armenzg)

Updated

5 years ago
Attachment #827155 - Attachment is obsolete: true
Created attachment 831844 [details] [diff] [review]
Steps to make androidx86_emulator_unittest.py arch-neutral
Attachment #831844 - Flags: review?(armenzg)
Created attachment 831846 [details] [diff] [review]
Add new scripts/android_emulator_build.py to build AVDs and emulators from AOSP.
Attachment #831846 - Flags: review?(gbrown)
Created attachment 831848 [details] [diff] [review]
Modify androidx86_emulator_unittest.py and config to adapt to script-built AVDs
Attachment #831848 - Flags: review?(armenzg)
Created attachment 831850 [details] [diff] [review]
Add new configs/android/androidarm.py for unit testing on arm emulator
Attachment #831850 - Flags: review?(armenzg)
Armen: The attachments above marked for review-by-you make two thematic changes to your mozharness script androidx86_emulator_unittest.py:

  - Adapt it to be arch-neutral (attachment 831844 [details] [diff] [review] and later, add an arm-specific config in attachment 831850 [details] [diff] [review])
  - Adapt it to minor changes in tarball and AVD-dir structure brought about by scripted AVD construction (attachment 831848 [details] [diff] [review])

You should probably also at some point rename it to just android_emulator_unittest.py but I imagine that involves changes elsewhere that I don't know how to make.

Geoff: The attachment (attachment 831846 [details] [diff] [review]) above marked for review-by-you is just the AVD-construction script, refreshed to incorporate all the changes we've discussed up until today. Previously I had asked Armen to review it also but he seems to be quite busy and the steps involved are ones you're more familiar with anyways, AIUI. It produced the most recent AVDs above.

Comment 107

5 years ago
Comment on attachment 831844 [details] [diff] [review]
Steps to make androidx86_emulator_unittest.py arch-neutral

Review of attachment 831844 [details] [diff] [review]:
-----------------------------------------------------------------

I will not give r+ since it is merging with some of my work in bug 919812, however, I'm taking most of it with my patch.
I will be testing it.

Renaming of the script should go in before we start enabling x86 across the board.
Attachment #831844 - Flags: review?(armenzg) → feedback+

Comment 108

5 years ago
Comment on attachment 831850 [details] [diff] [review]
Add new configs/android/androidarm.py for unit testing on arm emulator

Review of attachment 831850 [details] [diff] [review]:
-----------------------------------------------------------------

Feel free to land.
We will have to make adjustments once we start trying to run this on production.
Attachment #831850 - Flags: review?(armenzg) → review+

Comment 109

5 years ago
Created attachment 832397 [details] [diff] [review]
[checked-in] rename androidx86_emulator_unittest.py to be generic of the architecture
Attachment #832397 - Flags: review?(bugspam.Callek)

Comment 110

5 years ago
Created attachment 832400 [details] [diff] [review]
[checked-in][mozharness] rename androidx86_emulator_unittest.py to be generic of the architecture
Attachment #832400 - Flags: review?(bugspam.Callek)
(In reply to Armen Zambrano [:armenzg] (Release Engineering) (EDT/UTC-4) from comment #108)

> Feel free to land.
> We will have to make adjustments once we start trying to run this on
> production.

Sure. Which branch / repo?
Comment on attachment 831846 [details] [diff] [review]
Add new scripts/android_emulator_build.py to build AVDs and emulators from AOSP.

Review of attachment 831846 [details] [diff] [review]:
-----------------------------------------------------------------

This looks quite solid -- good stuff.

My biggest concern is the emulator build and packaging. Let's discuss before moving forward.

::: scripts/android_emulator_build.py
@@ +247,5 @@
> +
> +        if not (platform.machine() in ['i386', 'i486', 'i586', 'i686', 'x86_64']):
> +            self.exception("this script only works on x86 and x86_64")
> +
> +        self.tag = self.select_android_tag(self.config['android_version'])

I have mixed feelings about the derivation you do of tag, api level, etc from a requested version. On the one hand, it makes this easier to use, but on the other, there is a loss of flexibility. For example, I found that the tag "tools_r22" seemed to correspond to the Android 18 sdk tools. Or one might not always want the most recent revision of a branch. Of course, it wouldn't be hard to customize this script for such cases either...but if we end up customizing it often, maybe it would be better to request the tag explicitly. Still, just a thought.

@@ +311,5 @@
> +                         cwd=self.aospdir,
> +                         halt_on_failure=True)
> +
> +        if self.tag.startswith("android-2.3"):
> +            self.info("updating QEMU sub-repository to R12")

This seems like a special case that could use a comment to remind us of the issue.

@@ +397,5 @@
> +            abi2 = " TARGET_CPU_ABI2=" + abi2
> +
> +        self.run_command(["/bin/bash", "-c",
> +                          ". build/envsetup.sh "
> +                          "&& lunch 1 "

I'm familiar with "lunch full-eng" or "lunch sdk-eng" -- what does "lunch 1" mean?

@@ +573,5 @@
> +                time.sleep(10)
> +                break
> +
> +        self.info("modifying 'su' on emulator")
> +        self.adb_e(["shell", "mount", "-o", "remount,rw", "/dev/block/mtdblock0", "/system"])

Does the device name deviate at all across different versions of the system image, or does it always seem to be mtdblock0?

@@ +660,5 @@
> +        emuarch = "x86"
> +        if self.is_arm_target():
> +            emuarch = "arm"
> +        self.run_command(["tar", "-czf", filename, "-C", self.aosphostdir,
> +                          "bin/emulator", "bin/emulator-" + emuarch, "bin/adb"],

How would the resulting tar be used? Would you merge it with an Android SDK tools directory, or ...? I thought there were libs in <sdk>/tools/lib, and maybe other files that went along with the emulator. For my experience with building emulators, see bug 933918 -- I opted to re-distribute the whole tools directory.

Bug 933918 brings up another issue: How do we patch the emulator and use your script? (It looks like we need to apply our own patch for x86.)
(In reply to Geoff Brown [:gbrown] from comment #112)

> My biggest concern is the emulator build and packaging. Let's discuss before
> moving forward.

Sure.

> I have mixed feelings about the derivation you do of tag, api level, etc
> from a requested version. On the one hand, it makes this easier to use, but
> on the other, there is a loss of flexibility. For example, I found that the
> tag "tools_r22" seemed to correspond to the Android 18 sdk tools. Or one
> might not always want the most recent revision of a branch. Of course, it
> wouldn't be hard to customize this script for such cases either...but if we
> end up customizing it often, maybe it would be better to request the tag
> explicitly. Still, just a thought.

Yeah, I'd be happy to split these out into separate configs that get set to automatic defaults but can be overridden. Can do that now or in later versions when/if the need ever arises.

> @@ +311,5 @@
> > +                         cwd=self.aospdir,
> > +                         halt_on_failure=True)
> > +
> > +        if self.tag.startswith("android-2.3"):
> > +            self.info("updating QEMU sub-repository to R12")
> 
> This seems like a special case that could use a comment to remind us of the
> issue.

Very much so. I'll make a comment to explain it.

> 
> @@ +397,5 @@
> > +            abi2 = " TARGET_CPU_ABI2=" + abi2
> > +
> > +        self.run_command(["/bin/bash", "-c",
> > +                          ". build/envsetup.sh "
> > +                          "&& lunch 1 "
> 
> I'm familiar with "lunch full-eng" or "lunch sdk-eng" -- what does "lunch 1"
> mean?

First menu entry: full-eng. Happy to change it to say that instead.

> @@ +573,5 @@
> > +                time.sleep(10)
> > +                break
> > +
> > +        self.info("modifying 'su' on emulator")
> > +        self.adb_e(["shell", "mount", "-o", "remount,rw", "/dev/block/mtdblock0", "/system"])
> 
> Does the device name deviate at all across different versions of the system
> image, or does it always seem to be mtdblock0?

I've not seen any deviation, but of course it could happen. Unfortunately I tried to omit the device and do the step in your instructions in bug 894507 (just write `mount -o remount,rw /system`) and apparently the version of mount that ships in older androids is not clever enough to take that simpler-argument-list form: it needs a device name. Compensating for possible deviation would require parsing the output of `adb shell mount` I guess. Can do that now if you like, or leave a note about it for later if it ever occurs.

> @@ +660,5 @@
> > +        emuarch = "x86"
> > +        if self.is_arm_target():
> > +            emuarch = "arm"
> > +        self.run_command(["tar", "-czf", filename, "-C", self.aosphostdir,
> > +                          "bin/emulator", "bin/emulator-" + emuarch, "bin/adb"],
> 
> How would the resulting tar be used? Would you merge it with an Android SDK
> tools directory, or ...? I thought there were libs in <sdk>/tools/lib, and
> maybe other files that went along with the emulator. For my experience with
> building emulators, see bug 933918 -- I opted to re-distribute the whole
> tools directory.

I've tried running it "raw" on a machine with no SDK, just untarring and running `bin/emulator`. It works, but I haven't checked to see if it picks up any host GL libraries (at the time I tested that I was ignoring GL errors and running headless). Overwriting the one in the SDK would probably also work. I'd be happy to tar up something different, or omit this step entirely (see below).

> Bug 933918 brings up another issue: How do we patch the emulator and use
> your script? (It looks like we need to apply our own patch for x86.)

Could add a step that applies a patch after checking out AOSP. Or perhaps modifies the repo manifest to pull the emulator from elsewhere? I've no preference, happy to try any strategy (or none).

TBH it doesn't really seem like we _need_ the emulators coming out of this build so much as the AVDs. Emulators are both easier to build on their own and seemingly more broadly backwards-compatible: if we just always build emulators from the newest tools branch and/or use the newest SDK emulators, I think we get the best known qemu in terms of bug fixes and emulated CPUs. They haven't retired any CPUs and I don't think we stand to gain anything by running old AVDs on older / lower-fidelity emulators. Newer emulators will still simulate old phones better than old emulators.
Created attachment 832649 [details] [diff] [review]
Add new scripts/android_emulator_build.py to build AVDs and emulators from AOSP.

Updated to include:

  - New options --android-tag and --android-apilevel
  - Use 'lunch full-eng' not 'lunch 1'
  - Comment to explain update to qemu R12 on gingerbread

Let me know what else you'd like to see addressed, or can pass on for now.
Attachment #831846 - Attachment is obsolete: true
Attachment #831846 - Flags: review?(gbrown)
Attachment #832649 - Flags: review?(gbrown)

Comment 115

5 years ago
Comment on attachment 831848 [details] [diff] [review]
Modify androidx86_emulator_unittest.py and config to adapt to script-built AVDs

Same thing, this patch is good, I'm taking it with me but we can't land as-is.
Attachment #831848 - Flags: review?(armenzg) → feedback+
Comment on attachment 832649 [details] [diff] [review]
Add new scripts/android_emulator_build.py to build AVDs and emulators from AOSP.

Review of attachment 832649 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks for making those changes. This looks good to me now.

While it is tempting to use the built emulator since it's a by-product of our build, I think we are struggling to find a compelling reason to use it rather than rely on an official release...unless we need to patch it, in which case we need to customize the script. So I would not package the built emulator -- just make the avds the output of this script and leave it at that. Armen or :Callek may have a different perspective; check in with them to see what they think.
Attachment #832649 - Flags: review?(gbrown) → review+
(In reply to Geoff Brown [:gbrown] from comment #116)

> Review of attachment 832649 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Thanks for making those changes. This looks good to me now.

Great, pushed: https://hg.mozilla.org/build/mozharness/rev/9dfa6c67a71e

Comment 118

5 years ago
I have no preference even though I lean towards official builds.
Created attachment 8334933 [details] [diff] [review]
Remove use of psutil, reimplement loop using os alone.

This came up in #ateam today: I mistakenly added a dependency to an uncommon python lib, and I should either use virtualenv or prune the dependency. It wasn't terribly important so I opted to prune the dependency, rewrite the loop without it.
Attachment #8334933 - Flags: review?(aki)

Updated

5 years ago
Attachment #8334933 - Attachment description: 0005-Remove-use-of-psutil-reimplement-loop-using-os-alone.patch → Remote use of psutil, reimplement loop using os alone.

Updated

5 years ago
Attachment #8334933 - Attachment description: Remote use of psutil, reimplement loop using os alone. → Remove use of psutil, reimplement loop using os alone.
Created attachment 8334950 [details] [diff] [review]
Add SUTAgent probe to _trigger_test to wait for slow emulator startup.

This came up yesterday and today while trying to confirm mochitest execution of armv7-gingerbread on the AWS loaner box (a relatively underpowered m1.medium machine): SUTAgent starts, but it takes 17 minutes to start. This is much longer than mochitest/runtestsremote.py is willing to wait, and I figured waiting in this special circumstance (emulator targets only) would be safer than perturbing mochitest code.
Attachment #8334950 - Flags: review?(armenzg)
Comment on attachment 8334933 [details] [diff] [review]
Remove use of psutil, reimplement loop using os alone.

Thanks!
Attachment #8334933 - Flags: review?(aki) → review+

Comment 122

5 years ago
Comment on attachment 8334950 [details] [diff] [review]
Add SUTAgent probe to _trigger_test to wait for slow emulator startup.

Review of attachment 8334950 [details] [diff] [review]:
-----------------------------------------------------------------

This patch wfm. However, I would like to know what jmaher and gbrown think about this since they might have other ways of determining if the SUT agent is up and running.

This at worst would add 30*100 seconds == 50 minutes
This is too long, I hope we would never have to go that far and I would rather know the root cause of why the SUT agent takes so long.

On another note, do we have Watcher on these emulators? How does it interact these days with SUT?
Attachment #8334950 - Flags: review?(jmaher)
Attachment #8334950 - Flags: review?(gbrown)
Attachment #8334950 - Flags: review?(armenzg)
Attachment #8334950 - Flags: review+
Comment on attachment 8334950 [details] [diff] [review]
Add SUTAgent probe to _trigger_test to wait for slow emulator startup.

Review of attachment 8334950 [details] [diff] [review]:
-----------------------------------------------------------------

I don't know all the quirks with the emulators/vms, but if we can't get it spun up and sutagent active in 10 minutes, we should be solving this problem differently.

::: scripts/androidx86_emulator_unittest.py
@@ +297,5 @@
>          env = self.query_env()
>          self.query_minidump_stackwalk()
>  
> +        attempts = 0
> +        while attempts < 100:

I would put this at 20 (10 minutes).
Attachment #8334950 - Flags: review?(jmaher) → review+
(In reply to Joel Maher (:jmaher) from comment #123)
> Comment on attachment 8334950 [details] [diff] [review]
> Add SUTAgent probe to _trigger_test to wait for slow emulator startup.
> 
> Review of attachment 8334950 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> I don't know all the quirks with the emulators/vms, but if we can't get it
> spun up and sutagent active in 10 minutes, we should be solving this problem
> differently.
> 
> ::: scripts/androidx86_emulator_unittest.py
> @@ +297,5 @@
> >          env = self.query_env()
> >          self.query_minidump_stackwalk()
> >  
> > +        attempts = 0
> > +        while attempts < 100:
> 
> I would put this at 20 (10 minutes).

As I mentioned above, it takes 17 minutes to start on the emulator I'm testing (on the AWS machine I'm testing it on). I'm happy to investigate the slow startup as a bug in and of itself, but at present a 10 minute timeout means no tests will run.
Comment on attachment 8334950 [details] [diff] [review]
Add SUTAgent probe to _trigger_test to wait for slow emulator startup.

Review of attachment 8334950 [details] [diff] [review]:
-----------------------------------------------------------------

I like the idea of waiting for confirmation that sutagent is active, but I do not like this implementation.

Waiting for up to 50 minutes does not seem reasonable. I would expect that we would not need to wait for more than 2 minutes on average, maybe 5 minutes max. That's based on my experience with devices and the x86 4.2 emulator. The report - Comment 120 - that sutagent takes 17 minutes to start is troubling and needs to be investigated more. The watcher should be starting sutagent, and that should be visible in logcat -- http://mxr.mozilla.org/mozilla-central/source/build/mobile/sutagent/android/watcher/WatcherService.java#1000 -- along with other messages tracking the watcher lifetime. 

Also, rather than checking for the sutagent process in this patch, I think I would prefer a check that we can telnet (or otherwise connect) to the sutagent port. That way we verify that the network is set up, sutagent is accepting connections, and we can actually use sut.
Attachment #8334950 - Flags: review?(gbrown) → review-
(In reply to Geoff Brown [:gbrown] from comment #125)

> Waiting for up to 50 minutes does not seem reasonable. I would expect that
> we would not need to wait for more than 2 minutes on average, maybe 5
> minutes max. That's based on my experience with devices and the x86 4.2
> emulator. The report - Comment 120 - that sutagent takes 17 minutes to start
> is troubling and needs to be investigated more.

Agreed, it starts in mere moments on my desktop; the delay is troubling. I'll continue to fiddle with it to see if I can isolate the relevant difference between the AWS host and my own. Thanks for the review.
(In reply to Graydon Hoare :graydon from comment #126)
> (In reply to Geoff Brown [:gbrown] from comment #125)
> 
> > Waiting for up to 50 minutes does not seem reasonable. I would expect that
> > we would not need to wait for more than 2 minutes on average, maybe 5
> > minutes max. That's based on my experience with devices and the x86 4.2
> > emulator. The report - Comment 120 - that sutagent takes 17 minutes to start
> > is troubling and needs to be investigated more.
> 
> Agreed, it starts in mere moments on my desktop; the delay is troubling.
> I'll continue to fiddle with it to see if I can isolate the relevant
> difference between the AWS host and my own. Thanks for the review.

Ugh, apologies, thinko on my part. The AVDs on the AWS machine were out of date. Fresher AVDs, everything's fine. Removing the patch. Thanks for getting me to double check.

Updated

5 years ago
Attachment #8334950 - Attachment is obsolete: true
Attachment #8334950 - Flags: review+

Updated

5 years ago
Attachment #832397 - Flags: review?(bugspam.Callek) → review+

Updated

5 years ago
Attachment #832400 - Flags: review?(bugspam.Callek) → review+
something[s] here made it to production

Comment 129

5 years ago
Comment on attachment 832400 [details] [diff] [review]
[checked-in][mozharness] rename androidx86_emulator_unittest.py to be generic of the architecture

It was checked-in on bug 919812.
Attachment #832400 - Attachment description: [mozharness] rename androidx86_emulator_unittest.py to be generic of the architecture → [checked-in][mozharness] rename androidx86_emulator_unittest.py to be generic of the architecture

Comment 130

5 years ago
Comment on attachment 832397 [details] [diff] [review]
[checked-in] rename androidx86_emulator_unittest.py to be generic of the architecture

checked-in in bug 919812.
Attachment #832397 - Attachment description: rename androidx86_emulator_unittest.py to be generic of the architecture → [checked-in] rename androidx86_emulator_unittest.py to be generic of the architecture
Added missing script to ash-mozharness: http://hg.mozilla.org/users/asasaki_mozilla.com/ash-mozharness/rev/500865621135
Re-triggered ash by merging m-c: https://hg.mozilla.org/projects/ash/rev/e24447a4a2f5

Awaiting results at https://tbpl.mozilla.org/?tree=Ash. LMK if I did something wrong here, first time following instructions in https://bugzilla.mozilla.org/show_bug.cgi?id=936601#c18
Build results more or less as expected :(

    Can't download from http://tooltool.pvt.build.mozilla.org/build/sha512/ac8a99fe0d120520de7e165899140cad074479fbaf3c4fcadb0d4ccb873bda0d249f88132df0828b86844d6df02aabce911316ace32bba7e94cf8dffdec77669 to /builds/slave/talos-slave/cached/public_html/AVDs-armv7a-android-2.3.7_r1-build-2013-11-13-ubuntu.tar.gz!

Tooltool needs the armv7a AVDs uploaded, as mentioned in https://bugzilla.mozilla.org/show_bug.cgi?id=919812#c47

Comment 133

5 years ago
(In reply to Graydon Hoare :graydon from comment #132)
> Build results more or less as expected :(
> 
>     Can't download from
> http://tooltool.pvt.build.mozilla.org/build/sha512/
> ac8a99fe0d120520de7e165899140cad074479fbaf3c4fcadb0d4ccb873bda0d249f88132df08
> 28b86844d6df02aabce911316ace32bba7e94cf8dffdec77669 to
> /builds/slave/talos-slave/cached/public_html/AVDs-armv7a-android-2.3.7_r1-
> build-2013-11-13-ubuntu.tar.gz!
> 
> Tooltool needs the armv7a AVDs uploaded, as mentioned in
> https://bugzilla.mozilla.org/show_bug.cgi?id=919812#c47

I've uploaded them.
I've re-triggered the jobs:
https://tbpl.mozilla.org/?tree=Ash&jobname=Android%202.3
After a few rounds, starting to see emulator executing:

https://tbpl.mozilla.org/php/getParsedLog.php?id=31461342&tree=Ash&full=1

Still not quite working fully. I'm attempting to disable the GPU, see if that's the problem.

What kind of host is this running on, incidentally?

Comment 135

5 years ago
(In reply to Graydon Hoare :graydon from comment #134)
> After a few rounds, starting to see emulator executing:
> 
> https://tbpl.mozilla.org/php/getParsedLog.php?id=31461342&tree=Ash&full=1
> 
> Still not quite working fully. I'm attempting to disable the GPU, see if
> that's the problem.
> 
> What kind of host is this running on, incidentally?

We're using EC2's m1.medium
I _think_ what's going wrong now is that the system needs libgl1-mesa-dev installed on it (the .deb package) and it's not coming online with that present. Any idea how I can modify the AMI that we boot into?
Yeah, I injected a command into the script to check the test environment and it doesn't have libgl1-mesa-dev, just libgl1-mesa-glx https://tbpl.mozilla.org/php/getParsedLog.php?id=31601507&tree=Ash&full=1

This means it doesn't have the symlink /usr/lib/x86_64-linux-gnu/mesa/libGL.so set up, and the emulator relies on that in order to load GL. We can either install that package or run `sudo ln -s`. Any preference? I don't imagine cltbld is in sudoers.
I want to double checking that you are making sure the system doesn't have it vs what the emulator sees.  We run Mesa LLVM Pipe in order to get our unit tests running on ec2 for linux, but all attempts for emulators seem to ignore that library- we have suspected it to be a bug in the emulator in the past.

Comment 139

5 years ago
(In reply to Graydon Hoare :graydon from comment #137)
> Yeah, I injected a command into the script to check the test environment and
> it doesn't have libgl1-mesa-dev, just libgl1-mesa-glx
> https://tbpl.mozilla.org/php/getParsedLog.php?id=31601507&tree=Ash&full=1
> 
> This means it doesn't have the symlink
> /usr/lib/x86_64-linux-gnu/mesa/libGL.so set up, and the emulator relies on
> that in order to load GL. We can either install that package or run `sudo ln
> -s`. Any preference? I don't imagine cltbld is in sudoers.

Can you have a look at the EC2 VM you got?
We might need to include it in the PATH before we trigger the emulator.
I can look into it later today if you need me to.
(In reply to Joel Maher (:jmaher) from comment #138)

> I want to double checking that you are making sure the system doesn't have
> it vs what the emulator sees.  We run Mesa LLVM Pipe in order to get our
> unit tests running on ec2 for linux, but all attempts for emulators seem to
> ignore that library- we have suspected it to be a bug in the emulator in the
> past.

I do not know if this is the sole issue causing things to hang, but it's the next-most-obvious thing for me to look at. The emulator complains (at least when it gets this far) about being unable to load libGL.so. Specifically, take a look at this log: https://tbpl.mozilla.org/php/getParsedLog.php?id=31601507&tree=Ash&full=1

Note the lines:

19:24:32     INFO - emulator: Initializing hardware OpenGLES emulation support
19:24:32     INFO - Failed to load libGL.so
19:24:32     INFO - error libGL.so: cannot open shared object file: No such file or directory

These are looking for a file called libGL.so, which (I am guessing) the emulator is manually dlopen()'ing. It turns out that on ubuntu, libGL.so (which should just be a symlink to libGL.so.1 and from there to libGL.so.1.2) is contained in libgl1-mesa-dev. That package is not installed, as my script just checked (by enumerating 'dpkg -l'), only libgl1-mesa-glx is installed, which contains libGL.so.1 and libGL.so.1.2, but not the one final symlink libGL.so.
(To clarify further, I suspect the failure here has only to do with the difference in filename; firefox runs fine because it links to libGL.so.1 or libGL.so.1.2, the emulator is being strange in trying to dlopen() libGL.so alone. The mesa libraries are installed, they just lack a single symlink that the emulator is looking for.)

Comment 142

5 years ago
Thanks for digging further into it!

Have you figure out what differs from what you were running locally on your EC2 test machine? Did you add the symlink manually I'm trying to figure out what is different.

Should I create a fresh VM and see if we hit the same issue that we're hitting in production?

Updated

5 years ago
Depends on: 843100
Yes, I seem to have added the symlink locally on the loaner instance, and/or installed the missing package. Must have done so back when I was first poking around at this bug. If you could reset the loaner EC2 machine to the exact state it will come up in, in production, I can try again to close any remaining gaps in script functionality without tampering with the host. Sorry for the slow pace, seems there's an almost endless list of little snags :(

Note that in bug 843100 and on IRC I discussed with :rail the idea of installing the libgl1-mesa-dev .deb to the EC2 machines, generally, I think via puppet. He sounded willing to do so there, so perhaps wait for his move before resetting the loaner?
Ugh, that one package brings in dozens of other dependencies, I don't want to perturb all of the releng machines for just a single symlink. I'm going to try hacking around it via a different strategy. Armen: if you can reset the loaner, I'll try a few other approaches there. Thanks.

Comment 146

5 years ago
(In reply to Graydon Hoare :graydon from comment #144)
> Ugh, that one package brings in dozens of other dependencies, I don't want
> to perturb all of the releng machines for just a single symlink. I'm going
> to try hacking around it via a different strategy. Armen: if you can reset
> the loaner, I'll try a few other approaches there. Thanks.

Killing the host. I will have to re-create it.

Updated

5 years ago
No longer depends on: 843100

Comment 147

5 years ago
I'm re-creating it re-using the DNS records and the IP assigned to the old machine (since it takes less time).

(aws-ve)[buildduty@cruncher.srv.releng.scl3 aws]$ invtool search -q "$HOST (type=:A OR type=:PTR)"
ldap username: armenzg@mozilla.com
ldap password: 
20765 dev-tst-linux64-ec2-graydon.test.releng.use1.mozilla.com. None IN  A    10.134.56.91
21972 91.56.134.10.in-addr.arpa.               3600 IN  PTR  dev-tst-linux64-ec2-graydon.test.releng.use1.mozilla.com.

(aws-ve)[buildduty@cruncher.srv.releng.scl3 aws]$ bug=910092
(aws-ve)[buildduty@cruncher.srv.releng.scl3 aws]$ user=graydon
(aws-ve)[buildduty@cruncher.srv.releng.scl3 aws]$ slavetype=dev-tst-linux64-ec2
(aws-ve)[buildduty@cruncher.srv.releng.scl3 aws]$ host=$slavetype-$user
(aws-ve)[buildduty@cruncher.srv.releng.scl3 aws]$ ip=10.134.56.91
FWIW: re-ran on Ash in the meantime, with only a single emulator (not 4) and only running a single testsuite (the first reftest batch). Still freezes up / fails to respond:

https://tbpl.mozilla.org/php/getParsedLog.php?id=31776751&tree=Ash&full=1

So I think that rules out my simpler hunch about it just being overloaded / resource-exhausted.
Today:

  - I re-checked the tarball to confirm that it has hardware GL enabled in its AVD .ini files (it does).
  - I disabled all tests but xpcshell tests which apparently run without working GL. They do not work.
  - Armen reset the loaner machine to a clean state so I could interactively diagnose.
  - On activation of the test script, I see SUTAgent, watcher and fennec all running in the emulator.
  - However, no log output occurs and the emulator screen is blank.
  - adb logcat shows this promising result:

W/dalvikvm( 463): threadid=1: thread exiting with uncaught exception (group=0x40015560)
E/GeckoAppShell( 463): >>> REPORTING UNCAUGHT EXCEPTION FROM THREAD 1 ("main")
E/GeckoAppShell( 463): org.mozilla.gecko.gfx.GLController$GLControllerException: No available EGL configurations Error 12288
E/GeckoAppShell( 463): at org.mozilla.gecko.gfx.GLController.AttemptPreallocateEGLSurfaceForCompositor(GLController.java:301)
E/GeckoAppShell( 463): at org.mozilla.gecko.gfx.GLController.updateCompositor(GLController.java:186)
E/GeckoAppShell( 463): at org.mozilla.gecko.gfx.GLController$1.run(GLController.java:168)
E/GeckoAppShell( 463): at android.os.Handler.handleCallback(Handler.java:587)
E/GeckoAppShell( 463): at android.os.Handler.dispatchMessage(Handler.java:92)
E/GeckoAppShell( 463): at android.os.Looper.loop(Looper.java:130)
E/GeckoAppShell( 463): at android.app.ActivityThread.main(ActivityThread.java:3683)
E/GeckoAppShell( 463): at java.lang.reflect.Method.invokeNative(Native Method)
E/GeckoAppShell( 463): at java.lang.reflect.Method.invoke(Method.java:507)
E/GeckoAppShell( 463): at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:839)
E/GeckoAppShell( 463): at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:597)
E/GeckoAppShell( 463): at dalvik.system.NativeStart.main(Native Method)

This suggests the loaner has no EGL support (EGL being a related, but different, API from GL). I will now attempt to (very carefully!) add packages to the loaner in order to enable EGL (libegl1-mesa is the missing package, but it brings with it libegl1-mesa-drivers, libgbm1, libopenvg1-mesa, libwayland0 and libxcb-xfixes0).
Attempts at installing EGL packages and GL-ES packages (also missing) didn't help. Uninstalled all dependent packages to get the loaner back to (hopefully) baseline. Still not working.

Two unfortunate discoveries make me newly concerned:

  1. The page https://developer.android.com/tools/devices/emulator.html says:

         To enable graphics acceleration at runtime for an AVD:
         If you are running the emulator from the command line, just include the -gpu on
         option:
             emulator -avd <avd_name> -gpu on
         Note: You must specify an AVD configuration that uses Android 4.0.3 (API Level 15,
         revision 3) or higher system image target. Graphics acceleration is not available 
         for earlier system images.

  2. I'm not able to reproduce "running firefox on arm-gingerbread (2.3) with GL turned on"
     on my local system. I can run it with GL turned off, but then the content area is blank;
     or I can run it on arm-jb (4.2) with GL turned on.

#2 is particularly worrying. I _thought_ (really, would have bet money) that I tested that back when I repacked these 3 most recent AVDs with GL turned on, back on 2013-11-13 (comment 101). But it's possible I only confirmed full GL functionality on the 4.2 images. If so, we might be out of luck here.

I will continue to try reproducing that config, meanwhile CC'ing :snorp as suggested on IRC for possible guidance. In particular, I cannot tell if the statement from the android developer page means that GL APIs are never present inside an 2.3 AVD, or whether those GL APIs simply won't implement _accelerated_ GL by shipping a command stream to the host mesa libraries outside the emulator.
After some discussion with snorp on IRC:

  - 2.3 does support some OpenGL APIs in-the-emulator, but possibly not all or the correct ones.

    - It has a /lib/system/libGLESv2.so so supports some of GLES, maybe.

    - Release notes from google suggest that gingerbread itself supports the entire GLES2.0
      API https://developer.android.com/about/versions/android-2.3.html

    - Possibly portions of the goldfish kernel or other layers of the emulator lack support;
      at least some stackoverflow commenters hold this opinion:
      http://stackoverflow.com/questions/4455783/does-the-android-emulator-support-opengl-es-2-0

  - Firefox is still giving the "No available EGL configurations Error 12288" error, whether or not
    I run the emulator with -gpu on or off. The difference is that with -gpu off, I can see the
    browser shell and interact with it (but not see the content area). With -gpu on, I see a blank
    emulator screen from startup (but can interact with it programmatically, via adb and SUTAgent).

  - The goldfish emulator is a 16bpp framebuffer, possibly Firefox is requesting the wrong depth (24bpp);
    snorp recommends trying to force 16bpp in chooseConfig, in GLController.java
Sorry, sidetracked with some machine failures here. Quick check on the 2.3 machine shows the following GL implementation:

Android PixelFlinger 1.4 (15 extensions)
Version: OpenGL ES-CM 1.0
Shading Language: null
Renderer: Android PixelFlinger 1.4
Vendor: Android
EGL Vendor: Android
EGL Version: 1.4 Android META-EGL
Max texture size: 4096
Max texture cube map: 0
Max vertex attribs: 0
Max texture image units: 0
Max vertex texture image units: 0
Max shader binary formats: 0
Read color format: 6407
Read color type: 33635
Max render buffer size: 0
Max combined texture image units: 0
Max varying vectors: 0
Max vertex uniform vectors: 0
Max fragment uniform vectors: 0
Num compressed texture formats: 11
Max texture anisotropy: 0
EGL_KHR_image
EGL_KHR_image_base
EGL_KHR_image_pixmap
EGL_ANDROID_image_native_buffer
EGL_ANDROID_swap_rectangle
GL_OES_byte_coordinates
GL_OES_fixed_point
GL_OES_single_precision
GL_OES_read_format
GL_OES_compressed_paletted_texture
GL_OES_draw_texture
GL_OES_matrix_get
GL_OES_query_matrix
GL_OES_EGL_image
GL_OES_compressed_ETC1_RGB8_texture
GL_ARB_texture_compression
GL_ARB_texture_non_power_of_two
GL_ANDROID_user_clip_plane
GL_ANDROID_vertex_buffer_object
GL_ANDROID_generate_mipmap

TexFormat GL_PALETTE4_RGB8_OES
TexFormat GL_PALETTE4_RGBA8_OES
TexFormat GL_PALETTE4_R5_G6_B5_OES
TexFormat GL_PALETTE4_RGBA4_OES
TexFormat GL_PALETTE4_RGB5_A1_OES
TexFormat GL_PALETTE8_RGB8_OES
TexFormat GL_PALETTE8_RGBA8_OES
TexFormat GL_PALETTE8_R5_G6_B5_OES
TexFormat GL_PALETTE8_RGBA4_OES
TexFormat GL_PALETTE8_RGB5_A1_OES
TexFormat GL_ETC1_RGB8_OES

I'm going to have a go at trying to force a 16bpp config next. If anything in this list here looks unusual, I'm open to other suggestions!
Snorp points out that "OpenGL ES-CM 1.0" means the userspace probably isn't supporting GLES2, or isn't willing to support it on the goldfish emulator. I've attempted to force 16bpp in GLController.java and not met with success. Will investigate further options tomorrow.
Further news, none good:

  - While there are 8 EGL contexts available for use if I enumerate them all; none have LOCAL_EGL_OPENGL_ES2_BIT, meaning GLES2 is not active in the gingerbread+goldfish userspace.

  - Discussing with snorp further, I have come to believe the  problem lies beneath the java layer (the requisite java libraries are present) but above the kernel layer (since the same kernel and emulator are happy to be-driven by the jellybean userspace). This leaves something in lower parts of the GL driver being ignorant of how to drive goldfish devices and/or lacking software fallback rendering for GLES2.

  - I attempted to just blindly _copy_ the GL drivers (the armv7a libGL*.so files) from newer emulators to the older ones, on the off chance this works. It doesn't; a bunch of unresolved symbols result.)

  - Remaining possibilities I can think of involve attempts at backporting GL-driver code from 4.2 and rebuilding the 2.3 AVD. I'm not sure if this would take the form of a GL-pipe driver (serializing GL command streams to host mesa) or an in-emulator software fallback mode. I'm not actually sure android _ever_ shipped a full software fallback for GLES2, it looks like they introduced the GL-pipe mechanism around the same time as they started talking about the emulator supporting GLES2. In either case it seems like it'll be quite involved.

In the nearer term, I think I would like to try arranging for our test automation to test 4.2 on arm emulation. That at least (I hope!) should work as well as the 4.2 on x86 does at this point. Would this be acceptable?

Comment 155

5 years ago
(In reply to Graydon Hoare :graydon from comment #154)
>
> In the nearer term, I think I would like to try arranging for our test
> automation to test 4.2 on arm emulation. That at least (I hope!) should work
> as well as the 4.2 on x86 does at this point. Would this be acceptable?

graydon thanks for the info.

I'm thinking that perhaps we want Android 4.2 on EC2 for scaling purposes (moving away from 4.0 on pandas where possible) and set Android 2.3 up on the pandas.

Android 2.2 on Tegras --> Android 2.3 on Pandas
Android 4.0 on Pandas --> Android 4.2 on EC2

gbrown, jgriffin: what do you think?
Flags: needinfo?(jgriffin)
Flags: needinfo?(gbrown)
> Android 2.2 on Tegras --> Android 2.3 on Pandas
> Android 4.0 on Pandas --> Android 4.2 on EC2

That sounds right to me.

Android 4.2 is likely our easiest option for running on emulator -- a good place to start. If that doesn't go very-smoothly (takes as long as, say Android x86, to get running cleanly), is there near-term capacity on the Pandas for running both Android 2.3 and Android 4.0?
Flags: needinfo?(gbrown)

Comment 157

5 years ago
(In reply to Geoff Brown [:gbrown] from comment #156)
> > Android 2.2 on Tegras --> Android 2.3 on Pandas
> > Android 4.0 on Pandas --> Android 4.2 on EC2
> 
> That sounds right to me.
> 
> Android 4.2 is likely our easiest option for running on emulator -- a good
> place to start. If that doesn't go very-smoothly (takes as long as, say
> Android x86, to get running cleanly), is there near-term capacity on the
> Pandas for running both Android 2.3 and Android 4.0?

We have 900 Pandas. The pool can be shared as we can re-image the Pandas programatically before the test job start.

Comment 158

5 years ago
In other words, yes, *I* believe we can.
However, if we want to be sure we might want to ask Callek as he knows the pool's capacity better than me.
Flags: needinfo?(bugspam.Callek)
Have we ever run 2.x on pandas?  How well do we understand the scope of work in getting that up and running smoothly?
Flags: needinfo?(jgriffin)
(Reporter)

Comment 160

5 years ago
It is a declared and locked in goal to eliminate any use of physical hardware in future testing, if at all possible. So please plan for as little hardware as possible and as much virtualization as possible. The pandas should die in 2014.

Comment 161

5 years ago
(In reply to Andreas Gal :gal from comment #160)
> It is a declared and locked in goal to eliminate any use of physical
> hardware in future testing, if at all possible. So please plan for as little
> hardware as possible and as much virtualization as possible. The pandas
> should die in 2014.

This is great news for me!
Aside from this bug.
Can I please get some help from you?
I need developers to help us run b2g reftests on EC2, however, I've found a hard time finding anyone to own it. I'm still poking people through email. I will add you to the thread.
You can see a status update here:
https://bugzilla.mozilla.org/show_bug.cgi?id=850105#c8
The specific bug (which is dependent) is bug 818968.
(Reporter)

Comment 162

5 years ago
We can absolutely help with that. Mike Lee is probably a good starting point since he is heavily focused on testing.
Oh, look what I found: https://android.googlesource.com/platform/development/+/4e6af749d5996edd6558821a7e400427f0457306

"emulator: opengl: Back-port GLES emulation from the master tree"

BRB rebuilding everything.

Comment 164

5 years ago
Yay!

No need to bother Callek with needinfo request as comment 162 puts proposal on comment 155 out of the table.
Flags: needinfo?(bugspam.Callek)
(In reply to Andreas Gal :gal from comment #160)
> It is a declared and locked in goal to eliminate any use of physical
> hardware in future testing, if at all possible. So please plan for as little
> hardware as possible and as much virtualization as possible. The pandas
> should die in 2014.

Can we have some further discussion about this? Given the diversity of Android devices (and one day, B2G as well), it doesn't seem prudent to perform all of our testing on hardware that doesn't exist in the real world. Our Tegra and Panda boards have caught real, hardware-specific bugs that would otherwise gone undetected.

Comment 166

5 years ago
(In reply to Graydon Hoare :graydon from comment #163)
> Oh, look what I found:
> https://android.googlesource.com/platform/development/+/
> 4e6af749d5996edd6558821a7e400427f0457306
> 
> "emulator: opengl: Back-port GLES emulation from the master tree"
> 
> BRB rebuilding everything.

Graydon, so that I can be clear, this is rebuilding GLES support for 2.3?

(In reply to James Willcox (:snorp) (jwillcox@mozilla.com) from comment #165)
> (In reply to Andreas Gal :gal from comment #160)
> > It is a declared and locked in goal to eliminate any use of physical
> > hardware in future testing, if at all possible. So please plan for as little
> > hardware as possible and as much virtualization as possible. The pandas
> > should die in 2014.
> 
> Can we have some further discussion about this? Given the diversity of
> Android devices (and one day, B2G as well), it doesn't seem prudent to
> perform all of our testing on hardware that doesn't exist in the real world.
> Our Tegra and Panda boards have caught real, hardware-specific bugs that
> would otherwise gone undetected.

Andreas' goal is noble, but I don't think we are ever going to do performance testing in VMs (for one example). And so, I don't think that we are ever going to eliminate all our hardware. I'd like to see most of our testing in the cloud because that gives us maximum flexibility. But, I believe there will always be a set of tests that are required to run on real hardware.

To Jgriffin's point higher up, we've never successfully run 2.3 on pandaboards, and so I applaud Graydon's efforts to backport GLES if possible.
(In reply to Clint Talbert ( :ctalbert ) from comment #166)

> Graydon, so that I can be clear, this is rebuilding GLES support for 2.3?

It's rebuilding the 2.3 arm AVD with a backport of what google calls "GLES emulation". This means something quite specific. GLES1 has been supported in android for a long time. GLES2 is also officially supported ever since 2.3, at the API level and on most 2.3-running phones. But it assumes a driver for the board / chipset / phone it's running on. A backend of some sort.

However: in the _emulator_, which isn't real hardware at all, they initially did not ship a GLES2 backend. No fake GPU, no software fallback, nothing. So the APIs were there but they couldn't render, it boots but just shows a black screen unless you configure with `-gpu off` or `hw.gpu.enabled = no` in the .ini file.

Sometime around 4.x they implemented this clever thing that serializes GLES2 API calls and ships them out of the emulator via a pipe, for rendering on the host's mesa drivers, at full host speed. This is very fast, no emulation at all. But they initially only implemented it in the userspace of the 4.x series SDKs. It was introduced somewhat vaguely with this blog post http://android-developers.blogspot.ca/2012/04/faster-emulator-with-better-hardware.html which has taken me a while to figure out the exact meaning of.

I found today that they later backported that driver to the gingerbread branch, but the AVDs I built before didn't contain that backport. So I'm rebuilding my gingerbread AVDs to include this driver. Hopefully it'll make the 2.3 emulator run with -gpu on. We'll see. Still rebuilding (android builds take a while and there were a few moving parts involved).

> To Jgriffin's point higher up, we've never successfully run 2.3 on
> pandaboards, and so I applaud Graydon's efforts to backport GLES if possible.

Not my efforts! If it works it'll entirely be due to David Turner at google :)
(Reporter)

Comment 168

5 years ago
(In reply to James Willcox (:snorp) (jwillcox@mozilla.com) from comment #165)
> (In reply to Andreas Gal :gal from comment #160)
> > It is a declared and locked in goal to eliminate any use of physical
> > hardware in future testing, if at all possible. So please plan for as little
> > hardware as possible and as much virtualization as possible. The pandas
> > should die in 2014.
> 
> Can we have some further discussion about this? Given the diversity of
> Android devices (and one day, B2G as well), it doesn't seem prudent to
> perform all of our testing on hardware that doesn't exist in the real world.
> Our Tegra and Panda boards have caught real, hardware-specific bugs that
> would otherwise gone undetected.


You just gave the best reason why it doesn't make sense on specific hardware. The other thousands and thousands of hardware configurations we don't test. There is real opportunity cost to testing on real hardware. We are paying hundreds of thousands of dollars every year in hardware, infrastructure and people cost. Money we could use to hire more engineers.

There will always be a few things we need hardware for, but those should be isolated corner cases and not part of our automation, especially not at scale.
at this time, we have tegras moving from MTV to SCL3 and planned to retire b end of 2014.
We have 893 panda's moving from scl1 to scl3 by early summer per the needs/requests of Hal/Clint/Catlee/Bob.
We are not holding any plans to expand thee mobile boards in 2014.
If that is not the assumption, we need to know this asap for budgeting and DC space planning.

Note that there are about 122 r3 mac minis that are still used for some b2g testing on Fedora in scl1 planned as of now to move to scl3.
(Reporter)

Comment 170

5 years ago
We should not use any mac minis for b2g testing. Thats nuts. Lets meet with the team and move that onto something else (ec2 for example).

Comment 171

5 years ago
(In reply to Andreas Gal :gal from comment #170)
> We should not use any mac minis for b2g testing. Thats nuts. Lets meet with
> the team and move that onto something else (ec2 for example).

bug 818968 is where Mike Lee will be giving a hand with.
Update: 

Re-initialized repo to the 'gingerbread' branch head (confirmed as containing the backport), then updated external/qemu directory to 'aosp/tools_r17' (the most recent tools_* release after GLES2 emulation was added), then rebuilt with 'export BUILD_EMULATOR_OPENGL=true' and 'export BUILD_EMULATOR_OPENGL_DRIVER=true'.

This produces a ... mixed result. Not quite working yet, but progress. The emulator binary accepts -gpu on/off (which it did not previously) and the GB userspace contains the encoder-based GLES2 libraries that previously only existed in the JB userspace:

/system/lib/libEGL.so
/system/lib/libGLESv1_CM.so
/system/lib/libGLESv1_enc.so
/system/lib/libGLESv2.so
/system/lib/libGLESv2_enc.so

Moreover the emulator does boot, and the userspace doesn't crash. However, logcat tells a sad story:

D/gralloc_goldfish(   65): Emulator without GPU emulation detected.
D/libEGL  (   65): Emulator without GPU support detected. Fallback to software renderer.
D/libEGL  (   65): loaded /system/lib/egl/libGLES_android.so
I/SurfaceFlinger(   65): EGL informations:
I/SurfaceFlinger(   65): # of configs : 8
I/SurfaceFlinger(   65): vendor    : Android
I/SurfaceFlinger(   65): version   : 1.4 Android META-EGL
I/SurfaceFlinger(   65): extensions: EGL_KHR_image EGL_KHR_image_base EGL_KHR_image_pixmap EGL_ANDROID_image_native_buffer EGL_ANDROID_swap_rectangle 
I/SurfaceFlinger(   65): Client API: OpenGL ES
I/SurfaceFlinger(   65): EGLSurface: 5-6-5-0, config=0x0
I/SurfaceFlinger(   65): OpenGL informations:
I/SurfaceFlinger(   65): vendor    : Android
I/SurfaceFlinger(   65): renderer  : Android PixelFlinger 1.4
I/SurfaceFlinger(   65): version   : OpenGL ES-CM 1.0
...
E/GeckoAppShell(  320): >>> REPORTING UNCAUGHT EXCEPTION FROM THREAD 1 ("main")
E/GeckoAppShell(  320): org.mozilla.gecko.gfx.GLController$GLControllerException: No available EGL configurations Error 12288
E/GeckoAppShell(  320):         at org.mozilla.gecko.gfx.GLController.AttemptPreallocateEGLSurfaceForCompositor(GLController.java:312)

Same story. It's a gralloc that knows it _should_ be finding GLES2 emulation in its containing qemu (good!) but it's not finding it (bad!). Something's still not enabled, or working, or something. I'll continue investigation.
Slightly more progress. Fiddled with broken build flags and subsequent testing environment on my workstation to the point where I can get:

I/SurfaceFlinger(   65): SurfaceFlinger is starting
I/SurfaceFlinger(   65): SurfaceFlinger's main thread ready to run. Initializing graphics H/W...
D/        (   65): HostConnection::get() New Host Connection established 0xeabd8, tid 73
D/libEGL  (   65): loaded /system/lib/egl/libGLES_android.so
D/libEGL  (   65): loaded /system/lib/egl/libEGL_emulation.so
D/libEGL  (   65): loaded /system/lib/egl/libGLESv1_CM_emulation.so
D/libEGL  (   65): loaded /system/lib/egl/libGLESv2_emulation.so
E/EGL_emulation(   65): rcMakeCurrent returned EGL_FALSE
E/EGL_emulation(   65): tid 73: eglMakeCurrent(984): error 0x3006 (EGL_BAD_CONTEXT)
E/libEGL  (   65): call to OpenGL ES API with no current context (logged once per thread)
E/libEGL  (   65): call to OpenGL ES API with no current context (logged once per thread)
E/libEGL  (   65): call to OpenGL ES API with no current context (logged once per thread)
E/libEGL  (   65): call to OpenGL ES API with no current context (logged once per thread)

Followed by surfaceflinger crashing and restarting perpetually. So it is now at the point where it thinks it has GLES2 emulation on the host and has suitable machinery to contact it, but that machinery is failing. I will try a few additional builds at different emulator versions. So far no go with tools_r22 and tools_r18. Retrying tools_r17 in fixed environment next.
Further progress, by copying the aosp/out/host/linux-x86/lib/*.so files from the aosp host build directory and adding them to LD_LIBRARY_PATH as well, I'm able to get it to boot with enough GLES2 to run fennec:

I/SurfaceFlinger(   65): SurfaceFlinger is starting
I/SurfaceFlinger(   65): SurfaceFlinger's main thread ready to run. Initializing graphics H/W...
D/        (   65): HostConnection::get() New Host Connection established 0xeabd8, tid 73
D/libEGL  (   65): loaded /system/lib/egl/libGLES_android.so
D/libEGL  (   65): loaded /system/lib/egl/libEGL_emulation.so
D/libEGL  (   65): loaded /system/lib/egl/libGLESv1_CM_emulation.so
D/libEGL  (   65): loaded /system/lib/egl/libGLESv2_emulation.so
I/SurfaceFlinger(   65): EGL informations:
I/SurfaceFlinger(   65): # of configs : 120
I/SurfaceFlinger(   65): vendor    : Android
I/SurfaceFlinger(   65): version   : 1.4 Android META-EGL
I/SurfaceFlinger(   65): extensions: EGL_KHR_image EGL_KHR_image_base EGL_KHR_image_pixmap EGL_ANDROID_image_native_buffer EGL_ANDROID_swap_rectangle 
I/SurfaceFlinger(   65): Client API: OpenGL ES
I/SurfaceFlinger(   65): EGLSurface: 8-8-8-8, config=0xd
I/SurfaceFlinger(   65): OpenGL informations:
I/SurfaceFlinger(   65): vendor    : Google
I/SurfaceFlinger(   65): renderer  : OpenGL ES-CM 1.1
I/SurfaceFlinger(   65): version   : OpenGL ES-CM 1.1
I/SurfaceFlinger(   65): extensions: GL_OES_blend_func_separate GL_OES_blend_equation_separate GL_OES_blend_subtract GL_OES_byte_coordinates GL_OES_compressed_paletted_texture GL_OES_point_size_array GL_OES_point_sprite GL_OES_single_precision GL_OES_stencil_wrap GL_OES_texture_env_crossbar GL_OES_texture_mirored_repeat GL_OES_EGL_image GL_OES_element_index_uint GL_OES_draw_texture GL_OES_texture_cube_map GL_OES_draw_texture GL_OES_read_format GL_OES_framebuffer_object GL_OES_depth24 GL_OES_depth32 GL_OES_fbo_render_mipmap GL_OES_rgb8_rgba8 GL_OES_stencil1 GL_OES_stencil4 GL_OES_stencil8 GL_OES_packed_depth_stencil GL_EXT_texture_format_BGRA8888 GL_APPLE_texture_format_BGRA8888 GL_OES_compressed_ETC1_RGB8_texture 
I/SurfaceFlinger(   65): GL_MAX_TEXTURE_SIZE = 8192
I/SurfaceFlinger(   65): GL_MAX_VIEWPORT_DIMS = 8192
I/SurfaceFlinger(   65): flags = 00010000
...
W/GeckoGLController(  330): GLController::updateCompositor with mCompositorCreated=false
I/Gecko   (  330): Attempting load of libEGL.so
D/        (  330): HostConnection::get() New Host Connection established 0x27fd98, tid 390
W/GeckoGLController(  330): GLController::compositorCreated
E/GeckoConsole(  330): OpenGL compositor Initialized Succesfully.
E/GeckoConsole(  330): Version: OpenGL ES 2.0
E/GeckoConsole(  330): Vendor: Google
E/GeckoConsole(  330): Renderer: OpenGL ES 2.0
E/GeckoConsole(  330): FBO Texture Target: TEXTURE_2D
W/GeckoGLController(  330): done GLController::updateCompositor
E/GeckoConsole(  330): Adding HealthReport:RequestSnapshot observer.
W/GeckoGLController(  330): GLController::serverSurfaceChanged(1024, 791)
D/GeckoLayerClient(  330): Window-size changed to (1024,791)
W/GeckoGLController(  330): GLController::resumeCompositor(1024, 791) and mCompositorCreated=true

It is quite crashy in this config, but that may easily be either the immaturity of the tools release I'm using or the behavior of the nvidia proprietary driver on my workstation. I'll keep looking for a more stable config. Either way, it looks like the "can't run on gingerbread" hurdle may be clearable after all.
Wow, nice. Is the emulator crashing or fennec?
Mostly fennec, once it took down the emulator too. Also iffy about actually displaying anything in the content area. But it's this strange behavior I've seen before wherein it loads and executes content, and thumbnails of the content show up in the recent-pages tab, it just ... doesn't quite seem able to flush the visual out to the host display. So if it were stable I suspect it'd be adequate for running some tests. I'm going to poke around the tools_rNN tags to see if I can find a release that works any better. Time being I've archived these kinda-working tools_r17-based binaries as:

  http://people.mozilla.org/~graydon/AVDs-armv7a-android-2.3.7_r1-build-2013-12-19-ubuntu-tools_r17-working.tar.gz
  http://people.mozilla.org/~graydon/emulators-armv7a-android-2.3.7_r1-build-2013-12-19-ubuntu-tools_r17-working.tar.gz
  http://people.mozilla.org/~graydon/libs-android-2.3.7_r1-build-2013-12-19-ubuntu-tools_r17-working.tar.gz
Update: tools_r22 doesn't want to build in combination with gingerbread head at all; too much stuff has moved around since. tools_r18 does and appears to work about as well as tools_r17, or a bit worse. Boots and runs fennec, but reliably crashes the emulator mid-pageload, in:

#0  0xf7ded290 in ?? () from /lib/i386-linux-gnu/libc.so.6
#1  0xf58b8d36 in GLESbuffer::setBuffer (this=0x809c1e20, size=96, usage=35040, data=0x0)
    at development/tools/emulator/opengl/host/libs/Translator/GLcommon/GLESbuffer.cpp:28
#2  0xf58b7de9 in GLEScontext::setBufferData (this=0x8c703638, target=34962, size=96, data=0x0, usage=35040)
    at development/tools/emulator/opengl/host/libs/Translator/GLcommon/GLEScontext.cpp:447
#3  0xf58a345c in glBufferData (target=34962, size=96, data=0x0, usage=35040)
    at development/tools/emulator/opengl/host/libs/Translator/GLES_V2/GLESv2Imp.cpp:306
#4  0xf7905c4c in gl2_decoder_context_t::decode (this=0x8c700934, buf=0x8437a008, len=44, stream=0x98800670)
    at out/host/linux-x86/obj/STATIC_LIBRARIES/libGLESv2_dec_intermediates/gl2_dec.cpp:190
#5  0xf79029d8 in RenderThread::Main (this=0x98801580)
    at development/tools/emulator/opengl/host/libs/libOpenglRender/RenderThread.cpp:122
#6  0xf79138f5 in osUtils::Thread::thread_main (p_arg=0x98801580)
    at development/tools/emulator/opengl/shared/OpenglOsUtils/osThreadUnix.cpp:83
#7  0xf7fa1d4c in start_thread () from /lib/i386-linux-gnu/libpthread.so.0
#8  0xf7da2bae in clone () from /lib/i386-linux-gnu/libc.so.6

Will do more comparative submodule-version archaeology tomorrow, see if I can get this under control.
Update: I've spent a couple days trying builds with different combinations of tags from different points-in-time around the backport. All variant so far show either failure-to-build or else the same null buffer error as in comment #177. I will continue to search this space for a little while longer.

Meanwhile I have emailed the google engineer who did the backport to ask about hints / blessed tag combinations / possible configuration settings I'm missing. Haven't heard back yet.

I may also try manually debugging the failure in terms of gdb and stepping through GL calls, though I imagine that will eat a lot of time and carry even less certain rewards. Other suggestions welcome.
Update: After a frustrating and unsuccessful week trying various combinations of sub-project tags and revisions within the AOSP build, I heard back from the google engineer responsible for GL emulation, and have received a few hints and encouragement that there ought to be a configuration that works, as well as a general offer of some Q&A / assistance getting it working. Trying a couple new things this week.

It might be worth figuring out when/where to draw a line on this quest; I don't know when our schedule demands we have something working / what plan B is. Non-tegra hardware?
Plan B is to stand up Android 2.3 testing on pandas, but that will have its own set of problems and will result in us not being able to deprecate pandas for correctness tests, so we'd like to avoid plan B if we can!
Update: In further consultation with the google engineer, he suggested that if the backport is failing to talk to a newer emulator, it is probably due to a mismatch in the gralloc layer's rendercontrol protocol. Today I manually backported the relevant changes to rendercontrol and have just built a gingerbread AVD that can speak to a modern emulator, which behaves well. Am presently running armv7a-fennec with host-accelerated GL on it.

IOW, it works now, bug "fixed". I'll post a patch momentarily.
Created attachment 8361456 [details] [diff] [review]
aosp-gingerbread-GL-backport-rendercontrol-fixes.patch

patch against aosp 'gingerbread' development branch (that carries the GL emulation backport) permitting communication with newer emulators with more functional GL host rendering.
Working AVDs from my workstation:

http://people.mozilla.com/~graydon/AVDs-armv7a-gingerbread-build-2014-01-16-graydon.tar.gz

Will do a cleaner build on AWS host tomorrow.
Graydon: awesome work!
(In reply to Graydon Hoare :graydon from comment #181)
> Update: In further consultation with the google engineer, he suggested that
> if the backport is failing to talk to a newer emulator, it is probably due
> to a mismatch in the gralloc layer's rendercontrol protocol. Today I
> manually backported the relevant changes to rendercontrol and have just
> built a gingerbread AVD that can speak to a modern emulator, which behaves
> well. Am presently running armv7a-fennec with host-accelerated GL on it.
> 
> IOW, it works now, bug "fixed". I'll post a patch momentarily.

This is freaking amazing.
Graydon: That's fabulous! Congrats!!

To "manage expectations", is it fair to say, "IOW, we can now run armv7a-fennec in an emulator running Android 2.3 hosted on AWS, and we are ready to try running the test suites in that environment"?
(In reply to Geoff Brown [:gbrown] from comment #186)
> Graydon: That's fabulous! Congrats!!
> 
> To "manage expectations", is it fair to say, "IOW, we can now run
> armv7a-fennec in an emulator running Android 2.3 hosted on AWS, and we are
> ready to try running the test suites in that environment"?

I believe so. I'll do another rebuild now on AWS, and should probably modify the AVD-building script to apply the patch automatically when building gingerbread (I don't really want to think about hosting a fork of AOSP just for this). Alternatively I might be able to convince upstream to take the patch.

Anyway, yes, broadly speaking I _think_ this puts android-2.3-armv7a in a similar position to where we were with android-4.2-armv7a back in comment #150, a month ago. Assuming I didn't imagine it. I'll post back here when I have confirmed a fresh build.
Confirmed, rebuilt by script, seems to work on my second desktop here (rendering to nvidia GL hardware; laptop at home was intel I think).

I added the requisite changes to the build script to automatically select appropriate revisions and apply the patch (downloaded for the time being from bugzilla itself; suggestions welcome for other locations). Diff to the build script is temporarily here: https://github.com/graydon/build-mozharness/commit/24e1430d5cc497e41f0ff8f6fcf19751480bc79b but I'll rebase and post for review here, land on mozharness trunk in a bit.

Meanwhile fresh AVDs built from script are here, should be able to take them out for a spin in testing:
http://people.mozilla.com/~graydon/AVDs-armv7a-gingerbread-build-2014-01-16-graydon.tar.gz
Oops! That should be:

http://people.mozilla.com/~graydon/AVDs-armv7a-gingerbread-build-2014-01-17-ubuntu.tar.gz

(Note date and user difference, sorry.)
Bug 961284 filed to deploy gingerbread AVDs to testing.
Created attachment 8361989 [details] [diff] [review]
various gingerbread-related updates to the AVD-building mozharness script

There are a few changes here. I can split them apart if you want but I don't think they're terribly tricky:

  - Change a bunch of exception levels to FATAL (they need to be, were wrong)

  - Add support for patching an AOSP tree before building it.

  - Modify tag selection to default to 'gingerbread' (the development branch)
    when the user asks for a 2.3.x version, unless they specify a tag as well.

  - Update to a qemu with partial GL-pipe support when building 'gingerbread'
    (tools_r17), continue to use tools_r12 for other 2.3.x tags.

  - Build with BUILD_EMULATOR_OPENGL=true and BUILD_EMULATOR_OPENGL_DRIVER=true
    in the environment.

  - Apply the remainder of the GL-backport patch (the bit I wrote yesterday
    to fix the rendercontrol implementation) when building 'gingerbread'.

I suspect the only contentious part of this is the bit where we download a patch from bugzilla and apply it. I'm happy to put the patch in some "more safe" location for longer-term storage, perhaps in the mozharness repo as well or such, but for now this seemed like as good a place as any (it's already hosted here).
Attachment #8361989 - Flags: review?(aki)
Comment on attachment 8361989 [details] [diff] [review]
various gingerbread-related updates to the AVD-building mozharness script

First off, thanks for all your work!


pyflakes says:

scripts/android_emulator_build.py:17: 'ERROR' imported but unused
scripts/android_emulator_build.py:257: undefined name 'FATAL'
scripts/android_emulator_build.py:260: undefined name 'FATAL'
...

That's fixable by changing the import on this line to s,ERROR,FATAL,
http://hg.mozilla.org/build/mozharness/file/d416937ec90a/scripts/android_emulator_build.py#l17

>   - Add support for patching an AOSP tree before building it.

We've tried this elsewhere.
This will probably cause problems if we ever want to do depend builds (i.e. non-clobber), but it appears we always clobber so this may work.

> I suspect the only contentious part of this is the bit where we download a
> patch from bugzilla and apply it. I'm happy to put the patch in some "more
> safe" location for longer-term storage, perhaps in the mozharness repo as
> well or such, but for now this seemed like as good a place as any (it's
> already hosted here).

I'm not entirely sure if this will cause issues or not.
I think we already have bugzilla as part of some production processes, so it's not introducing a new dependency.

>         if platform.system() != "Linux":
>-            self.exception("this script only works on (ubuntu) linux")
>+            self.exception("this script only works on (ubuntu) linux", level=FATAL)

I wonder if you mean self.fatal(); exception() is meant to dump the contents of a caught exception, here, and below.
fatal() is meant to output a message and exit.

Also, is there any cleanup that needs to be done before we exit?
android_emulator_unittest.py has a _post_fatal() defined that kills any emulator processes, not sure if there's anything like that.
http://hg.mozilla.org/build/mozharness/file/d416937ec90a/scripts/android_emulator_unittest.py#l327

I think these changes are ok, given the above fixes.


I also *think* I remember this script is only run locally?
I noticed that we're referencing internet URLs:
http://hg.mozilla.org/build/mozharness/file/d416937ec90a/scripts/android_emulator_build.py#l244
http://hg.mozilla.org/build/mozharness/file/d416937ec90a/scripts/android_emulator_build.py#l290
http://hg.mozilla.org/build/mozharness/file/d416937ec90a/scripts/android_emulator_build.py#l401

If we want to run this in production, this will be problematic:
a) sheriffs will hate this because the build can start burning when unknown external dependencies become unavailable.
b) (a) will make this build less reliable or predictable than otherwise
c) there are plans to close off the build network from accessing the outside world, as much as possible, to avoid the above two issues, as well as opsec reasons.  This script will break if/when that happens.

If not, it shouldn't be an issue.  It might be advisable for us to make a mirror of the needed deb repos, though, so we're not up a creek if the needed packages go away while we still need them.
Attachment #8361989 - Flags: review?(aki) → review+
(In reply to Aki Sasaki [:aki] from comment #192)

> pyflakes says:
> 
> scripts/android_emulator_build.py:17: 'ERROR' imported but unused
> scripts/android_emulator_build.py:257: undefined name 'FATAL'
> scripts/android_emulator_build.py:260: undefined name 'FATAL'
> ...
> 
> That's fixable by changing the import on this line to s,ERROR,FATAL,
> http://hg.mozilla.org/build/mozharness/file/d416937ec90a/scripts/
> android_emulator_build.py#l17
> 
> >   - Add support for patching an AOSP tree before building it.
> 
> We've tried this elsewhere.
> This will probably cause problems if we ever want to do depend builds (i.e.
> non-clobber), but it appears we always clobber so this may work.

It's pretty much always a clobber. Though I'll send the patch upstream as well, with luck they'll incorporate it.

> I'm not entirely sure if this will cause issues or not.
> I think we already have bugzilla as part of some production processes, so
> it's not introducing a new dependency.

Ok.

> I wonder if you mean self.fatal(); exception() is meant to dump the contents
> of a caught exception, here, and below.
> fatal() is meant to output a message and exit.

Oh, ok. I've replaced all the exception() calls with fatal() then, and removed both ERROR and FATAL imports.

> Also, is there any cleanup that needs to be done before we exit?
> android_emulator_unittest.py has a _post_fatal() defined that kills any
> emulator processes, not sure if there's anything like that.
> http://hg.mozilla.org/build/mozharness/file/d416937ec90a/scripts/
> android_emulator_unittest.py#l327

Hm, yes, I suppose it'd be a bit helpful to kill any stray emulators along the way. Usually this happens because of the explicit supervision of the emulator process by the script. But I'll add in an extra kill_processes() to be sure.

> I also *think* I remember this script is only run locally?

Yes, definitely. It's a rare task -- should only be run ever again if we want to get updated SUTagent / watcher binaries into the AVDs, or find some other flaw in the AVD images -- and likely only done on a dedicated workstation or VM. Takes hours and chews up 20+ GB of workspace for a full build.

> If not, it shouldn't be an issue.  It might be advisable for us to make a
> mirror of the needed deb repos, though, so we're not up a creek if the
> needed packages go away while we still need them.

Mhm, I wondered if it might be wise for me to archive either the debs or (since the oracle JDK is actually one of the dependencies, and it doesn't work with openJDK) maybe even a full VM or container image of the ubuntu environment I am using to build these. Earlier in this bug I was attempting to do the build in docker images for this reason (also its image-caching is helpful in avoiding costly repeat-steps). I'd be willing to go back to try that again now, or just make an AVD of the AWS machine I've been using (currently it's a dormant instance sitting on EBS).

Thanks for the review, landed version with changes:

https://hg.mozilla.org/build/mozharness/rev/f78c69c7bc26
Bug for landing the rendercontrol backport upstream on the gingerbread dev branch:

https://code.google.com/p/android/issues/detail?id=65044
mozharness merged to production
status update: pushed registration of new AVDs: 

http://hg.mozilla.org/users/asasaki_mozilla.com/ash-mozharness/rev/448925cc7fa8

Ran attempted build, got timeouts:

https://tbpl.mozilla.org/php/getParsedLog.php?id=33350189&tree=Ash&full=1

Reproduced locally: SUTAgent is having trouble starting up (again). Appears that maybe the sdcard is not being mounted properly. Investigating locally.
Update: the Vold (volume daemon, an automounter of sorts) on the system is failing to parse /etc/vold.fstab, because on the system image in the built AVD, that file doesn't exist. An older format of file, /etc/vold.conf, exists instead, but this was supposed to be phased out with froyo. This in turn is causing the sdcard to not mount.

Now attempting to figure out why this wound up in the AVD. Possibly has to do with using built system.img file rather than one of the ones from the platform-api-versioned image directories.
Further update: it appears to have something to do with the different build targets for AOSP and the different image files they produce. There are multiple recipes for building system.img and some of them pull in obsolete files for /etc from the development/data dir, others from system/core/rootdir. I thought I might have found the correct combination, but not quite yet. Likely soon.
Update: applying this changeset:

https://android.googlesource.com/platform/external/sqlite.git/+/bd4d2cb1791515d77d7663b8c670b64ab9b2fd0b

and rebuilding with `lunch sdk-eng` and `make sdk` rather than `lunch full-eng` and `make` produces AVDs of substantially different content (including the proper vold.fstab, not the obsolete vold.conf), and when I run them locally the sdcard mounts and SUTAgent comes up relatively soon thereafter.

Turns out the default target uses some relatively stale build rules and the sdk-related targets are the only ones that see much testing. None of this is documented anywhere and the sdk-eng lunch combo isn't even in the menu of options :(

New AVD is up at 
http://people.mozilla.com/~graydon/AVDs-armv7a-gingerbread-build-2014-01-23-ubuntu.tar.gz

sha512 is 7140e026b7b747236545dc30e377a959b0bdf91bb4d70efd7f97f92fce12a9196042503124b8df8d30c2d97b7eb5f9df9556afdffa0b5d9625008aead305c32b

Reopening bug 961284 to deploy to tooltool. I'll modify the build script accordingly (if this works).
Created attachment 8364687 [details] [diff] [review]
aosp-gingerbread-sqlite-readline-disable.patch

patch to AOSP external/sqlite to be able to build sdk-eng product
Created attachment 8364753 [details] [diff] [review]
bug-910092-teach-build-script-about-sqlite-patch-and-sdk-make-and-lunch-targets.patch

Couple additional modifications to the emulator build script to handle most recent roadblock (build sdk target, apply patch required to make that work).
Attachment #8364753 - Flags: review?(aki)
Comment on attachment 8364753 [details] [diff] [review]
bug-910092-teach-build-script-about-sqlite-patch-and-sdk-make-and-lunch-targets.patch

>         [["--patch"], {
>             "dest": "patch",
>-            "help": "'dir=url' of patch to apply to AOSP before building (eg. development=http://foo.com/bar.patch; default inferred)",
>+            "help": "'dir=url' comma-separated list of patches to apply to AOSP before building (eg. development=http://foo.com/bar.patch; default inferred)",
>         }],

If you wanted to make self.config['patch'] auto-split on commas, you could use the type "extend":
http://hg.mozilla.org/build/mozharness/file/108a5bcff97c/mozharness/base/config.py#l51
I think the weakness of 'extend' is there's no way to remove the default from the list; it currently only allows appending, so if you went that route you might have to use the value of 'None' as the current 'inferred'.  The current logic is fine; just pointing this out for future reference.
Attachment #8364753 - Flags: review?(aki) → review+
Update: after deploying the new AVDs, emulators start and SUTAgent is found, but tests still appear to be timing out / not executing:

https://tbpl.mozilla.org/php/getParsedLog.php?id=33478753&tree=Ash&full=1

I'm not sure if this is a real error or simply due to overloading slow slaves. Attempts to reproduce locally run into differences between the test env and my workstation; I should probably switch back to debugging on a loaner AWS host as in bug 915177.
There's lots of good news in that log:
 - emulators running
 - sutagent and watcher running
 - fennec installed and started
 - normal "activity" pings being received by sutagent

But there's no sign of tests running: The logcats show 60 minutes of test inactivity. I don't know what to make of that.


Do you think this is something to worry about:

13:03:52     INFO - failed to create drawable
13:03:52     INFO - Renderer error: failed to create/resize pbuffer!!

?
(In reply to Geoff Brown [:gbrown] from comment #204)
> There's lots of good news in that log:
>  - emulators running
>  - sutagent and watcher running
>  - fennec installed and started
>  - normal "activity" pings being received by sutagent

Yeah! Steps toward victory :)

> Do you think this is something to worry about:
> 
> 13:03:52     INFO - failed to create drawable
> 13:03:52     INFO - Renderer error: failed to create/resize pbuffer!!

It might be. I don't k now what component is generating that error, but I don't see it in any of my logs when I run locally. I'll try running under loopback Xvfb and see if I can reproduce it.
So, um, bad news is that inside "normal" xvncserver and Xvfb+x11vnc, I get:

  Failed to create Context 0x3005
  emulator: WARNING: Could not initialize OpenglES emulation, using software renderer.

Good news is this appears to not have anything to do with the custom AVD, the SDK emulator does the same thing on self-made AVDs (including x86) and others have reported it online. So I am encouraged!

I've tried manually starting Xvfb with '+extension GLX' and this works somewhat better, but still fails with 'eglMakeCurrent failed'. I imagine I'm not quite setting up the Xvfb correctly. Any hints?

Comment 208

5 years ago
I don't recommend using Xvfb (also mentioned in comment 54). Xorg with a dummy driver is the way to go. x11vnc can be used to take a peak into what's going on.
Status update: I got a loaner releng AWS machine set up again (bug 964473) and am VNC'ed into it. When I run the full 4-emulator mozharness unit test it seems to time out or lock up. Like I get some ANRs on the emulators from the system process, simply because it's running too slowly.

When I run a 1-emulator mozharness unit test, running (say) just the first mochitest-1 set, it runs successfully. As in: I am sitting watching it run each mochitest page through VNC, and it's working (browser up, pages cycling, green test:pass bar across top of page, etc.)

I will try 2- and 3-emulator tests tomorrow, but at this point I get the feeling the technical roadblocks are all gone and this is just a matter of the AWS machine being underpowered for a 4-emulator test.
Run completed, took a little over an hour to do just the mochitest-1 batch:

https://people.mozilla.org/~graydon/2014-01-28-armv7-gingerbread-log_raw.log

It might run faster if I disconnect from VNC. I'll also try that tomorrow.
(In reply to Graydon Hoare :graydon from comment #209)
> I will try 2- and 3-emulator tests tomorrow, but at this point I get the
> feeling the technical roadblocks are all gone and this is just a matter of
> the AWS machine being underpowered for a 4-emulator test.

That makes sense. We set up 4 emulators at a time for Android x86, which runs on in-house physical machines with multiple cores.
(In reply to Graydon Hoare :graydon from comment #210)
> Run completed, took a little over an hour to do just the mochitest-1 batch:
> 
> https://people.mozilla.org/~graydon/2014-01-28-armv7-gingerbread-log_raw.log
> 
> It might run faster if I disconnect from VNC. I'll also try that tomorrow.

That looks great!

There is a 1 hour time limit for jobs, which you ran into here:

 56975 INFO TEST-PASS | /tests/content/html/content/test/forms/test_validation.html | Invalid event should not have been handled
 TEST-UNEXPECTED-FAIL | /tests/content/html/content/test/forms/test_validation.html | application ran for longer than allowed maximum time
 INFO | automation.py | Application ran for: 1:02:06.583435

If they don't run faster post-VNC, we should just increase the number of mochitest chunks.

Comment 213

5 years ago
(In reply to Graydon Hoare :graydon from comment #207)
> So, um, bad news is that inside "normal" xvncserver and Xvfb+x11vnc, I get:
> 
>   Failed to create Context 0x3005
>   emulator: WARNING: Could not initialize OpenglES emulation, using software
> renderer.
> 
> Good news is this appears to not have anything to do with the custom AVD,
> the SDK emulator does the same thing on self-made AVDs (including x86) and
> others have reported it online. So I am encouraged!
> 
> I've tried manually starting Xvfb with '+extension GLX' and this works
> somewhat better, but still fails with 'eglMakeCurrent failed'. I imagine I'm
> not quite setting up the Xvfb correctly. Any hints?

What would this mean to our current AWS test machines?
What changes would be needed?

graydon, on another note, could you please land your mozharness changes to http://hg.mozilla.org/build/mozharness and http://hg.mozilla.org/users/asasaki_mozilla.com/ash-mozharness?
I believe most of your patches are already reviewed positively so they can land at anytime.

Aki's user repo is shared by multiple people and it seems to not have your latest changes on the current tip (as it branched).
If we land to the official mozharness repo, it will be likely that people will keep the changes when pushing to aki's ash-mozharness user repo.

I'm interested on seeing how things are shaping on untouched production AWS machines.
https://tbpl.mozilla.org/?tree=Ash&jobname=Android%202.3

Comment 214

5 years ago
FYI, your loaner is an m1.medium instance.
Yesterday, we started running jobs on m3.medium instances instead of m1.medium.
Future loaners will be m3.medium instances.
We can get you an m3.medium instance if that helps.

From my reading, we are not running *tests* on EC2 on multi-core instances. [1]
My apologies, I should have noticed this earlier (I came from the Android x86 project where we have 4-cores on in-house machines).

Do we want to run Android 2.3 on 1-core machines by running 1 emulator and have more jobs?
Or work on having a second class of EC2 test instances that would have more cores?
The former is easy to go towards to.
The latter would require a bunch of work and hopefully having rail with spare cycles (which might not be at this moment) or go a bit slower through me.

Please let me know which way to proceed.

We probably also want to have a quick financial analysis at which approach is more cost-efficient (since this year we're big about cost-efficiency).
https://aws.amazon.com/ec2/pricing/
m3.medium (1 core)    $0.113 per Hour (current type)
m3.xlarge (4 cores)   $0.450 per Hour (3.982300885 times the cost of m3.medium)
m1.large  (4 cores)   $0.240 per Hour (2.123893805 times the cost of m3.medium - older generation)

FYI this pricing is for normal instances, however, we use a mix with spot instances which is dynamic (bid prices).

[1]
https://aws.amazon.com/ec2/instance-types/

Instance Type	vCPU	ECU	Mem.	Storage (GB)
m1.medium 	1 	2 	3.75 	1 x 410 
m3.medium 	1 	3 	3.75 	1 x 4 SSD*6
(In reply to Armen Zambrano [:armenzg] (Release Engineering) (EDT/UTC-4) from comment #213)

> What would this mean to our current AWS test machines?
> What changes would be needed?

Apparently nothing. Things work fine on the loaner, however it's configured.

> graydon, on another note, could you please land your mozharness changes to
> http://hg.mozilla.org/build/mozharness and
> http://hg.mozilla.org/users/asasaki_mozilla.com/ash-mozharness?
> I believe most of your patches are already reviewed positively so they can
> land at anytime.

They're all in the ash-mozharness tree, they're just one of several heads, not tip. There are multiple heads and I didn't (and still don't) quite know how to go about merging them within ash-mozharness and/or propagating them to mozharness, without stepping on anyone's toes or messing up history.

(In reply to Armen Zambrano [:armenzg] (Release Engineering) (EDT/UTC-4) from comment #214)

> Future loaners will be m3.medium instances.
> We can get you an m3.medium instance if that helps.

Possibly, I'm not really concerned with the instance we use. "Fast enough, but cheap" seems to be a good goal, no?

> Do we want to run Android 2.3 on 1-core machines by running 1 emulator and
> have more jobs?
> Or work on having a second class of EC2 test instances that would have more
> cores?
> The former is easy to go towards to.

Maybe do what's easiest, then? I don't know, I don't really feel like it's my position to make the cost/benefit tradeoffs associated with this. I have no idea what cost envelope we're trying to fit within, relative to the costs of running physical devices.

> We probably also want to have a quick financial analysis at which approach
> is more cost-efficient (since this year we're big about cost-efficiency).
> https://aws.amazon.com/ec2/pricing/
> m3.medium (1 core)    $0.113 per Hour (current type)
> m3.xlarge (4 cores)   $0.450 per Hour (3.982300885 times the cost of
> m3.medium)
> m1.large  (4 cores)   $0.240 per Hour (2.123893805 times the cost of
> m3.medium - older generation)
> 
> FYI this pricing is for normal instances, however, we use a mix with spot
> instances which is dynamic (bid prices).

I don't know what the plan is as far as transitioning to using spot instances, but I gather Taras has been running experiments with them. Fwiw, m1.mediums are currently hovering around $0.02/hr and even m3.xlarges are only $0.09/hr on the spot market, so generally about 1/5 the price.

Again, this feels like it's a bit outside my area of expertise, a question for a different person / different bug.

Comment 216

5 years ago
(In reply to Graydon Hoare :graydon from comment #215)
> (In reply to Armen Zambrano [:armenzg] (Release Engineering) (EDT/UTC-4)
> from comment #213)
> 
> > What would this mean to our current AWS test machines?
> > What changes would be needed?
> 
> Apparently nothing. Things work fine on the loaner, however it's configured.
> 
Great!

> > graydon, on another note, could you please land your mozharness changes to
> > http://hg.mozilla.org/build/mozharness and
> > http://hg.mozilla.org/users/asasaki_mozilla.com/ash-mozharness?
> > I believe most of your patches are already reviewed positively so they can
> > land at anytime.
> 
> They're all in the ash-mozharness tree, they're just one of several heads,
> not tip. There are multiple heads and I didn't (and still don't) quite know
> how to go about merging them within ash-mozharness and/or propagating them
> to mozharness, without stepping on anyone's toes or messing up history.
> 
I always tend to add my patch back on top of tip if my code is unrelated to the people that also use it.
I sometimes poke them on IRC if I have conflicting changes.

> (In reply to Armen Zambrano [:armenzg] (Release Engineering) (EDT/UTC-4)
> from comment #214)
> 
> > Future loaners will be m3.medium instances.
> > We can get you an m3.medium instance if that helps.
> 
> Possibly, I'm not really concerned with the instance we use. "Fast enough,
> but cheap" seems to be a good goal, no?
> 
Our current setup is cheap as-is. I believe we will have to run 1 emulator instead of 4 if we stick to our current 1-core instance.

> > Do we want to run Android 2.3 on 1-core machines by running 1 emulator and
> > have more jobs?
> > Or work on having a second class of EC2 test instances that would have more
> > cores?
> > The former is easy to go towards to.
> 
> Maybe do what's easiest, then? I don't know, I don't really feel like it's
> my position to make the cost/benefit tradeoffs associated with this. I have
> no idea what cost envelope we're trying to fit within, relative to the costs
> of running physical devices.
> 
I am only considering VM vs VM.
If Android x86 moves away from physical hardware is a different topic,

> > We probably also want to have a quick financial analysis at which approach
> > is more cost-efficient (since this year we're big about cost-efficiency).
> > https://aws.amazon.com/ec2/pricing/
> > m3.medium (1 core)    $0.113 per Hour (current type)
> > m3.xlarge (4 cores)   $0.450 per Hour (3.982300885 times the cost of
> > m3.medium)
> > m1.large  (4 cores)   $0.240 per Hour (2.123893805 times the cost of
> > m3.medium - older generation)
> > 
> > FYI this pricing is for normal instances, however, we use a mix with spot
> > instances which is dynamic (bid prices).
> 
> I don't know what the plan is as far as transitioning to using spot
> instances, but I gather Taras has been running experiments with them. Fwiw,
> m1.mediums are currently hovering around $0.02/hr and even m3.xlarges are
> only $0.09/hr on the spot market, so generally about 1/5 the price.
> 
> Again, this feels like it's a bit outside my area of expertise, a question
> for a different person / different bug.

Yeah. Correct. I was just collecting some of the info I could get my hands on.

Let's stay with the current setup and we can look in the future if we have to optimize and aim for four emulator per machine.
I will attach the patches.
Perhaps interesting: http://www.genymotion.com/

Comment 218

5 years ago
(In reply to Dan Mosedale (:dmose) from comment #217)
> Perhaps interesting: http://www.genymotion.com/

Looks like the performance boost there comes from using the x86 emulator,
Created attachment 8368248 [details] [diff] [review]
miscallaneous hunks not yet propagated to mozharness trunk
Attachment #8368248 - Flags: review?(gbrown)
Attachment #8368248 - Flags: review?(gbrown) → review+
All residual changes merged to mozharness trunk:

 https://hg.mozilla.org/build/mozharness/rev/e7fa438cddd9

Comment 221

5 years ago
Created attachment 8368546 [details] [diff] [review]
remove "sets" approach to android 2.3
Attachment #8368546 - Flags: review?(aki)
merged "something" to production
Attachment #8368546 - Flags: review?(aki) → review+

Comment 223

5 years ago
Comment on attachment 8368546 [details] [diff] [review]
remove "sets" approach to android 2.3

checked-in:
https://hg.mozilla.org/build/buildbot-configs/rev/669d5411a23b

This change will be live on Ash once we have a reconfiguration of the masters happening.
Probably on Monday/Tuesday.
A curious aside for future x86-emulation-in-cloud experiments:

Taras pointed out that the VMs on digitalocean come up with vmx bits set. That is, they seem to support _nested_ virtualization, for running KVM within KVM. I was looking at AWS and GCE back in comment 26 and comment 28 and neither of them supported nested virt; it's relatively new and not too many clients probably notice or care about its absence. Anyway, I instantiated a digitalocean "droplet" myself and brought up a nested KVM to see the perf cost using the same highly-dubious integer benchmark I used before. 

You may recall the AWS numbers: 

          native:  5917.7 Kpos/sec  <-- outer VM, xen-on-metal
      qemu/kvm64:  1387.8 Kpos/sec  <-- inner VM, kvm-on-xen-on-metal

These improve markedly when on a digitalocean host with nested virt:

   droplet outer:   5015.7 Kpos/sec  <-- outer VM, kvm-on-metal
  droplet nested:   5434.5 Kpos/sec  <-- inner VM, kvm-on-kvm-on-metal

I'm assuming the fact that the nested VM is _faster_ is just measurement noise. But at very least it's not factor-of-3 slower like in the AWS and GCE cases.
Created attachment 8369543 [details] [diff] [review]
bug-910092-redundant-jdk-typo.patch

Just noticed this on code review: still have a redundant entry for openJDK in the apt_get_dependencies step. Ought to remove it.
Attachment #8369543 - Flags: review?(gbrown)
Attachment #8369543 - Flags: review?(gbrown) → review+
(In reply to Armen Zambrano [:armenzg] (Release Engineering) (EDT/UTC-4) from comment #223)
> This change will be live on Ash once we have a reconfiguration of the
> masters happening.

First results are available now. I will review those and try to turn them green!
Created attachment 8371014 [details] [diff] [review]
a couple minor residual fixes for the AVD buider script

While attempting to reproduce an AVD-build from a clean ubuntu 12.04 AWS machine, I ran into a minor error here wherein the second make invocation (for sdcard) after 'sdk' invalidates the 'lunch' config and triggers a clean target. The solution is just to move the sdcard call into the same shell command.

Along the way I also moved the patch-aosp call into a separate mozharness step so that users can retrigger builds without attempting to re-patch (and failing).
Attachment #8371014 - Flags: review?(gbrown)
Attachment #8371014 - Flags: review?(gbrown) → review+
in production

Updated

5 years ago
Blocks: 971176

Comment 231

5 years ago
Should we close this bug and green things up on bug 967704?

Comment 232

5 years ago
We're greening the tests up on Ash (bug 967704):
https://tbpl.mozilla.org/?tree=Ash&jobname=Android%202.3
Status: NEW → RESOLVED
Last Resolved: 5 years ago
No longer depends on: 967704
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.