Closed Bug 1405396 Opened 7 years ago Closed 7 years ago

Make Fennec B build use --with-gradle and turn Fennec Bg into Bng using --without-gradle

Categories

(Firefox Build System :: Android Studio and Gradle Integration, enhancement)

enhancement
Not set
normal

Tracking

(firefox58 fixed)

RESOLVED FIXED
mozilla58
Tracking Status
firefox58 --- fixed

People

(Reporter: nalexander, Assigned: nalexander)

References

(Blocks 2 open bugs)

Details

Attachments

(5 files, 2 obsolete files)

This ticket tracks "flipping the switch" to make --with-gradle the default in automation.

We want to keep the existing moz.build builds arounds, at least for a while.  I propose:

- to turn the existing Fennec B builds into Gradle builds that use --with-gradle
- to rename the Fennec Bg builds to Fennec Bng into non-Gradle builds that use --without-gradle

Re-purposing the existing B build task rather than switching to the existing Bg task should prevent unrelated bits of automation and unknown dependencies from requiring updates.  The existing Bg task is Tier-2, so there shouldn't be any issue repurposing it.

I'm not sure _exactly_ the work involved here.  It probably just looks like:

- re-arranging mobile/android/config/**
- re-arranging testing/mozharness/configs/builds/releng_sub_android_configs/64_api_16_gradle.py
- updating taskcluster/ci/build/android.yml
We can do a lot of testing of this using Nightly builds triggered by try in TC.  These builds are signed with a CI signing key -- *not* the Nightly signing key -- but we can learn a lot about upgrading using them.  Callek tells me that 

nalexander> Callek_cloud9: can you point me at the way to build Fennec Nightly (N), Nightly multilocale (if that's different), and Nightly single-locale repack builds in try now?  Thanks!
11:36 AM (Maybe this has gotten better with `mach try fuzzy`?  I haven't used that yet.)

<Callek_cloud9> nalexander: the best way is `./mach try fuzzy --full` and you can use that to select a Fennec Nightly and/or a Fennec single locale repack
<Callek_cloud9> e.g. type 'es-ES android'
<nalexander> Callek_cloud9: are those builds signed?  I have lost track of the signing details in TC these With a Nightly key, I guess.  Or any non-Debug key.
<Callek_cloud9> nalexander: they are not, *but* the should be signable if you select the signed tasks :-)
if you are using the --full stuff
it would be using the "CI" key, so not the Nightly key for try
<nalexander> Callek_cloud9: oh, interesting.  Hints on the "signed tasks"?
<Callek_cloud9> nalexander: for fuzzy search params 'android signing nightly' -- for the single locale ones just type '<space>l10n' into the search and pick one
ignore checksums since that happens after the "push nightly to archive.m.o" stuff which doesn't work on try
A status update on this.  I've been pushing to oak and working through issues; the latest issue is _exactly_ the l10n issue that I expected to come up: see the logs at https://treeherder.mozilla.org/#/jobs?repo=oak&selectedJob=138074989.

maliu: what's happening is that the l10n packager has put new stuff into the object directory and we're trying to run Gradle, but the Makefile knows that running Gradle requires resources to be updated, and the build system isn't in position to do that.  (See the Comment 1/3 bits for more details.)  I have a push that's trying to do the same strategy -- .gradle.nodeps in the Makefile -- to keep it all working.

We'll be able to get this working, it just takes 2-3 hours for each cycle, since building these Oak Nightlies is slow :(  It's very difficult to work on this locally, 'cuz the automation steps are... opaque.
Assignee: nobody → nalexander
Status: NEW → ASSIGNED
Further status update: I have green oak Nightly builds... but we have resource corruption, of exactly the type one would expect if the l10n/nodeps mechanism wasn't working.  We either:

- understand and fix the mechanism
- figure out how to put the updated Gradle classes.dex into the APK

Fixing this will be tomorrow's problem.  It will start by figuring out:

- if _local_ builds are busted
- then if _automation_ en-US builds are busted
- then if _automation_ multi builds are busted

It's possible that Gradle's aapt just isn't deterministic...

Anyway, next step is to start comparing the resources tables of all the APKs in flight and try to get to the bottom of things.
Bogdan: Sorina: this is what we have to work through in order to have testable APKs for you.  Sorry that we're lagging!
Flags: needinfo?(sorina.florean)
Flags: needinfo?(bogdan.surd)
(In reply to Nick Alexander :nalexander from comment #6)
> Bogdan: Sorina: this is what we have to work through in order to have
> testable APKs for you.  Sorry that we're lagging!

Oh -- that didn't really require NI -- just wanted you looped in.
Flags: needinfo?(sorina.florean)
Flags: needinfo?(bogdan.surd)
Thanks for letting us know.
snorp: gbrown: I'm trying to schedule gv against x86 builds (that are created with Gradle in this ticket).  See jobs like https://treeherder.mozilla.org/#/jobs?repo=oak&selectedJob=138552288.  The geckoview_example.apk looks well-formed and x86.  Can you interpret the logs and suggest why libmozglue can't be found on x86?  Thanks!
Flags: needinfo?(snorp)
Flags: needinfo?(gbrown)
(In reply to Nick Alexander :nalexander from comment #5)
> Further status update: I have green oak Nightly builds... but we have
> resource corruption, of exactly the type one would expect if the l10n/nodeps
> mechanism wasn't working.  We either:
> 
> - understand and fix the mechanism
> - figure out how to put the updated Gradle classes.dex into the APK
> 
> Fixing this will be tomorrow's problem.  It will start by figuring out:
> 
> - if _local_ builds are busted
> - then if _automation_ en-US builds are busted
> - then if _automation_ multi builds are busted
> 
> It's possible that Gradle's aapt just isn't deterministic...
> 
> Anyway, next step is to start comparing the resources tables of all the APKs
> in flight and try to get to the bottom of things.

I added debug logging of the R.txt files produced during the build, package, and multi-l10n package.  The results in https://treeherder.mozilla.org/#/jobs?repo=oak&selectedJob=138545193 are pretty clear: the multi-l10n package simply generates a different R.txt, suggesting that (newer? Gradle?) aapt just isn't deterministic.

Now, we're producing a fresh classes.dex at the same time as we generate the R.txt, and we could include this fresh classes.dex into the multi-l10n packaged APK.  I wrote a comment to Max about doing this, but:

- now I can't find the comment!
- I recall concluding there was an issue with this approach, but I can't remember what it was!

It might have been a _future_ issue with multi-dexing, where we will need to include more than just classes.dex into the APK.  I'll keep digging and thinking about this.
(In reply to Nick Alexander :nalexander from comment #10)
> (In reply to Nick Alexander :nalexander from comment #5)
> > Further status update: I have green oak Nightly builds... but we have
> > resource corruption, of exactly the type one would expect if the l10n/nodeps
> > mechanism wasn't working.  We either:
> > 
> > - understand and fix the mechanism
> > - figure out how to put the updated Gradle classes.dex into the APK
> > 
> > Fixing this will be tomorrow's problem.  It will start by figuring out:
> > 
> > - if _local_ builds are busted
> > - then if _automation_ en-US builds are busted
> > - then if _automation_ multi builds are busted
> > 
> > It's possible that Gradle's aapt just isn't deterministic...
> > 
> > Anyway, next step is to start comparing the resources tables of all the APKs
> > in flight and try to get to the bottom of things.
> 
> I added debug logging of the R.txt files produced during the build, package,
> and multi-l10n package.  The results in
> https://treeherder.mozilla.org/#/jobs?repo=oak&selectedJob=138545193 are
> pretty clear: the multi-l10n package simply generates a different R.txt,
> suggesting that (newer? Gradle?) aapt just isn't deterministic.
> 
> Now, we're producing a fresh classes.dex at the same time as we generate the
> R.txt, and we could include this fresh classes.dex into the multi-l10n
> packaged APK.  I wrote a comment to Max about doing this, but:
> 
> - now I can't find the comment!

Aha, found it: https://hg.mozilla.org/projects/oak/rev/3a6e8fedc00a5c2930e0f2bc3077cd9beed5c5d2

I said:

"""
As a side note, this runs the entire Gradle process again, including
producing classes.dex.  (The moz.build process doesn't compile the
Java source code again.)  So in the future -- and possibly right away,
if there are issues -- we could copy the regenerated classes.dex into
the repacked apk and avoid the need for the R.java IDs to line up
across the |mach build| and |mach package| invocations.
"""

I think this approach will break single-locale l10n repacks and possibly multi-locale l10n packages because the l10n mozconfig doesn't have lots of keyfiles, like at:

http://searchfox.org/mozilla-central/search?q=l10n-nightly&path=

But those are only used for single-locale repacks.  If the only way to get working Gradle builds is to break single-locale repacks for "a while", I think we'd take that trade, but I'm not the only stake-holder.
(In reply to Nick Alexander :nalexander from comment #9)
> geckoview_example.apk looks well-formed and x86.  Can you interpret the logs
> and suggest why libmozglue can't be found on x86?  Thanks!

This part of the logcat seems relevant:

10-20 13:11:48.990   722   735 F GeckoLoader: Couldn't load mozglue. Trying native library dir.
10-20 13:11:49.030   722   735 F GeckoLoader: Library doesn't exist when it should.
10-20 13:11:49.040   722   735 D dalvikvm: Trying to load lib /data/data/org.mozilla.geckoview_example/lib/libmozglue.so 0x41744e30
10-20 13:11:49.040   722   735 E dalvikvm: dlopen("/data/data/org.mozilla.geckoview_example/lib/libmozglue.so") failed: dlopen failed: library "/data/data/org.mozilla.geckoview_example/lib/libmozglue.so" not found
10-20 13:11:49.040   722   735 F GeckoLoader: Couldn't load /data/data/org.mozilla.geckoview_example/lib/libmozglue.so: java.lang.UnsatisfiedLinkError: dlopen failed: library "/data/data/org.mozilla.geckoview_example/lib/libmozglue.so" not found
10-20 13:11:49.060   722   735 D dalvikvm: Trying to load lib /data/app-lib/org.mozilla.geckoview_example/libmozglue.so 0x41744e30
10-20 13:11:49.060   722   735 E dalvikvm: dlopen("/data/app-lib/org.mozilla.geckoview_example/libmozglue.so") failed: dlopen failed: library "/data/app-lib/org.mozilla.geckoview_example/libmozglue.so" not found
10-20 13:11:49.060   722   735 F GeckoLoader: Couldn't load /data/app-lib/org.mozilla.geckoview_example/libmozglue.so: java.lang.UnsatisfiedLinkError: dlopen failed: library "/data/app-lib/org.mozilla.geckoview_example/libmozglue.so" not found
10-20 13:11:49.091   722   735 D dalvikvm: Trying to load lib /data/data/org.mozilla.geckoview_example/lib/libmozglue.so 0x41744e30
10-20 13:11:49.091   722   735 E dalvikvm: dlopen("/data/data/org.mozilla.geckoview_example/lib/libmozglue.so") failed: dlopen failed: library "/data/data/org.mozilla.geckoview_example/lib/libmozglue.so" not found
10-20 13:11:49.091   722   735 F GeckoLoader: Couldn't load /data/data/org.mozilla.geckoview_example/lib/libmozglue.so: java.lang.UnsatisfiedLinkError: dlopen failed: library "/data/data/org.mozilla.geckoview_example/lib/libmozglue.so" not found
10-20 13:11:49.110   722   735 W GeckoLoader: lib/armeabi-v7a/libmozglue.so not found in APK /data/app/org.mozilla.geckoview_example-1.apk
10-20 13:11:49.110   722   735 E GeckoLoader: Load diagnostics: LOAD mozglue: ABI: null, armeabi-v7a: Data: /data/data/org.mozilla.geckoview_example, ax=false, ddx=false, -1x=false, -2x=false, nativeLib: /data/app-lib/org.mozilla.geckoview_example-1, dirx=true, libx=false
10-20 13:11:49.110   722   735 W dalvikvm: threadid=11: thread exiting with uncaught exception (group=0x414c8700)
10-20 13:11:49.120   722   735 E GeckoCrashHandler: >>> REPORTING UNCAUGHT EXCEPTION FROM THREAD 51 ("Gecko")
10-20 13:11:49.120   722   735 E GeckoCrashHandler: java.lang.UnsatisfiedLinkError: Couldn't load mozglue from loader dalvik.system.PathClassLoader[DexPathList[[zip file "/system/framework/android.test.runner.jar", zip file "/data/app/org.mozilla.geckoview_example-1.apk"],nativeLibraryDirectories=[/data/app-lib/org.mozilla.geckoview_example-1, /system/lib]]]: findLibrary returned null
10-20 13:11:49.120   722   735 E GeckoCrashHandler: 	at java.lang.Runtime.loadLibrary(Runtime.java:355)
10-20 13:11:49.120   722   735 E GeckoCrashHandler: 	at java.lang.System.loadLibrary(System.java:525)
10-20 13:11:49.120   722   735 E GeckoCrashHandler: 	at org.mozilla.gecko.mozglue.GeckoLoader.doLoadLibraryExpected(GeckoLoader.java:371)
10-20 13:11:49.120   722   735 E GeckoCrashHandler: 	at org.mozilla.gecko.mozglue.GeckoLoader.doLoadLibrary(GeckoLoader.java:397)
10-20 13:11:49.120   722   735 E GeckoCrashHandler: 	at org.mozilla.gecko.mozglue.GeckoLoader.loadMozGlue(GeckoLoader.java:458)
10-20 13:11:49.120   722   735 E GeckoCrashHandler: 	at org.mozilla.gecko.GeckoThread.initGeckoEnvironment(GeckoThread.java:256)
10-20 13:11:49.120   722   735 E GeckoCrashHandler: 	at org.mozilla.gecko.GeckoThread.run(GeckoThread.java:369)

but I don't immediately see what's going wrong.
catlee: aki: I would normally start with Callek, but he's away until early November.

I recall that the l10n single-locale repacking tasks don't have access to the same secrets as the builders, hence things like http://searchfox.org/mozilla-central/source/mobile/android/config/mozconfigs/android-api-16/l10n-nightly#35-43.  It's very likely that I'll want the l10n single-locale repacking business to be able to actually build the Android bits again (for complicated reasons that are partially revealed in this ticket).  Can you suggest how I get the same secrets available to the l10n repacking tasks?

It looks like this is mostly in mozharness configs, perhaps in http://searchfox.org/mozilla-central/source/taskcluster/ci/l10n/kind.yml#83-84.

It looks like aki has already given me an oak-l10n configuration, at https://hg.mozilla.org/projects/oak/file/711b3ebc1f6a0492e7c321dd59bf253954d7040b/testing/mozharness/configs/single_locale/oak_android-api-16.py.  Thanks!

Since these secrets shouldn't depend on the branch, can I just add the secret configurations from

http://searchfox.org/mozilla-central/source/testing/mozharness/configs/builds/releng_base_android_64_builds.py#38-70

to

http://searchfox.org/mozilla-central/source/testing/mozharness/configs/single_locale/tc_android-api-16.py
Flags: needinfo?(catlee)
Flags: needinfo?(aki)
gbrown: yeah, I just can't make any sense out of the

10-20 13:11:49.110   722   735 E GeckoLoader: Load diagnostics: LOAD mozglue: ABI: null, armeabi-v7a: Data: /data/data/org.mozilla.geckoview_example, ax=false, ddx=false, -1x=false, -2x=false, nativeLib: /data/app-lib/org.mozilla.geckoview_example-1, dirx=true, libx=false

which sure looks like it think's its an ARM device.  Maybe the gv task is launching an ARM emulator and not an x86 emulator?
Aha! Just update:

https://hg.mozilla.org/mozilla-central/annotate/24d3adf28162/taskcluster/ci/test/misc.yml#l12

to use the x86 config for the x86 platform.
Flags: needinfo?(gbrown)
(In reply to Nick Alexander :nalexander from comment #13)
> catlee: aki: I would normally start with Callek, but he's away until early
> November.
> 
> I recall that the l10n single-locale repacking tasks don't have access to
> the same secrets as the builders, hence things like
> http://searchfox.org/mozilla-central/source/mobile/android/config/mozconfigs/
> android-api-16/l10n-nightly#35-43.  It's very likely that I'll want the l10n
> single-locale repacking business to be able to actually build the Android
> bits again (for complicated reasons that are partially revealed in this
> ticket).  Can you suggest how I get the same secrets available to the l10n
> repacking tasks?

That's not ideal, in that we're going to have separate resulting binaries. However, we stopped shipping single locale on beta or release, so that's possibly not as bad as it could be.

I believe the l10n repack tasks are on the same workerTypes, so they should have access to the same secrets. We would essentially switch out the mobile_l10n.py script to handle the new workflow; we may even need to switch this to be a new build type, or a multilocale build with a single locale, or something like that. Whatever script we use would have to use SecretsMixin, and get the secrets config somehow.

> It looks like this is mostly in mozharness configs, perhaps in
> http://searchfox.org/mozilla-central/source/taskcluster/ci/l10n/kind.yml#83-
> 84.
> 
> It looks like aki has already given me an oak-l10n configuration, at
> https://hg.mozilla.org/projects/oak/file/
> 711b3ebc1f6a0492e7c321dd59bf253954d7040b/testing/mozharness/configs/
> single_locale/oak_android-api-16.py.  Thanks!

np

> Since these secrets shouldn't depend on the branch, can I just add the
> secret configurations from
> 
> http://searchfox.org/mozilla-central/source/testing/mozharness/configs/
> builds/releng_base_android_64_builds.py#38-70
> 
> to
> 
> http://searchfox.org/mozilla-central/source/testing/mozharness/configs/
> single_locale/tc_android-api-16.py

It's possible. If you're switching to a new single locale build, you may want to go the other way and add optional single locale logic to the build. The main concern I'd have with copying secrets config would be if that prevented any local usage of the config files, but we may have de facto dropped that requirement years ago.
Flags: needinfo?(aki)
(In reply to Geoff Brown [:gbrown] from comment #16)
> Aha! Just update:
> 
> https://hg.mozilla.org/mozilla-central/annotate/24d3adf28162/taskcluster/ci/
> test/misc.yml#l12
> 
> to use the x86 config for the x86 platform.

Hi Geoff, I must have done this wrong.  Patch at

https://hg.mozilla.org/projects/oak/rev/11b7ab2c3c206915179080be13777e0d5a8d4c98

led to busted decision task

https://treeherder.mozilla.org/#/jobs?repo=oak&selectedJob=138639447

Can you suggest how to address this?
Flags: needinfo?(gbrown)
I think that *almost* worked, and you only need to cover off another case or two to make taskcluster happy.

I suggest:

         config:
             by-test-platform:
                 android-4.2-x86/opt:
                     - android/androidx86.py
                 default:
                     - android/androidarm_4_3.py

This might also work:

         config:
             by-test-platform:
                 android-4.2-x86/opt:
                     - android/androidx86.py
                 android.*:
                     - android/androidarm_4_3.py

Compare to https://dxr.mozilla.org/mozilla-central/rev/69e24c678a28dc46a4c1bda3ff04b2f6106ff71a/taskcluster/ci/test/xpcshell.yml#8, if that helps?
Flags: needinfo?(gbrown)
bogdan: sorina: ok, we have finally produced some oak Nightly builds that are worth testing!  Can we do the first pass on the builds from 

https://treeherder.mozilla.org/#/jobs?repo=oak&revision=b9e7448fca87a1e6d9844f93862b318004ec80d4&filter-searchStr=Ns

it's a little unclear, since each of those Ns (Nightly, signed) jobs produces _two_ APKs, but you want the multi one (not the en-US one).  So, for ARM devices, you want

https://queue.taskcluster.net/v1/task/QU_2RJhzR0-we-jZdjavEA/runs/0/artifacts/public/build/target.apk

That should install over-top of a Google Play install -- I just did that myself -- and it seems to basically work.  You can see that it's an oak build by looking in about:buildconfig.

You'll need to install with |adb install -r -d ...| -- the "-r" replaces your existing install, and the "-d" allows a downgrade (since that build was produced on Friday last week).

Thanks!
Flags: needinfo?(sorina.florean)
Flags: needinfo?(bogdan.surd)
Flags: needinfo?(catlee)
Hi Nick,

Thanks for letting us know, we have Beta 11 scheduled today, but will do a test run by the end of the day and let you know if we find any major issues.
Tomorrow we plan all day for Gradle testing.
Flags: needinfo?(sorina.florean)
Flags: needinfo?(bogdan.surd)
Depends on: 1411686
Depends on: 1411688
No longer depends on: 1411688
See Also: → 1411688
Comment on attachment 8922030 [details]
Bug 1405396 - Pre: Configure geckoview test for x86.

https://reviewboard.mozilla.org/r/193022/#review198372
Attachment #8922030 - Flags: review?(gbrown) → review+
I've tested the apk from treeherder. I found it has got the correct MmaLeanplumImp ( cause it reaches this line[1] so I can find the shared_pref). But I'm not sure if it got the correct AccesKey/ClienId in MMAConstant[2] from  task cluster.


[1] http://searchfox.org/mozilla-central/rev/dd47bee6468de7e1221b4d006342ad6b9813d0e5/mobile/android/base/java/org/mozilla/gecko/mma/MmaDelegate.java#169
[2] http://searchfox.org/mozilla-central/rev/dd47bee6468de7e1221b4d006342ad6b9813d0e5/mobile/android/base/MmaConstants.java.in#17
Comment on attachment 8922541 [details]
Bug 1405396 - Post: Run geckoview tests on all but non-Gradle Android builds.

https://reviewboard.mozilla.org/r/193620/#review199108

::: commit-message-f1083:6
(Diff revision 1)
> +Bug 1405396 - Post: Run geckoview tests on all but non-Gradle Android builds. r=gbrown
> +
> +As we transition to Gradle (and only Gradle) Fennec builds, we're
> +turning the Gradle builds (Bg) into non-Gradle builds (Bng).  It's not
> +worth the effort to rename "gradle" to "non-gradle", so this is a
> +confusing looking patch is turning on GeckoView tests for everything

Review for typo/grammar -- there's an extra word or something here.
Attachment #8922541 - Flags: review?(gbrown) → review+
Comment on attachment 8922031 [details]
Bug 1405396 - Pre: Set file.encoding=utf-8 for GeckoView packaging.

https://reviewboard.mozilla.org/r/193024/#review199340
Attachment #8922031 - Flags: review?(max) → review+
Comment on attachment 8922032 [details]
Bug 1405396 - Build with Gradle by default; turn Bg (Gradle) into Bng (non-Gradle).

https://reviewboard.mozilla.org/r/193026/#review199342
Attachment #8922032 - Flags: review?(max) → review+
Comment on attachment 8922033 [details]
Bug 1405396 - Post: Make it easier to debug l10n builds.

https://reviewboard.mozilla.org/r/193028/#review199344
Attachment #8922033 - Flags: review?(max) → review+
Depends on: 1412984
Depends on: 1413006
Depends on: 1413332
Pushed by nalexander@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/4e98c4295025
Pre: Configure geckoview test for x86. r=gbrown
https://hg.mozilla.org/integration/autoland/rev/06f348255c40
Pre: Set file.encoding=utf-8 for GeckoView packaging. r=maliu
https://hg.mozilla.org/integration/autoland/rev/b4ea55eece0f
Build with Gradle by default; turn Bg (Gradle) into Bng (non-Gradle). r=maliu
https://hg.mozilla.org/integration/autoland/rev/ded315eea526
Post: Run geckoview tests on all but non-Gradle Android builds. r=gbrown
https://hg.mozilla.org/integration/autoland/rev/36599de1793a
Post: Make it easier to debug l10n builds. r=maliu
Blocks: 1414021
Blocks: 1414048
Blocks: 1414054
Blocks: 1414058
Attachment #8918979 - Attachment is obsolete: true
Attachment #8918979 - Flags: review?(nalexander)
Attachment #8918980 - Attachment is obsolete: true
Attachment #8918980 - Flags: review?(nalexander)
Product: Firefox for Android → Firefox Build System
Target Milestone: Firefox 58 → mozilla58
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: