Closed Bug 1396098 Opened 7 years ago Closed 7 years ago

move android builds to debian-based docker image

Categories

(Firefox Build System :: General, enhancement)

enhancement
Not set
normal

Tracking

(firefox57 fixed)

RESOLVED FIXED
mozilla57
Tracking Status
firefox57 --- fixed

People

(Reporter: froydnj, Assigned: froydnj)

References

Details

Attachments

(4 files, 2 obsolete files)

Newer versions of tools from Android's development kits require a newer glibc than are currently installed on our CentOS builders.  nalexander, in bug 1370119, attempted to move all our Android build jobs to Debian-based docker images, and was partially successful.  I've picked up his work and made it work with recent taskcluster and mercurial changes.
nalexander did all of the hard work to make this image use Debian instead of
CentOS; I have upgraded it to:

1) Work with our current taskcluster conventions;
2) Download mercurial via pip (Debian packages are not new enough to
   support sparse checkouts);
3) Install everything we need for mercurial; and
4) Work with the Android SDK (e.g. 32-bit packages).

All the mistakes are mine: I'm sure there are a million things wrong here and
that's why I'm asking for feedback rather than review.  There may be several
rounds of getting things right here.

The good news is that this entire patch series seems to work, e.g.:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=5ec6dfc1924d89732fcdd058ceb5f0a5b4c62973

Dustin for our taskcluster bits, glandium for Debian bits, nalexander to
comment about how horribly I mangled his work.
Attachment #8903842 - Flags: feedback?(nalexander)
Attachment #8903842 - Flags: feedback?(mh+mozilla)
Attachment #8903842 - Flags: feedback?(dustin)
This seems like the obvious thing to want to happen.
Attachment #8903843 - Flags: review?(mh+mozilla)
Now that we have a Docker image with newer library versions on it, we
can move our builds over.  The new images differ from the old
CentOS-based images in two important ways, though:

1) The system compilers in the new image are new enough to be used as
   host compilers; additionally, our CentOS-built GCC compilers will not
   work.  We need to change the Android mozconfigs to reflect that.  We
   also need to change the Android tasks to not depend on the GCC
   toolchain builds.

2) In a similar fashion, we can use the system JDK; we no longer need to
   use the JDK from the Android NDK, which we had packaged up via the
   Android dependencies task.  Accordingly, we remove the JDK package
   from the tooltool manifests.

I think I caught everything here, but you may have comments.  Feedback to
nalexander as well, as there are probably some Android things I don't know
about.

I'm also not sure if the android-old-id builds are correct, since they don't
run on try.  I can't imagine they'd be *that* much different than the standard
Android builds, though.
Attachment #8903844 - Flags: review?(mh+mozilla)
Attachment #8903844 - Flags: feedback?(nalexander)
Comment on attachment 8903842 [details] [diff] [review]
part 1 -  base `android-gradle-build` on Debian instead of CentOS

Review of attachment 8903842 [details] [diff] [review]:
-----------------------------------------------------------------

::: taskcluster/docker/android-gradle-build/Dockerfile
@@ +15,4 @@
>  
>  VOLUME /builds/worker/checkouts
>  VOLUME /builds/worker/workspace
>  VOLUME /builds/worker/tooltool-cache

You removed the "TODO remove VOLUME below", yet they're still here.  I'm not sure exactly what that comment means /cc gps

@@ +15,5 @@
>  
>  VOLUME /builds/worker/checkouts
>  VOLUME /builds/worker/workspace
>  VOLUME /builds/worker/tooltool-cache
> +VOLUME /builds/worker/.tc-vcs

If using robustcheckout (yay!) this isn't needed anymore..

@@ +84,5 @@
> +
> +COPY requirements.txt /tmp/
> +# python-pip installs a lot of dependencies increasing the size of an image
> +# drastically.
> +RUN easy_install pip

This should have a hash check of some sort.  Also, you can save a few docker layers here with `&&`.

::: taskcluster/docker/recipes/debian-build-system-setup.sh
@@ +8,5 @@
> +. /setup/common.sh
> +. /setup/install-mercurial.sh
> +# . /setup/install-make.sh
> +# . /setup/install-cmake.sh
> +# . /setup/install-debug-symbols.sh

If these aren't necessary, let's remove them..
Attachment #8903842 - Flags: feedback?(gps)
Attachment #8903842 - Flags: feedback?(dustin)
Attachment #8903842 - Flags: feedback+
Comment on attachment 8903842 [details] [diff] [review]
part 1 -  base `android-gradle-build` on Debian instead of CentOS

Review of attachment 8903842 [details] [diff] [review]:
-----------------------------------------------------------------

This seems reasonable. Depending on the timetable for this, I have a new "framework" in bug 1396154 for producing Docker images without Dockerfile madness. It will likely be a bit of work to get this working in that framework. So it is best to hold off. But you may want to take a look so you know what's coming down the pipe.

I'm also a fan of standardizing on Debian for all our images. And thank you for using the Debian snapshots repo so image builds are more deterministic.

::: taskcluster/docker/android-gradle-build/hgrc
@@ +1,2 @@
> +[extensions]
> +share =

Huh? If something is running `hg share`, that's a bug. All VCS checkout operations should be handled by `hg robustcheckout`.

But this isn't a blocking issue for this bug.

::: taskcluster/docker/android-gradle-build/requirements.txt
@@ +1,1 @@
> +virtualenv==15.1.0 --hash=sha256:39d88b533b422825d644087a21e78c45cf5af0ef7a99a1fc9fbb7b481e5c85b0

Not sure where this is used. But we have virtualenv vendored in-repo under third_party/python. If we could install virtualenv from our in-repo copy to ensure everyone is using a consistent version, that would be swell.
Attachment #8903842 - Flags: feedback?(gps) → feedback+
Comment on attachment 8903842 [details] [diff] [review]
part 1 -  base `android-gradle-build` on Debian instead of CentOS

Review of attachment 8903842 [details] [diff] [review]:
-----------------------------------------------------------------

Here's a tooltool manifest to fetch all Python packaging packages.

The proper way to install them is:

 (cd setuptools-36.2.7 && python2.7 setup.py install)
 (cd pip-9.0.1 && python2.7 setup.py install)
 pip2.7 install virtualenv-15.1.0-py2.py3-none-any.whl
 pip2.7 install wheel-0.29.0-py2.py3-none-any.whl

[
  {
    "algorithm": "sha512",
    "digest": "9563879c1ed3be1153b6e06acef2af8c3598b3a5708b738bd3d90376f6e4c68c5a2dabbdda0ba3fb9dfe22d47466ec3f01471925
    "filename": "setuptools-36.2.7.zip",
    "size": 716382,
    "unpack": true
  },
  {
    "algorithm": "sha512",
    "digest": "ee59efb4b009ff6543b7afdea99b9cbbee1981ecc03af586acda76674024d3b66dab23049e68f3da9448734984619fc1eaba6e96
    "size": 1197370,
    "filename": "pip-9.0.1.tar.gz",
    "unpack": true
  },
  {
    "algorithm": "sha512",
    "digest": "37a43008cfbb15692122f530cea2065f4ed963a1323808fee5a2d53d62038ea510c1d71ac2e037fdab0de02d6b64ba30f401738c
    "size": 66878,
    "filename": "wheel-0.29.0-py2.py3-none-any.whl"
  },
  {
    "algorithm": "sha512",
    "digest": "9988af801d9ad15c3f9831489ee9b49b54388e8349be201e7f7db3f2f1e59d033d3117f12e2f1909d65f052c5f1eacd87a894c6f
    "size": 1820727,
    "filename": "virtualenv-15.1.0-py2.py3-none-any.whl"
  }
]
Comment on attachment 8903842 [details] [diff] [review]
part 1 -  base `android-gradle-build` on Debian instead of CentOS

Review of attachment 8903842 [details] [diff] [review]:
-----------------------------------------------------------------

Your changes look sensible to me, but I'll say more after I rebase my work on top of yours.

Thanks for running with this!

Would it be possible for gps's Mercurial-installing snippet to be a toolchain task?

::: taskcluster/docker/android-gradle-build/Dockerfile
@@ +3,4 @@
>  
> +RUN mkdir -p /builds
> +
> +### add worker user and setup its workspace

nit: full sentence comments, here and below ("# Declare default working folder" is missing the trailing period).

@@ +3,5 @@
>  
> +RUN mkdir -p /builds
> +
> +### add worker user and setup its workspace
> +RUN useradd -d /builds/worker -s /bin/bash -m worker

I thought I had this in the series, but perhaps it came later: we should explicitly set `--uid 1000`, rather than accepting OS-dependent defaults.  Sadly, I can't guarantee this _also_ sets the `--gid 1000`, even though it does in practice.

@@ +15,5 @@
>  
>  VOLUME /builds/worker/checkouts
>  VOLUME /builds/worker/workspace
>  VOLUME /builds/worker/tooltool-cache
> +VOLUME /builds/worker/.tc-vcs

We are using robustcheckout (via `run-task`) for all Android tasks, so yes, this can go.

@@ +44,5 @@
> +
> +# Set apt sources list to a snapshot
> +COPY sources.list /etc/apt/
> +
> +# We need i386 packages for the Android SDK.

Perhaps link to https://bugzilla.mozilla.org/show_bug.cgi?id=1370119, which will get rid of this nonsense.

@@ +84,5 @@
> +
> +COPY requirements.txt /tmp/
> +# python-pip installs a lot of dependencies increasing the size of an image
> +# drastically.
> +RUN easy_install pip

We do this in a bunch of places -- see http://searchfox.org/mozilla-central/source/taskcluster/docker/funsize-balrog-submitter/Dockerfile#17, for example.

Does this need to be addressed more broadly?  I think gps's snippet will address things for this ticket.

@@ +115,5 @@
>  RUN bash /setup/system-setup.sh
>  
> +# # Add wrapper scripts for xvfb allowing tasks to easily retry starting up xvfb
> +# # %include taskcluster/docker/recipes/xvfb.sh
> +#COPY topsrcdir/taskcluster/docker/recipes/xvfb.sh /builds/worker/scripts/xvfb.sh

We should always copy this, even though these Android builds won't use it.  I don't know how xvfb is used for interactive loaners (if at all).

::: taskcluster/docker/android-gradle-build/hgrc
@@ +1,2 @@
> +[extensions]
> +share =

This was taken from https://dxr.mozilla.org/mozilla-central/source/taskcluster/docker/centos6-build/hgrc although it looks superceded by http://searchfox.org/mozilla-central/source/taskcluster/docker/recipes/install-mercurial.sh#100.

gps, can it be removed?  We're using robustcheckout for all the Android tasks now.

::: taskcluster/docker/android-gradle-build/requirements.txt
@@ +1,1 @@
> +virtualenv==15.1.0 --hash=sha256:39d88b533b422825d644087a21e78c45cf5af0ef7a99a1fc9fbb7b481e5c85b0

At one point I found I needed this in order to get |mach build| to successfully install `virtualenv`.  It looks like this approach is superceded by your snippet in the bug.

::: taskcluster/docker/android-gradle-build/sources.list
@@ +1,1 @@
> +deb [check-valid-until=no] http://snapshot.debian.org/archive/debian/20170830T000511Z/ stretch main

glandium, others: it surprises me that snapshot is available only over http, but that seems to be the case.  Can you confirm that the packages are signed/checksummed/otherwise secure enough for use in automation?

::: taskcluster/docker/recipes/debian-build-system-setup.sh
@@ +8,5 @@
> +. /setup/common.sh
> +. /setup/install-mercurial.sh
> +# . /setup/install-make.sh
> +# . /setup/install-cmake.sh
> +# . /setup/install-debug-symbols.sh

I doubt they are necessary, since Debian has fresh packages.

::: taskcluster/scripts/builder/build-android-dependencies/after.sh
@@ +18,5 @@
>  # We can't redistribute the Android SDK publicly.
>  mkdir -p /builds/worker/private/android-sdk
>  mv android-sdk-linux.tar.xz /builds/worker/private/android-sdk
>  
> +cp -R ${NEXUS_WORK}/storage/jcenter jcenter

This `NEXUS_WORK` stuff might be a good Pre: patch.  I'll be rebasing on top of your series tomorrow; perhaps I can do this then.
Attachment #8903842 - Flags: feedback?(nalexander) → feedback+
Comment on attachment 8903843 [details] [diff] [review]
part 2 - make android-* fire off of mozconfig changes

Review of attachment 8903843 [details] [diff] [review]:
-----------------------------------------------------------------

Yeah.
Attachment #8903843 - Flags: review?(mh+mozilla) → review+
Comment on attachment 8903844 [details] [diff] [review]
part 3 - move android builds to new docker image

Review of attachment 8903844 [details] [diff] [review]:
-----------------------------------------------------------------

lgtm.

::: taskcluster/ci/build/android-stuff.yml
@@ +57,5 @@
>          tier: 2
>          symbol: tc(test)
>      worker-type: aws-provisioner-v1/gecko-{level}-b-android
>      worker:
> +        docker-image: {in-tree: android-gradle-build}

I think we should call this `android-build`; it's a relic of history that it refers to Gradle.  (And Gradle will soon be the default/required toolchain for Android builds.)
Attachment #8903844 - Flags: feedback?(nalexander) → feedback+
Comment on attachment 8903842 [details] [diff] [review]
part 1 -  base `android-gradle-build` on Debian instead of CentOS

Review of attachment 8903842 [details] [diff] [review]:
-----------------------------------------------------------------

::: taskcluster/docker/android-gradle-build/Dockerfile
@@ +3,2 @@
>  
> +RUN mkdir -p /builds

Note that each RUN creates a docker image layer, and for something as small as that, or worse, for things that only change permissions, that's rather wasteful.

You'd be better off grouping things. At least grouping all mkdirs together and all chowns together.

@@ +60,5 @@
> +      git \
> +      gnupg \
> +      make \
> +      mercurial \
> +      mercurial-common \

maybe don't install mercurial?

@@ +90,3 @@
>  
>  # %include python/mozbuild/mozbuild/action/tooltool.py
>  COPY topsrcdir/python/mozbuild/mozbuild/action/tooltool.py /build/tooltool.py

That's probably not necessary anymore.

@@ +142,5 @@
>  # Install Sonatype Nexus.  Cribbed directly from
>  # https://github.com/sonatype/docker-nexus/blob/fffd2c61b2368292040910c055cf690c8e76a272/oss/Dockerfile.
>  
> +# Observe missing --no-install-recommends, in order to install glib2.0/gconf/etc.
> +RUN apt-get update -q && \

you shouldn't need an update here. You've done one already.

::: taskcluster/docker/android-gradle-build/sources.list
@@ +1,2 @@
> +deb [check-valid-until=no] http://snapshot.debian.org/archive/debian/20170830T000511Z/ stretch main
> +deb-src [check-valid-until=no] http://snapshot.debian.org/archive/debian/20170830T000511Z/ stretch main

You don't need deb-src entries.

@@ +1,4 @@
> +deb [check-valid-until=no] http://snapshot.debian.org/archive/debian/20170830T000511Z/ stretch main
> +deb-src [check-valid-until=no] http://snapshot.debian.org/archive/debian/20170830T000511Z/ stretch main
> +
> +deb [check-valid-until=no] http://snapshot.debian.org/archive/debian-security/20170830T000511Z/ stretch/updates main

You also need stretch/updates from non debian-security:
deb http://snapshot.debian.org/archive/debian/$date stretch-updates main

::: taskcluster/docker/recipes/install-mercurial.sh
@@ +26,5 @@
>      fi
>  
>      CERT_PATH=/etc/ssl/certs/ca-certificates.crt
>  
> +elif [ -f /etc/os-release ]; then

instead of checking /etc/lsb-release, /etc/centos-release and /etc/os-release, this script should just call lsb_release. (this may or may not require installing the lsb-release package, I don't remember if it's there by default)

::: taskcluster/scripts/builder/build-linux.sh
@@ +60,3 @@
>  # run XVfb in the background, if necessary
>  if $NEED_XVFB; then
> +    . /home/worker/scripts/xvfb.sh

This could go in a separate patch.
Attachment #8903842 - Flags: feedback?(mh+mozilla) → feedback+
(In reply to Nick Alexander :nalexander [parental leave until September 15th] from comment #7)
> glandium, others: it surprises me that snapshot is available only over http,
> but that seems to be the case.  Can you confirm that the packages are
> signed/checksummed/otherwise secure enough for use in automation?

The first thing that apt downloads is a gpg signed Release file[1][2], which contains md5 and sha256 digests for e.g. Packages files, which contain digests for deb files.

1. http://snapshot.debian.org/archive/debian/20170830T000511Z/dists/stretch/Release
2. http://snapshot.debian.org/archive/debian/20170830T000511Z/dists/stretch/Release.gpg
Comment on attachment 8903844 [details] [diff] [review]
part 3 - move android builds to new docker image

Review of attachment 8903844 [details] [diff] [review]:
-----------------------------------------------------------------

> The system compilers in the new image are new enough to be used as host compilers;

On the long term, this is not going to pan out, because that's going to be a fixed version of the host compiler.

> additionally, our CentOS-built GCC compilers will not work. 

Because of multiarch? I thought I had used our gcc compiler on my system... maybe I misremember.
Attachment #8903844 - Flags: review?(mh+mozilla)
(In reply to Mike Hommey [:glandium] from comment #12)
> > The system compilers in the new image are new enough to be used as host compilers;
> 
> On the long term, this is not going to pan out, because that's going to be a
> fixed version of the host compiler.

This is true, but I can't imagine that our host compiler requirements are *particularly* onerous.  I guess some of our host tools (elfhack) depend on mfbt/, and so are beholden to whatever clever things we use there...but I would argue that we can cross that bridge when we come to it.

> > additionally, our CentOS-built GCC compilers will not work. 
> 
> Because of multiarch? I thought I had used our gcc compiler on my system...
> maybe I misremember.

I think this is multiarch-related; the errors are variations on:

INFO -  /bin/ccache /builds/worker/workspace/build/src/gcc/bin/gcc -std=gnu99 -o host_pathsub.o -c  -DXP_UNIX -MD -MP -MF .deps/host_pathsub.o.pp -O3 -DNDEBUG=1 -DTRIMMED=1 -D_UNICODE -DUNICODE  -I/builds/worker/workspace/build/src/config -I/builds/worker/workspace/build/src/obj-firefox/config  -I/builds/worker/workspace/build/src/obj-firefox/dist/include  -I/builds/worker/workspace/build/src/obj-firefox/dist/include/nspr /builds/worker/workspace/build/src/config/pathsub.c
INFO -  In file included from /usr/include/assert.h:35:0,
INFO -                   from /builds/worker/workspace/build/src/config/pathsub.c:10:
INFO -  /usr/include/features.h:364:25: fatal error: sys/cdefs.h: No such file or directory
INFO -   #  include <sys/cdefs.h>
INFO -                           ^
INFO -  compilation terminated.
(In reply to Mike Hommey [:glandium] from comment #10)
> ::: taskcluster/docker/recipes/install-mercurial.sh
> @@ +26,5 @@
> >      fi
> >  
> >      CERT_PATH=/etc/ssl/certs/ca-certificates.crt
> >  
> > +elif [ -f /etc/os-release ]; then
> 
> instead of checking /etc/lsb-release, /etc/centos-release and
> /etc/os-release, this script should just call lsb_release. (this may or may
> not require installing the lsb-release package, I don't remember if it's
> there by default)

I would rather fix this as a followup; the desktop-build images don't seem to have lsb_release on them.

> ::: taskcluster/scripts/builder/build-linux.sh
> @@ +60,3 @@
> >  # run XVfb in the background, if necessary
> >  if $NEED_XVFB; then
> > +    . /home/worker/scripts/xvfb.sh
> 
> This could go in a separate patch.

OK.  I'm not 100% sure what this is for.
> > ::: taskcluster/scripts/builder/build-linux.sh
> > @@ +60,3 @@
> > >  # run XVfb in the background, if necessary
> > >  if $NEED_XVFB; then
> > > +    . /home/worker/scripts/xvfb.sh
> > 
> > This could go in a separate patch.
> 
> OK.  I'm not 100% sure what this is for.

I think I wanted to be safe.  My initial image wasn't going to support xvfb -- indeed, I don't know why desktop-build images support xvfb -- so I wanted to avoid having xvfb.sh and friends in the image at all.

I think this is safe; we should just pull it into a Pre:, perhaps more generally.
(In reply to Nick Alexander :nalexander [parental leave until September 15th] from comment #15)
> I think I wanted to be safe.  My initial image wasn't going to support xvfb
> -- indeed, I don't know why desktop-build images support xvfb -- so I wanted
> to avoid having xvfb.sh and friends in the image at all.

We use xvfb when we run Firefox during the profiling phase of a PGO build. It's possible there's some other random stuff that depends on it, but that's the primary use case. That shouldn't be necessary for Android builds.
No sense doing unnecessary work.
Attachment #8907794 - Flags: review?(mh+mozilla)
I think this addresses all the feedback given in previous reviews.

The only thing that might be controversial is using the system
Python/pip/etc. rather than...whatever we were using otherwise (e.g. gps's
suggestion in comment 6).  Open to opinions on how that's going to work out
long term.
Attachment #8907795 - Flags: review?(mh+mozilla)
Attachment #8907795 - Flags: review?(dustin)
Attachment #8903842 - Attachment is obsolete: true
The only substantial change here is using `android-build` as the system image.
Comment 13 explains why we're using the host compiler here.
Attachment #8907796 - Flags: review?(mh+mozilla)
Attachment #8903844 - Attachment is obsolete: true
Attachment #8907794 - Flags: review?(mh+mozilla) → review+
Comment on attachment 8907795 [details] [diff] [review]
part 1 - base `android-build` on Debian instead of CentOS

Review of attachment 8907795 [details] [diff] [review]:
-----------------------------------------------------------------

::: taskcluster/docker/android-build/Dockerfile
@@ +83,5 @@
> +      libncurses5:i386 \
> +    && \
> +    apt-get clean
> +
> +# %include python/mozbuild/mozbuild/action/tooltool.py

My comments from previous review about mercurial and tooltool still apply. Also, the patch should indicate the file move.
Attachment #8907795 - Flags: review?(mh+mozilla) → review+
Comment on attachment 8907796 [details] [diff] [review]
part 3 - move android builds to new docker image

Review of attachment 8907796 [details] [diff] [review]:
-----------------------------------------------------------------

::: mobile/android/config/mozconfigs/common
@@ +84,5 @@
>  export MOZ_PACKAGE_JSSHELL=1
>  
>  # Use ccache
>  
> +HOST_CC="/usr/bin/gcc"

Actually, I think the build system can figure this out on its own if you remove it.

::: taskcluster/ci/build/android-stuff.yml
@@ -9,5 @@
>          tier: 2
>          symbol: tc(Deps)
>      worker-type: aws-provisioner-v1/gecko-{level}-b-android
>      worker:
> -        docker-image: {in-tree: android-gradle-build}

Technically, this means this is busted between part 1 and part 3.
Attachment #8907796 - Flags: review?(mh+mozilla) → review+
Comment on attachment 8907795 [details] [diff] [review]
part 1 - base `android-build` on Debian instead of CentOS

Review of attachment 8907795 [details] [diff] [review]:
-----------------------------------------------------------------

::: taskcluster/docker/android-build/Dockerfile
@@ +87,5 @@
> +# %include python/mozbuild/mozbuild/action/tooltool.py
> +COPY topsrcdir/python/mozbuild/mozbuild/action/tooltool.py /build/tooltool.py
> +COPY topsrcdir/python/mozbuild/mozbuild/action/tooltool.py /builds/tooltool.py
> +COPY topsrcdir/python/mozbuild/mozbuild/action/tooltool.py /setup/tooltool.py
> +COPY topsrcdir/python/mozbuild/mozbuild/action/tooltool.py /tmp/tooltool.py

lol, we really stink at deciding where to put files
Attachment #8907795 - Flags: review?(dustin) → review+
(In reply to Dustin J. Mitchell [:dustin] from comment #22)
> ::: taskcluster/docker/android-build/Dockerfile
> @@ +87,5 @@
> > +# %include python/mozbuild/mozbuild/action/tooltool.py
> > +COPY topsrcdir/python/mozbuild/mozbuild/action/tooltool.py /build/tooltool.py
> > +COPY topsrcdir/python/mozbuild/mozbuild/action/tooltool.py /builds/tooltool.py
> > +COPY topsrcdir/python/mozbuild/mozbuild/action/tooltool.py /setup/tooltool.py
> > +COPY topsrcdir/python/mozbuild/mozbuild/action/tooltool.py /tmp/tooltool.py
> 
> lol, we really stink at deciding where to put files

:)  Experimentation suggests that eliminating all but the /setup/tooltool.py location builds things just fine (which I think might have been what glandium is getting at), so I'm going to make that change.

After updating my tree for try pushes, I discovered a few more places to do s/android-gradle-build/android-build/ and more builds that need to change, the l10n repacks (which are kind of cool that they're running from taskcluster now):

https://treeherder.mozilla.org/#/jobs?repo=try&revision=aa7fe6577c409d7addf847a998f0e0e4a076d36c

Looking at the l10n taskcluster descriptions, it looks like there's no convenient place to drop worker definitions (docker-image, etc.) in the description, but the transforms do seem to set all that up:

https://dxr.mozilla.org/mozilla-central/source/taskcluster/taskgraph/transforms/l10n.py#388-403

Dustin, is the right thing to do here to shove Android builds in there as well, or should a by-platform: keying strategy be adopted for defining the worker definitions in the kind.yml file?
Flags: needinfo?(dustin)
Ah, I'm not sure if that actually matters.  Callek says that l10n jobs on release are still using BB, so we have to make sure that the Android configurations accommodate being run on *both* kinds of machines for the time being.  I think we can do this by continuing to set JAVA_HOME in the mozconfig (and downloading Java from tooltool) so the l10n jobs find everything they expect.

This bug could try moving the l10n-android jobs over, but I'm not sure that really makes much difference until l10n jobs everywhere are in TC.  The l10n jobs feel more like a followup.
Thanks!  I find having a file in more than one place inevitably means it gets found and used, so one tooltool is good :)

I'll defer to Callek regarding the l10n question.
This bit (core modifications, plus some followup for thinkos):

https://hg.mozilla.org/try/rev/a4b8bf6b0a5e95452f8451563f527d712fb9f8e3
https://hg.mozilla.org/try/rev/9ed680043df1d9d9fa82635423329196a663918a

along with making sure that we still download the Java binaries from tooltool:

https://hg.mozilla.org/try/rev/baf78bb9da69d5f5269c750881e9b13257e9182d

is enough to make the l10-android-api-16 jobs green on try:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=9ed680043df1d9d9fa82635423329196a663918a

We also appear to be building the android nightlies + l10 repacks, as well. \o/

These seem to me relatively uncontroversial; they are basically re-introducing things that the already-reviewed patches take out, plus a small adjustment for our new Debian-based images, which the already-reviewed patches effectively did.  (They also have the bonus of addressing glandium's claim that HOST_{CC,CXX} should not be needed on our new Debian images.)  So I claim that these changes do not need another round of review.

I will incorporate them into the appropriate patchsets, and I'll be splitting out the android-gradle-build => android-build renaming into a separate patch to make that clearer.
Flags: needinfo?(dustin)
froydnj: I see a /home/worker that should be a /builds at https://hg.mozilla.org/try/rev/fdbe2ea925993c809d9ecf34e5676415ade8fcc5#l1.38.

As noted on IRC, we should remove the first bootstrap at https://hg.mozilla.org/try/file/8ed4a783dfa1a1c6d7d4efe3847349de863210ea/taskcluster/scripts/builder/build-android-dependencies/before.sh#l14.

Also, it looked like you didn't remove taskcluster/scripts/builder/build-android-dependencies/repackage-jdk-centos.sh.  Was that intentional -- keeping it until the last buildbot jobs die?
Callek: just capturing a question from IRC.  I see in https://treeherder.mozilla.org/#/jobs?repo=try&revision=9ed680043df1d9d9fa82635423329196a663918a N1 through N6, each building only two locales.  That's only 12 locales, which seems low.  If this is a real issue, can you file as appropriate?  I have no idea where this report goes.  Thanks!
Flags: needinfo?(bugspam.Callek)
(In reply to Nick Alexander :nalexander [parental leave until September 15th] from comment #27)
> froydnj: I see a /home/worker that should be a /builds at
> https://hg.mozilla.org/try/rev/fdbe2ea925993c809d9ecf34e5676415ade8fcc5#l1.
> 38.

This is a good point, will fix.

> As noted on IRC, we should remove the first bootstrap at
> https://hg.mozilla.org/try/file/8ed4a783dfa1a1c6d7d4efe3847349de863210ea/
> taskcluster/scripts/builder/build-android-dependencies/before.sh#l14.

Yup, done.

> Also, it looked like you didn't remove
> taskcluster/scripts/builder/build-android-dependencies/repackage-jdk-centos.
> sh.  Was that intentional -- keeping it until the last buildbot jobs die?

It was not intentional.  But after looking at comment 26 and the changes required there, I think it's helpful to have it there so the mozconfig has something to refer to about how these packages are created.  (The script is not creating the tarball, but ideally that will be obvious enough how a tarball would be created.)  Sound OK?  If it turns out that people would like to remove all references to old things, even though they're sort of still being used, we can do that in a followup.
> > Also, it looked like you didn't remove
> > taskcluster/scripts/builder/build-android-dependencies/repackage-jdk-centos.
> > sh.  Was that intentional -- keeping it until the last buildbot jobs die?
> 
> It was not intentional.  But after looking at comment 26 and the changes
> required there, I think it's helpful to have it there so the mozconfig has
> something to refer to about how these packages are created.  (The script is
> not creating the tarball, but ideally that will be obvious enough how a
> tarball would be created.)  Sound OK?  If it turns out that people would
> like to remove all references to old things, even though they're sort of
> still being used, we can do that in a followup.

Yup, I'm fine with this being follow-up.
I had to back out the top 3 commits because the uid/gid of 1000:1000 was causing problems.

Docker caches are persisted between tasks. Recent cache validation work that I performed has run-task dropping a file into a cache on first use and verifying that "requirements" of the cache are sufficient on subsequent uses. This is documented in the following code:

https://hg.mozilla.org/mozilla-central/file/34529d61b6a8/taskcluster/taskgraph/transforms/task.py#l722
https://hg.mozilla.org/mozilla-central/file/34529d61b6a8/taskcluster/docker/recipes/run-task#l283

The uid and gid are written into this requirements file. This helps prevent mass (read: expensive) chowning on task run and inadvertent file permissions problems. As long as all images using the same cache use the same uid/gid, we're good.

Unfortunately, centos creates the worker:worker as 500:500 and debian as 1000:1000. So when we introduced this debian-based image with its 1000:1000 uid/gid into the cache pool, we started seeing a mix of uid/gid and this inadvertently launched a cache poisoning attack. run-task correctly failed fast upon detecting the mismatch.

You didn't catch this on Try because run-task will mass chown caches on Try... because reasons. Caching is hard.

Anyway, the easy fix is to change worker:worker to 500:500 on this Debian image. Hopefully that doesn't muck with anything in Debian's internal by using a <1000 uid/gid. If it does, then the fix will be to change worker:worker to 1000:1000 on the centos images. That's likely a bit of scope bloat, as we currently use Docker image inheritance and we haven't modified the base image in... over a year? I've attempted to squash the images before and it is a bit of a rabbit hole. It is probably easiest to change debian to 500:500. Alternatively, we can seed a different value into the cache hash for this new image so it doesn't share caches with centos images. That will yield lower cache hit rate, but is a relatively quick solution. I don't think I coded support for that into Taskgraph, so it would need to be implemented. Ping me if you need help.
Pushed by nfroyd@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/05993bfa51b0
part 0a - only source xvfb.sh if we need xvfb; r=glandium
https://hg.mozilla.org/integration/mozilla-inbound/rev/a8620614d5bf
part 0b - rename android-gradle-build to android-build; r=glandium,dustin
https://hg.mozilla.org/integration/mozilla-inbound/rev/c798f482b2c5
part 1 - base `android-build` on Debian instead of CentOS; r=glandium,dustin
https://hg.mozilla.org/integration/mozilla-inbound/rev/ce228388aaf6
part 2 - make android-* fire off of mozconfig changes; r=nalexander
https://hg.mozilla.org/integration/mozilla-inbound/rev/ef813898b5d3
part 3 - move android builds to new docker image; r=glandium
When the full five patch stack was landed, crash symbols (at least for android) were broken. With parts 1-3 backed out, the crash symbols were fixed.
Backout by kwierso@gmail.com:
https://hg.mozilla.org/mozilla-central/rev/ab28a93856a4
Backed out changesets ef813898b5d3, ce228388aaf6, and c798f482b2c5 for violating TC cache constraints
Status: RESOLVED → REOPENED
Keywords: leave-open
Resolution: FIXED → ---
I have no idea why crash stacks would be broken based on the host system...Ted, do you have flashes of insight there?
Flags: needinfo?(ted)
I think your Ubuntu image is missing file(1).

This seems really dumb, but looking at the build log from your inbound push:
https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&revision=ef813898b5d34ef03428b1cee37a329f425db6e7&filter-resultStatus=testfailed&filter-resultStatus=busted&filter-resultStatus=exception&filter-resultStatus=success&filter-resultStatus=retry&filter-resultStatus=usercancel&filter-resultStatus=coalesced&filter-searchStr=android&selectedJob=131804528

https://public-artifacts.taskcluster.net/brR5qpd8SW261BYDdi2gDA/0/public/logs/live_backing.log

I see:
[task 2017-09-18T20:25:03.539Z] 20:25:03     INFO -  /builds/worker/workspace/build/src/obj-firefox/_virtualenv/bin/python -m mozbuild.action.dumpsymbols /builds/worker/workspace/build/src/obj-firefox/toolkit/library/libxul.so /builds/worker/workspace/build/src/obj-firefox/toolkit/library/libxul.so_syms.track
[task 2017-09-18T20:25:03.539Z] 20:25:03     INFO -  sh: 1: file: not found

We do run `file` as part of symbolstore.py:
https://dxr.mozilla.org/mozilla-central/rev/ffe6cc09ccf38cca6f0e727837bbc6cb722d1e71/toolkit/crashreporter/tools/symbolstore.py#704
https://dxr.mozilla.org/mozilla-central/source/toolkit/crashreporter/tools/symbolstore.py#486

...but for some reason that code uses `os.popen` so it doesn't raise on failure, which is pretty dumb.
Flags: needinfo?(ted)
Depends on: 1401226
OK, installed the `file` package so symbol dumping can proceed, created the `worker` group with gid 500, and made sure the new `worker` user belongs to the `worker` group:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=ee9e779c2bb32be1d09674608f826ed886be758a
Pushed by nfroyd@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/9435244890be
part 1 - base `android-build` on Debian instead of CentOS; r=glandium,dustin
https://hg.mozilla.org/integration/mozilla-inbound/rev/960c59ace165
part 2 - make android-* fire off of mozconfig changes; r=nalexander
https://hg.mozilla.org/integration/mozilla-inbound/rev/3e7baf957e1d
part 3 - move android builds to new docker image; r=glandium
Appears to be sticking.  Thank you for the help, everyone!
Status: REOPENED → RESOLVED
Closed: 7 years ago7 years ago
Keywords: leave-open
Resolution: --- → FIXED
(In reply to Nick Alexander :nalexander from comment #28)
> Callek: just capturing a question from IRC.  I see in
> https://treeherder.mozilla.org/#/
> jobs?repo=try&revision=9ed680043df1d9d9fa82635423329196a663918a N1 through
> N6, each building only two locales.  That's only 12 locales, which seems
> low.  If this is a real issue, can you file as appropriate?  I have no idea
> where this report goes.  Thanks!

This looks like just an artifact for how I have it scheduled on try, real nightlies look fine.
Flags: needinfo?(bugspam.Callek)
Product: Core → Firefox Build System
Assignee: nobody → nfroyd
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: