Closed Bug 1651662 Opened 4 years ago Closed 4 years ago

Provide a way to build products using Glean "offline"

Categories

(Data Platform and Tools :: Glean: SDK, defect, P1)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: Dexter, Assigned: mdroettboom)

Details

(Whiteboard: [tor])

Attachments

(1 file)

Currently, Android products using Glean need to download Miniconda, then run the gradle plugin which will, in turn, check for updated versions of the glean_parser using pip.

We should make sure to make this possible and document how to achieve a full offline build when using the Glean SDK.

Thanks for filing this bug, :dexter, really appreciated. So, from my experience there are actually two problems to solve:

a) miniconda is downloaded during the build
b) once miniconda is executed it runs pip to fetch some python modules from the internet

I heard the newer Gradle plugin for Glean looks for cached miniconda in ~/.gradle/glean, which is great as it would solve a) if we provided it separately.

However, I am not sure about b). I guess that would boil down to running the whole miniconda setup and pip-fetching elsewhere in a pre-build step and then shove the result into ~/.gradle/glean?

(In reply to Georg Koppen from comment #1)

I heard the newer Gradle plugin for Glean looks for cached miniconda in ~/.gradle/glean, which is great as it would solve a) if we provided it separately.

Yes, correct, (a) should be solved by the caching.

However, I am not sure about b). I guess that would boil down to running the whole miniconda setup and pip-fetching elsewhere in a pre-build step and then shove the result into ~/.gradle/glean?

The gradle plugin basically calls pip install --upgrade glean_parser==GLEAN_PARSER_VERSION where GLEAN_PARSER_VERSION is defined within the plugin itself. We do that because glean_parser is actually generating a type-safe API for the telemetry metrics, and each version of the Glean SDK is only compatible with a specific version of the glean_parser.

The way this could be fixed, assuming (a) would work, would be to manually run "pip install glean_parser==GLEAN_PARSER_VERSION" in that environment, so that when the gradle plugin runs the parser is already installed, there.

So yes, your intuition is correct :)

:gkoppen -- how are the regular Maven packages that Gradle downloads handled? IIRC, under normal circumstances, they are also downloaded early in the build and cached in ~/.gradle. Maybe the only issue here is that we need to ensure that the downloading of miniconda and the pip install into it happen at that same stage so you could just capture the cache of everything...?

There is another wrinkle here: We actually check the glean_parser version on every build to make sure it matches what we expect, and if not, pip upgrade it then as well. This is to work around a shortcoming of the JetBrains Python gradle plugin that it doesn't handle package upgrades in the same way that standard Maven packages would be. We can probably find a way to turn off that check. Is there an environment variable or other piece of information we could use to detect that we're running inside of an offline build?

(In reply to Michael Droettboom [:mdroettboom] from comment #3)

:gkoppen -- how are the regular Maven packages that Gradle downloads handled? IIRC, under normal circumstances, they are also downloaded early in the build and cached in ~/.gradle. Maybe the only issue here is that we need to ensure that the downloading of miniconda and the pip install into it happen at that same stage so you could just capture the cache of everything...?

What we do right now for Maven packages is the following:

  1. We run a build without --offline but with --debug, so we see in the build log which dependencies Gradle is actually requesting.
  2. Later on we extract those dependencies from that log and save them together with their SHA-256 sums in a .txt file
  3. That .txt file is consumed in a pre-build step with network access so that every builder is fetching the same dependencies before the build starts
  4. Then the fetched dependencies are put in the build container which lacks network access and we patch the build files to check mavenLocal() first.
  5. Gradle is run with --offline and -Dmaven.repo.local=/path/to/local/maven/repo

I am not sure yet where the miniconda part could/should fit into that model. Maybe we need an extra step. Or maybe we need a better model when dealing with offline Maven dependencies that would allow us as well to work better with the miniconda requirement. I am not sure how folks are "usually" working with offline builds and whether our current workflow makes sense. If there are room for improvements, in particular if that would help making progress with the miniconda part, please let us know. :)

Super helpful context, :gkoppen! I'll poke around and see what options are available in miniconda / the Python side of the house.

Assignee: nobody → mdroettboom
Whiteboard: [telemetry:glean-rs:m?] → [telemetry:glean-rs:m?][tor]

I can't speak to any of the Maven-specific approaches for offline builds (I just don't have enough experience there), but the miniconda environment used to run a Python script at build time feature is very much "tacked on" and we never once thought about offline builds, so that's where we are ;)

To follow the same pattern that you are using for the Maven packages, you could do the following:

  1. (as above), but extract the names of the Python dependencies from the pip log that looks like:
Installing packages via pip: [glean_parser==1.24.0]                                                                                                                                                                                           
Executing '/home/mdboom/.gradle/glean/bootstrap-py38_4.8.3/Miniconda3/bin/pip install --trusted-host pypi.python.org --no-cache-dir glean_parser==1.24.0'
Collecting glean_parser==1.24.0                                                                                                                                                                                                               
  Downloading glean_parser-1.24.0-py3-none-any.whl (51 kB)                                                                                                                                                                                    
Collecting Click>=7                                                                                                                                                                                                                           
  Downloading click-7.1.2-py2.py3-none-any.whl (82 kB)                                                                                                                                                                                        
Collecting diskcache>=4                                                                                                                                                                                                                       
  Downloading diskcache-4.1.0-py2.py3-none-any.whl (44 kB)                                                                                                                                                                                    
Collecting Jinja2>=2.10.1                                                                                                                                                                                                                     
  Downloading Jinja2-2.11.2-py2.py3-none-any.whl (125 kB)                                                                                                                                                                                     
Collecting jsonschema>=3.0.2                                                                                                                                                                                                                  
  Downloading jsonschema-3.2.0-py2.py3-none-any.whl (56 kB)                                                                                                                                                                                   
Collecting appdirs>=1.4                                                                                                                                                                                                                       
  Downloading appdirs-1.4.4-py2.py3-none-any.whl (9.6 kB)                                                                                                                                                                                     
Collecting PyYAML>=3.13                                                                                                                                                                                                                       
  Downloading PyYAML-5.3.1.tar.gz (269 kB)                                                                                                                                                                                                    
Collecting yamllint>=1.18.0                                                                                                                                                                                                                   
  Downloading yamllint-1.23.0-py2.py3-none-any.whl (58 kB)                                                                                                                                                                                    
Collecting MarkupSafe>=0.23                                                                                                                                                                                                                   
  Downloading MarkupSafe-1.1.1-cp38-cp38-manylinux1_x86_64.whl (32 kB)
Collecting attrs>=17.4.0
  Downloading attrs-19.3.0-py2.py3-none-any.whl (39 kB)                                                                                                                                                                                       
Collecting pyrsistent>=0.14.0                                                                                                                                                                                                                 
  Downloading pyrsistent-0.16.0.tar.gz (108 kB)                                                                                                                                                                                               
Requirement already satisfied: six>=1.11.0 in /home/mdboom/.gradle/glean/bootstrap-py38_4.8.3/Miniconda3/lib/python3.8/site-packages (from jsonschema>=3.0.2->glean_parser==1.24.0) (1.14.0)
Requirement already satisfied: setuptools in /home/mdboom/.gradle/glean/bootstrap-py38_4.8.3/Miniconda3/lib/python3.8/site-packages (from jsonschema>=3.0.2->glean_parser==1.24.0) (46.4.0.post20200518)
Collecting pathspec>=0.5.3
  Downloading pathspec-0.8.0-py2.py3-none-any.whl (28 kB)
Building wheels for collected packages: PyYAML, pyrsistent
  Building wheel for PyYAML (setup.py): started
  Building wheel for PyYAML (setup.py): finished with status 'done'
  Created wheel for PyYAML: filename=PyYAML-5.3.1-cp38-cp38-linux_x86_64.whl size=44617 sha256=708f44d48e560a2cc39a86ff0f0508ec58b2dc31ca636302e2c4c75deae37af8
  Stored in directory: /tmp/pip-ephem-wheel-cache-j46ba_he/wheels/13/90/db/290ab3a34f2ef0b5a0f89235dc2d40fea83e77de84ed2dc05c
  Building wheel for pyrsistent (setup.py): started
  Building wheel for pyrsistent (setup.py): finished with status 'done'
  Created wheel for pyrsistent: filename=pyrsistent-0.16.0-cp38-cp38-linux_x86_64.whl size=119936 sha256=ce7bf00cde3c737c6ab5b86b895d845b391ae37040a63e06793fdd76304085f5
  Stored in directory: /tmp/pip-ephem-wheel-cache-j46ba_he/wheels/17/be/0f/727fb20889ada6aaaaba861f5f0eb21663533915429ad43f28
Successfully built PyYAML pyrsistent
Installing collected packages: Click, diskcache, MarkupSafe, Jinja2, attrs, pyrsistent, jsonschema, appdirs, PyYAML, pathspec, yamllint, glean-parser
Successfully installed Click-7.1.2 Jinja2-2.11.2 MarkupSafe-1.1.1 PyYAML-5.3.1 appdirs-1.4.4 attrs-19.3.0 diskcache-4.1.0 glean-parser-1.24.0 jsonschema-3.2.0 pathspec-0.8.0 pyrsistent-0.16.0 yamllint-1.23.0

Unfortunately, I can't find a way to get pip to display the full URLs to those files, but maybe they all exist in the same namespace at some URL?

  1. (as above)
  2. (as above)
  3. put the downloaded wheels somewhere and pip install them directly.

The missing link is the installing of miniconda itself. Technically speaking, you don't need it if you have a trusted system Python 3 around. But we would need to patch our build system to allow a flag to use the system python instead.

So I think from this we could provide an "offline" mode for the Python stuff that (1) wouldn't download miniconda, but would assume a Python 3 is available on the PATH and create a virtualenv for it and (2) would install pip packages from a local location. All of this would still be dependent on the right set of Python packages being pre-fetched using the same rough pattern used for Maven packages.

I'll experiment with this to make sure my hunch is correct.

:gkoppen -- do offline builds need to work on Windows, or is Linux good enough?

Flags: needinfo?(gk)

(In reply to Michael Droettboom [:mdroettboom] from comment #7)

:gkoppen -- do offline builds need to work on Windows, or is Linux good enough?

Linux is good enough, thanks!

Flags: needinfo?(gk)
Priority: P3 → P1
Whiteboard: [telemetry:glean-rs:m?][tor] → [tor]

Georg: The attached PR is ready for your comment.

Flags: needinfo?(gk)

(In reply to Michael Droettboom [:mdroettboom] from comment #10)

Georg: The attached PR is ready for your comment.

Thanks! I don't have a Github account so I'll comment here instead. This looks really nice, thanks! It's a bit hard for me to test the PR as-is, but I think this should work for us. My current plan is to get this merged and then play with it in our environment. I don't suspect any issues, as I said, but if I run into things I'd open follow-up bugs. Thanks again for your help, really appreciated.

Flags: needinfo?(gk)

Thanks for the note, :gkoppen. We'll merge the PR, which will have to make its way into android-components and then into Fenix before it will be available to you. Check the a-c CHANGELOG for updates...

Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED

(In reply to Michael Droettboom [:mdroettboom] from comment #12)

Thanks for the note, :gkoppen. We'll merge the PR, which will have to make its way into android-components and then into Fenix before it will be available to you. Check the a-c CHANGELOG for updates...

Thanks. So, if I am seeing this right then for this change being available to a stable application-services the following things need to happen:

  1. A new Glean release needs to get tagged (likely v33.0.0)
  2. android-components needs to pick the new Glean release up
  3. There needs to get a new android-components version with the new Glean release out
  4. Someone needs to update ext.android_components_version in application-services to pick up that new android-components release
  5. Someone needs to tag a new application-services release with the fix for 4)

Do I see that right? Are there any shortcuts I can take assuming I want to build application-services with Glean in offlilne mode now (as in today)?

Flags: needinfo?(mdroettboom)

It's steps 1-3 above and then:

  1. Fenix needs to upgrade android-components (if they aren't using SNAPSHOT releases right now)

application-services doesn't enter it -- it doesn't talk to Glean directly.

Flags: needinfo?(mdroettboom)

ni? myself to get a release out and plan for the upgrade

Flags: needinfo?(jrediger)

The release happened, it's landing in a-c

Flags: needinfo?(jrediger)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: