Closed Bug 1442306 Opened 2 years ago Closed 2 years ago

create json file during build process that can be consumed by buildhub

Categories

(Firefox Build System :: General, enhancement)

enhancement
Not set

Tracking

(firefox61 fixed)

RESOLVED FIXED
mozilla61
Tracking Status
firefox61 --- fixed

People

(Reporter: kmoir, Assigned: ted)

References

(Blocks 3 open bugs)

Details

Attachments

(5 files)

I had a meeting with bwong, catlee, leplatrem, peterbe today regarding providing a json file as part of the build process that can be consumed by buildhub.  The current schema is here 

https://github.com/mozilla-services/buildhub/blob/master/jobs/buildhub/initialization.yml#L71-L198

The request is that a json file be created as part of the build process for nightly, beta, release, rc, devedition builds.  

Other notes
* locale vs. locales in that bug.
* Generally we something rather than perfect all-things-possible in the schema.
* Releng will populate the json file with data from later tasks in the release graph
Summary: json file created during build process that can be consumed by buildhub → create json file during build process that can be consumed by buildhub
For reference, we already create several informational files during packaging, most relevant to this bug is probably the buildinfo file:
https://dxr.mozilla.org/mozilla-central/rev/b996cabc7ef54bbe050d647494bf00d668ec52e6/toolkit/mozapps/installer/packager.mk#122

which is a JSON file generated by informulate.py:
https://dxr.mozilla.org/mozilla-central/source/toolkit/mozapps/installer/informulate.py

For an example, the Firefox 52 Win32 en-US build file looks like:
https://ftp.mozilla.org/pub/firefox/candidates/52.0-candidates/build2/win32/en-US/firefox-52.0.json

...which is already pretty close to that schema.
If I were bikeshedding that schema, I would probably want to include the compiler versions, and the version of rustc as well. about:buildconfig has the former but not the latter.
Thank you Kim. That's a good start. 
And Thank you Ted for getting more concrete details into the discussion.

Our goal is to write a new .json file and do it slightly differently:

* We want to validate the writing with a JSON Schema. (i.e. bad build data should break things early)
* We want a different name of the .json file. Leaning towards calling it buildhub.json

It's not clear, correct me if I'm wrong Kim, but we assume my team will actually write the code. I.e. me. Is that right? Or is there someone more suited who can do this? For what it's worth, I can write a piece of Python code that makes, or raises an error, the .json file. I guess, similar to that informulate.py but slightly different. But TaskCluster stuff in mozilla-central is still something where experienced hands would be much appreciated. 

Please confirm your understanding about who should write the code.
Does it make sense to write a script that if it fails on writing the JSON it will "break the build"?


(Ted, unless you've already figured it out, the objective with all of this is to ultimately stop scraping like Socorro scrapes archive.mozilla.org. We are also interested in making this new .json file end up in our database (Buildhub) but that process is best thought about once this new .json file exists.)
I think myself or one of the other build peers could easily write the generation for this. Note that I wasn't suggesting that we should reuse the existing build info JSON, just that that'd be a good place to start because we're already doing 90% of what we need there. We also already have voluptuous in the tree (it's used in Taskcluster task graph generation), so we could define the schema with that and then it should be pretty simple to ensure we're generating JSON that fits the schema.

And yeah, this is great and this is something I've wanted for *years*!
Product: Core → Firefox Build System
Thank you Ted for all the information on this bug, it's very helpful!

:peterbe 

So I'm a bit confused because looking at the meeting minutes they state

What data do we have and how do we output it? 
* Kim’s team should own this part for building out meta data for nightly/beta builds
* Release metadata = releng

So my understanding from the meeting is that my team are going to write code to output the new json file once we have agreed on the schema.  The plan was to output this json as part of the build process for nightly builds.  Is this your understanding or am I misunderstanding?  Happy to discuss on irc/vidyo if this is easier
Flags: needinfo?(peterbe)
Kim, 
The root of the confusion was as follows; after our vidyo meeting I wrote my summary and the bugzilla comment above. Then I had my 1:1 with Benson who had just previously had a 1:1 with catlee, and through Chinese whispers I was told that the work probably will land on me. 
In the midst of that, Ted chimed in. In particular comment #4. 

Ideally I don't want to write the code because the domain is not my best. Python scripting, sure, yes, I can do that. 
My ultimate goal and current top priority is that there is a new .json file Buildhub's tooling can consume (instead of hotch-potch digging through uploaded files trying to shoehorn that into Buildhubs schema). 

Ted, Kim,
Would you mind if the 3 of us form a little team to iron out the who's going to do the work? And actually do it :) I'd be happy to "project manage" this.
Flags: needinfo?(peterbe)
Thanks for the explanation Peter, it all makes sense now.  

Sure, it would be good to figure out who is going to do the work as a team.

Ted, are you able to mentor Peter on this work or would you like to defer to another build peer?  If not, I can ask the rest of the team in the tomorrow's meeting and see who able to help.
Flags: needinfo?(ted)
I'd be happy to mentor Peter on this, but it might just be easier to get a build peer to do the work since most of the hassle will probably be dealing with build system nonsense.
Flags: needinfo?(ted)
Some meeting notes here: https://docs.google.com/document/d/1HiHmVazeGjnif2vQfsmx0O38hn_4wBn5XdNPKPudvz8/edit#

Ted, 
The JSON Schema is as per comment 0 here in this bug. 
Permalink https://github.com/mozilla-services/buildhub/blob/53518a873b45daf497347ac71770859ec6ce2b8a/jobs/buildhub/initialization.yml#L71-L198
The scraping is currently done here: https://github.com/mozilla-services/buildhub/blob/master/jobs/buildhub/inventory_to_records.py 
It's a more (async) modern version of https://github.com/mozilla-services/socorro/blob/master/socorro/cron/jobs/ftpscraper.py but unlike Socorro makes one entry per locale.

Mathieu, CC'ed on this bug, knows a lot more details about how the "scraping" currently work if you have more detailed questions about how the buildhub does/did things. 

Again, thank you Ted!
Peter and Kim and I chatted about this, a few salient points:
* I'm going to write the build system bits to generate the JSON file. It should not be much work for someone who knows the build system.
* We're going to call the file "buildhub.json" to make its purpose explicit.
* The JSON schema includes three keys that the build system can provide (build, target, source), and one that it cannot (download). The latter will need to be added by the release pipeline as Kim mentioned in comment 0, so we should file a bug for that work. I'd like to have a schema in-tree that we validate the data against, but if we're not going to fill in the `download` bit we'll need to modify the schema in some way so we can validate just the subset we provide.
Assignee: nobody → ted
Blocks: 1443873
See Also: → 1453402
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #1)
> For an example, the Firefox 52 Win32 en-US build file looks like:
> https://ftp.mozilla.org/pub/firefox/candidates/52.0-candidates/build2/win32/
> en-US/firefox-52.0.json
> 
> ...which is already pretty close to that schema.

(In reply to Peter Bengtsson [:peterbe] from comment #9)
> The scraping is currently done here:
> https://github.com/mozilla-services/buildhub/blob/master/jobs/buildhub/
> inventory_to_records.py 

Looking at this code, it attempts to scrape info from this very JSON file, falling back to the older .txt files if it's not available, which makes sense:
https://github.com/mozilla-services/buildhub/blob/72b919a0bb0ebed1258d3446401dbdbbebc4d5cf/jobs/buildhub/inventory_to_records.py#L152
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #12)
> https://treeherder.mozilla.org/#/
> jobs?repo=try&revision=246887a74bdbad7473c0ba486bf33cd292867a38

I realized immediately after pushing that I forgot to add buildhub.json to UPLOAD_FILES so it wasn't going to wind up in the build artifacts... The second try push corrects this.

These patches produce a sensible looking buildhub.json that conforms to the schema when run on my machine (with a few extra things set in the environment, since the patches as written assume a CI environment);
https://gist.github.com/luser/f1f2bd684f083da46ef6cc8ad4fd38b7
I tried to avoid too much yak shaving with these patches (there's so many tempting things in packager.mk!). I did consolidate a few things so that one Python script is writing out several files with similar content, and I removed some unnecessary makefile targets.

One thing I didn't do in this patch series is import the JSON schema from buildhub and validate the data as we produce it. That should not be hard, I will just need to vendor a JSON schema library for Python.
Comment on attachment 8972376 [details]
bug 1442306 - simplify make-buildinfo-file by letting it assume Mozilla CI.

https://reviewboard.mozilla.org/r/240990/#review247564
Attachment #8972376 - Flags: review+
Attachment #8972376 - Flags: review?(core-build-config-reviews)
Attachment #8972377 - Flags: review?(core-build-config-reviews) → review?(gps)
Attachment #8972378 - Flags: review?(core-build-config-reviews) → review?(gps)
Attachment #8972379 - Flags: review?(core-build-config-reviews) → review?(gps)
Attachment #8972380 - Flags: review?(core-build-config-reviews) → review?(gps)
Comment on attachment 8972377 [details]
bug 1442306 - move build info.txt generation into informulate.py.

https://reviewboard.mozilla.org/r/240992/#review247568
Attachment #8972377 - Flags: review?(gps) → review+
Comment on attachment 8972378 [details]
bug 1442306 - move make-buildinfo-file and make-mozinfo-file targets inline into make-package target.

https://reviewboard.mozilla.org/r/240994/#review247570
Attachment #8972378 - Flags: review?(gps) → review+
Comment on attachment 8972379 [details]
bug 1442306 - set MOZ_SOURCE_CHANGESET and MH_BRANCH consistently in the environment for mozharness tasks.

https://reviewboard.mozilla.org/r/240996/#review247572
Attachment #8972379 - Flags: review?(gps) → review+
Comment on attachment 8972380 [details]
bug 1442306 - generate a buildhub.json file during packaging.

https://reviewboard.mozilla.org/r/240998/#review247576

::: toolkit/mozapps/installer/upload-files.mk:393
(Diff revision 1)
>    $(call QUOTED_WILDCARD,$(DIST)/$(PKG_PATH)$(MOZHARNESS_PACKAGE)) \
>    $(call QUOTED_WILDCARD,$(DIST)/$(PKG_PATH)$(SYMBOL_ARCHIVE_BASENAME).zip) \
>    $(call QUOTED_WILDCARD,$(DIST)/$(PKG_PATH)$(GENERATED_SOURCE_FILE_PACKAGE)) \
>    $(call QUOTED_WILDCARD,$(MOZ_SOURCESTAMP_FILE)) \
>    $(call QUOTED_WILDCARD,$(MOZ_BUILDINFO_FILE)) \
> +  $(MOZ_BUILDHUB_JSON) \

It's a bit weird that this isn't using QUOTED_WILDCARD like everything else. I can't recall why we use QUOTED_WILDCARD. I wouldn't be surprised if it were a cargo cult. If it works, it works and I'm fine being inconsistent.
Attachment #8972380 - Flags: review?(gps) → review+
Comment on attachment 8972380 [details]
bug 1442306 - generate a buildhub.json file during packaging.

https://reviewboard.mozilla.org/r/240998/#review247576

> It's a bit weird that this isn't using QUOTED_WILDCARD like everything else. I can't recall why we use QUOTED_WILDCARD. I wouldn't be surprised if it were a cargo cult. If it works, it works and I'm fine being inconsistent.

I'm going to put this back only because I triggered l10n repack jobs on my try push and they're broken without it here, because they're not generating a buildhub.json file so it doesn't exist to upload. I will fix them, but that's probably going to be enough work that I want to get these patches and do it in another bug.

IIRC this is effectively just working as `$(wildcard)`, in that it lets us list everything we could possibly upload but only upload files that are actually generated. We couldn't use `$(wildcard)` at the time because we still supported pretty package names which had spaces, which GNU make can't handle properly. (We've since dropped support for pretty package names, so maybe we could just make these `$(wildcard)` at this point, but I'd like to just remove this whole list at some point in the future anyway.)
https://hg.mozilla.org/integration/mozilla-inbound/rev/db2d6e2667d268ff2be84818fbec1fe67a1a7db3
bug 1442306 - simplify make-buildinfo-file by letting it assume Mozilla CI. r=gps

https://hg.mozilla.org/integration/mozilla-inbound/rev/3a2c60a752cc58035f7982225e3efdf374c76ff1
bug 1442306 - move build info.txt generation into informulate.py. r=gps

https://hg.mozilla.org/integration/mozilla-inbound/rev/604aa7d37098eb9ee89d7a6c670dd72c31d35116
bug 1442306 - move make-buildinfo-file and make-mozinfo-file targets inline into make-package target. r=gps

https://hg.mozilla.org/integration/mozilla-inbound/rev/63abddfab15f0effa13c6019b0889dc58cdb33b6
bug 1442306 - set MOZ_SOURCE_CHANGESET and MH_BRANCH consistently in the environment for mozharness tasks. r=gps

https://hg.mozilla.org/integration/mozilla-inbound/rev/7507516552fe76d0d00d32917d88a0056e731fb5
bug 1442306 - generate a buildhub.json file during packaging. r=gps
Blocks: 1459302
Duplicate of this bug: 1453271
You need to log in before you can comment on or make changes to this bug.