Open Bug 1372697 Opened 7 years ago Updated 2 years ago

Build a release build configuration against trunk

Categories

(Firefox Build System :: General, enhancement)

enhancement

Tracking

(firefox57 wontfix)

Tracking Status
firefox57 --- wontfix

People

(Reporter: gps, Unassigned)

References

Details

Attachments

(2 files, 10 obsolete files)

59 bytes, text/x-review-board-request
Details
59 bytes, text/x-review-board-request
Details
Currently, the build configuration for trunk/Nightly varies significantly from the build configuration for beta/release. For example, release builds are stripped and trunk builds aren't. This can lead to unexpected performance and behavior changes between build variations. See bug 1338651 comment 113 for one such variation.

In addition, there are actions we perform against beta/release builds that are never performed on trunk. Since we don't have CI coverage against trunk, we find out about bustage on uplift day. See the multiple bugs filed with regards to the Firefox 55 uplift this week in bug 1372556.

For years, the build peers have opined about this because we don't like finding out about build system regressions weeks after code is changed and we've already paged relevant context out of our heads that would enable an efficient bug resolution. Enough is enough: let's perform a beta/release build variant on trunk as part of regular CI so we can catch these failures sooner.
Plus, there are all the problems that sheriffs find ahead of the merge with try pushes of "m-c as beta". IIRC, there are two main variations: the mozconfig used (release vs. nightly), and the milestone in config/milestone.txt, and browser/config/version*.txt.

Ryan, am I forgetting something?
Flags: needinfo?(ryanvm)
Yeah, config/milestone.txt is the primary change of interest since that's what controls the NIGHTLY_BUILD/RELEASE_OR_BETA ifdefs. The EARLY_BETA_OR_EARLIER ifdef has also turned up problems on occasion. Support for a MOZ_DEV_EDITION ifdef also recently landed and I expect that'll eventually also turn into a source of "fun" from a simulation standpoint.
Flags: needinfo?(ryanvm)
Comment on attachment 8877302 [details]
Bug 1372697 - Add build variant with beta configuration

https://reviewboard.mozilla.org/r/148630/#review153094

mozharness can load a bunch of configs, which it, in fact, does with the variants. It could load more, avoiding the execfile business.

Another option, that I personally would advocate instead of this, is to do the same as for TOOLTOOL_MANIFEST: allow taskcluster task definitions to override the mozconfig location, and add jobs with an overriden value. Then eventually, move all the mozconfig definitions to taskcluster task definitions.

But as mentioned in comment 4, the milestone is also an important part of the difference between nightly and beta (in fact --enable-profiler comes from that difference, not from the difference in mozconfigs). BTW, I think we should strive for having less things at the mozconfig level, and consolidate the variations at the configure level as much as possible.
Attachment #8877302 - Flags: review?(mh+mozilla)
Comment on attachment 8877310 [details]
Bug 1372697 - Add __file__ to locals in evaluated config file;

https://reviewboard.mozilla.org/r/148644/#review153096

cf. review of the other patch, this may not be required.
Attachment #8877310 - Flags: review?(mh+mozilla)
Comment 5 also highlights the point that DevEdition adds another wrinkle into all of this in that it's treated as a separate product and needs to be simulated independently of Beta.
Does this relate to bug 1361153 at all?
(In reply to Chris Cooper [:coop] from comment #9)
> Does this relate to bug 1361153 at all?

Yes. I think this is a part of it. Although I think bug 1361153 could be reworded a bit to reflect how this should probably be done in a TaskCluster world. Instead of having a separate repo, we probably want to introduce new platforms for the beta, release, etc configurations and have those run periodically on trunk and always on beta, release, etc. So instead of a "Linux 64 PGO" platform that behaves differently depending on which repo it is on, the tasks are repo agnostic and we have task variations for different build configuration.

The model I'd like us to get to is that we stop treating repos ("branches") specially. Instead, we're capable of doing any automation we want on any changeset in any repo. The only difference for builds we ship is that somehow there is a mechanism that schedules special tasks against specific changesets that have the side-effect of doing shipping-related things.

I haven't been pushing too strongly on this model because it is a drastic difference from how we've done things for years. The important goal is for us to complete the TaskCluster transition. Then we can massively refactor things once that is out of the way.
Blocks: 1361153
Comment on attachment 8877302 [details]
Bug 1372697 - Add build variant with beta configuration

https://reviewboard.mozilla.org/r/148630/#review153094

There's some good feedback here. I'll do the mozconfig refactor.

In the interest of "perfect is the enemy of good," would you be OK with deferring the milestone bit to follow-up work? I think the important thing for us to accomplish is get a "framework" for a non-Nightly build configuration in place, even if it is just a smoke test in reality. We can always improve the accuracy later.
The main reason I pushed for a new repo for the simulation jobs is because sheriffing them is a different can of worms than "regular" jobs. Unless the intent is for these to eventually become Tier 1 jobs that run on every push.

(In reply to Gregory Szorc [:gps] from comment #11)
> In the interest of "perfect is the enemy of good," would you be OK with
> deferring the milestone bit to follow-up work? I think the important thing
> for us to accomplish is get a "framework" for a non-Nightly build
> configuration in place, even if it is just a smoke test in reality. We can
> always improve the accuracy later.

In this case, that means creating builds that will be of very little value for finding the majority of issues we find by doing simulations. So hopefully that follow-up work will happen sooner rather than later.
(In reply to Ryan VanderMeulen [:RyanVM] from comment #12)
> The main reason I pushed for a new repo for the simulation jobs is because
> sheriffing them is a different can of worms than "regular" jobs. Unless the
> intent is for these to eventually become Tier 1 jobs that run on every push.

I would like for some jobs to be tier 1 that run everywhere. Others would almost certainly be periodic or based on files that change. For those, we could relegate them to a special repo.

> (In reply to Gregory Szorc [:gps] from comment #11)
> > In the interest of "perfect is the enemy of good," would you be OK with
> > deferring the milestone bit to follow-up work? I think the important thing
> > for us to accomplish is get a "framework" for a non-Nightly build
> > configuration in place, even if it is just a smoke test in reality. We can
> > always improve the accuracy later.
> 
> In this case, that means creating builds that will be of very little value
> for finding the majority of issues we find by doing simulations. So
> hopefully that follow-up work will happen sooner rather than later.

That is my intent. I just really like to make small, incremental progress.
Attachment #8877310 - Attachment is obsolete: true
Comment on attachment 8877302 [details]
Bug 1372697 - Add build variant with beta configuration

Not ready for review.
Attachment #8877302 - Flags: review?(ted)
Attachment #8877302 - Flags: review?(mh+mozilla)
Nick: tl;dr we want to introduce "beta" variations of platforms so we have CI coverage of beta builds so we find regressions in that build configuration before uplift day. Right now, we're only talking about doing builds as a smoke test. Which Android platform(s) do you think we need to include for reasonable coverage?
Flags: needinfo?(nalexander)
(In reply to Gregory Szorc [:gps] from comment #16)
> Nick: tl;dr we want to introduce "beta" variations of platforms so we have
> CI coverage of beta builds so we find regressions in that build
> configuration before uplift day. Right now, we're only talking about doing
> builds as a smoke test. Which Android platform(s) do you think we need to
> include for reasonable coverage?

android-api-15 is the only one that really matters.

For completeness, android-api-15-gradle -- although that's really just for developers, future-proofing, etc.  Thanks for checking in!
Flags: needinfo?(nalexander)
And TIL we rewrite the "nightly" mozconfigs as part of uplift day activities: https://hg.mozilla.org/releases/mozilla-beta/rev/2191d7f87e2e. That's done in central_to_beta.py AFAICT. In the case of Android, we don't even have "beta" mozconfigs. Furthermore, it appears we rewrite the "nightly" mozconfigs for e.g. linux64 despite using the "beta" mozconfigs for the primary build tasks (i.e. we may be rewriting them for no reason). Although I haven't done a thorough audit.

I think there's a bit of cleanup work to do here...
(In reply to Gregory Szorc [:gps] from comment #18)
> And TIL we rewrite the "nightly" mozconfigs as part of uplift day
> activities: https://hg.mozilla.org/releases/mozilla-beta/rev/2191d7f87e2e

Yeah, we really need to stop doing this.
Rewriting them in https://dxr.mozilla.org/mozilla-central/source/testing/mozharness/configs/merge_day/central_to_beta.py is just an artifact of making it produce the same output as was previously produced by the combination of central_to_aurora.py and aurora_to_beta.py, which removed profiling, changed branding to aurora, then changed branding back to nightly on the merge to beta.

No reason not to stop doing it, but you might have to alter the mozconfig comparison tests, dunno if they'd be surprised to find profiling still enabled in the nightly mozconfig on beta, or if they have no knowledge of where they run so every difference along the way is already whitelisted.
Attachment #8877302 - Flags: review?(ted)
Callek: I figure you may interested in taking a look at this code.

glandium: you may want to treat the series as f? instead of r?. And I'm happy to split the refactor bits into separate bugs. I'm not super thrilled with the end state of the series w.r.t. all the mozconfigs. But I think it is a net improvement over where we were because we're no longer rewriting the "nightly" mozconfigs as part of uplift data and are therefore closer to reproducing the "beta" and "release" configuration on mozilla-central. I'd really like to move a lot of this logic to task graph or configure options. But I feel like we have to take baby steps and the first step is not rewriting the "nightly" mozconfigs.
Flags: needinfo?(bugspam.Callek)
Some thoughts:

- I think this is a good long term direction! I also think we're going to break some things on the way to improving them.
- We have mozconfig diffs as part of release sanity; we'll need to address those, or they'll break beta1
- We use the linux and android nightly build graphs as the release-promotable graphs on beta. We'll need to address that, especially the shared mozconfigs.
- If we run the whole release graph on central, we need to be absolutely sure we're not also pushing to beetmover/balrog production. Essentially, don't push beta-on-central to beta users or the candidates s3 bucket. We have safeguards around non-level3 trees, and chain of trust helps us somewhat here, but central is more privileged than, say, try. This is the shippable concept: we can build a release build and its repacks separate from publishing. If we want to test publishing, we can test to a staging bucket / staging balrog with dep signing.
- If we run the whole release graph on central, we will likely hit l10n repack issues. IIRC, nightly repacks are very permissive regarding missing strings, and beta/release repacks are strict. We may have to fix bug 1345619 (l10n bumper for desktop) to get locale information in-tree, and we may need to add information there about which build configurations (dep/nightly/beta) they're ready for.

I'm sure I'm missing something; probably many things. Windows and Mac still use buildbot, so they may behave differently than expected regardless of in-tree changes. We still have out-of-tree releasetasks, funsize, build/tools, build/buildbot-configs, etc. If the goal is to avoid all merge day bustage, this bug will probably move us in the wrong direction in the short term. Once we have everything in-tree and taskcluster-based, efforts like this one will have a higher chance of working as intended.  I know it's tempting to ignore buildbot since a large chunk of automation is in-tree and in taskcluster, but I also think that mindset is directly related to merge day bustage.

If and when this lands, we'll have less than 6 weeks to make sure all of our release automation (both in-tree and out) works before beta1. I think we should time this landing accordingly... ideally when we believe we're ready or almost ready.

So: I think this is a good direction. I also think we need to be careful with large changes like this, especially while we're mid-migration.
:aki summed up my thoughts and concerns perfectly (and in much better wordsmithing than I could have hoped for myself)
Flags: needinfo?(bugspam.Callek)
I agree we'll likely break things as part of improving them.

My immediate goal is to get something resembling beta (and possibly release) configurations running as part of trunk CI. Initially, the task graph will have separate beta/release platforms or build configurations. On beta/release, the main/nightly platforms will still be the canonical platform and all release activity will be driven from them, just like today. Over time, I'd like to drive a wedge between things and have separate platforms for nightly/beta/release where releases actually occur from the appropriate configuration. e.g. beta releases would be driven by the beta platform, not the nightly platform. In other words, the full task graph basically contains copies of things for each release channel and non-relevant builds are pruned depending on which repo triggered the scheduling. This is a massive change and not something we should undertake lightly. That's why I have no intent of tackling it in this bug. Realistically, we can't tackle several aspects of this until the TC transition is complete.

Regarding safeguards around accidentally publishing e.g. a beta build on central to the beta channel, yeah, that's a problem. If we don't already, I think it would be a good idea to establish permissions such that a task running without a mozilla-beta/mozilla-release scope can't write to the beta/release resources. We just can't take the risk that misconfigured automation will accidentally write to the production beta/release release machinery instead of a testing instance used by central/try.
Comment on attachment 8879388 [details]
Bug 1372697 - Establish per repo mozconfigs for Fennec;

https://reviewboard.mozilla.org/r/150704/#review155868

Specific to the Android gunk: there used to be beta (and release, IIRC) Fennec mozconfigs.  The last was removed _very_ recently -- just weeks ago -- perhaps even by you.

I'd like to understand what happened in the evolution from "beta mozconfigs" to "only nightly mozconfigs" (where nightly means "each night" and not the Nightly product) and now back to "beta mozconfigs".  We seem to have looped around and I don't understand what motivated the move _away_ from "beta mozconfigs" in the first plcae.

gps, can you answer this?  Or is this a question for somebody else?
Comment on attachment 8877302 [details]
Bug 1372697 - Add build variant with beta configuration

https://reviewboard.mozilla.org/r/148630/#review155872

::: commit-message-91134:30
(Diff revision 4)
> +variable. `mach build` knows how to look for this variable so it "just
> +works."
> +
> +TODO:
> +
> +* Add Android. (Not sure what base platform(s) to run.)

I think you asked about this and have checked off this TODO item.  I skimmed the new `android-api-15-beta/opt` configuration and I think it looks sensible, too.
Comment on attachment 8879388 [details]
Bug 1372697 - Establish per repo mozconfigs for Fennec;

https://reviewboard.mozilla.org/r/150704/#review155868

Indeed and TIL. We nuked a bunch of "release" mozconfigs in 9c745f0c2216 / bug 1369551. The reason for removing them was legit: they were unused. I invented a need for them by removing the rewriting of mozconfigs at uplift time :)
Comment on attachment 8879389 [details]
Bug 1372697 - Support resolving mozconfig from environment variable;

https://reviewboard.mozilla.org/r/150706/#review156030
Attachment #8879389 - Flags: review?(mh+mozilla) → review+
I'm not a huge fan of the multiplication of mozconfig parts. It's hard to review, error prone, etc.

I'd rather have less mozconfigs than more.

It seems to me it would be better to apply the same changes as listed in testing/mozharness/configs/merge_day/central_to_beta.py, but instead of totally replacing nightly-specific things with beta-specific things, replace them with conditional code that acts differently depending on the branch being built. It should even be possible to make the merge day script itself do it.

The only missing piece is for the branch to be available to the mozconfig evaluation (which, in fact, is /somehow/ available, build/mozconfig.cache figures it out from buildprops.json or GECKO_HEAD_REPOSITORY).

Then the central-as-release jobs could override that.
Comment on attachment 8879382 [details]
Bug 1372697 - Stop modifying nightly mozconfigs as part of uplift;

https://reviewboard.mozilla.org/r/150692/#review163170
Attachment #8879382 - Flags: review?(mh+mozilla)
Comment on attachment 8879383 [details]
Bug 1372697 - Rename "mozconfig" config element to "l10n_mozconfig";

https://reviewboard.mozilla.org/r/150694/#review163172
Attachment #8879383 - Flags: review?(mh+mozilla)
Comment on attachment 8879384 [details]
Bug 1372697 - Rename l10n mozconfig files;

https://reviewboard.mozilla.org/r/150696/#review163174
Attachment #8879384 - Flags: review?(mh+mozilla)
Comment on attachment 8879385 [details]
Bug 1372697 - Create separate mozconfig files for l10n configurations;

https://reviewboard.mozilla.org/r/150698/#review163176
Attachment #8879385 - Flags: review?(mh+mozilla)
Comment on attachment 8879386 [details]
Bug 1372697 - Rename l10n mozconfig files (mobile/android);

https://reviewboard.mozilla.org/r/150700/#review163178
Attachment #8879386 - Flags: review?(mh+mozilla)
Comment on attachment 8879387 [details]
Bug 1372697 - Stop rewriting mobile/android's debug mozconfigs;

https://reviewboard.mozilla.org/r/150702/#review163180
Attachment #8879387 - Flags: review?(mh+mozilla)
Comment on attachment 8879388 [details]
Bug 1372697 - Establish per repo mozconfigs for Fennec;

https://reviewboard.mozilla.org/r/150704/#review163182
Attachment #8879388 - Flags: review?(mh+mozilla)
Attachment #8877302 - Flags: review?(mh+mozilla)
Yes, I agree moving the logic into mozconfig evaluation is a better approach. I figured duplicating the mozconfigs was the simpler solution and didn't want to spend time writing code. But after seeing the final state of the series, I agree the number of files is a bit unwieldy. I'll try to find time to rework the series.
(In reply to Mike Hommey [:glandium] from comment #38)
> It seems to me it would be better to apply the same changes as listed in
> testing/mozharness/configs/merge_day/central_to_beta.py, but instead of
> totally replacing nightly-specific things with beta-specific things, replace
> them with conditional code that acts differently depending on the branch
> being built. It should even be possible to make the merge day script itself
> do it.

Varying configs by the branch/repository being built is the exact thing I'm trying to move away from. When you do that, it gets much harder to do things like trigger a beta configuration against a try build. What I'm trying to do is enable named build "profiles" and scheduling can activate the appropriate profiles depending on the repo/head being built.

I still think putting the logic in mozconfigs (or even moz.configure) makes sense. I just don't want to have a "switch statement" based on the repo name: that logic should live in scheduling (read: decision task) land.

> The only missing piece is for the branch to be available to the mozconfig
> evaluation (which, in fact, is /somehow/ available, build/mozconfig.cache
> figures it out from buildprops.json or GECKO_HEAD_REPOSITORY).

This comes from .taskcluster.yml BTW. It is magic baked into the TC platform.
Attachment #8879382 - Attachment is obsolete: true
Attachment #8879383 - Attachment is obsolete: true
Attachment #8879384 - Attachment is obsolete: true
Attachment #8879385 - Attachment is obsolete: true
Attachment #8879386 - Attachment is obsolete: true
Attachment #8879387 - Attachment is obsolete: true
Attachment #8879388 - Attachment is obsolete: true
Attachment #8879389 - Attachment is obsolete: true
Attachment #8877302 - Attachment is obsolete: true
The patches I just uploaded a very much a proof of concept. I wanted to code up a new mozconfig approach for one platform (Linux64) to see if others approve of the direction. If so, I can repeat for other platforms.
Attachment #8887741 - Flags: review?(ted)
Attachment #8887741 - Flags: review?(mshal)
Comment on attachment 8887741 [details]
Bug 1372697 - Consolidate main mozconfig for Linux64;

https://reviewboard.mozilla.org/r/158652/#review164308

I added ted and mshal to the reviewers so they can look things over. You both have more experience with in-tree mozconfigs than me. I'd very much appreciate your feedback on the approach.

I'm nowhere near ready to land these patches. If you grant r+, I won't land nor carry that forward to final review until more of the series is written.
Comment on attachment 8887740 [details]
Bug 1372697 - Export MOZ_BUILD_CHANNEL to build tasks;

https://reviewboard.mozilla.org/r/158650/#review165364

I'm not sure "channel" is right for this, though. As in, it seems too close to MOZ_UPDATE_CHANNEL, yet, unrelated. MOZ_BUILD_TYPE maybe? OTOH, this can largely be deduced from the version number, like we do for NIGHTLY_BUILD (and we can even distinguish between beta and release through browser/config/version_display.txt now)

So how about we set that variable from a mozconfig, with something like:

case "$(cat $topsrcdir/browser/config/version_display.txt)" in
*a1)
   ...=nightly
   ;;
*b*)
   ...=beta
   ;;
*)
   ...=release
   ;;
esac

Then branches like jamun that want a specific "channel" can just change their version.txt and version_display.txt files (which they should probably do anyways because that changes other things in the build)
Attachment #8887740 - Flags: review?(mh+mozilla)
(In reply to Mike Hommey [:glandium] from comment #52)
> Then branches like jamun that want a specific "channel" can just change
> their version.txt and version_display.txt files (which they should probably
> do anyways because that changes other things in the build)

Is the idea then that we would be able to override this logic if you want to do a try push as a beta build via something in the task configuration? So if I wanted to push to try and build as a nightly and build as a beta/release, I could do a single push and select those tasks (instead of two separate pushes where one needs and updated version.txt)
Comment on attachment 8887741 [details]
Bug 1372697 - Consolidate main mozconfig for Linux64;

https://reviewboard.mozilla.org/r/158652/#review166320

::: commit-message-a763b:9
(Diff revision 1)
> +Linux64. This creates an explosion of mozconfig variants. In addition,
> +the mechanism by which these mozconfigs are activated is by differing
> +the main mozconfig according to the repo that scheduled the job. This
> +is currently done in mozharness in the branch_specifics.py file.
> +
> +This commit moves us a step closer to doing away with repo-specific

I agree that not having repo-specific mozconfigs is a worthwhile goal, however I'm not sure that consolidating things into a single mozconfig is really accomplishing that.

With these changes, we still have the differences between nightly & release mozconfigs (so the number of mozconfig variants is still the same). And we can't really get rid of that, since we really do want to do different things in nightly than we do for a release, right?

It seems this is really changing the interface between Taskcluster+mozharness and the build system. Before Taskcluster+mozharness could select options like:

mozconfigs/linux64/nightly
mozconfigs/linux64/beta
mozconfigs/linux64/release

But now these options are:

mozconfigs/linux64/main + MOZ_BUILD_CHANNEL=nightly
mozconfigs/linux64/main + MOZ_BUILD_CHANNEL=beta
mozconfigs/linux64/main + MOZ_BUILD_CHANNEL=release

What do we benefit from by having a consistent filename but then requiring a separate environment variable to be set?

Can we still get the mozconfig-is-not-tied-to-repo functionality if we specify mozconfigs in the task definition rather than mozharness? So if I trigger a beta task, it uses mozconfigs/linux64/beta, for example.
Attachment #8887741 - Flags: review?(mshal)
Comment on attachment 8887741 [details]
Bug 1372697 - Consolidate main mozconfig for Linux64;

https://reviewboard.mozilla.org/r/158652/#review169418

The problem with an environment variable, or a different mozconfig file, is that it is one more way things can vary. At the moment, if you want to do a try build for beta you need to:
- change the version number for RELEASE_BUILD to have the right value
- copy the beta mozconfig over the nightly mozconfig

That's two changes to do. Switching to an environment variable doesn't change that. And if we're going to consolidate mozconfig, I think we should reduce that to one.
Attachment #8887741 - Flags: review?(mh+mozilla)
Comment on attachment 8887741 [details]
Bug 1372697 - Consolidate main mozconfig for Linux64;

I don't think I have anything to add that hasn't already been said by mshal and glandium.
Attachment #8887741 - Flags: review?(ted)
Product: Core → Firefox Build System
FWIW catlee is organization a session at the SF all hands where we will discuss the topic in the bug summary. I reckon to get in touch with catlee if you haven't seen an invite in your calendar yet.
In bug 1497575, I've added a tool for doing try pushes that configure the tree as various release type builds, which will (when https://phabricator.services.mozilla.com/D8452 lands) correctly pick the mozconfig variant that is used for that type of release build. It is currently optimized for testing the release process, so only builds nightly (i.e. shippable) builds, and doesn't run any tests, but I think it could easily be extended to have configurations for things useful for other teams.
See Also: → 1497575
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: