Open Bug 1447458 Opened 6 years ago Updated 2 years ago

Create (and periodically build) a configuration with unification (unified build) disabled in a whitelist of directories

Categories

(Firefox Build System :: General, enhancement)

enhancement

Tracking

(Not tracked)

People

(Reporter: dholbert, Unassigned)

References

Details

Right now, it is extremely easy to accidentally introduce build bustage that is hidden by the fact that we've got unified builds (by doing trivial things like not realizing you need to add an #include or a "using namespace" declaration for new code).

Pretty much every directory in our source tree has this sort of latent build bustage.  (though I sometimes purge a directory or two, e.g. bug 1437723)  This bustage is effectively a bunch of tiny hidden landmines, which may go off when we do anything that changes unification (e.g. adding a new C++ file, or starting to build in a release configuration).  These sort of routine things can inadvertantly reshuffle the unification which may make these seemingly-unrelated build issues show up (when really they've been there all along and have just been suppressed by virtue of getting their #includes/using-namespace/etc. from some other C++ file that they're unified with).

Some very recent examples of problems that unified builds were covering up until they surfaced and caused trouble:
 - Build bustage when merging from mozilla-central to beta (which we would've caught if unified builds were disabled) - bug 1444633 (just a missing #include that we found out about at the last minute)
 - Official SUSE Firefox 59 package failing to build, due to latent bustage that was suppressed by unified builds (bug 1447070, bug 1447409)  (again, missing #includes that we're now finding out about, *after* shipping Firefox 59. :-/)

To avoid these landmines causing periodic problems, I think we should add a periodic (~daily) TreeHerder job configuration, that builds in non-unified mode, for a whitelist of directories that are known to be currently safe to build in that mode (probably a short whitelist now, but hopefully growing over time -- kind of like how FAIL_ON_WARNINGS started small & grew until it could become the default).  This doesn't necessarily have to be all platforms -- e.g. it could start out linux-only perhaps -- but we ideally should do this for both debug and opt builds.
Maybe this could even be a configuration that would happen unconditionally on every mozilla-central push, since those are pretty infrequent?  That would make it relatively easy to track down the responsible change and tag the responsible developer, when this configuration hits a build failure from a recent landing that neglected to include some #include or whatever.
Strawman proposal:
 1) For directories that are known to currently build successfully in non-unified mode, we add a line like the following to their moz.build:
  SUPPORTS_NON_UNIFIED_BUILD = True

 2) We add a new mozconfig option, e.g. "ac_add_options --prefer-non-unified-build", which makes us check for that ^^ moz.build variable and treat UNIFIED_SOURCES as if it were SOURCES in that moz.build file.

 3) We add a new Linux Opt & Linux Debug build-configuration on mozilla-central TreeHerder, which is just like our existing opt/debug builds but simply have that mozconfig option added.  (No tests needed -- similar to our "V" build for --enable-valgrind)

CCing some folks who've been involved with unified builds & cleaning directories of warnings/errors in the past. Thoughts?
Summary: Create (and periodically build) a configuration with unification disabled → Create (and periodically build) a configuration with unification (unified build) disabled in a whitelist of directories
Thanks. (I suppose what I'm asking for here is a generalized version of the "spidermonkey-sm-nonunified-linux64/debug" build that apparently already exists, as noted there, and a for-now more limited version of the configuration we removed in bug 1121000.)

At the end of bug 1442647 comment 18, ehsan says:
> If the ask here is to change our stance on non-unified builds and
> to ask for them to actually be a build configuration again that we
> care about to some extent, please provide some reason why this is desirable.

That is indeed what I'm asking for on this bug -- I think this should be a configuration we care about to some extent. (Perhaps tier 2, so that we don't have to backout immediately -- and maybe on a per-directory opt-in basis so that we've always got the option of un-whitelisting a directory to "fix" bustage if we run into bustage that's too much trouble to sort out. But by default.)

The reason is that we've seen several cases of latent brokenness popping up as late-breaking bustage around releases & merges, as noted under "Some very recent examples of problems" in comment 0.  This seems like the sort of thing we should be catching earlier (and automatically), one way or another.
Depends on: 1121000
(In reply to Daniel Holbert [:dholbert] from comment #4)
> (Perhaps tier 2, so that we
> don't have to backout immediately -- and maybe on a per-directory opt-in
> basis so that we've always got the option of un-whitelisting a directory to
> "fix" bustage if we run into bustage that's too much trouble to sort out.
> But by default.)

Sorry, I left that last thought in this paragraph unfinished.  I meant to say: But by default, the idea would be that bustage should be easy to diagnose & should be fixed by the developer or a sheriff with followups (or caught on Try), similar to the "V" valgrind build or "Windows MinGW" build, for example.
See Also: → 1442647
When we made the decision to axe non-unified builds 3 years ago, the concept of Tier 2 didn't exist. I think these would be a great use of that status, personally. As dbaron pointed out in the other bug, I see developers wasting too much time and/or getting backed out because they were unfortunate enough to land a change that perturbed the unified boundaries just enough to hit a latent issue. Tier 2 specifically has the guarantee of not getting patches backed out, but still allowing us to notice issues soon enough after the fact to get them fixed by someone.
Today the lack of this build configuration bit one of our interns: someone else had checked in code that did not compile when unified boundaries shifted and the intern's patch was backed out as collateral damage.

I would very much support adding a Tier 2 non-unified compile job.
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.