Closed Bug 1442647 Opened 6 years ago Closed 3 years ago

Support non-unified builds via configure

Categories

(Firefox Build System :: General, enhancement)

enhancement
Not set
normal

Tracking

(Not tracked)

RESOLVED DUPLICATE of bug 1725125

People

(Reporter: jorendorff, Assigned: jorendorff)

References

Details

Attachments

(2 files, 1 obsolete file)

Forgive me if this already exists somewhere. I looked in `configure --help` and didn't see anything.

Mozilla C++ hackers have to keep the non-unified builds working, which is fine, but it seems like the build system and automation only barely support building that way at all. When I break the non-unified build, I don't feel like I have a good option to reproduce the issue locally.

spidermonkey-sm-nonunified-linux64/debug works by modifying my moz.build files in place, using a regexp, lol: https://searchfox.org/mozilla-central/source/js/src/devtools/automation/autospider.py#171-172

Or I can set FILES_PER_UNIFIED_FILE to 1, but there are some differences between that and the other thing. Either way, modifying source files... it's not great. We have this whole mechanism for configuring a build, let's use that.
I'd love to write this patch, if anyone's willing to mentor.
Non-unified builds were removed as a valid configuration a while ago (see [1] and [2]). Was this decision reversed at some point?

[1] https://lists.mozilla.org/pipermail/dev-platform/2015-January/008301.html
[2] https://bugzilla.mozilla.org/show_bug.cgi?id=1121000
See Also: → 1121000
As for implementation, things have moved around since bug 1121000, so it wouldn't be a straightforward backout of that patch. Instead we'd need to add the --disable-unified-compilation flag in moz.configure instead of old-configure.in, and the check for 'MOZ_DISABLE_UNIFIED_COMPILATION' would now be in python/mozbuild/mozbuild/frontend/data.py in the UnifiedSources class instead of in recursivemake.py.
A push of mine was backed out this morning for breaking the non-unified SpiderMonkey build. This configuration exists.
gps/ehsan - should we be officially supporting non-unified builds? The SpiderMonkey one was added in bug 1273639, which seems to be at odds with bug 1121000.
Flags: needinfo?(gps)
Flags: needinfo?(ehsan)
Unified builds make it possible for one .cpp to contain source code that depends on the content of another .cpp. This is always accidental and a bug, but it won't be noticed until some later unrelated change to moz.build disturbs things -- latent brokenness.

It's good to be able to build in a mode that does not mask this kind of bug. Think of it as a static analysis mode, not a build configuration we would support for end users.
Assignee: nobody → jorendorff
Status: NEW → ASSIGNED
Comment on attachment 8955606 [details] [diff] [review]
Part 1: Re-add support for --disable-unified-compilation

Clearing r? -- I didn't mean to request a review until we hear from ehsan/gps. But here's my stab at it.
Attachment #8955606 - Flags: review?(mshal)
I think this is a decision for the C++ module owner. If they don't want to support unified builds then we shouldn't have them.
I created the Spidermonkey SM(nu) build because the spidermonkey team prefers to keep the code valid and accepts the overhead of doing so, and a spidermonkey-only nonunified build does not force anything upon gecko overall. It shouldn't be seen as a disagreement with gecko's policy of accepting nonunified bustage, because it isn't -- I'm not even sure which way I'd vote for Gecko. The cost/benefit equation is very very different between gecko and spidermonkey.

It would, however, be nice to have a better implementation, and to my non-build-peerish eye, jorendorff's looks perfect.

Now that tier 2 is a real thing (expected to fix but not necessarily immediately), gecko could reasonably reconsider the possibility of staying nonunified-clean.

(I should also note that the speed savings of unified builds are awesome, and I don't think anyone wants their default developer builds to be nonunified. At least, not after they've sat through a couple.)
Comment on attachment 8955608 [details] [diff] [review]
Part 2: Use --disable-unified-compilation in autospider

Review of attachment 8955608 [details] [diff] [review]:
-----------------------------------------------------------------

Given that this patch depends on the previous, I don't think there's any harm in reviewing it even if it never lands.
Attachment #8955608 - Flags: review+
Product: Core → Firefox Build System
Supporting non-unified builds is a decision for the C++ module owner to make.
Flags: needinfo?(gps)
ni?froydnj as acting C++ module owner.

Nathan, the SpiderMonkey team currently has a non-unified build on treeherder: SM(nu). We want to keep our code working this way (see comment 6 for why). We do not want to set policy for Gecko.

I want to (re-)add a configure option to help with this, not as a supported build mode for Gecko, but for JS team's use. The build team naturally doesn't want a --build-gecko-in-a-completely-broken-way flag without the C++ module owner weighing in. (We used to have this flag and Ehsan removed it.)

If it helps, I'd be happy to hack it so this flag is only available when configuring SpiderMonkey, not when configuring Gecko.
Flags: needinfo?(ehsan) → needinfo?(nfroyd)
Talked with Jason on IRC and confirmed that Ehsan is the person to be asking about this.  Maybe the modules wiki page should be updated now...
Flags: needinfo?(nfroyd) → needinfo?(ehsan)
(In reply to Jason Orendorff [:jorendorff] from comment #14)
> If it helps, I'd be happy to hack it so this flag is only available when
> configuring SpiderMonkey, not when configuring Gecko.

I think this would be totally reasonable, FWIW.
Attachment #8955606 - Attachment is obsolete: true
(In reply to Jason Orendorff [:jorendorff] from comment #0)
> Forgive me if this already exists somewhere. I looked in `configure --help`
> and didn't see anything.

The --disable-unified-compilation existed and was removed intentionally over three years ago as comment 2 mentioned.

> Mozilla C++ hackers have to keep the non-unified builds working

That is not true, there is no such requirement.  The only supported build configuration is unified compilation currently.  Other configurations (e.g. FILES_PER_UNIFIED_FILE=1) have been supported on a best effort basis depending on where in the code base you look, but overall most people haven't paid any attention to keep that configuration working beyond the occasional small fixes that people land when adding/removing files from UNIFIED_SOURCES.

> which is
> fine, but it seems like the build system and automation only barely support
> building that way at all.

That's quite intentional as the thread linked to above shows.  :-)  For a while after we got unified compilation we maintained both unified and non-unified build configurations so that all developers would gain confidence in the quality of the unified configuration and to ensure all remaining issues with it are solved.  Once that happened (Feb 2015) we went ahead and removed the non-unified configuration since it made little sense to continue to support two build configs where we would only use one in practice.

> When I break the non-unified build, I don't feel
> like I have a good option to reproduce the issue locally.
> 
> spidermonkey-sm-nonunified-linux64/debug works by modifying my moz.build
> files in place, using a regexp, lol:
> https://searchfox.org/mozilla-central/source/js/src/devtools/automation/
> autospider.py#171-172

It's kind of surprising to me that we have a build job testing this configuration to be honest.  Steve's objection's were known at the time since he also commented on the same thread and we had a discussion about it (see https://lists.mozilla.org/pipermail/dev-platform/2015-January/008337.html which is one of my responses to him) and it seems to me that at the time it wasn't clear if SpiderMonkey actually wants non-unified compilation.  Perhaps my memory is wrong but I don't ever remember someone closing the loop there, but FWIW glandium did suggest a better alternative at the time <https://lists.mozilla.org/pipermail/dev-platform/2015-January/008352.html>, if the only issue is this hairy code, then maybe we don't need a new configure flag.

Is that sufficient for your needs?

If the ask here is to change our stance on non-unified builds and to ask for them to actually be a build configuration again that we care about to some extent, please provide some reason why this is desirable.  Over the past three years we have gained a lot of experience in my opinion that show that decision made in Feb 2015 was a good decision overall, so I personally believe in it now even more than I did back then!
Flags: needinfo?(ehsan) → needinfo?(jorendorff)
> Is that sufficient for your needs?

Uhhhh, sure.

> If the ask here is to change our stance on non-unified builds and
> to ask for them to actually be a build configuration again that we
> care about to some extent, please provide some reason why this is
> desirable.

No, I'm not asking that, and in fact the current patch in this bug disables the flag when configuring Gecko -- it's only there when configuring SpiderMonkey.

We don't even want to use the binaries for anything, except to make sure each .cpp file is including the headers it uses and there aren't cpp->cpp dependencies. The SM(nu) build doesn't even run jit-tests. It's a cheap static analysis for us.

If it helps, I can rename the flag to something like --unsupported-disable-unified-compilation-danger-do-not-use-this-flag.
Flags: needinfo?(jorendorff)
Jason and I chatted on IRC about this.  He said that his only goal is to avoid the possibility of accidentally checking in a modification to the source file (such as stranding FILES_PER_UNIFIED_FILES=1 lines) when working on a fix for a non-unified build bustage, not suggesting a modification to Mozilla's supported build configurations.  I think that's a fair ask, and it's not really up to me to say whether the best way to do that is through adding a configure option or something else but I don't see a strong reason to avoid adding a SM specific facility to improve things here.  Build peers are better equipped to answer that question, I think, so I'd defer to glandium or another build peer on that.
Okay, at this point, I don't get it anymore. If the only non-hackish way to disable unified compilation is a configure flag that is 2km long and says "unsupported", why then are those type of builds supported in the form of a tier-2 taskcluster job? This seems backwards to me. Either it's unsupported and the jobs should be removed, or it's supported and the flag shouldn't say it isn't.
Well, this would be a configure option used by a single test job. I thought we already several build system features like this, that serve a specific purpose, some of them used in CI jobs, but which we would never use in a release build, don't support for end users, and which may well be broken in combinations that aren't tested.

We have Valgrind builds and static analysis builds because they're useful to developers. SM(nu) is useful to us in much the same way, just not as important. I don't get to choose what's useful to us.
I will try to be brief:

- thanks for that email thread, I still agree with everything both Ehsan and I said there. Unified builds are good. Supporting the option of non-unified gecko builds is unneeded.

- spidermonkey has different tradeoffs. We observed repeated (unified) bustage with non-tier-1 platforms because we use (or used?) conditional compilation in the makefile. The consensus on #jsapi is that we would prefer to maintain a tier-1 nonunified build *for spidermonkey only*. It has been relatively low cost to maintain. I am unaware of developers ever complaining that the fixes were unnecessary busywork. I don't think any of those are true for Gecko.

- my implementation of SM(nu) in autospider.py is awful. My initial attempt was to use something like glandium suggested (or maybe it was just FILES_PER_UNIFIED_FILE=1?) and I ran into tooling problems where I was getting realfile.cpp errors reported in unified.cpp. (It also reported the correct location, but whatever tooling it was looked primarily at the unified.cpp file. Might have been emacs error parsing.) It was a nuisance when fixing nonunified breakage. I don't know if that tooling limitation still exists.

- related, I wanted something that did not require touching the build system. I was signing up for maintaining an external hack, because the will of the build peers was to not have any pretense of supporting nonunified builds. I didn't consider the possibility of a js/src/configure-only solution, but I probably didn't have the stomach for arguing it at the time anyway.

- the tradeoffs for spidermonkey are different than for gecko. We have or had more conditionally-compiled source files. We are willing to maintain the nonunified build (of spidermonkey only). (Before I added SM(nu), people were regularly fixing it up, and complaining about it not being in automation.)

- SM(nu) is tier 1. At that time, tier 2 was indistinguishable from tier 3. Today, we might go for tier 2.

- I don't speak for all of the spidermonkey team. The collective opinion may be different now.

- if the existence of nonunified spidermonkey builds bothers build peers because it challenges the gecko stance on supporting nonunified compilation, I'm not going to fight against its removal. (I would find that decision unfortunate, to be sure, but it's the build peers' call.) Again, I can't speak for all of #jsapi.

- I believe the "2km long configure flag" was meant to be "I'm not entirely sure what the objections are, but if it would help for the flag to be long and ugly, we could make it long and ugly." I think it's off the table now; it sounds like people are saying that we should either have a simple JS-only --disable-unified, or we should remove SM(nu) entirely.
I think bug 1447458 comment 0 has some useful data points.  Right now we're spending relatively senior engineers' time on firedrills around beta merge and to help Linux distro packagers ship Firefox, because they're the people who understand the problems that are happening.  Instead of that, I'd rather have sheriffs bug people who don't "include what you use".
See Also: → 1447458
OK, I'll toss out an alternative proposal here.  If we're serious about the idea that nonunified is no longer supported, then we need to be serious that changing *other* build flags shouldn't cause bustage.  This, in turn, requires that no supported build flags or configurations should change the set of unified files compiled within a directory.  In other words, we should have the rule that we no longer conditionally build unified-build files within a directory based on a supported build flag -- we either (a) conditionally build directories and ensure that the files are in their own directories or (b) #ifdef out the entire contents of files rather than ceasing to build them or (c) (not preferred, since it hurts build performance) always build the files nonunified.

I don't particularly like this proposal -- I'd prefer having a nonunified build in CI as tier-2 -- but I think the current situation where supported build configurations *do* cause unification-related bustage because they change the set of files being built isn't acceptable (and we then drag senior people away to fix that), and we should move in one direction or the other.
(In reply to David Baron :dbaron: ⌚️UTC-7 from comment #24)
> I think bug 1447458 comment 0 has some useful data points.  Right now we're
> spending relatively senior engineers' time on firedrills around beta merge
> and to help Linux distro packagers ship Firefox, because they're the people
> who understand the problems that are happening.  Instead of that, I'd rather
> have sheriffs bug people who don't "include what you use".

IMO the proper way to prevent firedrills around uplifts is to be running beta, release, etc build configurations in CI on trunk. Bug 1372697 is probably what you are looking for here. Now that the TC migration is complete, I may try to revive this effort.

If downstream Linux distro builds are causing firedrills as well, I would also suggest we stand up CI on trunk to cover these build configurations. Maybe it is only tier 2 and only runs once a day. But at least we'll have visibility into the problem before it causes a firedrill on uplift. We should be finding problems sooner so there is the opportunity to fix them at the time of regression, not potentially weeks later.

Regarding the substance of this bug, we need to make a decision as to whether:

a) we change the build system to prevent conditionally-included sources from being lumped in with non-conditional translation/compilation units
b) we stand up CI covering the variable build configurations that waste our people's time so problems don't turn into firedrills
c) we put our foot down and refuse to support certain build configurations [that aren't well-tested]

"c" is hard because if you are at that point, then you might as well say "if the build configuration isn't supported, why is it a configuration option." We end up making various components non-optional and this upsets downstream packagers and makes their lives a *lot* harder. In other words, we externalize the cost of packaging Firefox to others. This would likely lead to fewer downstream packagers, which isn't good for Firefox adoption. So this becomes more of a Product question than an Engineering question.

I like "a" as an elegant solution to the problem. Although I'm not sure how practical it is. We'd have to annotate each UNIFIED_SOURCES entry to indicate if it is conditional. Or we'd have to prevent multiple assignments to UNIFIED_SOURCES. Or we'd have to consult the AST and prevent assignments that occur from `if` blocks.

Aspects of "b" should likely be done regardless.
While I'm uncomfortable scope-creeping this bug to include more than spidermonkey, I did want to mention that "a" isn't a full fix. If a conditionally-compiled file contains 'using' or an #include needed by something in its unified compilation unit, then the build will still break if that whole file is #ifdef'ed out or just omitted from the compilation without shifting unification boundaries.

That is not an argument against, btw; the effects would be much more contained than they are today with shifting boundaries.

While I still think a spidermonkey-only decision ought to be straightforward, it may be best to decide the overall question in bug 1447458 and then worry about spidermonkey specifically if it still needs special consideration -- a tier 2 job for all nonunified-friendly whitelisted directories, as dholbert proposes in bug 1447458, would give us everything we need.

May I ask a question. Why are unified builds different for Firefox and Thunderbird? We had two bustages recently and one a while ago caused by unified builds. Each time FF compiled, but TB didn't. Here are the fixes:
https://hg.mozilla.org/mozilla-central/rev/3aec75953c28 - 12 Jan 2019, adding one missing include.
https://hg.mozilla.org/mozilla-central/rev/c9a17c8a3c3f - 3 Jan 2019 (incorrect fix)
https://hg.mozilla.org/mozilla-central/rev/f612a041c69c - 4 Jan 2019 (correct fix), adding six(!) missing includes
https://hg.mozilla.org/mozilla-central/rev/c40ca7a1bdd9 - 9 Mar 2017, adding one missing include.

I know, TB doesn't have any tier status, but it's stressful for the TB maintainer to fix and fix the bustage, get review and approval to fix it on M-C directly. The last two occurrences happened on the 23:00 merge and it took until after midnight to get it sorted out. Surely other downstream projects are also affected, see bug 1345771 comment #15.

Personally I find it particularly, well, let's say, "inconvenient", that the M-C code base is in a broken state, that is, it only compiles under certain "lucky" circumstances and may break at any moment when those circumstances change. Surely there must be a way to detect this bustage before it lands, like yesterday, well, in the AM hours of today, I added WindowProxyHolder.h to a file using WindowProxyHolder, a symbol 100% unknown in that file.

s/fix and fix/find and fix/.

(In reply to Jorg K (GMT+1) from comment #28)

May I ask a question. Why are unified builds different for Firefox and Thunderbird?

This is a valid question. Usually the answer is that some files are only compiled in a certain configuration, which shifts the unification boundaries.

For your situation, you need to figure out (hopefully with assistance from build people) why there is a difference and eliminate it. Otherwise, you're always going to be fighting an uphill battle, even if we decide to make a tier 2 build (as I hope we will.)

Personally I find it particularly, well, let's say, "inconvenient", that the M-C code base is in a broken state, that is, it only compiles under certain "lucky" circumstances and may break at any moment when those circumstances change. Surely there must be a way to detect this bustage before it lands, like yesterday, well, in the AM hours of today, I added WindowProxyHolder.h to a file using WindowProxyHolder, a symbol 100% unknown in that file.

The code is not in a broken state, this sort of breakage is the expected result of our chosen method of compilation. The code is valid when compiled unified in the configurations that are tested in CI. It's not a lucky circumstance, it's a known situation. It's not unlike requiring a certain library's headers to build; without that library installed, and furthermore a compatible version of that library installed, your build will fail. The constraint of unification, though subtle and annoying, buys us a huge compilation speed win and so is totally worth it.

We have chosen to not detect this bustage before it lands, to remove that sort of nuisance overhead from developers. Even if we had a tier 2 build, we would say that it's ok to land stuff that breaks it. I imagine that m-c would stay more or less working if we had that build, though -- sheriffs would try to get people to fix it before merging.

If I were to try tracking your problem down, I would suggest first comparing all of your $objdir/**/Unified*.cpp files against those from a plain m-c build. (You'd have to strip off the leading portion of the absolute paths, though.) A build peer might just know what is likely to be the problem, though; perhaps setting the product (or whatever other configuration settings you use?) is enough to add in or remove some files. I recommend filing a bug on this to see what help you can get.

Thanks, I followed your suggestion and filed bug 1519936.

Good news, this seems to have been fixed elsewhere, in bug 1725125 and bug 1725145.

mozconfig option:

ac_add_options --disable-unified-build

Of course that would break the build if we really disabled unified mode everywhere right now, so many directories opt out of honoring this by setting REQUIRES_UNIFIED_BUILD in their mozconfig. But presumably the number of directories that have to do that will go down over time.

Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → DUPLICATE

(So: if I'm understanding correctly, you can force a given directory to build in non-unified mode by (a) removing REQUIRES_UNIFIED_BUILD from that directory's moz.build file, and (b) putting ac_add_options --disable-unified-build in your mozconfig.)

Thanks Daniel for pointing to this original bug. Your comments are correct, also we have some autoland tasks that perform hybrid build.

You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: