Last Comment Bug 705479 - Parallel builds (-jN) on Windows should fail early if gmake is being used
: Parallel builds (-jN) on Windows should fail early if gmake is being used
Status: RESOLVED FIXED
fixed-in-bs
:
Product: Firefox
Classification: Client Software
Component: Build Config (show other bugs)
: unspecified
: All Windows 7
: -- enhancement (vote)
: Firefox 11
Assigned To: Jon Buckley
:
Mentors:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2011-11-26 13:34 PST by Jon Buckley
Modified: 2011-11-30 06:56 PST (History)
5 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
Throw an error when parallel building (-jN) on Windows with gmake (1.16 KB, patch)
2011-11-27 12:55 PST, Jon Buckley
no flags Details | Diff | Review
Move checks to browser/build.mk (1.29 KB, patch)
2011-11-28 14:25 PST, Jon Buckley
khuey: review-
bugspam.Callek: feedback-
Details | Diff | Review
Allow multi-CPU builds to work with -j1 (1.19 KB, patch)
2011-11-29 08:13 PST, Jon Buckley
khuey: review+
Details | Diff | Review

Description Jon Buckley 2011-11-26 13:34:16 PST
I ran into the documented problem where doing a parallel build on Windows with gmake will randomly deadlock during the compile process. I got directed to use Pymake instead, which works great for me, but it'd be helpful for new developers if the compile process would stop with an error message if gmake is being used to do a parallel build.
Comment 1 Kyle Huey [:khuey] (khuey@mozilla.com) 2011-11-26 17:04:00 PST
Are Thunderbird or Seamonkey still using VMs for build infra?  IIRC we did parallel make on gmake in VMs without deadlocking because the VMs are so damn slow.
Comment 2 Justin Wood (:Callek) 2011-11-27 04:15:07 PST
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #1)
> Are Thunderbird or Seamonkey still using VMs for build infra?  IIRC we did
> parallel make on gmake in VMs without deadlocking because the VMs are so
> damn slow.

SeaMonkey is, but we probably will be ending up moving away from it one of these years. And yes, SeaMonkey is still using -j3 on our VM's so I would appreciate at the least a way to override this error (or no error to be introduced). We do hit the deadlock once and a while, but it is rare.
Comment 3 Jon Buckley 2011-11-27 12:55:53 PST
Created attachment 577158 [details] [diff] [review]
Throw an error when parallel building (-jN) on Windows with gmake

Here's a first crack at a fix for this. It doesn't handle other Mozilla applications, but it does work for building Firefox on Windows and OS X. Is putting this logic in client.mk the right approach to fixing this, or is there a better place for it?
Comment 4 Justin Wood (:Callek) 2011-11-27 15:18:00 PST
(In reply to Jon Buckley [:jbuck] from comment #3)
> Created attachment 577158 [details] [diff] [review] [diff] [details] [review]
> Throw an error when parallel building (-jN) on Windows with gmake
> 
> Here's a first crack at a fix for this. It doesn't handle other Mozilla
> applications, but it does work for building Firefox on Windows and OS X. Is
> putting this logic in client.mk the right approach to fixing this, or is
> there a better place for it?

Strictly my opinion, IFF we do this, we should do it in the toplevel Makefile.in not client.mk

client.mk itself is not executed in parallel, and people don't need (and sometimes don't) start with client.mk for builds, but client.mk does always dive into top-level Makefile. Could also [instead] be done in config/rules.mk.

As I said above though, I'm against doing this without an "ignore"/"off" switch.
Comment 5 Kyle Huey [:khuey] (khuey@mozilla.com) 2011-11-28 04:42:48 PST
Comment on attachment 577158 [details] [diff] [review]
Throw an error when parallel building (-jN) on Windows with gmake

Lets put this in browser/build.mk instead.
Comment 6 Ted Mielczarek [:ted.mielczarek] 2011-11-28 04:51:11 PST
I think we could make this relatively foolproof by erroring iff:
1) You're using gmake
2) On Windows
3) And you have >1 core. (Since this is Windows-only, you can probably check the NUMBER_OF_PROCESSORS env var.)
Comment 7 Jon Buckley 2011-11-28 14:25:51 PST
Created attachment 577379 [details] [diff] [review]
Move checks to browser/build.mk

This patch moves the parallel build checks to browser/build.mk, and changes the check to NUMBER_OF_PROCESSORS instead.
Comment 8 Justin Wood (:Callek) 2011-11-28 14:33:29 PST
Comment on attachment 577379 [details] [diff] [review]
Move checks to browser/build.mk

This breaks compiling Firefox with gnuMake on windows when youhave multiple CPUs even when you don't -jN.  Which for example, would break current firefox build machines too.
Comment 9 Justin Wood (:Callek) 2011-11-28 14:36:11 PST
(In reply to Justin Wood (:Callek) from comment #8)
> Comment on attachment 577379 [details] [diff] [review] [diff] [details] [review]
>  Which for example, would break current firefox build machines too.

Err assumptions presented as fact -- fail.

I *think* Current firefox build machines actually have multiple processors assigned, but I did not verify that assumption. but my f- stands since it also breaks the "don't break for the easy/common case" for many users.

Since the deadlocks do not happen with -j1 or no -j at all.
Comment 10 Kyle Huey [:khuey] (khuey@mozilla.com) 2011-11-28 14:37:54 PST
Comment on attachment 577379 [details] [diff] [review]
Move checks to browser/build.mk

Review of attachment 577379 [details] [diff] [review]:
-----------------------------------------------------------------

We shouldn't break -j1 on multiprocessor machines, even if it doesn't affect the build bots.
Comment 11 Jon Buckley 2011-11-29 08:13:19 PST
Created attachment 577623 [details] [diff] [review]
Allow multi-CPU builds to work with -j1

Now checks for the presence of -j in MAKEFLAGS. MAKEFLAGS doesn't get expanded on the first pass through the file, so this patch uses a workaround found on the make mailing list to search MAKEFLAGS: http://lists.gnu.org/archive/html/help-make/2009-05/msg00023.html
Comment 12 Jon Buckley 2011-11-29 08:30:39 PST
Just for some further clarification, MAKEFLAGS at that point in the file shows "w --jobserver-fds=3,4 -j", if you pass any -jN option via MOZ_MAKE_FLAGS, eg -j8, -j12. If you do not pass -jN in your MOZ_MAKE_FLAGS, then MAKEFLAGS will be "w".

If we need to test for specific -jN values (if someone does MOZ_MAKE_FLAGS=-j1), the only way I can think of right now is exporting the MOZ_MAKE_FLAGS variable in client.mk, so that browser/build.mk can see exactly what flag was set. Looking through mxr, this *might* be a concern with some of the build configs: http://mxr.mozilla.org/mozilla-central/search?string=MOZ_MAKE_FLAGS
Comment 13 Jon Buckley 2011-11-29 08:53:42 PST
Turns out I am being needlessly paranoid about my first patch :)

If you pass MOZ_MAKE_FLAGS="", MAKEFLAGS="w"
If you pass MOZ_MAKE_FLAGS="-j1", MAKEFLAGS="w"
If you pass MOZ_MAKE_FLAGS="-jN", where N is > 1, MAKEFLAGS="w --jobserver-fds=3,4 -j"
Comment 14 Kyle Huey [:khuey] (khuey@mozilla.com) 2011-11-29 11:24:45 PST
Comment on attachment 577623 [details] [diff] [review]
Allow multi-CPU builds to work with -j1

Review of attachment 577623 [details] [diff] [review]:
-----------------------------------------------------------------

Woo.  Looks good to me.
Comment 15 Kyle Huey [:khuey] (khuey@mozilla.com) 2011-11-29 13:35:29 PST
https://hg.mozilla.org/projects/build-system/rev/0e5dfc784411
Comment 16 Kyle Huey [:khuey] (khuey@mozilla.com) 2011-11-30 06:56:44 PST
https://hg.mozilla.org/mozilla-central/rev/0e5dfc784411

Note You need to log in before you can comment on or make changes to this bug.