Closed Bug 1624670 Opened 4 years ago Closed 4 years ago

Bootstrap fails with 32-core / 64-thread processor

Categories

(Firefox Build System :: Bootstrap Configuration, defect)

x86_64
Windows 10
defect
Not set
normal

Tracking

(firefox76 fixed, firefox77 fixed)

RESOLVED FIXED
mozilla77
Tracking Status
firefox76 --- fixed
firefox77 --- fixed

People

(Reporter: kip, Assigned: glandium)

Details

Attachments

(3 files)

Ran mach bootstrap on a fresh mozilla-unified clone and fresh mozillabuild install.
Windows 10 running on a Threadripper 2990wx (32-core 64-thread).

This failed with an error:

ValueError: max_workers must be <= 61

Attached is longer tty output and the callstack.

This appears to be Windows-specific, as mach bootstrap succeeds on a Xeon 7210 (64-core / 256-thread) system running Linux.

Looked a bit deeper...

It appears that the problem is rooted in the 61 worker limitation of the Python3 concurrent.futures library:

https://docs.python.org/3.8/library/concurrent.futures.html

This is implemented using Win32 WaitForMultipleObject calls, which are not able to support more than 63 objects in a single call.

Now that there are consumer oriented, single-socket processors with up to 64-cores / 128-threads, this function call is obsolete for the purpose of synchronizing per-cpu instantiated workers. This is better done with IOCP's (https://docs.microsoft.com/en-us/windows/win32/fileio/i-o-completion-ports). It seems that even the latest version of concurrent.futures has this 61-worker limitation.

I was able to complete a bootstrap by hard-coding max_workers = 61 in python/mozbuild/mozbuild/frontend/reader.py:

        # max_workers = cpu_count()
        max_workers = 61

Easiest fix is probably just clamping the max_workers on Windows only.

Assignee: nobody → nfroyd

ProcessPoolExecutor will naturally default to the number of CPUs on
the machine and will also handle edge cases on Windows.

I'd love to hear about what kind of build times you get with that machine. :)

Pushed by nfroyd@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/f6e96e32b5f0
remove max_workers argument for ProcessPoolExecutor; r=dmajor
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla76

So, fun fact, this actually didn't fix it entirely... because somehow the capping doesn't work properly in python.
See https://bugs.python.org/issue26903#msg365886

Assignee: nfroyd → mh+mozilla
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Pushed by mh@glandium.org:
https://hg.mozilla.org/integration/autoland/rev/be8356bf09d7
Cap ProcessPoolExecutor's max_workers to 60 on Windows. r=firefox-build-system-reviewers,rstewart
Status: REOPENED → RESOLVED
Closed: 4 years ago4 years ago
Resolution: --- → FIXED
Target Milestone: mozilla76 → mozilla77
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: