Open Bug 1483846 Opened 5 years ago Updated 4 months ago

Investigate clang LTO parallelism

Categories

(Firefox Build System :: General, enhancement)

enhancement

Tracking

(Not tracked)

People

(Reporter: glandium, Unassigned)

Details

The clang documentation about LTO says the default number of threads should be the number of virtual cores on the machine. I was on an interactive task instance on taskcluster, with 16 vcores per `nproc`, and ld.lld would max out at 800% cpu, which suggests 8 threads.

It would be worth investigating if we should manually set the value to that of `nproc`.
So I got a loaner, carried on an LTO build, removed libxul.so and relinked it:
- With the defaults, the link step took 8 minutes
- With -Wl,--thinlto-jobs=16, it took 6 minutes

We do use the number of processors as seen by configure for GCC LTO, it seems worth trying the same for clang.

I am wondering whether the jobs defaults for all of our linkers get set properly with LTO. I think the right thing magically happens for ld64 on our Mac cross-compiles. I am less certain whether it happens correctly for binutils ld on our Linux jobs, and judging by resource usage graphs on our shippable Windows jobs, it doesn't happen at all there (COFF lld doesn't seem to expose --thinlto-jobs). Maybe we should investigate a little harder? (A separate "link" tier to clearly delineate what CPU usage looks like during linking?)

judging by resource usage graphs on our shippable Windows jobs, it doesn't happen at all there (COFF lld doesn't seem to expose --thinlto-jobs).

COFF spells it as /opt:lldltojobs=N.

(In reply to :dmajor from comment #3)

judging by resource usage graphs on our shippable Windows jobs, it doesn't happen at all there (COFF lld doesn't seem to expose --thinlto-jobs).

COFF spells it as /opt:lldltojobs=N.

Ah, indeed. But looking through lib/LTO/LTO.cpp and lib/Support/ThreadPool.cpp, I think the default of 0 gives completely bogus results.

To see whether it makes any difference, the trivial patch for Windows:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=1014bed3cb2562b771c479c1146462c51681f110

Hmm, apparently there's also an lldltopartitions? It's not entirely clear to me what the difference is, although this may be a starting point: https://reviews.llvm.org/D29059#665077.

(In reply to Nathan Froyd [:froydnj] from comment #5)

To see whether it makes any difference, the trivial patch for Windows:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=1014bed3cb2562b771c479c1146462c51681f110

OK, I'm not completely sure if this made a difference. The 64-bit shippable build completed in 70 minutes, with a resource utilization graph that looks like:

https://taskcluster-artifacts.net/F7DtTqx4QeyJyzoBV65uDA/0/public/build/build_resources.html

Three out of the last four builds on central, as of this writing, look like:

https://taskcluster-artifacts.net/MBnV5UPoSzqFPELxRAfbsA/0/public/build/build_resources.html
https://taskcluster-artifacts.net/AjmRGX-7QQqDrONLcolAPw/0/public/build/build_resources.html
https://taskcluster-artifacts.net/LtE55tpVRnu_P9qD0_qnSA/0/public/build/build_resources.html

and the times are somewhere in the 80+ minute range. So, similar graphs, with slightly more CPU usage before we drop off 100% utilization...I think we are winning?

There are other jobs on central that look like:

https://taskcluster-artifacts.net/QjfviU_0QTuoLHWYKS7kyQ/0/public/build/build_resources.html

and have build times similar to what the try push did. I don't know how the build times can vary so much; these builds shouldn't be sccached or anything like that, so we're just doing "how fast does a clean build go", and 20%ish variation on that seems...not great.

So I think something improved? I might be pushing from an old tree without some of glandium's recent build improvements, though.

(In reply to :dmajor from comment #6)

Hmm, apparently there's also an lldltopartitions? It's not entirely clear to me what the difference is, although this may be a starting point: https://reviews.llvm.org/D29059#665077.

I would like to understand what this option does too.

Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.