1483846 - Investigate clang LTO parallelism

Reporter

Description

•

6 years ago

The clang documentation about LTO says the default number of threads should be the number of virtual cores on the machine. I was on an interactive task instance on taskcluster, with 16 vcores per `nproc`, and ld.lld would max out at 800% cpu, which suggests 8 threads. It would be worth investigating if we should manually set the value to that of `nproc`.

Mike Hommey [:glandium]

Reporter

Comment 1

•

6 years ago

So I got a loaner, carried on an LTO build, removed libxul.so and relinked it: - With the defaults, the link step took 8 minutes - With -Wl,--thinlto-jobs=16, it took 6 minutes We do use the number of processors as seen by configure for GCC LTO, it seems worth trying the same for clang.

Nathan Froyd [:froydnj]

Comment 2

•

5 years ago

I am wondering whether the jobs defaults for all of our linkers get set properly with LTO. I think the right thing magically happens for ld64 on our Mac cross-compiles. I am less certain whether it happens correctly for binutils ld on our Linux jobs, and judging by resource usage graphs on our shippable Windows jobs, it doesn't happen at all there (COFF lld doesn't seem to expose --thinlto-jobs). Maybe we should investigate a little harder? (A separate "link" tier to clearly delineate what CPU usage looks like during linking?)

(Away)

Comment 3

•

5 years ago

judging by resource usage graphs on our shippable Windows jobs, it doesn't happen at all there (COFF lld doesn't seem to expose --thinlto-jobs).

COFF spells it as /opt:lldltojobs=N.

Nathan Froyd [:froydnj]

Comment 4

•

5 years ago

(In reply to :dmajor from comment #3)

judging by resource usage graphs on our shippable Windows jobs, it doesn't happen at all there (COFF lld doesn't seem to expose --thinlto-jobs).

COFF spells it as /opt:lldltojobs=N.

Ah, indeed. But looking through lib/LTO/LTO.cpp and lib/Support/ThreadPool.cpp, I think the default of 0 gives completely bogus results.

Nathan Froyd [:froydnj]

Comment 5

•

5 years ago

To see whether it makes any difference, the trivial patch for Windows:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=1014bed3cb2562b771c479c1146462c51681f110

(Away)

Comment 6

•

5 years ago

Hmm, apparently there's also an lldltopartitions? It's not entirely clear to me what the difference is, although this may be a starting point: https://reviews.llvm.org/D29059#665077.

Nathan Froyd [:froydnj]

Comment 7

•

5 years ago

(In reply to Nathan Froyd [:froydnj] from comment #5)

To see whether it makes any difference, the trivial patch for Windows:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=1014bed3cb2562b771c479c1146462c51681f110

OK, I'm not completely sure if this made a difference. The 64-bit shippable build completed in 70 minutes, with a resource utilization graph that looks like:

https://taskcluster-artifacts.net/F7DtTqx4QeyJyzoBV65uDA/0/public/build/build_resources.html

Three out of the last four builds on central, as of this writing, look like:

https://taskcluster-artifacts.net/MBnV5UPoSzqFPELxRAfbsA/0/public/build/build_resources.html
https://taskcluster-artifacts.net/AjmRGX-7QQqDrONLcolAPw/0/public/build/build_resources.html
https://taskcluster-artifacts.net/LtE55tpVRnu_P9qD0_qnSA/0/public/build/build_resources.html

and the times are somewhere in the 80+ minute range. So, similar graphs, with slightly more CPU usage before we drop off 100% utilization...I think we are winning?

There are other jobs on central that look like:

https://taskcluster-artifacts.net/QjfviU_0QTuoLHWYKS7kyQ/0/public/build/build_resources.html

and have build times similar to what the try push did. I don't know how the build times can vary so much; these builds shouldn't be sccached or anything like that, so we're just doing "how fast does a clean build go", and 20%ish variation on that seems...not great.

So I think something improved? I might be pushing from an old tree without some of glandium's recent build improvements, though.

(In reply to :dmajor from comment #6)

Hmm, apparently there's also an lldltopartitions? It's not entirely clear to me what the difference is, although this may be a starting point: https://reviews.llvm.org/D29059#665077.

I would like to understand what this option does too.

Nathan Froyd [:froydnj]

Comment 8

•

5 years ago

New try push adding in lldltopartitions:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=8d2897ebc2e7166d1ab1b77157ebddb67a4850e8

BMO Automation

Updated

•

2 years ago

Severity: normal → S3

Bugzilla

Investigate clang LTO parallelism

Categories

(Firefox Build System :: General, enhancement)

Tracking

(Not tracked)

People

(Reporter: glandium, Unassigned)

References

Details

Crash Data

Security

(public)

User Story

Description

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Updated