Closed Bug 1696729 Opened 5 years ago Closed 4 years ago

Loss of parallelism during cargo invocation during local debug build of TB

Categories

(Thunderbird :: Build Config, defect)

x86_64
Linux
defect

Tracking

(Not tracked)

RESOLVED INVALID

People

(Reporter: ishikawa, Unassigned)

Details

Attachments

(3 files)

I am not sure if this happens on try-comm-server, but I notice a loss of parallelism during
local build of debug version of TB.

The attached screendump shows that there is basically only one CPU being used while I specified "-j6".
Most of the ordinary C++ source trees are compiled with good parallelism.
This cargo command seems to take more than 5 minutes on my CPU and so loss of parallelism hurts in terms of long elapsed time.

This is under
Debian GNU/Linux 64.
Compiling DEBUG version of comm-central source tree.

I have no way of seeing the parallelism on try-comm-server, but I suspect there may be a similar issue there.

Can't the rust-part of build be parallel?

PS: It could be a time-consuming lib creation. However, we can do better.
We can run OTHER compilation commands in C++ source tree part.
I wonder why we have to serialize this part and make it a bottle neck.

Yes, this is an issue and it does happen on builds in Taskcluster as well. This can be seen here: https://firefoxci.taskcluster-artifacts.net/Ot7vjFPeQviZOjzMsBdEWg/0/public/build/build_resources.html

Around 1500 seconds in, the build drops to a single CPU.

The same thing on a Firefox build: https://firefoxci.taskcluster-artifacts.net/Z4OiuCQlSmSIfh1Jq6z70g/0/public/build/build_resources.html

The culprit is a gkrust. It takes a good 15 minutes to just to link. I don't know enough about the dependency structure there to comment on whether build steps can be rearranged or not. I suspect that it needs to happen late in the build process so that the dependencies' object files are built.

I'll also note that there's been a bug open since 2017 at https://github.com/rust-lang/rust/issues/43211 about the long link times specifically with the webrender, geckoservo, and gkrust libraries.

I see. So it happens on tryserver as well.

My suggestion based on local observation is this.
At least for TB build, rust compilation seems to happen a bit later than C++ source tree (C++ source tree of C-C portion under "./comm" subdirectory.)
Thus, this library link comes late when it consumes CPU time for a while.
I would move the rust tree compilation come earlier somehow. Then there is a possibility that C++ compilation of C-C and M-C tree can be executed iin parallel.

As I wrote the sentence above, I realize I need to write down the following strange observation.
Whenever I rewrite a source file (C++ or even plain C file under ./comm/ldap), somehow, I have no idea why, rust compilation (at least for a subset of files) happens. This is really strange. Is there ANY rust object that depends on C/C++ compilation result of C-C tree (?!)
This has been going on for quite a while since last year (or even earlier?).
That explains why I notice the linking of rust library that runs solo after I rewrite a C file under ./comm/ldap because somehow that rewrite triggers the recompilation of rust subset. It is really strange.

I needed to boot my PC and so I cannot access the linux image where the development takes place, but I can find out what rust compilation happens after the change of ldap source tree.

If we can move the link time to an earlier phase (removing this strange dependency of rust compilation on C/C++ source code change in ./comm would help), we can shorten the time when the rust lib linking runs solo. Thus the overall congestion of tryserver will be reduced (!) :-)
Maybe I am too optimistic.

The attached log is an indication of a symptom of unnecessary compilation of rust files.

I have run |mach build| with appropriate environmental variable setting, etc. three times in a row.
(The first run may have required the compilation after all since it seemed to create a new header files, etc. from xpidl files.
So I ran the second |mach build|, AND the third |mach build|.
Actually, the second run also invoked rustc compilation.
But I am showing the log from the third |mach build| just in case I missed something.

There should NOT be a rustc compilation in this run.
But it happens.

Look for the section that starts with the following.


gmake[4]: Leaving directory '/NEW-SSD/moz-obj-dir/objdir-tb3/toolkit/mozapps/update/updater/updater-xpcshell'
    Blocking waiting for file lock on package cache
    Blocking waiting for file lock on package cache
    Blocking waiting for file lock on package cache
    Blocking waiting for file lock on package cache
       Fresh cfg-if v0.1.10
       Fresh unicode-xid v0.2.0
       Fresh autocfg v1.0.1 (/NEW-SSD/NREF-COMM-CENTRAL/mozilla/third_party/rust/autocfg)
       Fresh itoa v0.4.4
       Fresh bytes v0.5.3
warning: use of deprecated macro `try`: use the `?` operator instead
  --> /NEW-SSD/NREF-COMM-CENTRAL/mozilla/third_party/rust/itoa/src/lib.rs:74:5
   |
74 |     try!(wr.write_all(s.as_bytes()));
   |     ^^^
   |
   = note: `#[warn(deprecated)]` on by default

Why does rustc compilation happens? A complete mystery.
It should not.
These warning lines that get printed are the ones I found very obnoxious in my e-mail to Wayne.
If they are compiled and shown once, I don't have to see them every time I change ldap source file, for example.

Now, in the next comment, I will bring up what I see in the log when I touch a C source file in ./comm/ldap directory.
I think I get the compilation of the said C source file AND the unnecessary rust re-compilation (!).

Here is the log of local build.
I changed a ./comm/ldap file.
Actually, I only poped and pushed local Mercurial patch queue. (I know MQUEUE is outdated. But for an occasional patch submitter, it is indeed a bother to keep up with the latest glitzy tools. :-( )

Anyway, in the log, you can see the flurry of compilation of .c file,. That was to be expected

What I don't expect at all is that the rust compilation. This clutters the build log.
It should not be necessary. This |mach build| follows the three |mach build| as I explained above.
No rust source files and headers are touched if I am not mistaken. (Or maybe the configuration is screwed up so that rustc compilation ALWAYS runs ?)

More strange observation. In this run, no library link is attempted. But
only the compilation is attempted?
And since I use sccache, it seems that the sccache simply printed out the warnings and
used the cached object for the unchanged rustc file.
However, the warning lines, etc. ARE nuisance.
The compilation should NEVER be attempted in this stage IMHO.

Anyway, if the rustc compilation kicks in AFTER all the C++/C compilations are finished due to some not-so-well documented dependency or maybe an outright bug of serializing C/C++ compilation and rustc compilation, then I can see why the rust object link process runs solo at the end during try-comm-server build since the build there is basically building from scratch.

Hope this helps to investigate the issue by somebody.

Funny, no one complained on this from FF community?

Don't people build locally anymore?
OR maybe 4GHz CPU with lots of cores make the build time short enough not to bother most developers (?)

Anyway, try-comm-server farm can't be all 4GHz CPUs with many cores and so if we can solve this issue of extra compilation of rust source files AFTER C/C++ source files compilation, then we can probably lessen the congestion on try build farm.

Yeah, I wonder if there's an open Firefox bug for this...

Type: enhancement → defect

The gist of it is: there is nothing else left to build that isn't waiting for what's currently compiling, and what's currently compiling is stuff that takes a while. That would usually be things like style, geckoservo, gkrust.

Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → INVALID
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: