Closed Bug 1583999 Opened 1 year ago Closed 2 months ago

Make the CompileDB backend work with clangd

Categories

(Firefox Build System :: Developer Environment Integration, enhancement)

enhancement
Not set
normal

Tracking

(Not tracked)

RESOLVED WORKSFORME

People

(Reporter: botond, Assigned: andi)

References

(Depends on 1 open bug)

Details

Attachments

(1 file, 1 obsolete file)

clangd is a C++ language server that provides a lot of useful semantics-aware editing features for any editor that supports the Language Server Protocol.

Getting clangd to understand a codebase requires a compile_commands.json file. We have machinery to generate one, mach build-backend -b CompileDB.

Unfortunately, clangd doesn't work well with our compile_commands.json file due to unified compilation.

The entries in our compile_commands.json file look something like this:

{
"directory": "<objdir>/gfx/layers",
"command": "clang++ <flags> <objdir>/gfx/layers/Unified_gfx_layers1.cpp",
"file": "<srcdir>/gfx/layers/AnimationHelper.cpp"
},

Note that the value of the "file" key, which is the non-unified source file, does not match the actual input file to the compile command, which is the unified source file.

This confuses clangd, which expects the two to be the same. (Specifically, clangd will interpret the contents of the non-unified file as if it were located in the directory where the unified file is, which breaks resolution of relative include paths.)

We should fix our CompileDB backend to generate a compile_commands.json that clangd can consume.

Either of the following would work:

  • Disable unified compilation for the purposes of generating the CompileDB
  • Emit the unified source file under the "file" key as well, thereby having it match the actual source file used in the command.

I'd argue it's clangd that needs fixing.

(In reply to Mike Hommey [:glandium] from comment #1)

I'd argue it's clangd that needs fixing.

I would find it hard to defend our current format. The spec for compilation databases says:

The contracts for each field in the command object are:

  • directory: The working directory of the compilation. All paths specified in the command or file fields must be either absolute or relative to this directory.
  • file: The main translation unit source processed by this compilation step. This is used by tools as the key into the compilation database. There can be multiple command objects for the same file, for example if the same source file is compiled with different configurations.
  • command: The compile command executed. After JSON unescaping, this must be a valid command to rerun the exact compilation step for the translation unit in the environment the build system uses. Parameters use shell quoting and shell escaping of quotes, with ‘"’ and ‘\’ being the only special characters. Shell expansion is not supported.

Note in particular that file is "The main translation unit source processed by this compilation step." If we're doing a unified build, the non-unified file is not the "main translation unit source", it's just an included file.

(In reply to Botond Ballo [:botond] from comment #2)

(In reply to Mike Hommey [:glandium] from comment #1)

I'd argue it's clangd that needs fixing.

I would find it hard to defend our current format.

I tend to agree. Isn't the point of compile_commands.json to be consumed by tooling? We should do whatever the tooling needs (perhaps while pursuing improvements in parallel).

On the subject of the spec, it rather looks like unrecognized fields aren't allowed. If that's not correct, could we include a special, Mozilla-only, "non-unified" field that would capture the current "file" field so we don't lose that information?

Just FYI, since I happen to be looking at compile_commands.json for some sccache stuff right now: my DB has an entry with file: /Users/nalexander/Mozilla/objdirs/objdir-droid-compile/gfx/layers/Unified_cpp_gfx_layers0.cpp as well as ones for subpieces of that unified source. Is that expected?

FWIW, the attached simple change makes clangd work for me locally.

I'm not sure if this change is landable as written.

(In reply to Nick Alexander :nalexander [he/him] from comment #4)

Just FYI, since I happen to be looking at compile_commands.json for some sccache stuff right now: my DB has an entry with file: /Users/nalexander/Mozilla/objdirs/objdir-droid-compile/gfx/layers/Unified_cpp_gfx_layers0.cpp as well as ones for subpieces of that unified source. Is that expected?

I suspect that's just a consequence of how the code to create compile_commands.json evolved. The original patch for the CompileDB backend in bug 904572 presumably just took the files that are actually built (the unified files) and output them to the database file. It wasn't until later that Ehsan changed the code to additionally output the source files that are included into the unified files in bug 1337874.

Ehsan, was there any reason that you didn't remove the unified files from the database, or were you just playing it safe? Do you have any objections to this patch (which additionally changes the "command" value for the entries for the original source files to reference their respective source file rather than their unified file)?

Flags: needinfo?(ehsan)

(In reply to Botond Ballo [:botond] [spotty responses until Feb 19] from comment #6)

I'm not sure if this change is landable as written.

If Ehsan has no objections to this change, you should presumably also remove the unified argument related code added in https://bug1337874.bmoattachments.org/attachment.cgi?id=8835011 since it's no longer needed.

(In reply to Jonathan Watt [:jwatt] from comment #7)

(In reply to Nick Alexander :nalexander [he/him] from comment #4)

Just FYI, since I happen to be looking at compile_commands.json for some sccache stuff right now: my DB has an entry with file: /Users/nalexander/Mozilla/objdirs/objdir-droid-compile/gfx/layers/Unified_cpp_gfx_layers0.cpp as well as ones for subpieces of that unified source. Is that expected?

I suspect that's just a consequence of how the code to create compile_commands.json evolved. The original patch for the CompileDB backend in bug 904572 presumably just took the files that are actually built (the unified files) and output them to the database file. It wasn't until later that Ehsan changed the code to additionally output the source files that are included into the unified files in bug 1337874.

I believe that is accurate. Also please note that as far as I remember at the time this code was written there wasn't a spec for this file format, so what we generated was a best effort...

Ehsan, was there any reason that you didn't remove the unified files from the database, or were you just playing it safe? Do you have any objections to this patch (which additionally changes the "command" value for the entries for the original source files to reference their respective source file rather than their unified file)?

I vaguely remember that there was a usecase which required the in-tree .cpp files to be listed in the file keys in the JSON entries (IIRC it was you-complete-me, but I could be wrong.) I believe with this change we can no longer tell what command to run when a particular .cpp file is changed in tree to redo the build command for that file. But the same thing is true for any .h file. So perhaps that is fine.

I'm happy to leave making the call on the trade-off to the module owners, I don't have any objections on how to fix this one way or another personally, since I don't know what use cases we care about most these days.

Flags: needinfo?(ehsan)

Thanks, Ehsan.

There is considerable value in getting clangd to work with our source code, so If we're not sure whether there is value in keeping the Unified* filenames in the "command" field then I suggest we take Botond's changes. If we find that some tool needs to be able to use compile_commands.json to know how to rebuild then we can revisit this (perhaps adding a flag to mach build-backend to allow users to choose which of the two cases matters to them?).

Botond, do you want to make the changes I suggested in comment 8 and request review for this?

Flags: needinfo?(botond)

I feel the need to add some information here, we also use compilation_commands.json for our static-analysis tooling, like clang-tidy and Coverity.
Due to the fact that we have the unified* files in the command field we have some issues with checkers that rely on filtering based on isInMainFile from SourceManager. isInMainFile always uses as base the unified translation unit and this can lead to undesired behavior like we have with checker modernize-concat-nested-namespaces whereas we cannot used it because it relies on the logic provided by isInMainFile.
So from a static-analysis tooling perspective dropping the *unified* file and replacing it with the actual file from the file field would benefit us.

(In reply to Jonathan Watt [:jwatt] from comment #10)

Botond, do you want to make the changes I suggested in comment 8 and request review for this?

Sure.

Assignee: nobody → botond
Flags: needinfo?(botond)

Originally authored by Botond Ballo <botond@mozilla.com>.

See Also: → 1610204
Attachment #9129030 - Attachment description: Bug 1583999 - Use non-unified sources in CompileDB. r=glandium → Bug 1583999 - Use non-unified sources in CompileDB. r=jwatt

After speaking with :botond I've landed this.

Assignee: botond → bpostelnicu
Pushed by bpostelnicu@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/e343d3722e84
Use non-unified sources in CompileDB. r=jwatt
Pushed by bpostelnicu@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/99fed1b628cb
Use non-unified sources in CompileDB. r=jwatt
Backout by cbrindusan@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/30a534a6451c
Backed out changeset e343d3722e84 as requested by Andi on slack for landing with the wrong author name. CLOSED TREE
Status: NEW → RESOLVED
Closed: 7 months ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla75

Backed out as requested by Andi for braking static-analysis

Backout link: https://hg.mozilla.org/mozilla-central/rev/cd3fa71346c78a3ffafa0b25f3a6cfcf28914dfc

Status: RESOLVED → REOPENED
Flags: needinfo?(bpostelnicu)
Resolution: FIXED → ---
Target Milestone: mozilla75 → ---

So the issue here is:

  • Non-unified builds have some amount of include-what-you-use bustage due to not being exercised in CI.
  • For some consumers of the CompileDB backend, having some non-unified bustage is fine. For example, with clangd, it only affects files you open in the editor, and if those files have non-unified bustage you can fix it on an as-needed basis (as I have been doing).
  • Other consumers of the CompileDB backend, such as static analysis, however, need every source file to compile successfully (which is why the patch in this bug was backed out).

I think our options here are:

  1. Fix non-unified bustage tree-wide and keep things non-busted by exercising non-unified builds in CI.
  2. Add an option to the CompileDB backend to use either unified or non-unified builds. This way, static analysis builds can continue using unified builds, and clangd users can use non-unified builds.

I'm leaning towards option (2) as it's unclear if we can get the CI resources for (1).

(In reply to Botond Ballo [:botond] from comment #20)

I think our options here are:

  1. Fix non-unified bustage tree-wide and keep things non-busted by exercising non-unified builds in CI.
  2. Add an option to the CompileDB backend to use either unified or non-unified builds. This way, static analysis builds can continue using unified builds, and clangd users can use non-unified builds.

There is also:

  1. Contribute unified build support to clangd and keep the CompileDB backend unified-only.

(In reply to Botond Ballo [:botond] from comment #21)

(In reply to Botond Ballo [:botond] from comment #20)

I think our options here are:

  1. Fix non-unified bustage tree-wide and keep things non-busted by exercising non-unified builds in CI.
  2. Add an option to the CompileDB backend to use either unified or non-unified builds. This way, static analysis builds can continue using unified builds, and clangd users can use non-unified builds.

There is also:

  1. Contribute unified build support to clangd and keep the CompileDB backend unified-only.

Currently I'm working on this since this adds extra load on our static-analysis tasks. If we are to only add support for unified builds in clangd we will still have the problem where for static-analysis tasks we analyze many more files than we should have thus consuming cpu time in GCP.
So the idea here is:

  • fix all of the issues, this is time consuming
  • add a task on try that builds the non-unified tree. The purpose of this tasks is only to keep backing out changes that break the non-unified format. If we think that building is time consuming, and we want to strip this even further, we can rely on the compile error reporting provided by coverity during static-analysis, and thus reducing a lot the overhead.
Flags: needinfo?(bpostelnicu)

(In reply to Andi-Bogdan Postelnicu [:andi] from comment #22)

  • fix all of the issues, this is time consuming

I wonder if it's possible to use include-what-you-use to automate some of this work.

(In reply to Botond Ballo [:botond] from comment #23)

(In reply to Andi-Bogdan Postelnicu [:andi] from comment #22)

  • fix all of the issues, this is time consuming

I wonder if it's possible to use include-what-you-use to automate some of this work.

the problem is a bit more complicate than this, sorting includes is the easy problem, another aspect is related with using namespace XXYYZZ where in the the translation unit we don't have it bu in the larger unified translation unit it already exists used prior to out included unit so this resolves nicely.

Depends on: 1626190
Depends on: non-unified
Depends on: 1626640
No longer depends on: 1626640

This probably does not tell you more than you already know, but the following changes make several code completion engines work fairly well with the compilation database. Removing the first argument from command list, in case the argument is the path to ccache was not necessary for clangd or ccls, but some other engines, like CLion's own solution, failed, because it expected the first argument to always be the path to compiler. I hope, by the time this bug is closed, we will have found a solution for this bit as well, because a similar situation could happen with sccache.

diff --git a/python/mozbuild/mozbuild/compilation/database.py b/python/mozbuild/mozbuild/compilation/database.py
index 2460f1e4b20e..c73e4b203595 100644
--- a/python/mozbuild/mozbuild/compilation/database.py
+++ b/python/mozbuild/mozbuild/compilation/database.py
@@ -87,10 +87,9 @@ class CompileDBBackend(CommonBackend):
         for (directory, filename, unified), cmd in self._db.items():
             env = self._envs[directory]
             cmd = list(cmd)
-            if unified is None:
-                cmd.append(filename)
-            else:
-                cmd.append(unified)
+            if cmd[0].endswith("ccache"):
+                cmd = cmd[1:]
+            cmd.append(filename)
             variables = {
                 'DIST': mozpath.join(env.topobjdir, 'dist'),
                 'DEPTH': env.topobjdir,
@@ -151,8 +150,6 @@ class CompileDBBackend(CommonBackend):
         # For unified sources, only include the unified source file.
         # Note that unified sources are never used for host sources.
         for f in obj.unified_source_mapping:
-            self._build_db_line(obj.objdir, obj.relsrcdir, obj.config, f[0],
-                                obj.canonical_suffix)
             for entry in f[1]:
                 self._build_db_line(obj.objdir, obj.relsrcdir, obj.config,
                                     entry, obj.canonical_suffix, unified=f[0])

(In reply to Subhamoy [:ssengupta][he/him] from comment #25)

Removing the first argument from command list, in case the argument is the path to ccache was not necessary for clangd or ccls, but some other engines, like CLion's own solution, failed, because it expected the first argument to always be the path to compiler. I hope, by the time this bug is closed, we will have found a solution for this bit as well, because a similar situation could happen with sccache.

If you're interested in submitting a patch for this, I think such a patch would be accepted -- it's good to have things like CLion work out of the box.

(Posting the patch in a dependent bug would likely be easiest.)

I think we can close this since we are building a separate compile_commands.json that is mainly used by clangd.
The added command is:

./mach build-backend -b Clangd
See Also: → 1656740
Status: REOPENED → RESOLVED
Closed: 7 months ago2 months ago
Resolution: --- → WORKSFORME
Attachment #9129030 - Attachment is obsolete: true
You need to log in before you can comment on or make changes to this bug.