Closed Bug 1413738 Opened 7 years ago Closed 7 years ago

Land Searchfox C++ indexer in mozilla-central

Categories

(Webtools :: Searchfox, defect)

defect
Not set
normal

Tracking

(firefox59 fixed)

RESOLVED FIXED
Tracking Status
firefox59 --- fixed

People

(Reporter: billm, Assigned: billm)

References

Details

Attachments

(2 files, 1 obsolete file)

There are big advantages to running the C++ indexing aspect of Searchfox in TaskCluster.

1. It will be pretty easy to do the indexing on all platforms that support the Clang plugin, which is much better than the Linux-only approach that Searchfox and DXR currently use.

2. It will be a lot easier for people to develop on Searchfox since it will be in-tree.

3. Indexing is much less likely to fail when build requirements change (e.g., needing a new version of Rust or the compiler) since this sort of maintenance already gets done for TaskCluster builds.

This bug will just take care of the first step: landing the Searchfox indexer code in mozilla-central. I have patches to add TaskCluster builds, but I think it makes sense to do that in a separate bug.
Attached patch patch (obsolete) — Splinter Review
I'm just looking for some initial feedback on this. It piggybacks on top of the static analysis plugin for simplicity. I'm not sure if that's a good idea long-term.

I don't have a good sense of how easy this code is to read. It definitely needs more comments for the way it deals with templates, but I'd like to get some early feedback on the rest of it.

This also includes some integration into the build process so that a .zip file with all the analysis information gets generated and uploaded as a build artifact.

To test this, you can compile with:
--enable-clang-plugin
--enable-mozsearch-plugin

All the generated analysis data will end up in $OBJDIR/mozsearch_index.

(In theory, Searchfox is just a domain name and mozsearch is the name of the code that does the indexing. That might be confusing, though, and I'm happy to change it.)
Attachment #8924360 - Flags: review?(nika)
Also, I should mention that this doesn't work on Windows yet and I don't know why. The plugin compiles, but it segfaults immediately. I haven't been able to try it locally because I suck at Windows.
Comment on attachment 8924360 [details] [diff] [review]
patch

Review of attachment 8924360 [details] [diff] [review]:
-----------------------------------------------------------------

I haven't done a deep review of all 1660 lines of MozsearchIndexer.cpp - I can do that at some point if you want but I don't think that's what you were going for.

In general I think it would be good to keep the indexer code separate from the rest of the plugin - not in terms of which dylib is created, but rather in terms of folders. Perhaps we could have a mozsearch subdirectory under clang-plugin, with a readme explaining what it does and how to turn it on?

I'd also like to not build this component when we're just running the clang plugin, and to clarify that this is not a normal checker and is not built into clang-tidy.

I also have some comments about how components of mozsearch seem like they would be useful in other parts of the clang plugin. It doesn't have to happen now but it might be cool to make them more generally useful in the future :-).

::: build/clang-plugin/MozsearchIndexer.cpp
@@ +72,5 @@
> +{
> +    unsigned long hash = 5381;
> +
> +    for (const char* p = str; *p; p++) {
> +      hash = ((hash << 5) + hash) + *p; /* hash * 33 + c */

nit: Perhaps move comment onto the line before - it looks like you've commented out code here.

@@ +259,5 @@
> +  return mangled;
> +}
> +
> +// Implement some hacks on the mangled name of an identifier to get around the
> +// macro silliness that we use for strings in Gecko.

We don't do this macro silliness anymore! nsCString is now literally just a `using nsCString = nsTString<char>;`. We can probably get rid of this.
http://searchfox.org/mozilla-central/rev/423b2522c48e1d654e30ffc337164d677f934ec3/xpcom/string/nsStringFwd.h#62

@@ +365,5 @@
> +  virtual void Ifndef(SourceLocation loc, const Token &tok, const MacroDefinition& md) override;
> +};
> +
> +// A very basic JSON formatter that records key/value pairs and outputs a JSON
> +// object that contains only non-object data.

I'm inclined to pull this out into a separate file - a JSON formatter feels like a large enough component that it shouldn't be in the same file as the data collection.

@@ +1637,5 @@
> +    EnsurePath(args[1] + "/");
> +    outdir = GetAbsolutePath(args[1]);
> +    outdir += "/";
> +
> +    objdir = GetAbsolutePath(args[2]);

Getting the full path to the objdir and srcdir will be very useful for other analyses in the clang-plugin, so it might be nice at some point to, especially if we keep this inside of the clang plugin, refactor this into a place which is usable for other code.

::: build/clang-plugin/moz.build
@@ +18,5 @@
>      'ExplicitImplicitChecker.cpp',
>      'ExplicitOperatorBoolChecker.cpp',
>      'KungFuDeathGripChecker.cpp',
>      'MozCheckAction.cpp',
> +    'MozsearchIndexer.cpp',

Perhaps add a comment here clarifying that this file is intentionally excluded when building the clang-tidy plugin.

Let's also not build this when the mozsearch-plugin configure flag is not enabled.
Attachment #8924360 - Flags: review?(nika) → feedback+
Also, if you do move this into a subdirectory, it might be easier to split up some of the components into separate files, which could keep the code shorter & more clear :-).
Attached patch patch v2Splinter Review
OK, I fixed up the review comments and added some more comments around the template stuff.

Mike, can you look at the build system parts?
Attachment #8924360 - Attachment is obsolete: true
Attachment #8927092 - Flags: review?(nika)
Attachment #8927092 - Flags: review?(mh+mozilla)
Comment on attachment 8927092 [details] [diff] [review]
patch v2

Review of attachment 8927092 [details] [diff] [review]:
-----------------------------------------------------------------

Style nits: Currently the style of the indexer's source code appears to be a mish-mash of different styles. Some members are gecko-style mFoo named, some are lowercase, etc. 
It would be really nice if the style here could be consistent. The code in clang-plugin is currently all formatted under the LLVM style guidelines, so I would suggest formatting the code that way. You should be able to run clang-format to do that. 

I am not reviewing the algorithms you're using - I presume they work because searchfox works ^.^. I mostly have some style nits.

::: build/clang-plugin/moz.build
@@ +8,5 @@
>  
>  SOURCES += ['!ThirdPartyPaths.cpp']
>  
> +if CONFIG['ENABLE_MOZSEARCH_PLUGIN']:
> +    mozsearch_sources = [

Why can't you just UNIFIED_SOURCES += here instead of doing this weird thing with mozsearch_sources?

::: build/clang-plugin/mozsearch-plugin/JSONFormatter.cpp
@@ +22,5 @@
> +JSONFormatter::Escape(const std::string& input)
> +{
> +  bool needsEscape = false;
> +  for (char c : input) {
> +    if (c == '\\' || c == '"') {

Doesn't JSON require more characters than this to be escaped?

@@ +35,5 @@
> +
> +  std::string cur = input;
> +  cur = ReplaceAll(cur, "\\", "\\\\");
> +  cur = ReplaceAll(cur, "\"", "\\\"");
> +  return new std::string(cur);

Why are you allocating this on the heap? It seems like it would've been better to just actually directly move the string in. If you want it on the heap, can you switch this to return a std::unique_ptr<std::string> so that you don't have to manually drop it.

You might also want to std::move `cur` into the string constructor, as the move isn't implicit here.

::: build/clang-plugin/mozsearch-plugin/JSONFormatter.h
@@ +18,5 @@
> +  struct Property {
> +    const char* mName;
> +    const char* mLiteralValue;
> +    const std::string* mStringValue;
> +    const std::string* mEscapedStringValue;

Again, using std::unique_ptr<std::string> here would clarify things.

@@ +41,5 @@
> +
> +  ~JSONFormatter();
> +
> +  void Add(const char* name, const char* value);
> +  void Add(const char* name, const std::string& value);

The `const std::string&` argument scares me a little bit, as it's possible to pass a temporary here, and std::string has implicit conversions. I imagine you aren't messing it up anywhere here, but I'd feel safer if you either explicitly moved the string into the JSONFormatter, required a pointer, or required a non-const reference.

::: build/clang-plugin/mozsearch-plugin/MozsearchIndexer.cpp
@@ +118,5 @@
> +static bool
> +IsValidToken(std::string input)
> +{
> +  for (char c : input) {
> +    if (isspace(c) || c == '"' || c == '\\') {

There are more characters than this which are invalid C++ tokens? Could you add a comment explaining why this doesn't have to be a complete list?

@@ +222,5 @@
> +      std::string absolute = GetAbsolutePath(filename);
> +      if (absolute.empty()) {
> +        absolute = filename;
> +      }
> +      FileInfo *info = new FileInfo(absolute);

Can you use unique_ptr<FileInfo> and make_unique here? As far as I can tell you're never freeing these right now.

@@ +392,5 @@
> +        return std::string("E_<") + GetMangledName(ctx, named) + ">_" + d2->getNameAsString();
> +      }
> +    }
> +
> +    assert(false);

I don't think we ever build the clang plugin with assertions enabled - so this probably will never get hit.

::: build/clang-plugin/mozsearch-plugin/README
@@ +7,5 @@
> +This plugin is enabled with the --enable-clang-plugin and
> +--enable-mozsearch-plugin mozconfig options. The output of the plugin
> +is stored in $OBJDIR/mozsearch_index.
> +
> +This code is not a checker, like other parts of the Mozilla clang

s/like/unlike
Attachment #8927092 - Flags: review?(nika)
Attached patch patch v3Splinter Review
Attachment #8927471 - Flags: review?(nika)
Attachment #8927471 - Flags: review?(mh+mozilla)
> Why can't you just UNIFIED_SOURCES += here instead of doing this weird thing with mozsearch_sources?

The sources have to be listed in alphabetical order or else the build system complains. This is what I came up with. I can also have three |UNIFIED_SOURCES +=| sections, but I thought this would be cleaner.
(In reply to Bill McCloskey (:billm) from comment #8)
> > Why can't you just UNIFIED_SOURCES += here instead of doing this weird thing with mozsearch_sources?
> 
> The sources have to be listed in alphabetical order or else the build system
> complains. This is what I came up with. I can also have three
> |UNIFIED_SOURCES +=| sections, but I thought this would be cleaner.

I'm pretty sure that you can add sources in multiple UNIFIED_SOURCES blocks. Each time you write += you need to have the list on the RHS sorted, but you can do multiple appends which don't need to be sorted relative to each other.
Comment on attachment 8927471 [details] [diff] [review]
patch v3

Review of attachment 8927471 [details] [diff] [review]:
-----------------------------------------------------------------

r+ other than the build system stuff - I'm pretty sure you can use multiple UNIFIED_SOURCES += calls and avoid the grossness.
Attachment #8927471 - Flags: review?(nika) → review+
(In reply to Nika Layzell [:mystor] from comment #10)
> r+ other than the build system stuff - I'm pretty sure you can use multiple
> UNIFIED_SOURCES += calls and avoid the grossness.

You can indeed do this.
Comment on attachment 8927471 [details] [diff] [review]
patch v3

Review of attachment 8927471 [details] [diff] [review]:
-----------------------------------------------------------------

::: build/autoconf/clang-plugin.m4
@@ +162,5 @@
> +    fi
> +
> +    dnl We use this construct rather than $_objdir to avoid getting /js/src in the
> +    dnl path when compiling JS code.
> +    OBJDIR="$(dirname $(dirname $(dirname $CLANG_PLUGIN)))"

Please call that something else than OBJDIR.

::: build/clang-plugin/moz.build
@@ +8,5 @@
>  
>  SOURCES += ['!ThirdPartyPaths.cpp']
>  
> +if CONFIG['ENABLE_MOZSEARCH_PLUGIN']:
> +    mozsearch_sources = [

You can just do UNIFIED_SOURCES += [] here, as mentioned in previous comments.
Attachment #8927471 - Flags: review?(mh+mozilla) → review+
https://hg.mozilla.org/mozilla-central/rev/abe5e28c4990
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: