Closed Bug 1490144 Opened 6 years ago Closed 7 months ago

Add Java and kotlin analysis using scip-java

Categories

(Webtools :: Searchfox, enhancement)

enhancement
Not set
normal

Tracking

(firefox121 fixed)

RESOLVED FIXED
Tracking Status
firefox121 --- fixed

People

(Reporter: botond, Assigned: nicolas.guichard)

References

Details

Attachments

(4 files)

We have Java code in the tree that's used by Firefox for Android / GeckoView. 

It would be really nice if cross-reference features like "Go to declaration" worked for this Java code in Searchfox.
See bug 1341646 for the equivalent with Rust.
Summary: Support "Go to declaration" in Java code → Add Java analysis
Kotlin would be nice too since an increasing amount of our Android code is in kotlin. I'm gonna lump it in here because most likely the way to implement this is to parse the class files and analyze that, and that should work for both Java and Kotlin.
Summary: Add Java analysis → Add Java and kotlin analysis
Assignee: nobody → m_kato
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED

Sorry. I added keyword syntax highligher. but I don't add source analyzer. So I reopen this. (Kotlin doesn't have AST API yet)

Assignee: m_kato → nobody
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Depends on: 1633260

From a maintenance perspective, we probably should consider using https://github.com/sourcegraph/scip-java since it provides both Java and Kotlin support and all indications are that sourcegraph and the scip-java upstream will continue to maintain it (9 releases in the past year). :m_kato's work on a Java AST-based indexer in https://github.com/mozsearch/mozsearch/pull/314 (Bug 1633260) is nothing short of amazing, but the active searchfox contributor base does not have the expertise/availability to maintain an additional indexer and the reality is that in the event searchfox is retired, sourcegraph would be the likely successor, in which case it is preferable to already be using scip-java.

No longer depends on: 1633260
See Also: → 1633260
Summary: Add Java and kotlin analysis → Add Java and kotlin analysis using scip-java

I have a test branch at https://github.com/nicolas-guichard/mozsearch/tree/scip-java which adds scip-java to the container and uses it to index java & kotlin code.
It works without much trouble on just gradle inited examples. I've yet to see how it does on bigger examples.

In that branch I took the shortcut of removing RLS save-analysis altogether to turn rust-analyze into a generic scip-analyze, but I'll rebase on your version which adds tree-sitter and per-language customization once that's stable.

Awesome! I've been doing some preparatory landings in preparation for getting the scip-indexer changes up for review, so I should be able to put that up for review soon and it sounds like you probably will have the right context already to help review it! That said, since you'll be actively contributing to it after it lands, it may make sense to plan that you'd actually fix any problems, and with review comments being primarily to call out policy discussions we'd want to discuss or note where you see a problem that you'll plan to fix.

For your stack, it's great that you're adding podman and nix support! For nix, I'm not really familiar with nix, although I understand it's popular and something I should be more familiar with! It would be great to split those out into their own pull request(s), and in particular since I'm not familiar with nix, it would be great if you could make sure to[1]:

  • Provide brief context about what the .envrc file is in the commit message. (In particular, since it's a fairly generic dotfile name, it'd be good for the history for the file to help provide that context.)
  • Add some additional context to the docs about what nix is and when people would want to use it. It might make sense to split this out into a separate markdown file under docs/ with a pointer. Specifically, I think a lot of existing Firefox contributors may be in a similar situation as me where we could benefit from some pointers about nix and how using it might be useful with searchfox and whether one can usefully use it on Ubuntu/Fedora/etc. or whether one would really only want to use it if switching to nix OS.

1: I of course understand that patches you have in a WIP branch are usually not in the state they'd be put up for review! Also, in this case I think I/we would benefit from the extra context around nix in particular. For a lot of searchfox changes commit messages don't necessarily be too verbose since a fairly deep level of familiarity can be assumed.

Oh, and to avoid you being blocked, it probably makes sense to experiment with the https://github.com/mozsearch/mozsearch/blob/master/docs/testing-checks.md mechanism and adding test coverage. Note that the checks will generate markdown "explanations" under tests/tests/checks/explanations/ that are currently .gitignored, but they can be fairly useful for checks that involve multiple pipeline stages since they will dump the state between the pipelines.

Also, there's the potential to enhance the explainer pages to be more than the JSON conversion that pipeline_explainer.liquid does right now. query_results.liquid is the root of a somewhat complex liquid templating hierarchy right now, although the types it deals with are different from the explainers since the explanations are consuming tokio tracing output and I haven't made an attempt to make them particularly useful.

:m_kato's work on a Java AST-based indexer in https://github.com/mozsearch/mozsearch/pull/314 (Bug 1633260) is nothing short of amazing, but the active searchfox contributor base does not have the expertise/availability to maintain an additional indexer and the reality is that in the event searchfox is retired, sourcegraph would be the likely successor, in which case it is preferable to already be using scip-java.

That is used by GitHub's AST parser, but it is too slow for large projects as long as I test, so I don't think that it is better for us. Although I look for alternative "fast" AST tools or plugin for Java compiler, I cannot find it.

(In reply to Makoto Kato [:m_kato] from comment #9)

That is used by GitHub's AST parser, but it is too slow for large projects as long as I test, so I don't think that it is better for us. Although I look for alternative "fast" AST tools or plugin for Java compiler, I cannot find it.

Can you clarify whether you are suggesting that https://github.com/javaparser/javaparser is unsuitable or that scip-java is unsuitable?

I should note that:

  • Searchfox isn't particularly concerned about the time the analysis jobs take for a given language; we currently allow 5 hours for the code coverage jobs we depend on to run.
    • That said, there are always potentially synergies between Gecko developers' VS Code/IDE use-cases and searchfox use-cases, but it's not a primary decision-making factor. (Noting that the existence of the LSIF LSP and the fact that SCIP can easily be translated to LSIF is part of the reason we're leaning towards using SCIP in cases where it makes sense and provides sufficient fidelity.)
  • My understanding from the mobile team at this time is that in general the codebase is largely Kotlin and most/all new code is expected be Kotlin. (And https://github.com/javaparser/javaparser doesn't seem to mention Kotlin at all?)
Flags: needinfo?(m_kato)

(In reply to Nicolas Guichard from comment #6)

In that branch I took the shortcut of removing RLS save-analysis altogether to turn rust-analyze into a generic scip-analyze, but I'll rebase on your version which adds tree-sitter and per-language customization once that's stable.

PRs are up at https://github.com/mozsearch/mozsearch/pull/663 and https://github.com/mozsearch/mozsearch-mozilla/pull/216 if you'd like to take a look at these along with Emilio.

After playing with this on the mozilla-central repo:

scip-java index automatic mode does not work on Gradle/Android projects so doesn't index most of mozilla-central Java/Kotlin code (it only indexes the mobile/android/annotations subproject, which does not load any Android Gradle plugin). This is https://github.com/sourcegraph/scip-java/issues/177.
Instead we have to use semanticdb-javac and semanticdb-kotlinc directly, which output one .semanticdb file per .java/.kt file. Then scip-java index-semanticdb can aggregate all those into one .scip.

This raises the question of where to run the index-semanticdb step, we can either:

  1. run the index-semanticdb step on the mozilla-central builder, and copy only a single .scip artifact to the searchfox builder; or
  2. copy all the semanticdb files over to the searchfox builder and run index-semanticdb there.

It doesn't make a big difference size-wise: the (uncompressed) semanticdb files weight 40 MiB, the aggregated index.scip weights 32 MiB.
It doesn't make a big difference time-wise either:

  • I didn't measure the semanticdb plugins overhead, but with both plugins enabled mach clobber && mach build takes < 4m
  • scip-java index-semanticdb takes < 4s (using 660% CPU according to time, it seems to be well multi-threaded!)

My heart leans toward option 2 to have more scip-related tasks on the Searchfox side. As a bonus we don't have to install scip-java into the mozilla-central builder at all and Gradle takes care of fetching semanticdb-javac and semanticdb-kotlinc for us.
What do you think?

(In reply to Nicolas Guichard from comment #12)

My heart leans toward option 2 to have more scip-related tasks on the Searchfox side. As a bonus we don't have to install scip-java into the mozilla-central builder at all and Gradle takes care of fetching semanticdb-javac and semanticdb-kotlinc for us.
What do you think?

Thank you for the excellent context, especially quantifying the file sizes and runtime. I agree that option 2 seems optimal at this time.

In addition, it seems like having the .semanticdb files available would be desirable for debugging or allowing other tooling to use it like a metals remote language server. While it's my hope that SCIP provides or can be enhanced to provide the full feature set we desire and exposes the full fidelity of semanticdb, if it ends up that we can provide a better experience by processing the semanticdb files as well or instead of SCIP, it's nice to have that flexibility.

(In reply to Andrew Sutherland [:asuth] (he/him) from comment #10)

(In reply to Makoto Kato [:m_kato] from comment #9)

That is used by GitHub's AST parser, but it is too slow for large projects as long as I test, so I don't think that it is better for us. Although I look for alternative "fast" AST tools or plugin for Java compiler, I cannot find it.

Can you clarify whether you are suggesting that https://github.com/javaparser/javaparser is unsuitable or that scip-java is unsuitable?

My javaparser that uses Github's JavaParser. I don't test scip-java.

  • My understanding from the mobile team at this time is that in general the codebase is largely Kotlin and most/all new code is expected be Kotlin. (And https://github.com/javaparser/javaparser doesn't seem to mention Kotlin at all?)

That doesn't support Kotlin.

Flags: needinfo?(m_kato)

When the Mozsearch plugin is enabled and when we compile the android
target, this uses the semanticdb-javac and semanticdb-kotlinc compiler
plugins to generate semanticdb files during the compilation process.

In order to index all files, all files need to be compiled at least
once, so this adds a mach android compile-all command to ensure that,
otherwise some examples and AndroidTests were not necessarily compiled.
Note that the AndroidTests do not have a release configuration so will
not be indexed when we build in release mode. The existing searchfox
mozconfigs are all set to debug so this should not be an issue.

To build the android-gradle-dependencies toolchain, all dependencies
must be accessible from the root build.gradle, so this also adds a flag
--download-all-gradle-dependencies to ignore conditional dependency
uses in gradle code.

Assignee: nobody → nicolas.guichard
Blocks: 1865048
Pushed by bugmail@asutherland.org:
https://hg.mozilla.org/integration/autoland/rev/9f244834bdc9
Add Java and Kotlin code indexing using semanticdb compiler plugins r=nalexander,asuth,emilio,geckoview-reviewers,owlish
Status: REOPENED → RESOLVED
Closed: 5 years ago7 months ago
Resolution: --- → FIXED

With the landing, I'm correctly seeing the most recent android searchfox job produce a target.mozsearch-java-index.zip with the expected contents of a bunch of .semanticdb files rooted under META-INF/semanticdb. Woo!

Duplicate of this bug: 1633260

This is the update from https://github.com/mozsearch/mozsearch/pull/672.
Mozsearch doesn't expect the pretty field on super/override
relationships anymore. It doesn't hurt being there, but let's clean it
up anyway.

Pushed by bugmail@asutherland.org:
https://hg.mozilla.org/integration/autoland/rev/5fd82c1c094a
Mozsearch indexer: remove pretty field from super/override relationships. r=asuth
See Also: 1633260
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: