Add Java and kotlin analysis using scip-java
Categories
(Webtools :: Searchfox, enhancement)
Tracking
(firefox121 fixed)
Tracking | Status | |
---|---|---|
firefox121 | --- | fixed |
People
(Reporter: botond, Assigned: nicolas.guichard)
References
Details
Attachments
(4 files)
Comment 1•6 years ago
|
||
Comment 2•6 years ago
|
||
Comment 3•5 years ago
|
||
Ah, I already fixed https://github.com/mozsearch/mozsearch/commit/79b30baf88e955918a6701355900f4dee18a2c12
Comment 4•5 years ago
|
||
Sorry. I added keyword syntax highligher. but I don't add source analyzer. So I reopen this. (Kotlin doesn't have AST API yet)
Comment 5•2 years ago
|
||
From a maintenance perspective, we probably should consider using https://github.com/sourcegraph/scip-java since it provides both Java and Kotlin support and all indications are that sourcegraph and the scip-java upstream will continue to maintain it (9 releases in the past year). :m_kato's work on a Java AST-based indexer in https://github.com/mozsearch/mozsearch/pull/314 (Bug 1633260) is nothing short of amazing, but the active searchfox contributor base does not have the expertise/availability to maintain an additional indexer and the reality is that in the event searchfox is retired, sourcegraph would be the likely successor, in which case it is preferable to already be using scip-java.
Assignee | ||
Comment 6•1 year ago
|
||
I have a test branch at https://github.com/nicolas-guichard/mozsearch/tree/scip-java which adds scip-java to the container and uses it to index java & kotlin code.
It works without much trouble on just gradle init
ed examples. I've yet to see how it does on bigger examples.
In that branch I took the shortcut of removing RLS save-analysis altogether to turn rust-analyze into a generic scip-analyze, but I'll rebase on your version which adds tree-sitter and per-language customization once that's stable.
Comment 7•1 year ago
|
||
Awesome! I've been doing some preparatory landings in preparation for getting the scip-indexer changes up for review, so I should be able to put that up for review soon and it sounds like you probably will have the right context already to help review it! That said, since you'll be actively contributing to it after it lands, it may make sense to plan that you'd actually fix any problems, and with review comments being primarily to call out policy discussions we'd want to discuss or note where you see a problem that you'll plan to fix.
For your stack, it's great that you're adding podman and nix support! For nix, I'm not really familiar with nix, although I understand it's popular and something I should be more familiar with! It would be great to split those out into their own pull request(s), and in particular since I'm not familiar with nix, it would be great if you could make sure to[1]:
- Provide brief context about what the .envrc file is in the commit message. (In particular, since it's a fairly generic dotfile name, it'd be good for the history for the file to help provide that context.)
- Add some additional context to the docs about what nix is and when people would want to use it. It might make sense to split this out into a separate markdown file under
docs/
with a pointer. Specifically, I think a lot of existing Firefox contributors may be in a similar situation as me where we could benefit from some pointers about nix and how using it might be useful with searchfox and whether one can usefully use it on Ubuntu/Fedora/etc. or whether one would really only want to use it if switching to nix OS.
1: I of course understand that patches you have in a WIP branch are usually not in the state they'd be put up for review! Also, in this case I think I/we would benefit from the extra context around nix in particular. For a lot of searchfox changes commit messages don't necessarily be too verbose since a fairly deep level of familiarity can be assumed.
Comment 8•1 year ago
|
||
Oh, and to avoid you being blocked, it probably makes sense to experiment with the https://github.com/mozsearch/mozsearch/blob/master/docs/testing-checks.md mechanism and adding test coverage. Note that the checks will generate markdown "explanations" under tests/tests/checks/explanations/
that are currently .gitignored, but they can be fairly useful for checks that involve multiple pipeline stages since they will dump the state between the pipelines.
Also, there's the potential to enhance the explainer pages to be more than the JSON conversion that pipeline_explainer.liquid does right now. query_results.liquid is the root of a somewhat complex liquid templating hierarchy right now, although the types it deals with are different from the explainers since the explanations are consuming tokio tracing output and I haven't made an attempt to make them particularly useful.
Comment 9•1 year ago
|
||
:m_kato's work on a Java AST-based indexer in https://github.com/mozsearch/mozsearch/pull/314 (Bug 1633260) is nothing short of amazing, but the active searchfox contributor base does not have the expertise/availability to maintain an additional indexer and the reality is that in the event searchfox is retired, sourcegraph would be the likely successor, in which case it is preferable to already be using scip-java.
That is used by GitHub's AST parser, but it is too slow for large projects as long as I test, so I don't think that it is better for us. Although I look for alternative "fast" AST tools or plugin for Java compiler, I cannot find it.
Comment 10•1 year ago
•
|
||
(In reply to Makoto Kato [:m_kato] from comment #9)
That is used by GitHub's AST parser, but it is too slow for large projects as long as I test, so I don't think that it is better for us. Although I look for alternative "fast" AST tools or plugin for Java compiler, I cannot find it.
Can you clarify whether you are suggesting that https://github.com/javaparser/javaparser is unsuitable or that scip-java is unsuitable?
I should note that:
- Searchfox isn't particularly concerned about the time the analysis jobs take for a given language; we currently allow 5 hours for the code coverage jobs we depend on to run.
- That said, there are always potentially synergies between Gecko developers' VS Code/IDE use-cases and searchfox use-cases, but it's not a primary decision-making factor. (Noting that the existence of the LSIF LSP and the fact that SCIP can easily be translated to LSIF is part of the reason we're leaning towards using SCIP in cases where it makes sense and provides sufficient fidelity.)
- My understanding from the mobile team at this time is that in general the codebase is largely Kotlin and most/all new code is expected be Kotlin. (And https://github.com/javaparser/javaparser doesn't seem to mention Kotlin at all?)
Comment 11•1 year ago
|
||
(In reply to Nicolas Guichard from comment #6)
In that branch I took the shortcut of removing RLS save-analysis altogether to turn rust-analyze into a generic scip-analyze, but I'll rebase on your version which adds tree-sitter and per-language customization once that's stable.
PRs are up at https://github.com/mozsearch/mozsearch/pull/663 and https://github.com/mozsearch/mozsearch-mozilla/pull/216 if you'd like to take a look at these along with Emilio.
Assignee | ||
Comment 12•1 year ago
|
||
After playing with this on the mozilla-central repo:
scip-java index
automatic mode does not work on Gradle/Android projects so doesn't index most of mozilla-central Java/Kotlin code (it only indexes the mobile/android/annotations
subproject, which does not load any Android Gradle plugin). This is https://github.com/sourcegraph/scip-java/issues/177.
Instead we have to use semanticdb-javac
and semanticdb-kotlinc
directly, which output one .semanticdb
file per .java
/.kt
file. Then scip-java index-semanticdb
can aggregate all those into one .scip
.
This raises the question of where to run the index-semanticdb
step, we can either:
- run the index-semanticdb step on the mozilla-central builder, and copy only a single
.scip
artifact to the searchfox builder; or - copy all the semanticdb files over to the searchfox builder and run index-semanticdb there.
It doesn't make a big difference size-wise: the (uncompressed) semanticdb files weight 40 MiB, the aggregated index.scip weights 32 MiB.
It doesn't make a big difference time-wise either:
- I didn't measure the semanticdb plugins overhead, but with both plugins enabled
mach clobber && mach build
takes < 4m scip-java index-semanticdb
takes < 4s (using 660% CPU according to time, it seems to be well multi-threaded!)
My heart leans toward option 2 to have more scip-related tasks on the Searchfox side. As a bonus we don't have to install scip-java into the mozilla-central builder at all and Gradle takes care of fetching semanticdb-javac and semanticdb-kotlinc for us.
What do you think?
Comment 13•1 year ago
|
||
(In reply to Nicolas Guichard from comment #12)
My heart leans toward option 2 to have more scip-related tasks on the Searchfox side. As a bonus we don't have to install scip-java into the mozilla-central builder at all and Gradle takes care of fetching semanticdb-javac and semanticdb-kotlinc for us.
What do you think?
Thank you for the excellent context, especially quantifying the file sizes and runtime. I agree that option 2 seems optimal at this time.
In addition, it seems like having the .semanticdb
files available would be desirable for debugging or allowing other tooling to use it like a metals remote language server. While it's my hope that SCIP provides or can be enhanced to provide the full feature set we desire and exposes the full fidelity of semanticdb, if it ends up that we can provide a better experience by processing the semanticdb files as well or instead of SCIP, it's nice to have that flexibility.
Comment 14•1 year ago
|
||
(In reply to Andrew Sutherland [:asuth] (he/him) from comment #10)
(In reply to Makoto Kato [:m_kato] from comment #9)
That is used by GitHub's AST parser, but it is too slow for large projects as long as I test, so I don't think that it is better for us. Although I look for alternative "fast" AST tools or plugin for Java compiler, I cannot find it.
Can you clarify whether you are suggesting that https://github.com/javaparser/javaparser is unsuitable or that scip-java is unsuitable?
My javaparser that uses Github's JavaParser. I don't test scip-java.
- My understanding from the mobile team at this time is that in general the codebase is largely Kotlin and most/all new code is expected be Kotlin. (And https://github.com/javaparser/javaparser doesn't seem to mention Kotlin at all?)
That doesn't support Kotlin.
Assignee | ||
Comment 15•1 year ago
|
||
When the Mozsearch plugin is enabled and when we compile the android
target, this uses the semanticdb-javac and semanticdb-kotlinc compiler
plugins to generate semanticdb files during the compilation process.
In order to index all files, all files need to be compiled at least
once, so this adds a mach android compile-all
command to ensure that,
otherwise some examples and AndroidTests were not necessarily compiled.
Note that the AndroidTests do not have a release configuration so will
not be indexed when we build in release mode. The existing searchfox
mozconfigs are all set to debug so this should not be an issue.
To build the android-gradle-dependencies toolchain, all dependencies
must be accessible from the root build.gradle, so this also adds a flag
--download-all-gradle-dependencies to ignore conditional dependency
uses in gradle code.
Updated•1 year ago
|
Assignee | ||
Comment 16•1 year ago
|
||
Assignee | ||
Comment 17•1 year ago
|
||
Comment 18•1 year ago
|
||
Comment 19•1 year ago
|
||
bugherder |
Comment 20•1 year ago
|
||
With the landing, I'm correctly seeing the most recent android searchfox job produce a target.mozsearch-java-index.zip
with the expected contents of a bunch of .semanticdb
files rooted under META-INF/semanticdb
. Woo!
Assignee | ||
Comment 22•1 year ago
|
||
This is the update from https://github.com/mozsearch/mozsearch/pull/672.
Mozsearch doesn't expect the pretty field on super/override
relationships anymore. It doesn't hurt being there, but let's clean it
up anyway.
Comment 23•1 year ago
|
||
Comment 24•1 year ago
|
||
bugherder |
Description
•