Open Bug 1740290 Opened 3 years ago Updated 11 months ago

Consider using SCIP for JS indexing and to benefit from typescript / tsserver inference process

Categories

(Webtools :: Searchfox, enhancement)

enhancement

Tracking

(Not tracked)

People

(Reporter: asuth, Unassigned)

References

Details

Converting a Jun 14, 2021 brief investigation into a bug:

I think it likely makes sense to try and move our JS indexing to using the typescript backend used by VS Code, and there's an interesting effort to create a "language server index format" (https://lsif.dev/) which could potentially be used as an easy way to help searchfox ingest languages it doesn't currently do a great job on, with https://github.com/sourcegraph/lsif-node being the JS/TS one.

(Presumably searchfox could do a better job in various places by having a better awareness of path mapping after we do the URL resolution stuff Nika had proposed, etc.)

It sounds like VS Code directly talks to tsserver https://github.com/Microsoft/TypeScript/wiki/Standalone-Server-%28tsserver%29 but there are language server wrappers like https://github.com/theia-ide/typescript-language-server and the lsif-node thing

https://microsoft.github.io/language-server-protocol/specifications/lsif/0.5.0/specification/ has some good examples of what the LSIF data output can look like, which seems like something that could be nicely re-processed as part of a searchfox pipeline.

...

It looks like a transformer (ex: https://github.com/longlho/ts-transform-system-import) (requires custom compiler wrapper per https://github.com/microsoft/TypeScript/issues/14419) or just preprocessing things ourselves could let us rewrite things such that:

  • const { foo, bar } = ChromeUtils.import("resource:///modules/Blah.jsm"); becomes import { foo, bar } from "RESOURCE/modules/Blah.jsm"
  • We establish a tsconfig/jsconfig paths mapping so that those URLs all properly map to the source paths.
  • We re-write const EXPORTED_SYMBOLS = ["foo", "bar"]; to export { foo, bar }.
See Also: → 1761287
See Also: → 1761627
See Also: → 1536835
Depends on: 1775130

We would now use https://github.com/sourcegraph/scip-typescript instead of lsif-node since it provides more information and because of the synergy from :emilio implementing SCIP support in bug 1761287.

As part of this change we would likely:

  • Change from our symbol soup model to something akin to my proposal in bug 1499066.
    • Right now, any definition of a function foo will be mapped to #foo. A class method SomeClass.foo will also be mapped as #foo and #SomeClass.foo. However, thanks to imports/exports and the analysis typescript will already be doing, it's possible for us to do significantly better since we can know the actual imported file.
    • In bug 1775130 I also propose that we can use the work already done with eslint to better support JS "script" files where the work done to understand what globals are available can help us do similar resolution. The one complication is that "script" JS files are more like mix-ins which means that a given token may actually be referring to multiple distinct symbol definitions because of the different contexts in which the script is evaluated. For example, we have IndexedDB tests that are run under both xpcshell and mochitests, and so any given helper invocation will actually be referencing 2 potential different implementations.
  • Be able to support JSX as typescript seems to optionally support JSX.
Depends on: 1761287
See Also: 17612871499066
Summary: Consider using LSIF (Language Server Index Format) for JS indexing and to benefit from typescript / tsserver inference process → Consider using SCIP for JS indexing and to benefit from typescript / tsserver inference process

I use valid TypeScript JSDoc in my JavaScript code in Gecko, so that I can leverage the in-editor hints. I'm guessing if you are using the typescript language server that these would get picked up by this system. The next step from there would be to type check the results to see if they are accurate.

(In reply to Greg Tatum [:gregtatum] from comment #2)

I use valid TypeScript JSDoc in my JavaScript code in Gecko, so that I can leverage the in-editor hints. I'm guessing if you are using the typescript language server that these would get picked up by this system. The next step from there would be to type check the results to see if they are accurate.

Yeah, these get picked up when they're syntactically correct[1], but at least as exposed in scip-typescript right now, the net result unfortunately isn't particularly useful.

1: One of my experiments was to use the devtools/ subtree, but while there are some places where the syntax is right (ex: @param {Debugger.Object} object), there are also a ton of cases where the syntax is wrong (ex: @param Debugger.Environment aEnvironment).

I came up with a candidate plan about introducing a concept of symbol confidences in my review comment at https://github.com/mozsearch/mozsearch/pull/628#issuecomment-1548575643 that I think could be useful as a mechanism where could incrementally migrate towards using scip-typescript for everything. It would allows us to potentially run both the existing "classic" JS analyzer on everything plus "scip-typescript" on things that it adds something for, and the UI could potentially surface the differences in quality in a prioritized fashion. That said, there would probably also be a lot of upside to moving to finishing up the MVP of the "query" UI and its faceting, as path and/or subsystem faceting would also go a long way to letting people filter out things they're not interested in.

You need to log in before you can comment on or make changes to this bug.