Closed Bug 1775130 Opened 2 years ago Closed 10 months ago

The eslint taskcluster job should emit metadata about the global contexts and imports of JS files to support searchfox JS indexing

Categories

(Developer Infrastructure :: Lint and Formatting, enhancement)

enhancement

Tracking

(Not tracked)

RESOLVED INCOMPLETE

People

(Reporter: asuth, Unassigned)

References

(Blocks 1 open bug)

Details

In bug 1740290 I'm hoping to massively improve Searchfox's JS indexing support (which is currently very soupy[1]) by changing the indexer to use https://github.com/sourcegraph/scip-typescript which is based on the https://github.com/Microsoft/TypeScript/wiki/Standalone-Server-%28tsserver%29 used by VS Code for JS and TS language server purposes.

For JS/TS files that exist under a jsconfig.json or tsconfig.json tree, indexing is straightforward. But for the rest of the JS codebase, the hardest part of doing this is knowing what type of global a JS file is executing in and any symbols that have already been defined in that global.

Very happily, the amazing work to get eslint working in the tree has already accomplished most of what needs to be done, with eslint-plugin-mozilla providing the heuristics combined with manual annotations so that it seems like globals.js and friends could emit metadata about the files they see. They could emit this as a build artifact that searchfox's follow-on JS indexing[2] can ingest and use to do a bunch of hacks like inject import statements at the top of the JS files that will then have their line numbers be subtracted off or use a (temporarily?) forked version of the scip-typescript indexer that knows how to use the information. This might involve some typescript definition files to help the typescript analysis flow type information as it indexes.

If there's a way that we could help scip-typescript and tsserver understand the global information without extra custom logic/JSON files, that's even better of course[3]. I haven't seen indications of support for this in the jsconfig.json docs or tsconfig.json docs and ref but I'm still learning more about this space.

1: Soupy in the sense that because searchfox has not known about imports before, it effectively assumes that all JS in the tree runs in one big global, including properties/attributes.

2: Right now the searchfox indexing AWS jobs handle things like analyzing JS, but the ideal situation would be that all language analysis happens in mozilla-central taskcluster jobs (like happens for C++ and we're hoping to have happen for XPIDL and IPDL soon). So generating the metadata about JS files would be an improvement. And in general it would be preferable for there to be (potentially new) taskcluster tasks that generate the SCIP analysis data (which can be converted to LSIF). I do expect the SCIP generation phase to involve a bunch of iteration that would be decoupled from mozilla-central at least initially, but once it stabilizes, would be worth considering moving in-tree.

3: While it's great for searchfox to have super-powers and for people to want to use it, I think it's most ideal to support VS Code and other IDEs via the Language Server Protocol, as well as pernosco via LSIF. So pipelines should ideally involve as few searchfox-specific formats/behaviors as possible.

Product: Firefox Build System → Developer Infrastructure
See Also: → 1789329
See Also: → 1525189

I think the work relating to enabling typescript annotations might be more immediately useful for this effort, but pinging Mark for his thoughts here either way.

Flags: needinfo?(standard8)

If I'm understanding correctly, this is basically looking for a list of files with the relevant environments/globals that are available to those files? Would we need to know where the globals came from?

I think it also depends on what information we need for that - is it just what the globals are available to the file, or can it include the globals from the file as well? What about imports and resource:// and chrome:// uris? (xref bug 1548309).

One thing to be aware of is that in a lot of situations our ESLint config is doing a "best guess". We generally know what .sys.mjs files are (though it looks like we get it wrong for child actors), as well as .worker.(m)js. We're pretty sure we know what a lot of .mjs files are.

.js files are a big guess, based on their file location, as you've seen, a significant proportion of the tests are wrong, though we could probably fix that with an extension of bug 1447034. We assume a lot of things are in the browser window. We guess about their globals/imports based on what we're told (e.g. import-globals-from).

It would probably be a lot easier for ESLint & others if everything was a module (or at least all our content code was), then we'd explicitly know which files are used. Unfortunately the performance implications of that are unknown, and it'd probably take a massive refactor to sort out the circular dependencies etc.

Sorry, that's probably a bit of a rant, but I wanted to be clear that it isn't easy to get right, and we'll probably have some wrong data in there.

Flags: needinfo?(standard8) → needinfo?(bugmail)

(In reply to Mark Banner (:standard8) from comment #2)

If I'm understanding correctly, this is basically looking for a list of files with the relevant environments/globals that are available to those files? Would we need to know where the globals came from?

I had a longer reply, but I think the tl;dr is probably:

  • I definitely wouldn't want searchfox to reinvent the wheel for the m-c domain knowledge that already lives in eslint-plugin-mozilla, and that was most of my rationale in filing this bug.
  • Much of my interest here was in cases like the random test JS files that are potentially loaded in many different permutations. The IndexedDB test files that are loaded as xpcshell test payloads, mochitest payloads, plus weird hacky proxied-worker mochitest payloads are a worst-case example of this, but there are plenty of other test files loaded in non-obvious ways.
  • It sounds like with the changes related to the removals of environments a lot of my rationale here is perhaps mooted/obsoleted.

I think I probably need to figure out where we are in terms of the following things before establishing next-steps:

  • scip-typescript's quality of output after our recent enhancements to searchfox's SCIP ingestion
    • This is somewhat decoupled from :Gijs' point about typescript annotations being the highest yield effort; for searchfox's purposes if we can't get useful data out of what typescript knows via scip-typescript then there's still blocking work in the searchfox space to do there to get that data.
  • possibilities of our code coverage runs; in particular, if we enable bug 1508237 to get us coverage data on a per-test basis, we could also use that to explicitly know the dynamic load behaviors that happen. Or more specifically, if we're capturing the coverage data of tests, that largely tells us what contexts they were loaded into. Bug 1503417 seems to be a key aspect of that. (It's already the case that the coverage data identifies the realms and what scripts were loaded into them, but the coverage data is only dumped after the tests are done, so the test realms have come into existence and been destroyed again before we try and dump them.)

I'm going to resolve this incomplete for now since this bug isn't actionable yet and it probably won't be actionable in its current form.

Status: NEW → RESOLVED
Closed: 10 months ago
Flags: needinfo?(bugmail)
Resolution: --- → INCOMPLETE
You need to log in before you can comment on or make changes to this bug.