Support using tree-sitter as a tokenizer and source of nesting data
Categories
(Webtools :: Searchfox, enhancement)
Tracking
(Not tracked)
People
(Reporter: asuth, Unassigned)
References
Details
Currently we have custom tokenizers in tree for CSS, c-like languages, and tag-like languages. (Rust is ironically a C-like language! :) tree-sitter is a parser that is already used by https://github.com/mozilla/rust-code-analysis/ to power various mozilla machinery like https://github.com/mozilla/bugbug that tries to figure out what functions were changed by patches, etc. There's even a custom tree-sitter grammar dialect for mozilla-specific macrology/similar for C++. There's a tree-sitter playground that can be used to show the resulting AST/parse-tree for a variety of languages.
Switching to tree-sitter could let us have a potentially more resilient tokenizer with more language support than we can maintain ourselves, while also potentially allowing us to:
- Derive our position:sticky nesting from tree-sitter rather than depending on the semantic analysis to provide it. This would allow position: sticky nesting across more languages and on non-trunk revisions of files. The original choice to do position: sticky via the analysis pass was done for reasons of simplicity and because we already had an AST, but is not strictly superior.
| Reporter | ||
Comment 1•2 years ago
|
||
In terms of what's in-tree now:
- In my enhancements to teach scip-indexer about nesting, it learned how to use tree-sitter to provide the sticky information. However, as noted in comment 0, this was done at the analysis data abstraction.
- The WIP hyperblame support cst_tokenizer.rs operates at a tokenizer level. While it will initially be used to derive the traditional line-centric blame view from the underlying token-centric rep, a follow-up could evolve things so that we entirely use the tree-sitter tokenizer for tokenization purposes (and still join on the semantic data).
| Reporter | ||
Updated•2 years ago
|
| Reporter | ||
Updated•2 years ago
|
Description
•