Bug 1756018 Comment 1 Edit History

Note: The actual edited comment in the bug view page will always show the original commenter’s name and original timestamp.

Original comment by

Andrew Sutherland [:asuth] (he/him)

on 2022-02-18 11:30:38 PST

There's a few levels of support possible here:
1. Treat the files as JS and let the tokenizer get confused when it sees the JSX blocks.  This would look like the change to support .mjs at https://github.com/mozsearch/mozsearch/pull/381
2. Teach the tokenizer to deal with the JSX blocks by adding some additional smarts to the [tokenizer](https://github.com/mozsearch/mozsearch/blob/master/tools/src/tokenize.rs) and then update the [language mapping](https://github.com/mozsearch/mozsearch/blob/master/tools/src/languages.rs).  There's a separate tokenizer for tag-like and c-like and there are already cases where transitions are made between the two, although I think usually it's from tag-into-C.
3. Add semantic support.  This would most likely want to happen under the auspices of bug 1740290 where we could leverage existing semantic analysis tools and their AST.  This could potentially also moot the prior two steps if using LSIF could let us emit a new type of "raw syntax" record or similar which could allow us to bypass the tokenizer when we have the more detailed stream available.  However, without the above tokenizer support, this would only allow syntax highlighting for the tip of the tree; older revisions and diffs/etc. would not have any syntax highlighting available because they would entirely lack semantic analysis data.

I'd be very happy to mentor someone through any of this work, as there are no engineering resources available to work on this otherwise.

Revision 1 by

Andrew Sutherland [:asuth] (he/him)

on 2022-02-18 11:32:35 PST

There's a few levels of support possible here:
1. Treat the files as JS and let the tokenizer get confused when it sees the JSX blocks.  This would look like the change to support .mjs at https://github.com/mozsearch/mozsearch/pull/381
2. Teach the tokenizer to deal with the JSX blocks by adding some additional smarts to the [tokenizer](https://github.com/mozsearch/mozsearch/blob/master/tools/src/tokenize.rs) and then update the [language mapping](https://github.com/mozsearch/mozsearch/blob/master/tools/src/languages.rs).  There's a separate tokenizer for tag-like and c-like and there are already cases where transitions are made between the two, although I think usually it's from tag-into-C.
3. Add semantic support.  This would most likely want to happen under the auspices of bug 1740290 where we could leverage existing semantic analysis tools and their AST (that understand the JSX syntax).  (Our existing analysis uses the Reflect API and would fail to parse a file with JSX in it, so a change like https://github.com/mozsearch/mozsearch/pull/382 that we made for mjs would not work.)  This could potentially also moot the prior two steps if using LSIF could let us emit a new type of "raw syntax" record or similar which could allow us to bypass the tokenizer when we have the more detailed stream available.  However, without the above tokenizer support, this would only allow syntax highlighting for the tip of the tree; older revisions and diffs/etc. would not have any syntax highlighting available because they would entirely lack semantic analysis data.

I'd be very happy to mentor someone through any of this work, as there are no engineering resources available to work on this otherwise.