Open Bug 1781179 Opened 5 months ago Updated 5 months ago

Improve Searchfox's C++ syntax/semantic highlighting


(Webtools :: Searchfox, enhancement)



(Not tracked)


(Reporter: botond, Unassigned)


Searchfox today highlights C++ code by classifying tokens into a relatively small number of categories (e.g. syn_type, syn_def, syn_string, syn_reserved, syn_comment, probably a few others I'm missing).

I would like to propose expanding this to a richer set of semantic token kinds (and possibly token modifiers), to make C++ code easier to read and understand visually.

As a point of comparison, and a potential target to aim for, here are clangd's token kinds and token modifiers, which are themselves based on (and lightly extend) the ones specified in the Language Server Protocol.

As a major fan of syntax highlighting, I am very on board with this! We should probably also file a bug on the plans on how to actually leverage this styling. During the dark theme work an initial idea was that we could just borrow devtools' themes as a basis, but if this is moving us into the territory of adopting VS code themes or needing to come up with our own more extensive themes for the extra token types, we should probably start thinking about that. I am planning to add a static settings page soon, so we can probably risk growing into a feature matrix that's the Cartesian product of ["light", "dark"], ["Just what I'm used to, thanks!", "All the colors!"] but probably don't want to go much beyond that.

An interesting design question here is how much should searchfox be emitting into the analysis "source" records in the "syntax" field and how much is something we should just be deriving from the "structured" records that provide canonical information about symbols and would have the benefit of running any global analyses like propagation of MOZ_CAN_RUN_SCRIPT. We currently have not adapted the style choosing logic to leverage this for C++, but we could. (Also, that logic would need to be updated for any changes we make here anyways.)

The structured record formal hierarchy section tries to capture all that can be emitted, but the implementations are straightforward for emitStructuredInfo for RecordDecl, FunctionDecl, and FieldDecl and the their shared common call-site.

One general argument in favor of relying on the "structured" records is analysis file size. Right now nsGlobalWindowInner.cpp is 7,989 lines and has a 4.6M analysis file against a source file size of 264K. While the size isn't a huge issue, it's potentially important for a possible magic feature where we map from fulltext regexp searches back to the underlying semantic tokens and where less analysis to process is arguably better until we change how we store the compressed analysis files to allow line-centric random access.

You need to log in before you can comment on or make changes to this bug.