Open Bug 1727789 Opened 3 years ago Updated 7 months ago

Prototype semantic linkage mechanism

Categories

(Webtools :: Searchfox, enhancement)

enhancement

Tracking

(Not tracked)

People

(Reporter: asuth, Unassigned)

References

(Blocks 2 open bugs)

Details

This is a spin-off of bug 1641372 review action item https://github.com/mozsearch/mozsearch/pull/408#discussion_r596683311 that led to my posting of https://bugzilla.mozilla.org/show_bug.cgi?id=1641372#c3 which I reproduce below in its entirety, but may update with time. I'll make another comment right after this one.

Linkage

Problem Statement

An important issue raised by :kats in the github review is that the IPC linkage mechanism implemented in the branch in crossref.rs could potentially be made more generic / higher level, covering cases like the preferences use-case described in bug 1699048 or perhaps the URL mapping discussed in bug 1697671. However, in this comment I'm only going to deal with the IDL cases, though I think the general idea here should generalize to the preferences situation.

When digging into this a bit more, it became clear that when addressing this we could potentially also simplify the implementations of our IDL-related processing. Much of the complexity in the XPIDL analyzer and the IPDL analyzer involves having them determine what C++ analysis header file to load (:billm's explanation) and then to establish a mapping between the expected "pretty" human-readable identifier (ex: "Class::Method") and the underlying mangled symbol (which the analyzers don't understand and depend on there being no overloads to get it right). This is work that could instead be handled by the cross-referencing and/or new linker process.

IDL analyzer simplification

For XPIDL/IPDL/WebIDL, the analyzer/indexer could be simplified such that it emits analysis records that express:

  • The IDL token definitions, including the peek ranges that cover the preceding comments.
  • The expected linkage identifiers for each "slot" that relates to the symbol. This can then be used by the linker to determine the actual (mangled) symbol(s).
    • The linkage identifiers would have a structured scheme like an array of { symbol: "exact_symbol" } / { symbol_prefix: "ZBlahBlah" } / { pretty: "Foo::Bar" } / { raw_prototype: "void blah(int blah, char blah) or something clever" } objects.
    • This would allow the XPIDL compiler to indicate C++ bindings via "pretty" name and rust bindings via symbol (which is actually the same as pretty right now), supporting both in a single pass.
  • IDL Slot schemes could be:
    • XPIDL: getter (for C++/Rust), setter (for C++/Rust), attribute (for JS given our current analysis), method
    • IPDL: send, recv
    • WebIDL: getter, setter, method, enabling_pref, enabling_func.
  • There'd probably also be a binding slot on each interface that allows a direct mapping from the interface to the binding classes.

Slots, Kinds, Symbols and UX

UX Status Quo

Searchfox currently really only has:

  • From context menu:
    • Go to definition (for which we know there's only one definition, otherwise this option is hidden)
    • Search for everything about the unioned set of symbols associated with this token that all shared the same pretty identifier. (There may be multiple searches for cases like constructors where implicitly invoked field constructors end up collapsed onto the same token, but inherently have different pretty identifiers.)
      • This is how the current XPIDL bindings work: We associate the .idl file tokens with the C++ symbol name for the binding we scrape from the binding header file, plus the wildly-unscoped JS property that the method/attribute would be indexed under as implemented or consumed.
  • The search UI:
    • Search for this pretty identifier and then tell me everything about the symbols associated with the pretty identifier.
    • Fulltext search that is largely separate from all this semantic stuff but highly useful and powerful in its own right and serves as a backstop for when the semantic stuff falls down.

In the search results, we facet and prioritize the result by (target) "kind", on trunk these are: def, decl, idl, use, assign.

What Slots Enable

The additional slots potentially enable new direct access context menu options as well as additional faceting. For example, for the above:

  • XPIDL:
    • Menu:
      • Go to {getter/setter/method} implementation. This is distinct from "go to definition" which would be in the IDL file.
    • Search Faceting:
      • Uses of {getter/setter/attribute} as distinct items instead of all combined together. Although it might make sense to mix them together by default but provide an inline control in the sub-heading like Uses ([x] getter [x] setter [x] attribute).
  • IPDL:
    • Menu:
      • Go to recv implementation. (This would directly accessible from the IPDL file and any calls to the Send method.)
    • Diagramming: It really helps the auto-diagramming logic to be able to understand what calls have IPC semantics and the send-recv pairings.
  • WebIDL:
    • Menu:
      • Go to {getter/setter/method} implementation. This is distinct from "go to definition" which would be in the WebIDL file.
      • Go to enabling pref "foo.bar" (The presence of the menu option indicating the use of an enabling pref is probably most useful; maybe this also wants to be an icon?)
      • Go to enabling func.

One Symbol Per Token / Abstract Symbols

A big conceptual change I'd been pursuing in the fancy branch was that rather than having each token being associated with a set of symbols, the token would explicitly be associated with a single symbol and have explicit relationships to the related symbols. That is, right now, there is no symbol that corresponds exactly to the given XPIDL token; it only references the presumed JS binding and the presumed C++ binding symbols. With this change we would create an explicit symbol namespace for XPIDL interfaces and the token in the .idl file would be associated with only that symbol.

That XPIDL-namespaced symbol would then reference the symbols of all the bindings. The C++ binding's analysis file would explicitly only have its own C++ symbol associated with the token. However, the cross-referencing process would establish these symbol relationships via the slots. The C++ binding symbol would have an "idl" slot that references the XPIDL-namespaced symbol, etc.

The search mechanism and context menu popups would all know how to pierce these relationships and potentially pre-compute/pre-aggregate the information as necessary.

The core action items for this bug will be a prototype which first:

  • Changes some subset of the in-tree mozilla-central XPIDL, IPDL, and WebIDL to generate linkage data for searchfox.
    • It won't be necessary to have full implementations for every piece in order to get eyes on this and opinions about whether the changes are resasonable and the benefits are reasonable.
  • Changes searchfox to perform the linkage step for the subset.
  • Demonstrate a few searchfox-tool based "checks" queries that usefully answer some kind of code question by following the linkage and which can intuitively map to some kind of context menu support or maybe a top-level search query.

From there I think we can evaluate how well this works and consider the specific context-menu features to implement, etc. (How the context menus get implemented could change if we move from the array-based jumps/searches ANALYSIS_DATA emission in the HTML versus just a keyed-by-symbol dictionary/map, but the structured analysis changes need to land first before that can happen.)

Blocks: 1732341
Blocks: 1699048
See Also: → 1733217

:m_kato provided some information about JNI bindings in https://github.com/mozsearch/mozsearch/pull/314#issuecomment-1005626355 that looks like an example of more complicated linkages:

  1. RegisterNatives which takes a table like this media video capture logic, but where we seem to have standardized on a custom template glue mechanism:
  2. Here's the C++ Java_org_mozilla_gecko_mozglue_GeckoLoader_loadGeckoLibsNative that seems to correspond to org/mozilla/gecko/mozglue/GeckoLoader.java's loadGeckoLibsNative

I don't think this will be the first thing to hook up, but it's a very interesting real-world scenario! Edit: And likely this is something where some kind of rules engine would be appropriate so that some kind of declarative schema could be processed by the crossref mechanism as it digests symbols and then establishes those links in a second pass once it's seen all the symbols.

See Also: → 1758394
See Also: → 1761627
See Also: → 1790683
See Also: → 1800008

To update this bug a bit:

  • searchfox now has an explicit concept of both "binding slots" and "ontology slots"; this is most realized on by hobby stack branch which will be landed to trunk during this half:
    • ontology-mapping.toml defines metadata that the cross-referencing logic uses as defined/ingested in ontology_mapping.rs.
      • This is distinct from per-file-info.toml which tells searchfox about artifacts it can ingest about files in the tree. Currently, this information is things like "this is a test file and we know these facts about this test file" and "coverage for every line in the file". But one could imagine that per-file-info might end up being told about files that contain per-symbol information, blurring the line between these different toml config files.
    • analysis.rs is home to the relevant types and useful comments; search on "Binding" and "Ontology".
  • ipdl-analyze.rs understands how to generate "bindingSlots" JSON output. It's still doing its own symbol resolution.
  • idl-analyze.py for XPIDL likewise generates "bindingSlots" data and continues to do its own symbol resolution.
  • Use of tree-sitter on my hobby stack branch has gone extraordinarily well and the tree-sitter query language has proven quite powerful. Given that Firefox developers usually will also benefit from having at least syntax highlighting for custom languages, it could make sense to use tree-sitter, its query language, and some custom glue logic for some languages rather than trying to insert the analysis generation into the in-tree code generators. And for cases where the code generators are not in-tree, this might make even more sense.
You need to log in before you can comment on or make changes to this bug.