Closed Bug 1641372 Opened 5 years ago Closed 4 years ago

Land structured analysis record indexing and cross-referencing support from fancy branch

Tracking

(firefox93 fixed)

Status:

RESOLVED FIXED

Tracking Flags:

Tracking

Status

firefox93

---

fixed

People

(Reporter: asuth, Assigned: asuth)

References

(Regressed 1 open bug)

Details

Attachments

(4 files)

Bug 1641372 - Emit structured records. r=kats 4 years ago Andrew Sutherland [:asuth] (he/him) 48 bytes, text/x-phabricator-request		Details \| Review
mozsearch changes 4 years ago Andrew Sutherland [:asuth] (he/him) 47 bytes, text/x-github-pull-request		Details \| Review
additional router s/consumes/callees/ rename fixes from review renames 4 years ago Andrew Sutherland [:asuth] (he/him) 47 bytes, text/x-github-pull-request		Details \| Review
router.py python3 fixes due to drift 4 years ago Andrew Sutherland [:asuth] (he/him) 47 bytes, text/x-github-pull-request		Details \| Review

Andrew Sutherland [:asuth] (he/him)

Assignee

Description

•

5 years ago

The fancy branch's indexer changes have largely stabilized, so it's worth thinking about landing them, especially as the conceptual changes are likely necessary in order to deal with the increased use of templates in the mozilla-central C++ codebase in a non-naive way. (The naive way results in scaling problems that also impact usability.)

The primary analysis-related changes are these:

A new "structured" record type is introduced that provides hierarchical information about symbols. Documentation is currently accessible at https://github.com/asutherland/mozsearch/blob/fancy/docs/analysis.md#structured-records.
- Information is not yet generated for templates, but it's on the short-list. Whether or not to wait for the template enhancements is mainly a question of whether human validation would rather be doing A/B checking for regressions or an overall evaluation of places we are still missing analysis coverage.
visitIdentifier in the C++ indexer no longer operates in terms of multiple symbols for a single token. This exclusively was used for C++ method overrides. (A separate symbol list existed for "contextsym" purposes, but this just gets flattened to a string and is unused. Although I hadn't quite realized this before, so I think I'm likely to stop it from generating multiple symbols as it's almost certainly breaking the "Consumes" derivation.)
"source" records gain type and typesym fields which allow for clicking on a token in the source view to directly let you know the type of something and to go to the definition of the type. This also works for no_crossref symbols like locals, and is especially useful for local variables that are declared as auto and for which searchfox provides no means of determining the type. These are documented at https://github.com/asutherland/mozsearch/blob/fancy/docs/analysis.md#experimental--in-flux
crossref.rs does some extra stuff:
- A "Consumes" relationship is established that's derived from "Uses" and their enclosing "contextsym" which is doing the using. For diagramming of inter-method control flow, we need to know in-edges and out-edges. "Uses" provides for in-edges to a function/method, and "Consumes" provides for the out-edges.
- A linkage phase is added for the processing of "structured" records.
  - "structured" records only point "up" to super-classes and overrides on their own, never down. This phase establishes the list of subclasses for a symbol, as well as who overrides it.
  - IDL relationships are also propagated. For example, the IPDL analysis phase (on the "fancy" branch) has the IPDL method definition explicitly call out its (C++) srcsym ("SendFoo") and (C++) targetsym ("RecvFoo"). During the linkage stage, the C++ Send and Recv methods are then annotated with their reciprocal type (targetsym and srcsym, respectively), as well as their idlsym. This specific mechanism will likely need some additional parameterization by binding language since we also have rust bindings for IPDL.
router.py has a "sorch" endpoint that basically is the fix for bug 1499059, providing results with a per-symbol granularity. This provides the means for fixing bug 1466692. This would want to be folded back into the "search" endpoint, as we can easily enhance the search UI for this.

The multiple symbols change is the big one. The general situation is:

The rust analysis-processing logic continues to support multiple symbols, which is used by merge-analyses.rs to handle cases where a symbol name differs for 2 source records that have the same location and "pretty" (demangled, arguments elided) string.
This means that where a method would previously have generated a single "source" record with multiple symbols and multiple "target" records, it will now generate only a source record with a single symbol and a single "target" record.
Having only the single symbol in the "source" record means that the searches menu either needs help to build the search symbol list at formatting time or needs the search page to perform the overrides traversal itself.
- Arguably a UX improvement here would be for the context menu to provide explicit search options for the method and the methods it's overriding as explicitly different options. This would necessitate access to the crossref database in format.rs or just a more structured cheat sheet than the jumps information provides. I'd argue for crossref database access.
Having only the single "target" record means that the "identifiers" mapping file only maps the pretty name for a symbol to its directly corresponding symbol. This means router.py would need to perform this traversal itself or we would want to intentionally change the search results here.
- Arguably the current behavior isn't great, with bug 1466692 as a concrete example of that. (But as proposed there, delineating the method from the method it's overriding might be the best of both worlds, and that is something that can be done and explicitly surfaced to the user.)

One conclusion I think I've reached from fancy branch experimentation is that it likely makes sense to dump the jumps/searches array in favor of a map keyed by symbol name, with the map including no_crossref symbols for the file so that the type/typesym information can be surfaced. The jumps/searches array produces a lot of redundant results (the only difference between different incidences of a symbol index-wise is the filtration of a jump for which the current symbol is the definition), so any bloat from extra structured information could potentially offset. This would also enable "go to declaration" as an option that I've found pretty useful on the fancy branch.

Disclaimer: None of this will happen before June 30th ;)

Andrew Sutherland [:asuth] (he/him)

Assignee

Updated

•

4 years ago

Depends on: 1698236

Andrew Sutherland [:asuth] (he/him)

Assignee

Comment 1

•

4 years ago

Attached file Bug 1641372 - Emit structured records. r=kats — Details

Andrew Sutherland [:asuth] (he/him)

Assignee

Comment 2

•

4 years ago

Attached file mozsearch changes — Details

Andrew Sutherland [:asuth] (he/him)

Assignee

Comment 3

•

4 years ago

•

Edited

Linkage

Problem Statement

An important issue raised by :kats in the github review is that the IPC linkage mechanism implemented in the branch in crossref.rs could potentially be made more generic / higher level, covering cases like the preferences use-case described in bug 1699048 or perhaps the URL mapping discussed in bug 1697671. However, in this comment I'm only going to deal with the IDL cases, though I think the general idea here should generalize to the preferences situation.

When digging into this a bit more, it became clear that when addressing this we could potentially also simplify the implementations of our IDL-related processing. Much of the complexity in the XPIDL analyzer and the IPDL analyzer involves having them determine what C++ analysis header file to load (:billm's explanation) and then to establish a mapping between the expected "pretty" human-readable identifier (ex: "Class::Method") and the underlying mangled symbol (which the analyzers don't understand and depend on there being no overloads to get it right). This is work that could instead be handled by the cross-referencing and/or new linker process.

IDL analyzer simplification

For XPIDL/IPDL/WebIDL, the analyzer/indexer could be simplified such that it emits analysis records that express:

The IDL token definitions, including the peek ranges that cover the preceding comments.
The expected linkage identifiers for each "slot" that relates to the symbol. This can then be used by the linker to determine the actual (mangled) symbol(s).
- The linkage identifiers would have a structured scheme like an array of { symbol: "exact_symbol" } / { symbol_prefix: "ZBlahBlah" } / { pretty: "Foo::Bar" } / { raw_prototype: "void blah(int blah, char blah) or something clever" } objects.
- This would allow the XPIDL compiler to indicate C++ bindings via "pretty" name and rust bindings via symbol (which is actually the same as pretty right now), supporting both in a single pass.
IDL Slot schemes could be:
- XPIDL: getter (for C++/Rust), setter (for C++/Rust), attribute (for JS given our current analysis), method
- IPDL: send, recv
- WebIDL: getter, setter, method, enabling_pref, enabling_func.
There'd probably also be a binding slot on each interface that allows a direct mapping from the interface to the binding classes.

Slots, Kinds, Symbols and UX

UX Status Quo

Searchfox currently really only has:

From context menu:
- Go to definition (for which we know there's only one definition, otherwise this option is hidden)
- Search for everything about the unioned set of symbols associated with this token that all shared the same pretty identifier. (There may be multiple searches for cases like constructors where implicitly invoked field constructors end up collapsed onto the same token, but inherently have different pretty identifiers.)
  - This is how the current XPIDL bindings work: We associate the .idl file tokens with the C++ symbol name for the binding we scrape from the binding header file, plus the wildly-unscoped JS property that the method/attribute would be indexed under as implemented or consumed.
The search UI:
- Search for this pretty identifier and then tell me everything about the symbols associated with the pretty identifier.
- Fulltext search that is largely separate from all this semantic stuff but highly useful and powerful in its own right and serves as a backstop for when the semantic stuff falls down.

In the search results, we facet and prioritize the result by (target) "kind", on trunk these are: def, decl, idl, use, assign.

What Slots Enable

The additional slots potentially enable new direct access context menu options as well as additional faceting. For example, for the above:

XPIDL:
- Menu:
  - Go to {getter/setter/method} implementation. This is distinct from "go to definition" which would be in the IDL file.
- Search Faceting:
  - Uses of {getter/setter/attribute} as distinct items instead of all combined together. Although it might make sense to mix them together by default but provide an inline control in the sub-heading like Uses ([x] getter [x] setter [x] attribute).
IPDL:
- Menu:
  - Go to recv implementation. (This would directly accessible from the IPDL file and any calls to the Send method.)
- Diagramming: It really helps the auto-diagramming logic to be able to understand what calls have IPC semantics and the send-recv pairings.
WebIDL:
- Menu:
  - Go to {getter/setter/method} implementation. This is distinct from "go to definition" which would be in the WebIDL file.
  - Go to enabling pref "foo.bar" (The presence of the menu option indicating the use of an enabling pref is probably most useful; maybe this also wants to be an icon?)
  - Go to enabling func.

One Symbol Per Token / Abstract Symbols

A big conceptual change I'd been pursuing in the fancy branch was that rather than having each token being associated with a set of symbols, the token would explicitly be associated with a single symbol and have explicit relationships to the related symbols. That is, right now, there is no symbol that corresponds exactly to the given XPIDL token; it only references the presumed JS binding and the presumed C++ binding symbols. With this change we would create an explicit symbol namespace for XPIDL interfaces and the token in the .idl file would be associated with only that symbol.

That XPIDL-namespaced symbol would then reference the symbols of all the bindings. The C++ binding's analysis file would explicitly only have its own C++ symbol associated with the token. However, the cross-referencing process would establish these symbol relationships via the slots. The C++ binding symbol would have an "idl" slot that references the XPIDL-namespaced symbol, etc.

The search mechanism and context menu popups would all know how to pierce these relationships and potentially pre-compute/pre-aggregate the information as necessary.

Phabricator Automation

Updated

•

4 years ago

Attachment #9208998 - Attachment description: Bug 1641372 - Emit structured records. → Bug 1641372 - Emit structured records. r=kats

Pulsebot

Comment 4

•

4 years ago

Pushed by bugmail@asutherland.org: https://hg.mozilla.org/integration/autoland/rev/601185a5693a Emit structured records.

Cristian Brindusan [:cbrindusan]

Comment 5

•

4 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/601185a5693a

Status: ASSIGNED → RESOLVED

Closed: 4 years ago

status-firefox93: --- → fixed

Resolution: --- → FIXED

Andrew Sutherland [:asuth] (he/him)

Assignee

Comment 6

•

4 years ago

Attached file additional router s/consumes/callees/ rename fixes from review renames — Details

Andrew Sutherland [:asuth] (he/him)

Assignee

Comment 7

•

4 years ago

Attached file router.py python3 fixes due to drift — Details

Andrew Sutherland [:asuth] (he/him)

Assignee

Updated

•

3 years ago

Regressions: 1728376

Andrew Sutherland [:asuth] (he/him)

Assignee

Updated

•

3 years ago

Regressions: 1729164

Andrew Sutherland [:asuth] (he/him)

Assignee

Updated

•

3 years ago

Regressions: 1729371

Andrew Sutherland [:asuth] (he/him)

Assignee

Updated

•

3 years ago

Blocks: 1574258

Andrew Sutherland [:asuth] (he/him)

Assignee

Updated

•

2 years ago

Blocks: 1499059

You need to log in before you can comment on or make changes to this bug.