Closed Bug 1729164 Opened 3 years ago Closed 2 years ago

[structured analysis search regression fixes] Fold single/low cardinality override relationships into search results without extra clicks

Categories

(Webtools :: Searchfox, enhancement)

enhancement

Tracking

(Not tracked)

RESOLVED DUPLICATE of bug 1772720

People

(Reporter: asuth, Assigned: asuth)

References

(Regression)

Details

Attachments

(1 file)

Problem / Context

In discussion about bug 1728376 and other potential UX enhancements for existing use-cases where the old behavior of the transitive closure on overrides (:mccr8's excellent characterization) was useful (as opposed to the overload case of bug 1466692), :nika raised the scenario of nsIDocShell methods where we expect there to be be a single implementation/override of the methods.

In that case, for example, searching for nsIDocShell::loadURI we expect for there to be a single C++ pure virtual definition (formerly a declaration prior to bug 1728376 altering this behavior) and then a single override on nsDocShell. We expect the uses to be of the pure virtual function, but the useful definition to be on the override.

Prior to the structured analysis landing, all of the results would have been included together, which was optimal for this use-case. While there wins from the behavior change, they were primarily in eliminating the unconditional inclusions of "cousin" relationships, not in hiding the parent/child relationships. The current behavior partitions the two useful pieces of information onto separate search pages, which is not helpful and is a regression.

Minimal Solution

Alter expand_keys so that:

  • Maybe: Move 'meta' processing to occur prior to the key translation step to allow for the easier introduction of other new-keyed results. It could also make sense to skip this and just do the merging post-transform.
  • Have the meta merge invocations (possibly just inside merge_defs_from_symbols_as consider if the number of symbol_names is small enough that it just makes sense to directly merge the looked-up-single symbol's crossref data in its entirety instead of excerpting the definitions under the override/class-relationship new derived kinds.

Stretch goal:

  • As :mstange raises in https://bugzilla.mozilla.org/show_bug.cgi?id=1466692#c3 it's appropriate for the changes here to try and ensure that the additional results included via symbol relationship traversal are prioritized below the direct search hits. Right now there's no mechanism for that, but it seems doable. Likely advisable to get test coverage for the search query results first, though.

Rambling / Scope-Creep

As part of the longer term plans to do fancier things with search and move from router.py to a searchfox-tool-based/informed rust based search mechanism, I've done some thinking-out-loud below. None of the below needs to (or should) block the above. And any experimentation can happen in parallel with the existing search. (Unfortunately the core representation change for overrides did need to be an all-or-nothing change, although I probably should have just had the server do the transitive closure thing on the backend, but I think the current batch of UX enhancements while addressing the unintended regressions is a really useful set of changes on net.)

Diagram / Trade-Offs Discussion

I'm attaching a diagram I mocked up from some higher level (scope creep) meta discussion I wanted to have in the area but which is relevant to the trade-offs here.

Search Pipeline

The search pipeline as implemented by router.py (ignoring fulltext) is basically:

  • Map identifiers to symbols performing prefix match. (With structured analysis, this phase now maps identifiers directly to their symbol and does not merge/union the overrides with what they're overriding. See bug 1466692 for a case where this was bad.)
  • Loads the symbol data which is aggregated by kind (def/decl/use/etc.)
  • Put all the data in one big bag and then prioritize the results by kind.

Automated Merging Concerns

Structured analysis provides us with edges between symbols that we can traverse to merge into the bag. The general concerns with this automatic merging are:

  • Making the "bag" too big.
    • The current search logic caps the amount of work done/results returned. We don't want to hit this overload case and end up eliding results.
    • Smaller result sets are easier to visually scan/parse.
  • Our prefix matching approach over a single lexical space means that while matching what the user is actively searching for, we may also be experiencing false matches over similarly named code in other parts of the code-base. (Namespace may help mitigate this if provided.)
    • (Naive) automatic traversal of these graph edges has the potential to vastly increase the number of false positives in the result set.

Mitigations

There are a bunch of ways to address these problems:

  • Faceting via interactive UI. Right now there's no way to refine the search results other than changing the underlying search, but we can very much allow the user to discard results that come from different sub-trees, that are on an interface we're not interested in, are in the wrong language, that haven't changed recently, etc.
  • The automatic traversal obviously doesn't need to be naive. The heuristic already proposed up at the top is for a single override or low cardinality, but this could be further modulated by knowing the size of the existing result set. If we know there's already N first-level matches, in theory it could be good for the user to tell us which of those N they care about.
    • This implies some level of faceting at the query level, if not at the UI level. We do somewhat have this now via the search upsell mechanism introduced in bug 1728376 to expose the overrides/superclass/subclass relationships as symbol searches which is a zoom-in style faceting, but which are limited in that they inherently only show a single node in our symbol graph.
  • Higher level overviews of the contents of the search results as a diagram or a ul/li tree (as a more compact and more accessible representation).
    • The attached diagram for example attempts to convey a graph where the relationships between the symbols including aggregations of the number of callers of methods is expressed at a high level.
      • The diagram could be interactively clickable, potentially as a Thunderbird global-search style interaction where each click can be used to specify inclusion (must have) or exclusion (must not have). These clickable interactions can also be expressed in the ul/li tree-view as checkboxes or combo-boxes.
      • Note that while the interaction of the diagram is hypothetical, the ul/li tree-view is not; I tried to make sure when prototyping on the fancy branch that the graph layout mechanism could always produce a dual of the visual graph as a proper HTML tree.
Depends on: 1729178
Regressed by: 1641372
See Also: → 1729371
See Also: → 1772720

A number of mitigations landed as part of bug 1728376 and I had some WIPs for this, but there was a lot of complexity creep and in general I think it's better to just address this via the "query" endpoint (tracked by bug 1772720) since it has explicitly grown the logic to deal with situations like this. That's not to say someone can't try and enhance router.py, it's just likely to be a non-trivial amount of work and result in UX regressions.

Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Duplicate of bug: 1772720
Resolution: --- → DUPLICATE
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: