Open Bug 1779340 Opened 3 years ago Updated 2 years ago

Figure out how to semantically represent function/method argument lists

Categories

(Webtools :: Searchfox, enhancement)

enhancement

Tracking

(Not tracked)

People

(Reporter: asuth, Unassigned)

References

Details

Currently searchfox doesn't understand argument lists for function calls which makes it hard to semantically do a search like "show me all the nsObserverService notifications for 'quit-application'". One can try and approximate it with a regexp, but that will miss cases where defines are used or cases that are multi-line.

Important notes:

  • Argument list "target" records do have a "contextsym" of the function/method they are an argument for!
  • The argument variable names do get "no_crossref" "source" records for both the def (in the header) and the decl (in the cpp file), but the "sym"s are different for both because the location portion of the hash mangleLocation(Decl->getLocation()) inherently differs, but we could correct this.
    • These do not have "contextsym"s because source records don't have those.

Use Case Brainstorms

What we do here potentially depends on the use-cases, so I'm going to edit this bug on an ongoing basis as I come up with new ideas.

I think the main question is really how far we push any information into the cross-referencing database in terms of pre-computation versus just doing things on-the-fly as a filtering post-pass. Note that we already know there are cases like the observer service where we want a semantic linkage mechanism like I propose in bug 1727789 which would inherently build on the mechanism discussed by this bug.

In particular, I think it makes sense to consider our analysis files as a first-class source of data available to us at query time. While our current gzip compression mechanism precludes random access to the files, it's entirely possible for us to use something like https://github.com/lz4/lz4/blob/dev/examples/dictionaryRandomAccess.md to allow efficient random-access seeking as an incremental enhancement.

Grouping / Faceting Search Results of method calls

The search for nsIObserverService::notifyObservers has 313 results right now. These could be rapidly filtered by presenting a table at the top of the results that could list all of the aTopic strings sorted alphabetically including piercing #defines. The first arg, nsISupports aSubject could be categorized too into nullptr/non-nullptr or potentially the types of the innermost argument. For example, it's common to do something like ToSupports(mWindow) where we explicitly know the type is nsCOMPtr<class nsPIDOMWindowOuter> in the source record plus we have the full sym of F_<T_AudioPlaybackRunnable>_mWindow which we have structured analysis data for.

This in the sidebar when looking at a method.

We just do this in the sidebar automatically for context if people want it.

Follow/search this value (through function calls)

One can imagine clicking on a value in a function call argument list or even a local and then having searchfox excerpt the lines where the variable is referenced in the current function, plus following the value through function calls. This could save a lot of manual traversal time as one could skim the uses and see where the value is used in conditionals or saved to other values.

Representation Thoughts

The net goal here is to encode sufficient structured info for definitions and AST representation of calls for uses sufficient to understand:

  • What token range / which analysis record(s) correspond to each argument.
  • Literals. We currently don't emit anything for strings, although if they involve, for example, operator""_ns (which wraps strings into an nsLiteralString/nsLiteralCString) as in "foo"_ns or u"bar"_ns, we will see the operator usage.
  • (Useful) structure amongst the records/tokens that make up the argument expression.
    • A simple heuristic would be that the rightmost token in the expression token range is probably the end of a traversal, but the operator""_ns example is effectively a postfix operator. (Although we can tell this from the symbol name, _Zli3_nsPKDsy where https://itanium-cxx-abi.github.io/cxx-abi/abi.html#mangle.operator-name explicitily tells us the "li" is for operator "" and so the heuristic could identify this.)
    • It's not actually clear that understanding the expression's AST is actually preferable for general search over a more soupy/union model, however.
      • If I want to express that I'm interested in observer notifications involving Document instances (ex: observerService->NotifyObservers(ToSupports(aDoc)), then seeing an object of type anywhere in an arg would be notable/useful. (Likewise if we did any primitive soup propagation via assignments / impact of conditionals and their tests so temporaries wouldn't obscure things). Also, it seems that a human authoring a query would be much more likely to start from something simple like NotifyObservers arg:Document which should be enough to find the results they want without having to do extra legwork. However, we could still support something like arg0:Document or argn:0:Document to allow further restriction of the search.
        • As noted in comment 0, a filtering post-pass would be the right strategy for searching, but we could pre-compute the arg-soup and include it on the function/method call use in the crossref to limit our need to filter the analysis records directly except for more specific searches.

It seems like the simplest approach for the analysis records is then:

  • Source records for function/method uses will gain an optional "args" property whose value is an array of "SourceRange" values, which is the same type we use for "nestingRange" and looks like "startLine:startCol-endLine:endCol".
    • Each index in the array corresponds to the range for the given argument by ordinal. The array is allowed to end early when default values are involved.
  • Structured records for function/method definitions would gain an "args" property whose value is an array of objects with keys/values:
    • name: The human readable argument name.
    • type: The human-readable version of the argument type as defined, like basically just what's in the source. This is not the same as the "pretty" of the "typesym" if a "typesym" is present, because the "typesym" will inherently lose all kinds of qualifiers.
    • typesym: The symbol of the type of the argument. This is handled identically to how we derive the typesym in visitIdentifier currently. It's likely this could change in the future, as right now this is just the mangled MaybeType.getCanonicalType()->getAsTagDecl().
You need to log in before you can comment on or make changes to this bug.