Figure out how to semantically represent function/method argument lists
Categories
(Webtools :: Searchfox, enhancement)
Tracking
(Not tracked)
People
(Reporter: asuth, Unassigned)
References
Details
Currently searchfox doesn't understand argument lists for function calls which makes it hard to semantically do a search like "show me all the nsObserverService notifications for 'quit-application'". One can try and approximate it with a regexp, but that will miss cases where defines are used or cases that are multi-line.
Important notes:
- Argument list "target" records do have a "contextsym" of the function/method they are an argument for!
- The argument variable names do get "no_crossref" "source" records for both the def (in the header) and the decl (in the cpp file), but the "sym"s are different for both because the location portion of the hash
mangleLocation(Decl->getLocation())
inherently differs, but we could correct this.- These do not have "contextsym"s because source records don't have those.
Use Case Brainstorms
What we do here potentially depends on the use-cases, so I'm going to edit this bug on an ongoing basis as I come up with new ideas.
I think the main question is really how far we push any information into the cross-referencing database in terms of pre-computation versus just doing things on-the-fly as a filtering post-pass. Note that we already know there are cases like the observer service where we want a semantic linkage mechanism like I propose in bug 1727789 which would inherently build on the mechanism discussed by this bug.
In particular, I think it makes sense to consider our analysis files as a first-class source of data available to us at query time. While our current gzip compression mechanism precludes random access to the files, it's entirely possible for us to use something like https://github.com/lz4/lz4/blob/dev/examples/dictionaryRandomAccess.md to allow efficient random-access seeking as an incremental enhancement.
Grouping / Faceting Search Results of method calls
The search for nsIObserverService::notifyObservers has 313 results right now. These could be rapidly filtered by presenting a table at the top of the results that could list all of the aTopic
strings sorted alphabetically including piercing #define
s. The first arg, nsISupports aSubject
could be categorized too into nullptr/non-nullptr or potentially the types of the innermost argument. For example, it's common to do something like ToSupports(mWindow)
where we explicitly know the type is nsCOMPtr<class nsPIDOMWindowOuter>
in the source record plus we have the full sym of F_<T_AudioPlaybackRunnable>_mWindow
which we have structured analysis data for.
This in the sidebar when looking at a method.
We just do this in the sidebar automatically for context if people want it.
Follow/search this value (through function calls)
One can imagine clicking on a value in a function call argument list or even a local and then having searchfox excerpt the lines where the variable is referenced in the current function, plus following the value through function calls. This could save a lot of manual traversal time as one could skim the uses and see where the value is used in conditionals or saved to other values.
Reporter | ||
Comment 1•3 years ago
•
|
||
Representation Thoughts
The net goal here is to encode sufficient structured info for definitions and AST representation of calls for uses sufficient to understand:
- What token range / which analysis record(s) correspond to each argument.
- Literals. We currently don't emit anything for strings, although if they involve, for example,
operator""_ns
(which wraps strings into an nsLiteralString/nsLiteralCString) as in"foo"_ns
oru"bar"_ns
, we will see the operator usage. - (Useful) structure amongst the records/tokens that make up the argument expression.
- A simple heuristic would be that the rightmost token in the expression token range is probably the end of a traversal, but the
operator""_ns
example is effectively a postfix operator. (Although we can tell this from the symbol name,_Zli3_nsPKDsy
where https://itanium-cxx-abi.github.io/cxx-abi/abi.html#mangle.operator-name explicitily tells us the "li" is foroperator ""
and so the heuristic could identify this.) - It's not actually clear that understanding the expression's AST is actually preferable for general search over a more soupy/union model, however.
- If I want to express that I'm interested in observer notifications involving
Document
instances (ex: observerService->NotifyObservers(ToSupports(aDoc)), then seeing an object of type anywhere in an arg would be notable/useful. (Likewise if we did any primitive soup propagation via assignments / impact of conditionals and their tests so temporaries wouldn't obscure things). Also, it seems that a human authoring a query would be much more likely to start from something simple likeNotifyObservers arg:Document
which should be enough to find the results they want without having to do extra legwork. However, we could still support something likearg0:Document
orargn:0:Document
to allow further restriction of the search.- As noted in comment 0, a filtering post-pass would be the right strategy for searching, but we could pre-compute the arg-soup and include it on the function/method call use in the crossref to limit our need to filter the analysis records directly except for more specific searches.
- If I want to express that I'm interested in observer notifications involving
- A simple heuristic would be that the rightmost token in the expression token range is probably the end of a traversal, but the
It seems like the simplest approach for the analysis records is then:
- Source records for function/method uses will gain an optional "args" property whose value is an array of "SourceRange" values, which is the same type we use for "nestingRange" and looks like "startLine:startCol-endLine:endCol".
- Each index in the array corresponds to the range for the given argument by ordinal. The array is allowed to end early when default values are involved.
- Structured records for function/method definitions would gain an "args" property whose value is an array of objects with keys/values:
name
: The human readable argument name.type
: The human-readable version of the argument type as defined, like basically just what's in the source. This is not the same as the "pretty" of the "typesym" if a "typesym" is present, because the "typesym" will inherently lose all kinds of qualifiers.typesym
: The symbol of the type of the argument. This is handled identically to how we derive the typesym invisitIdentifier
currently. It's likely this could change in the future, as right now this is just the mangledMaybeType.getCanonicalType()->getAsTagDecl()
.
Description
•