Implement "search-files" searchfox-tool command to approximate router.py's `search_files` grep-based implementation
Categories
(Webtools :: Searchfox, enhancement)
Tracking
(Not tracked)
People
(Reporter: asuth, Assigned: asuth)
References
Details
Attachments
(1 file)
To provide parity for the new rust "query" mechanism to be able to do everything the current "search" endpoint currently implemented by router.py can do, we need path searching.
router.py currently accomplishes this by shelling out to grep after transforming searchfox's **
-is-a-wildcard-that-allows-/
and *
-is-a-wildcard-that-forbids-/
and ?
-becomes-regexp-.
-and-normal-dots-are-escaped glob syntax into a regexp.
livegrep/codesearch actually returns files that match the path pattern, I believe. (The types imply this, but there aren't a lot of docs and I haven't dug into it.) So the big question is whether to try and use that, if suitable.
The answer for now is "no". Rationale:
- The pipeline model query uses assumes nodes only have a single output with provision for junctions later on, with our data types being largely homogeneous at least until we get to the "compile results" phase which is where we plan to take in the list of matching files, list of matching semantic lookups, and list of full-text lines and merge them into the traditional searchfox heterogeneous result set. Having
search-files
be its own standalone operation avoids needing to add support for multiple outputs from a pipeline node or complicating things with heterogeneous result sets earlier on. - The core "grep" happening here is trivial to implement and mechanically not a lot of work; additionally, we can easily avoid shelling out to grep with the process launch overhead and can pre-cache stuff, etc.
- It's quite likely that we could end up wanting to filter files based on metadata like whether they are test files, as test files whether they are enabled/disabled/etc. These extra axes to filter on benefit from having a dedicated command and not having complicated data-flow where we first run a path search and then pipe that into a filtering stage which has to first do lookups, especially when it might make sense to use the filtering to define the initial set of paths we then regex against (not that the regex work is a big concern).
- If we wanted to swap out the fulltext search engine for some reason, this makes it less of a hassle to experiment with that. (Having said that, I think we have zero interest in doing this.)
Assignee | ||
Comment 1•3 years ago
|
||
This is now implemented in the graphviz-serve branch as https://github.com/asutherland/mozsearch/commit/d22ef4d60897a27ffb3b0902bd488babb1ddea57 and should be landing by the end of the weekend as part of a larger stack.
Assignee | ||
Comment 2•2 years ago
|
||
Assignee | ||
Updated•2 years ago
|
Description
•