Open Bug 1749381 Opened 3 years ago

Consider augmenting target records and thereby crossref data with timestamps of the most recent contributing changes to allow sorting/filtering results by recency or grouping by release

Categories

(Webtools :: Searchfox, enhancement)

enhancement

Tracking

(Not tracked)

People

(Reporter: asuth, Unassigned)

References

Details

One of the pipeline filters I brainstormed in https://bugzilla.mozilla.org/show_bug.cgi?id=1707282#c0 was a predicate like blame --since=fx80. While this can be implemented as a filter that maps records to lines and then uses the lines to lookup the blame information, this is also something that could be precomputed. Specifically, either during or prior to crossref (for greater parallelism, because crossref.rs is currently single threaded), we could effectively apply the most "current"[1] blame record to the lines that a target record spans. In the case of functions/methods, the nestingRange from the source record could be used to reflect the most recent change to the method if we also have the indexer emit the nestingRange or a similar concept on target records.

If we had this information available in the crossref database, this could allow the search UI to easily do things like the following without impacting query result performance other than the marginal cost of shipping the extra data obtained by sequential I/O around:

  • Sort uses (calls) by when they were added.
  • Not impact the sort, but show little Firefox version number badges next to uses that changed in the last N versions.
  • Allow for filtering results by recency, such as limiting results to things that changed in the last N releases.

Note that what I'm proposing above would be entirely separate from supporting this in fulltext search. That would presumably still require separate lookups, and in fact I think we'd want to quantify what the latency cost of this would be, especially as I'm interested in whether the show-html pipeline stage could be fast enough on the server so that we could provide syntax-highlighted results via excerpting in the first case. If we have the HTML data, we could then have the blame strip metadata as well. (Currently the metadata is the revision and not a timestamp, but we could bake that in to the HTML and/or excerpt the relevant contents of a potential future (JS) map.)

1: Currently, searchfox attempts to skip marked revisions for blame purposes in output-file, but this is believed to have non-trivial cost. It's not clear we'd actually want to do the skipping in this case, but if we did, we'd probably want to compute that only once and then store it for the benefit of output-file. Transient disk space during indexing is not currently a concern.

You need to log in before you can comment on or make changes to this bug.