Add automated sanity checks on generated searchfox index
Categories
(Webtools :: Searchfox, enhancement)
Tracking
(Not tracked)
People
(Reporter: kats, Assigned: asuth)
References
Details
It would be great if we could define a set of checks that get run after the indexer is done indexing, and that could emit warnings or fail the indexer entirely depending on which checks fail. Checks would include things like "does rust code have working semantic analysis data".
The intent here is to catch regressions like bug 1593833 as soon as they happen, when we can look at the last day's worth of changes in m-c and hopefully identify what caused it. Otherwise we might not notice for a few days, and then it's not obvious how long it's been broken, and fixing it gets harder. Plus, of course, we can reject deploying an index that is too broken.
Assignee | ||
Comment 1•5 years ago
|
||
Agreed. I suspect you'll have already beaten me to an approach, but maybe we could do something like adding a check
script to each tree definition that would either call out to helper scripts or bash functions that can abstract away the details of the checks. By abstracting them away, we could potentially also re-use the checks when the web-server starts up as an additional end-to-end test against the web server. (Now that we also configure nginx for caching, it could also be abused to prime common searches.)
The check primitives would basically be:
- There's an analysis file for file FOO.EXT and it has a symbol definition for
T_Foo
.- Indexing check would verify the analysis file exists and that a jq filter locates the symbol definition.
- Indexing check would verify the output file exists and grep that
data-symbols="T_Foo
exists in the output. - Runtime check would fetch the output file from the server and grep that
data-symbols="T_Foo
exists in the output. - Runtime check would run a symbol search on
T_Foo
and use jq to verify that the definition is present in the search results. - Runtime check would run a fully qualified path constraining search for the symbol (which limits us to fulltext only) and that the definition shows up there as well, either via naive substring check or by cross-correlating the result of the jq output from the previous check.
And so our goal would be to pick a bunch of symbols spread over the different analysis engines that aren't likely to churn. This could also be accomplished via a JSON file or TOML file or something instead of a helper script, but that's not consistent with precedent and shell scripts certainly allow calling out to a tool that consumes such configs anyways.
Relatedly, maybe we should also take my "fancy" branch change at https://github.com/asutherland/mozsearch/commit/8e84f5dedef36d45c8de83583b96a19c9c9233a7#diff-315f016f5d8cf09394064584d7b7f731 which exposes the analysis files and repo-files
and objdir-files
which would save people from needing to log into the server to check analysis results. I'm not sure it would ever be particularly useful in a functionality check, though.
Reporter | ||
Comment 2•5 years ago
|
||
I hadn't thought about implementation detail but what you're describing makes a ton of sense. And yeah that fancy branch change sounds like a good one to land independently of everything else, I can think of a few instances where that would have been useful to have :)
Assignee | ||
Comment 3•4 years ago
|
||
Going to do a quick first pass at this to validate my fix for bug 1593833.
Assignee | ||
Comment 4•4 years ago
|
||
Initial checks and infrastructure landed as part of the fixes for bug 1593833 via https://github.com/mozsearch/mozsearch/pull/299 and https://github.com/mozsearch/mozsearch-mozilla/pull/87. Right now there are only rust checks in https://github.com/mozsearch/mozsearch-mozilla/blob/master/mozilla-central/check but I filed bug 1640723 for next steps.
Description
•