1598502 - Restructure searchfox (all-platform) indexing to be a taskcluster job that produces artifacts

See https://github.com/mozsearch/mozsearch-mozilla#how-searchfoxorg-stays-up-to-date for more details in parts, but the rough concept of searchfox's daily update is:

Taskcluster's nightly cron job runs searchfox analysis jobs that produce artifacts for each repository of of interest.
Searchfox AWS Lambda jobs spin up indexer machines that download the analysis artifacts for all 4 platforms we support and perform other indexing. (After recent changes,) the indexing happens on the instance's local disk and saved to an EBS volume. Each indexer is responsible for multiple repositories.
The indexer detaches its EBS volume, spins up a web-server that mounts the EBS volume, then updates the load-balancer to point at that web-server.

It would be nice to alter the steps to be:

Taskcluster's nightly cron job runs searchfox analysis jobs that produce artifacts for each repository of of interest AND a dependent indexing job that processes all of those artifacts and produces analysis byproducts as an artifact.
The webserver (which is always around) either receives a webhook trigger or using a periodic cron job notices that there is a new indexer run for a repository it cares about. It downloads the analysis data to a new directory, checks out the matching searchfox revision and builds its deps, then spins up the webservers for the repository. Once the webservers are up, it updates the reverse proxy[1] to point at the new webservers and nukes the old directory for the repository.

The benefits to this approach would be:

Avoid availability zone mismatch problems of the current architecture. Bug 1598046 is an example of this happening where the indexer starts out in AZ (Availability Zone) 'a' and so the EBS volume is in that AZ as well, but then we can't start a new webserver in 'a'.
This would potentially allow for on-demand searchfox indexes of specific try builds or specific older tree states for a repository. Bug 1463888 wanted this for local purposes, but the ideal end state is to not require anything local to happen. Note that there are hg/git logistical issues in that bug that still would need to be addressed.

The main downside to this is that there would be a lot more things to go wrong on the webserver. Especially if each analysis directory could be using a different searchfox version.

We could attempt an incremental approach, however, where the indexing runs could just look for a tarball that they could download from taskcluster artifacts, bypassing the local indexing. If the taskcluster indexing build failed, we fall back to traditional indexing and the existing webserver. The AWS load balancer is capable of more complicated health checks than we're using, and we should be able to create a setup where a new-style server and old-style server could still exist until the new-style server is proven reliable.

1: Currently we use nginx but it seems like it requires the API module that's only part of its commercial subscription. Something like envoy which provides dynamic updates via an API or if there's a fancy rust-based solution would be nice.

Bugzilla

Restructure searchfox (all-platform) indexing to be a taskcluster job that produces artifacts

Categories

(Webtools :: Searchfox, enhancement)

Tracking

(Not tracked)

People

(Reporter: asuth, Unassigned)

References

Details

Crash Data

Security

(public)

User Story

Description