Implement "search-text" searchfox-tool local-only command to talk to the livegrep codesearch server using Tokio-stack https://github.com/hyperium/tonic
Categories
(Webtools :: Searchfox, enhancement)
Tracking
(Not tracked)
People
(Reporter: asuth, Assigned: asuth)
References
Details
Attachments
(1 file)
In order to implement the full searchfox experience in rust, we need to be able to talk gRPC to the livegrep codesearch server. We'll use https://github.com/hyperium/tonic and expose this as a new "search-text" command in searchfox-tool / insta checking. "text-search" would be more natural, but we already have "search-identifiers" and arguably it's probably more user-friendly for the searchfox-tool '--help'
command to have the various searches clustered together. I also expect to add "search-files" soon as well, but I'm trying to be responsible about the scope of bugs and patches, so that will get its own bug.
Assignee | ||
Comment 1•3 years ago
•
|
||
I'm pursuing this now. I'm struggling a little bit with how to best deal with revision control for livegrep's config.proto
and livegrep.proto
files.
Currently, our web-server provisioning looks like this:
git clone -b mozsearch-version5 https://github.com/mozsearch/livegrep
inside the VM at~/livegrep
- bazel build that (inside the VM)
sudo pip3 install grpcio grpcio-tools
(inside the VM)- directly invoke
python3 -m grpc_tools.protoc
to build the python bindings inside the VM at~/livegrep-grpc3/
, and then put that on the python path
That all largely works fine for python because of the lazy runtime compilation model, but at the expense of the VS Code IDE (outside of the VM) where we get yellow squiggles under the grpc
and livegrep_pb2
and livegrep_pb2_grpc
imports because VS Code can't see any of those files and their byproducts. It's also impossible for anyone to use our codesearch.py
module outside of the VM, but this is not generally a problem because they also can't talk to the livegrep codesearch server outside of the VM.
For our rust tools/
dir, the compilation model really wants access to the proto files and the bindings. While I do try and only run the cargo build process inside the VM, VS Code with rust-analyzer generally is able to figure out what's going on as long as the rust versions are similar enough inside and outside the VM without me having pushed any buttons. Also, in theory it would be nice for someone to be able to be able to fully build the crate outside the VM as long as they've checked out the tools/ipdl_parser
submodule.
General options seem to be:
- Ruled out per the above: Stick with the provisioning idiom.
- Contribute a rust crate to livegrep to expose the bindings via a crate. I'm not sure if this is idiomatic or not, and in general it seems like it disproportionately might increase the maintenance burden for livegrep and defeat the point of such a clean API boundary.
- Add livegrep as a git submodule so that we can directly reference the proto files from their origin source tree. For sanity this probably does not want to live under
tools/
but instead at likedeps/livegrep
. This could potentially simplify provisioning, but I'm definitely not touching that as a first step.- In general I've found git submodules to be a pain, but most of that pain has come from high-churn submodules because
git pull
won't update the submodule checkouts unless you dogit pull --recurse-submodules
or have set thesubmodule.recurse
config flag to true. In that case, everything's good, otherwise one needs to dogit submodule update --init
. When people fail to update the the submodule, their tree status will look like the user has reverted the submodule change and it's easy to end up comitting that change. - Our churn on
ipdl_parser
andlivegrep
is so low, and our comitter-base so small, that I'm not sure any of this is a problem even if we couldn't just putgit config submodule.recurse true
in our provisioning script or in our installation instructions which already includegit submodule update --init
.
- In general I've found git submodules to be a pain, but most of that pain has come from high-churn submodules because
- Just check the
.proto
files in, flagging their origin and authorship as best possible. This seems messy at best unless the proto files had explicit per-file licensing/etc. as part of upstream.
So my tentative plan here is to add a new livegrep git submodule but not change the provisioning process at this time, but I will file a follow-up bug to track that idea.
Assignee | ||
Comment 2•3 years ago
•
|
||
I probably implemented this, but am running into a testing logistical issue related to the codesearch daemon not being active when we run the local index checks. Deferring the check to the web server check phase is not appropriate because we do want to be using the local_index
server rather than the remote_server
because the pipeline-server will be running in local_index
mode itself.
My tentative plan plan is just for the indexer phase in terms of mkindex.sh
to:
- assume/require that the web servers aren't running and that therefore no one is running any codesearch instances. maybe invoke
killall
/pkill
directly itself, even. - start up codesearch itself around its call to
check-index.sh
and clean them up after (in success; in failure the potential killall above might handle it at the (re)start of indexing; or maybe we do an error trap or what not).
This might help approximate a setup where systemd or some other daemon driver process is responsible for starting codesearch, noticing if it crashes, and restarting it. Said daemon driver might also then be able to kick/restart the pipeline-server as well. And then we avoid router.py or pipeline-server.rs having to be in charge of the codesearch process. We haven't had router.py fall over in a long long time now, so there's been no need for such changes, but as I move to expand searchfox's capabilities, there's a real chance for occasional OOM-related breakages and it would be nice to have automated self-healing for this. I feel like I've looked into tools to help with this in the past so I'll look at digging up my notes for that.
Assignee | ||
Comment 3•3 years ago
|
||
It appears my notes about process spawning/keep-aliving were at https://bugzilla.mozilla.org/show_bug.cgi?id=1630646#c6 where I determined systemd is probably the best option available.
Some relevant crates that jump out immediately in this space are:
- sd-notify: minimal MIT/Apache crate to allow use of the sd_notify API to update systemd about the status of a service.
- systemd: LGPL-2.1+ with gcc-exception-2.0 Full libsystemd integration
- killjoy: GPL-3+ standalone tool that is a dbus glue layer (and so doesn't actually need to be linked to any searchfox code)
It's not clear we'd need any of these, though, as external heartbeat check scripts are likely more reliable and would avoid platform-coupling/additional deps.
Assignee | ||
Comment 4•3 years ago
|
||
For now, to deal with the codesearch startup issue for tests, my plan is:
- Add a
__main__
handler tocodesearch.py
so that it can be directly run to daemonize a codesearch instance and also verify that the server has started up and is responsive by sending anInfoRequest
message over theInfo
rpc method. This should let us avoid depending on the sleep(5) which it seems like would needlessly slow the dev loop. - I'll also add some logic to kill any existing codesearch for the given port and a specific "shutdown" variant that just does the killing without the startup.
Assignee | ||
Comment 5•3 years ago
|
||
search-text impl commit is https://github.com/mozsearch/mozsearch/pull/505/commits/d4a2f02191a248d03021a10d4a040c5d4a811307
Assignee | ||
Updated•3 years ago
|
Description
•