Closed Bug 1463888 Opened 6 years ago Closed 4 years ago

Support processing searchfox indexes created by taskcluster try builds for local serving

Categories

(Webtools :: Searchfox, enhancement)

enhancement
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: asuth, Assigned: kats)

References

Details

This is basically just bug 1425597 (use the taskcluster searchfox indices) plus some additional support to deal with the try server and the fact that try pushes will be behind the current mozilla-central tip (which :kats is already proposing dealing with in https://bugzilla.mozilla.org/show_bug.cgi?id=1425597#c5).

The main complication is the standard mozilla-central hg/git situation[1].  As I try and do this manually-ish right now, it's taking several hours for git-cinnabar to do its thing to pull the revision I want from try given that it's dealing with a gecko-dev repo without pre-existing graft metadata.  It sounds like git-cinnabar 0.5's ability to sync metadata might help bring some type of closure to the hg/git situation soon.

Note that I'm not proposing we deal with the hosting side of things at this time.  I think being able to point people at the searchfox repo and tell them how to get vagrant hosting searchfox against their specific try build with less than an hour of processing time would be a sufficient first step.  It would be future work to have taskcluster also run the analysis processing step and then build on the "one-click loaner"/"create interactive task" infrastructure so that the user doesn't have to do anything locally other than ssh into the VM for keepalive purposes.


Relatedly, some things I noticed that might be sharp edges we could file down:
- gecko-dev.tar and gecko-blame.tar could probably benefit from being pre-compressed rather than depending on the server to do on-the-fly gzip encoding (if that even happens?).
- It could make sense to invert the logic around KEEP_WORKING in indexer-setup.sh so that automation would pass CLEAN_WORKING=1 instead of nuking the working directory's contents by default.  Alternately, the default could vary based on apparent interactivity (based on tty/versus pipe and shell flag checks).  This avoids constantly having to re-download all the files and re-compute cinnabar metadata due to small oversights.

1: :kats' thread at https://groups.google.com/d/msg/mozilla.dev.platform/kqMX4H6Iw5M/ehtXT_MeAgAJ and :myk's updated doc at https://wiki.mozilla.org/GitHub/Gecko_Repositories cover the situation pretty well.
Depends on: 1464297
My naive attempt to pull try revisions turned out poorly.  I filed https://github.com/glandium/git-cinnabar/issues/172 in the process, so now confused people like myself will get faster failures.

I've filed bug 1464297 blocking this bug to switch to using a git-cinnabar mirror like https://github.com/mozilla/gecko since I think it's clear git-cinnabar has won the git-hg conversion war.
I'm wondering what your use case is here. It sounds like you want to be able to have searchfox indexing capabilities on code that's not yet committed to m-c. Is that it, or is the use case more complex than that? Specifically is the "try push" required for something else, add opposed to just adding the necessary mozconfig incantation to build the index locally? I think that might be a technically simple approach if it satisfies your needs.
It's worth explicitly mentioning that the "analysis" and "blame" components of searchfox are more or less independent, and so if all you're after is the analysis then we can skip the whole blame business which means we don't need to futz around with the details of the underlying VCS mechanism. That it, it should work whether you're using git, cinnabar, or even mercurial.
The use-case is code review of large, complex patches.  For example, I'm about to undertake a 28-part review of the next-generation localstorage implementation on bug 1286798 and am finishing up a 13-part review of the e10s-ification of SharedWorker support.

In particular, the searchfox index is useful because I'm prototyping some tooling support along the lines of what I showed in bug 1339243, but building on other manual graphviz experience to try and make it easy to curate diagrams based on the searchfox for understanding like the manually created https://www.visophyte.org/blog/wp-content/uploads/2017/03/service-worker-multi.svg and https://www.visophyte.org/blog/wp-content/uploads/2017/03/intercepted-channels.svg.  The idea is that you can create diagrams to aid in your understanding as you explore an implementation using searchfox, then save the diagram for others, maybe lumping it into some documentation.  Because searchfox has the semantic understanding underlying the diagram, it can also say "hey, this diagram is no longer accurate because these things no longer exist", rather than being a suspicious UML diagram stashed on wiki.mozilla.org that may or may not correspond to reality.


Local indexes work too, but building on the try job is nice because:
- I can get indexes for platforms I don't regularly build on or can't build on.  The Windows and OS X jobs did fail, but if they hadn't failed, they would have been useful in some cases.  For me, having the fennec/android builds indexed would likely be more useful.
- The builds take a long time and tie up hardware.
- It makes it easier to further productize searchfox as a tool for people who aren't (would-be) searchfox contributors.


I concur the blame is separable from the indexed stuff, but it's super handy for figuring out which patch I need to comment on from the tree state, or what's part of the patch and what hasn't been modified.  In general, the searchfox UI is way better for this than using hg.mozilla.org.  (The exception, of course, is hg's ability to let you select a range of lines and see how they change over time.  That is magic and usually a good point to make the transition off of searchfox.)
I haven't tried this, but now that we're pulling artifacts from taskcluster I think getting a local searchfox instance for a try build is relatively straightforward (at least more than it was before).

Let's say I have written patches X, Y, Z on top of my a clone of m-c at base revision B. It doesn't matter whether you use hg, cinnabar, or gecko-dev to write the patches initially. Then you do a try push with those patches and include the searchfox jobs. Also, you need to apply the patches X, Y, and Z on top of a gecko-dev clone at B, and push that to github (or anywhere publicly available). If you originally wrote the patches on a gecko-dev clone then this is trivial, otherwise you have to cherry-pick them over. Now you have a try push which produces the analysis artifacts, and you have a publicly available gecko-dev clone at the same revision.

Modify the mozilla-central/setup file in the mozsearch-mozilla repo like so:
1. At [1] add commands to fetch your gecko-dev clone. e.g. `git fetch https://github.com/staktrace/gecko-dev Z`
2. At [2] set REVISION to `try.revision.<hash>` where `hash` is your try push hg revision.
3. At [3] replace the awk incantation with a straight up `INDEXED_GIT_REV=<hash>` where `hash` is the gecko-dev hash for Z. 

After that if you run the regular indexing steps it should just work. It will download the taskcluster artifacts from the try push and use the source/blame info from the gecko-dev clone for the rest.

If this works, we can easily make this more ergonomic by exposing stuff as env vars or adding a wrapper script that prompts you for the relevant information or whatever.

[1] https://github.com/mozsearch/mozsearch-mozilla/blob/47b631d3983b34861936bdefa2ff321d38f36eab/mozilla-central/setup#L50
[2] https://github.com/mozsearch/mozsearch-mozilla/blob/47b631d3983b34861936bdefa2ff321d38f36eab/mozilla-central/setup#L64
[3] https://github.com/mozsearch/mozsearch-mozilla/blob/47b631d3983b34861936bdefa2ff321d38f36eab/mozilla-central/setup#L69
Thanks for providing these steps!  I'm back around to needing/wanting a searchfox-indexed try build and I will be "try"-ing these out in the next day or two!  I'll provide an update back as to how it went.
FWIW, I tried the steps in comment 5, and after a few minor bumps (needed a newer vagrant version (2.1 seems good), needed a newer ruby version (2.5 seems good), needed to make enough disk space available (75 GB or so)), I have a local searchfox instance up and running, and it seems to be working well!

Now that we have switched to grafted cinnabar metadata, the steps for this have changed (but should be even simpler). I intend to add a command to the top-level Makefile that will make it Real Easy (TM) to locally index a try build, which should be sufficient to close out this bug.

Assignee: nobody → kats

Part 2: https://github.com/mozsearch/mozsearch/pull/286

I think it might be worth making KEEP_WORKING=1 the default since gecko-dev and gecko-blame are so massive and take so long to download. In production when it runs it runs from a clean slate anyway so KEEP_WORKING=1 there shouldn't make any difference. That's a separate bug though.

(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #10)

I think it might be worth making KEEP_WORKING=1 the default since gecko-dev and gecko-blame are so massive and take so long to download. In production when it runs it runs from a clean slate anyway so KEEP_WORKING=1 there shouldn't make any difference. That's a separate bug though.

Indeed. Maybe we should also consider compressing the tarballs as well? I understand lz4 (https://github.com/lz4/lz4#benchmarks) to be appropriate for cases like this where we don't want to pay the encoding time for optimal compression but wouldn't mind some level of compression. Having just run lz4 locally on a 6.8G gecko-dev.tar, lz4 compressed it to 5.2G (it says 76.54% of original) in 27 real seconds. This was on my local machine with an older SATA SSD (not super fancy NVME) with the gecko-dev.tar not having been cached at all. I repeated the process to different output files to see the effects of caching/improved I/O, the 2nd run took 24.5s, the 3rd run 9.1s, suggesting that on the indexers with NVME and compressing as the tar is created, the compression should be very fast. (Note that this is all single core, the lz4 repo implies that lz4 parallelizes stuff, but the default command does not, it's just saying the performance is on a single core and nothing precludes using multiple cores in parallel.)

Yeah, we should consider that too. I've spun off bug 1621324 and bug 1621325 for these things. Closing this bug as done as I've merged the PRs.

Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.