Closed Bug 1634770 Opened 4 years ago Closed 4 years ago

Optimize build-blame tool

Categories

(Webtools :: Searchfox, enhancement)

enhancement

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: kats, Assigned: kats)

References

(Blocks 1 open bug)

Details

Attachments

(1 file)

In bug 1627532 I rewrote the transform_repo.py script in rust, but it was basically a straight port from Python to Rust. There's room for performance improvement by separating the CPU-intensive part (calling into libgit2 to compute diffs) from the I/O part (writing the blame tree into the blame repo), and having the CPU part run for different revs on different threads, feeding all their results back to the main thread which does the writing to disk on a single thread. As libgit2 is not totally threadsafe, doing this will require some ad-hoc architecturing rather than the standard Rust parallelization toolkits.

I expect it to provide a significant speedup, so it's worth doing as I expect the speedup to unlock additional features that involve more frequent rebuilds of the blame repo.

I wrote code to parallelize the computation part. It does provide a bit of speedup on the nss repo (total blame-build time goes down from around 135 seconds to around 105 seconds). But it seems to slow down when running on gecko-dev and I'm not sure why. Will have to do some profiling/instrumentation. Patches are currently at https://github.com/staktrace/mozsearch/commits/rsblame2 (topmost commit is the most interesting)

The approach I had implemented involved using rayon to do some the parallelization work, but that meant each "compute task" had to re-open the git repository, which apparently slows things down quite a bit. If I have a persistent compute thread that keeps a repo object around, it does have some speedup. I'm trying to see if I can add a few more persistent compute threads (with some manual scheduling of compute tasks, instead of using rayon) will also speed things up more, or if I'll run into contention. Right now with one I/O thread and one compute thread we're still compute bound.

https://github.com/mozsearch/mozsearch/pull/354

This parallelizes using a simple home-rolled threadpool and round-robin scheduling of compute tasks. It gives a pretty good speedup (2-4x depending on the repo) and has room for further improvement. Current profiles show the bottleneck being the main I/O thread, with the get_path call still taking up 21% of time, even though I already removed some usages of it. I'm attaching a profile that can be loaded in https://profiler.firefox.com (but apparently not shared from there).

Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: