I spent some time writing code to do this and it seems to work but is really slow. And it only fixes some of the problems. There are all sorts of other problems because within a hunk you can have lines early in the hunk getting split or merged and that throws off the blame for the rest of the lines.
I have an idea on how to do this better: for each revision to ignore, we should compute an accurate mapping from the "after" state to the "before" state, and save that in a tarball on S3 so that we don't have to recompute it on every run. Instead we would just compute the mapping for any new revisions added to the list.
In order to compute the mapping accurately I think we want to take the "before" code and the "after" code from the hunk and recursively do longest substring matches to compute a character mapping. Then, for a given line in the "after" text, go through each character on the line with a corresponding "before" character, and pick the most recently modified container line. This should build an accurate line map which we would save to disk and put in S3. But it's going to be relatively expensive to compute.
bcmp crate should help with the recursive substring matching part since it has an implementation of Ukkonen's algorithm which is what we want for fast substring matching.