Open Bug 1878998 Opened 2 years ago Updated 9 months ago

Enable C++ semantic indexing for comm-central and provide hybrid blame support for comm-central and mozilla-central under mozilla/

Categories

(Webtools :: Searchfox, task)

task

Tracking

(Not tracked)

ASSIGNED

People

(Reporter: rjl, Assigned: asuth)

References

Details

Thunderbird is moving to a monthly release schedule the same as Firefox's. Part of this has been to revive the comm-release repository.

It would be very helpful to have comm-beta and comm-release available on Searchfox.

Also, there is now Rust code being added to Thunderbird. I'd like to get the semantic analysis going for Rust as well as C++.

(In reply to Rob Lemley [:rjl] from comment #0)

Also, there is now Rust code being added to Thunderbird. I'd like to get the semantic analysis going for Rust as well as C++.

Do you know if Thunderbird is able to commit to contributing an ongoing level of engineering and maintenance resources to searchfox? Indexing comm-central has been pretty cheap[1]/painless since it's ~only fulltext and not too much of a maintenance burden without the semantic data since there's been little that could break[2] and we didn't really address the weird tree/build setup. Semantic support would significantly change that calculus, especially in terms of upcoming hyperblame/phabricator-integration features where the tree complications would be a major concern.

If it's not possible to commit ongoing resources but a higher quality of searchfox experience is desired, it might make sense for Thunderbird to stand up its own searchfox instance that doesn't track the mozsearch trunk but instead updates its tracking branch periodically, like the Igalia webkit search used to do.

1: More so for the comm- trees that don't have the mozilla/ checkout; the mozilla/ checkout in comm-central is a concern but currently status quo.

2: The caveat is that because we always try and run JS analysis and I implemented a sort of quality ratchet there in fact was some JS stuff I've had to deal with, but otherwise not too bad.

Flags: needinfo?(sancus)
Duplicate of this bug: 1656066

(In reply to Andrew Sutherland [:asuth] (he/him) from comment #1)

(In reply to Rob Lemley [:rjl] from comment #0)

Also, there is now Rust code being added to Thunderbird. I'd like to get the semantic analysis going for Rust as well as C++.

Do you know if Thunderbird is able to commit to contributing an ongoing level of engineering and maintenance resources to searchfox?

The answer to this is a qualified Yes, we likely can commit some resources. I'd want more specifics on what they need to do and how much time it will take.

This is going to have to wait until we're done with some hiring though so we can come back to it next quarter or so.

Flags: needinfo?(sancus)

(In reply to Andrei Hajdukewycz [:sancus] from comment #3)

The answer to this is a qualified Yes, we likely can commit some resources. I'd want more specifics on what they need to do and how much time it will take.

I think the broad sketch of a situation that would be workable is:

  • We put the comm-central repositories in their own 2 configs which means they get their own servers and if they fall over it doesn't interfere with any m-c trees or any of the other random stuff we index. That is, if a searchfox indexing job fails, we let the existing web-server continue to run with stale data. (This would not be a change, that's how it already works, it's just the arrangement of trees into config files that changes.)
    • Including the new requested trees this gives 4 indexed semantic trees of comm-central, comm-beta, comm-release, and comm-esr128 plus the 6 non-semantic ESRs plus any pre-comm-esr60 you might want.
    • Server cost-wise, this is currently 2 x t3.2xlarge 24/7 for $486 = 2 * 243 a month for the web server instance and 2 x m5d.2xlarge for let's say an aggregate of ~10 hours a day for $137.48 a month for the indexer between the 2 runs. All costs are on-demand us-west-2.
      • Note that I am looking at some enhancements that might find us moving to m5d.2xlarge for web serving for full fidelity which would increase the aggregate web serving to $659.92 a month. (This of course does not capture any taskcluster costs for the semantic indexing runs, etc.)
    • The concern here isn't the cost so much as isolating the comm-* trees so breakage does not interfere with m-c trees. But I think it does make sense to provide clarity on the costs.
  • Thunderbird would provide someone responsible for keeping the comm-* config jobs running. In general searchfox is quite stable and does not require a lot of babysitting, but the basic idea would be that if comm-* trees break, someone from the Thunderbird team is already watching the mailing list that job failures get sent to and the searchfox bugzilla component to see if comm-* is broken and addressing that.
    • I don't know that there would be a regular time commitment here, but Thunderbird would want someone who would be able to spend a few hours here and there as things fell over. And this would assume that in general the comm-central tree is generally green and generally tracking mozilla-central.
    • That said, it might be nice for the person to have a few hours a week to potentially help make sure any enhancements m-c gets can also be translated to comm-* when there are additional things that need to be configured. Like I am hoping to add support for understanding preferences and our C++ bindings in H2, and when I land that, I would have done the hookup for m-c, but I would not have done it for c-c.
  • All comm-* indexed trees that want semantic indexing with blame support need to be normalized into a single git tree which searchfox consumes. This doesn't mean that the canonical comm-central tree needs to be git using git-subtree (first search result with a nice overview) for integrating mozilla-central into comm-central, although that does seem like it could be a nice option once m-c migrates to git. Just that the comm-central/setup script creates the effective equivalent before passing it off to build-blame. This can be accomplished by any number of vcs sync/conversion mechanisms as long as they produce a single git tree.
    • This is really the most important thing since it avoids creating comm-* specific complexity which would be a drag on searchfox enhancements. I should note there was some talk of providing better support of git submodules in the past, but for sanity I am ruling that out.
    • This obviously potentially requires some initial effort to get working. I don't know where m-c is in its git migration at this point, but this would really only become an issue when I finish up bug 1517978 which I really hope to get to in H2, so this isn't an immediate issue.
    • If there is a transform step rather than consuming a canonical repo, this also probably implies some minor enhancement work by the Thunderbird contributor to help searchfox be able to figure out how to generate links to the canonical hosted revision control service.

We do have https://github.com/thunderbird/comm-unified-l10n, which looks funny on the main branch but does have comm-central, comm-beta, comm-release, comm-esr128, and comm-esr115. It's updated daily via scheduled GH action using git-cinnabar. I suppose that it could be modified to run via webhook or something so it's always current.

It's purpose in life is to be a source for l10n strings to do quarantine stuff on Github. That said, I've no reason to doubt its stability or accuracy. I do plan to make it go away once the switch to Github happens.

(In reply to Rob Lemley [:rjl] from comment #5)

We do have https://github.com/thunderbird/comm-unified-l10n, which looks funny on the main branch but does have comm-central, comm-beta, comm-release, comm-esr128, and comm-esr115. It's updated daily via scheduled GH action using git-cinnabar. I suppose that it could be modified to run via webhook or something so it's always current.

To clarify what I poorly phrased for "normalized into a single git tree" wasn't about having all the branches in a single repo, although that is nice to have. My concern is I would want the "mozilla/" subdirectory to be present in the same git tree as "mail/". That doesn't have to be a public tree that people can clone, it could be something synthetically generated locally by the searchfox indexers and propagated between indexing runs (which is what already happens for the source and blame trees, although the source trees usually do correspond to a public tree at this point).

Since bug 1970809 covers adding the missing trees, I'm going to re-purpose this bug to just be about providing a full fidelity experience for comm-central in terms of both blame and turning on the semantic indexing.

A relevant change since comment 4 is a tentative plan to add web-standards indexing in a single composite tree which entails teaching searchfox about how to deal with multiple source trees. That situation is simpler than the comm-central situation since there's no nesting, just sibling trees, but it could be nice for firefox-main (formerly mozilla-central) to potentially have a synthetic git tree backing __GENERATED__ with its own history/blame which is basically the same situation.

I'm going to assign this to me for now, although it will probably be several weeks.

Assignee: nobody → bugmail
Status: NEW → ASSIGNED
Summary: Add comm-beta and comm-release to Searchfox → Enable C++ semantic indexing for comm-central and provide hybrid blame support for comm-central and mozilla-central under mozilla/
You need to log in before you can comment on or make changes to this bug.