Closed Bug 1617567 Opened 4 years ago Closed 4 years ago

Stand up text search for older revisions of m-c / older ESRs

Categories

(Webtools :: Searchfox, enhancement)

enhancement
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: Gijs, Assigned: kats)

References

Details

Occasionally I want to know when a thing went away. I know some string I can use to find the thing, but it's not in m-c anymore, so no hits when looking at current m-c. Searchfox has esr60 and esr68, but for older things, I have to use DXR and then I am sad.

It would be nice if we had text search for older revisions.

Type: defect → enhancement

There was some discussion of this feature in Matrix, a link to the start of the conversation is: https://view.matrix.org/room/!vDKxYNxlsZYvjSWGBh:mozilla.org/?anchor=$5Mqlih6YyDa9BAuqQNrs3LhCGSvsV5gxvs8hzPXqPL4&offset=0
Note that for some reason the anchoring UI likes to show the selected message as the last message on the screen, which I find weird, but in any event, if you click "show newer messages" you get to the actual conversation.

My plan here is to create another searchfox config, called mozilla-archived.json or similar, for repos that are not updating any more. This includes esr60 and any older ESR branches that we want to add. This will NOT get run daily, since the code isn't changing and there's no point. Instead it would be triggered manually and the web server would just stick around forever or until we decide we want to retrigger it (e.g. if the generated HTML changes significantly).

This should also reduce the runtime of the current mozilla-releases indexer since ESR60 will get offloaded.

Will work on this as my next searchfox background-ish task.

Assignee: nobody → kats

Restating my understanding of the rationale and cost ramifications:

  • Any indexing task we move out of the daily indexing jobs represents both an explicit cost savings in indexer machine run-time, plus a hard-to-quantify but likely real benefit in terms of Firefox developers seeing up-to-date code faster. Note that because there's a weird non-linearity where the indexers seem to slow down the longer they run for, time saved from removing an indexing job from the start may actually result in more time savings than it consumes itself at the start.
    • ESR60 currently takes about 50 minutes, but we'll round up to an hour for the purposes of discussion, especially since more recent indexes do take longer because they have more files.
  • Creating an additional web server target does have a tangible cost, but because webservers run 24/7, reserved instances make sense for them. Also, we can host many indexes on a target, so the decision to create a new target is less about the marginal cost/benefit of esr60 being extracted versus the benefit from being able to host possibly older revisions and/or versions like esr68 as they age out.
  • A 1-year reserved instance t3.large costs $1.17 a day, so it's not a lot of money.
  • Indexer time is currently $0.452/hr and reserved t3.large is $0.049/hr, which means every hour of indexer time saved covers 9.2 hours of t3.large web server operation. Which means an additional web server pays for itself with 3 mozilla-esr60 sized indexer jobs involved.
    • Also, 1 hour of engineer time easily covers more than a month of operating costs.

Note that although in normal operation our automation leaves around a backup web-server, the logic only does this if the webserver is less than 1.5 days old so we don't have to worry about manual triggering of indexing runs doubling costs by leaving around an out-of-date webserver.

Thanks for providing that cost calculus! I had just assumed it would be better to save indexer time without doing a detailed analysis.

PRs so far for this bug:
https://github.com/mozsearch/mozsearch-mozilla/pull/74
https://github.com/mozsearch/mozsearch/pull/281

These just stand up the extra target and move ESR60 to it. Adding older ESRs should be pretty easy on top of this. Gijs, how old do you usually go with ESRs? Do we want to go all the way back to ESR17 (the oldest ESR branch on gecko-dev)?

Flags: needinfo?(gijskruitbosch+bugs)

(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #5)

These just stand up the extra target and move ESR60 to it. Adding older ESRs should be pretty easy on top of this. Gijs, how old do you usually go with ESRs? Do we want to go all the way back to ESR17 (the oldest ESR branch on gecko-dev)?

I think the oldest that'd be useful is the Firefox 3.5 or 4 release, though admittedly I don't go back that far that often. Looking at the other comments here, I don't know if there is significant cost to each ESR we'd keep this way; if there is, I'd suggest keeping every other one. Does that help?

Flags: needinfo?(gijskruitbosch+bugs) → needinfo?(kats)

Yep. I'll add esr17, esr32, esr45 which are alternating ESRs. If you can find me a changeset that is "Firefox 4 release" or "Firefox 3.5 release" I can add that too, although I don't see anything that looks like that at https://hg.mozilla.org/releases/mozilla-release/tags which is where I would expect to see it.

Flags: needinfo?(kats)

(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #7)

Yep. I'll add esr17, esr32, esr45 which are alternating ESRs. If you can find me a changeset that is "Firefox 4 release" or "Firefox 3.5 release" I can add that too, although I don't see anything that looks like that at https://hg.mozilla.org/releases/mozilla-release/tags which is where I would expect to see it.

https://hg.mozilla.org/releases/mozilla-release/rev/FIREFOX_4_0_1_RELEASE might be the right thing?

Looks like DXR just lists mozilla1.9.1 / .2 / mozilla2.0, which tbh I'm not sure where those come from...

(In reply to :Gijs (he/him) from comment #8)

https://hg.mozilla.org/releases/mozilla-release/rev/FIREFOX_4_0_1_RELEASE might be the right thing?

Ah yeah that looks right. I wasn't expecting it to come after FIREFOX_5* chronologically.

We might want to overhaul the help.html list of repositories to have a table that indicates what analyses/indexing is available for each repo. I presume for most of the ones we're adding we'll just get JS/XPIDL + Fulltext? Or maybe this is something we should just have each tree indicate on the search results? Like where we currently say "Number of results: 136 (maximum is 1000)" we could say "Number of results: 136 (maximum is 1000) from searching Fulltext + JS/XPIDL Analysis but no C++/Rust Analysis". (The search mode would also change based on the use of "symbol"/"re"/"id"/"default"/"pathre" in the query.) At least to start with, this might come from the tree config JSON explicitly having an array of analyses that are known to be available or not available. Hm, actually, if it's in the tree config, we could probably use it as part of templating the root help.html. (In the future we could potentially automate some of this, especially if we have some concept of "canary" symbols whose presence we check for in a tree to indicate that a given phase generally worked. Although I suppose we'd still want explicit expectations so we could generate alerts when a mismatch occurs.)

The general idea would be to help make sure that expectations are properly set for use of the older repositories and even modern repositories. (And as a side-effect reduce the number of bugs filed, not that that has been a problem.)

I think a simple table on help.html should be fine for now.

Also, it looks like FIREFOX_4_0_1_RELEASE is not present in the mozilla-unified repository, and therefore I can't map it to a git version in gecko-dev, assuming it's in there at all. I'm not sure what process was used to create mozilla-unified and why it seems to have some tags but not others.

In fact, it's not in gecko-dev at all. So I'm going to skip that, at least for now.

https://github.com/mozsearch/mozsearch-mozilla/pull/76 has one change that will need to run through the indexer before I can actually add the repos to the mozilla-archived.json config.

The blame building has been going for 2.5 hours and is nowhere near done. If it causes an indexer timeout I might revert that last PR and try rewriting the transform script in rust or something.

I backed out the PR in https://github.com/mozsearch/mozsearch-mozilla/pull/77 - there were at least two problems with it. One was that I neglected to manually create the branch in the gecko-blame repo, which made me run afoul of my own warning. The other, which I realized afterwards, was that I used esr32 which isn't a thing, I meant to use esr31. But it didn't even get to that point.

And while I was looking into that I think the make-crontab.py script might be broken, as the indexer didn't have a self-shutdown crontab entry. And also the main.sh script that calls make-crontab.py doesn't set -e (intentionally, but we can improve that) and the output from that script gets lost (also fixable) so that all made the make-crontab.py failure very silent. I'll file a followup bug about improving those bits.

(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #15)

I backed out the PR in https://github.com/mozsearch/mozsearch-mozilla/pull/77 - there were at least two problems with it. One was that I neglected to manually create the branch in the gecko-blame repo, which made me run afoul of my own warning. The other, which I realized afterwards, was that I used esr32 which isn't a thing, I meant to use esr31. But it didn't even get to that point.

Fixed these problems and relanded: https://github.com/mozsearch/mozsearch-mozilla/pull/78

I made changes to help.html to use a table, see https://kats.searchfox.org/ for what it looks like. It's ugly, please suggest improvements, or edit the webserver directly to try changes.

Current changes are in:
https://github.com/mozsearch/mozsearch-mozilla/pull/80
https://github.com/mozsearch/mozsearch/pull/284

Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.