1617567 - Stand up text search for older revisions of m-c / older ESRs

Reporter

Description

•

4 years ago

Occasionally I want to know when a thing went away. I know some string I can use to find the thing, but it's not in m-c anymore, so no hits when looking at current m-c. Searchfox has esr60 and esr68, but for older things, I have to use DXR and then I am sad.

It would be nice if we had text search for older revisions.

:Gijs (he/him)

Reporter

Updated

•

4 years ago

Type: defect → enhancement

Andrew Sutherland [:asuth] (he/him)

Comment 1

•

4 years ago

There was some discussion of this feature in Matrix, a link to the start of the conversation is: https://view.matrix.org/room/!vDKxYNxlsZYvjSWGBh:mozilla.org/?anchor=$5Mqlih6YyDa9BAuqQNrs3LhCGSvsV5gxvs8hzPXqPL4&offset=0
Note that for some reason the anchoring UI likes to show the selected message as the last message on the screen, which I find weird, but in any event, if you click "show newer messages" you get to the actual conversation.

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 2

•

4 years ago

My plan here is to create another searchfox config, called mozilla-archived.json or similar, for repos that are not updating any more. This includes esr60 and any older ESR branches that we want to add. This will NOT get run daily, since the code isn't changing and there's no point. Instead it would be triggered manually and the web server would just stick around forever or until we decide we want to retrigger it (e.g. if the generated HTML changes significantly).

This should also reduce the runtime of the current mozilla-releases indexer since ESR60 will get offloaded.

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 3

•

4 years ago

Will work on this as my next searchfox background-ish task.

Assignee: nobody → kats

Andrew Sutherland [:asuth] (he/him)

Comment 4

•

4 years ago

Restating my understanding of the rationale and cost ramifications:

Any indexing task we move out of the daily indexing jobs represents both an explicit cost savings in indexer machine run-time, plus a hard-to-quantify but likely real benefit in terms of Firefox developers seeing up-to-date code faster. Note that because there's a weird non-linearity where the indexers seem to slow down the longer they run for, time saved from removing an indexing job from the start may actually result in more time savings than it consumes itself at the start.
- ESR60 currently takes about 50 minutes, but we'll round up to an hour for the purposes of discussion, especially since more recent indexes do take longer because they have more files.
Creating an additional web server target does have a tangible cost, but because webservers run 24/7, reserved instances make sense for them. Also, we can host many indexes on a target, so the decision to create a new target is less about the marginal cost/benefit of esr60 being extracted versus the benefit from being able to host possibly older revisions and/or versions like esr68 as they age out.
A 1-year reserved instance t3.large costs $1.17 a day, so it's not a lot of money.
Indexer time is currently $0.452/hr and reserved t3.large is $0.049/hr, which means every hour of indexer time saved covers 9.2 hours of t3.large web server operation. Which means an additional web server pays for itself with 3 mozilla-esr60 sized indexer jobs involved.
- Also, 1 hour of engineer time easily covers more than a month of operating costs.

Note that although in normal operation our automation leaves around a backup web-server, the logic only does this if the webserver is less than 1.5 days old so we don't have to worry about manual triggering of indexing runs doubling costs by leaving around an out-of-date webserver.

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 5

•

4 years ago

Thanks for providing that cost calculus! I had just assumed it would be better to save indexer time without doing a detailed analysis.

PRs so far for this bug:
https://github.com/mozsearch/mozsearch-mozilla/pull/74
https://github.com/mozsearch/mozsearch/pull/281

These just stand up the extra target and move ESR60 to it. Adding older ESRs should be pretty easy on top of this. Gijs, how old do you usually go with ESRs? Do we want to go all the way back to ESR17 (the oldest ESR branch on gecko-dev)?

Flags: needinfo?(gijskruitbosch+bugs)

:Gijs (he/him)

Reporter

Comment 6

•

4 years ago

(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #5)

These just stand up the extra target and move ESR60 to it. Adding older ESRs should be pretty easy on top of this. Gijs, how old do you usually go with ESRs? Do we want to go all the way back to ESR17 (the oldest ESR branch on gecko-dev)?

I think the oldest that'd be useful is the Firefox 3.5 or 4 release, though admittedly I don't go back that far that often. Looking at the other comments here, I don't know if there is significant cost to each ESR we'd keep this way; if there is, I'd suggest keeping every other one. Does that help?

Flags: needinfo?(gijskruitbosch+bugs) → needinfo?(kats)

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 7

•

4 years ago

Yep. I'll add esr17, esr32, esr45 which are alternating ESRs. If you can find me a changeset that is "Firefox 4 release" or "Firefox 3.5 release" I can add that too, although I don't see anything that looks like that at https://hg.mozilla.org/releases/mozilla-release/tags which is where I would expect to see it.

Flags: needinfo?(kats)

:Gijs (he/him)

Reporter

Comment 8

•

4 years ago

(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #7)

Yep. I'll add esr17, esr32, esr45 which are alternating ESRs. If you can find me a changeset that is "Firefox 4 release" or "Firefox 3.5 release" I can add that too, although I don't see anything that looks like that at https://hg.mozilla.org/releases/mozilla-release/tags which is where I would expect to see it.

https://hg.mozilla.org/releases/mozilla-release/rev/FIREFOX_4_0_1_RELEASE might be the right thing?

Looks like DXR just lists mozilla1.9.1 / .2 / mozilla2.0, which tbh I'm not sure where those come from...

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 9

•

4 years ago

(In reply to :Gijs (he/him) from comment #8)

https://hg.mozilla.org/releases/mozilla-release/rev/FIREFOX_4_0_1_RELEASE might be the right thing?

Ah yeah that looks right. I wasn't expecting it to come after FIREFOX_5* chronologically.

Andrew Sutherland [:asuth] (he/him)

Comment 10

•

4 years ago

We might want to overhaul the help.html list of repositories to have a table that indicates what analyses/indexing is available for each repo. I presume for most of the ones we're adding we'll just get JS/XPIDL + Fulltext? Or maybe this is something we should just have each tree indicate on the search results? Like where we currently say "Number of results: 136 (maximum is 1000)" we could say "Number of results: 136 (maximum is 1000) from searching Fulltext + JS/XPIDL Analysis but no C++/Rust Analysis". (The search mode would also change based on the use of "symbol"/"re"/"id"/"default"/"pathre" in the query.) At least to start with, this might come from the tree config JSON explicitly having an array of analyses that are known to be available or not available. Hm, actually, if it's in the tree config, we could probably use it as part of templating the root help.html. (In the future we could potentially automate some of this, especially if we have some concept of "canary" symbols whose presence we check for in a tree to indicate that a given phase generally worked. Although I suppose we'd still want explicit expectations so we could generate alerts when a mismatch occurs.)

The general idea would be to help make sure that expectations are properly set for use of the older repositories and even modern repositories. (And as a side-effect reduce the number of bugs filed, not that that has been a problem.)

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 11

•

4 years ago

I think a simple table on help.html should be fine for now.

Also, it looks like FIREFOX_4_0_1_RELEASE is not present in the mozilla-unified repository, and therefore I can't map it to a git version in gecko-dev, assuming it's in there at all. I'm not sure what process was used to create mozilla-unified and why it seems to have some tags but not others.

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 12

•

4 years ago

In fact, it's not in gecko-dev at all. So I'm going to skip that, at least for now.

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 13

•

4 years ago

https://github.com/mozsearch/mozsearch-mozilla/pull/76 has one change that will need to run through the indexer before I can actually add the repos to the mozilla-archived.json config.

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 14

•

4 years ago

The blame building has been going for 2.5 hours and is nowhere near done. If it causes an indexer timeout I might revert that last PR and try rewriting the transform script in rust or something.

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 15

•

4 years ago

I backed out the PR in https://github.com/mozsearch/mozsearch-mozilla/pull/77 - there were at least two problems with it. One was that I neglected to manually create the branch in the gecko-blame repo, which made me run afoul of my own warning. The other, which I realized afterwards, was that I used esr32 which isn't a thing, I meant to use esr31. But it didn't even get to that point.

And while I was looking into that I think the make-crontab.py script might be broken, as the indexer didn't have a self-shutdown crontab entry. And also the main.sh script that calls make-crontab.py doesn't set -e (intentionally, but we can improve that) and the output from that script gets lost (also fixable) so that all made the make-crontab.py failure very silent. I'll file a followup bug about improving those bits.

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 16

•

4 years ago

(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #15)

I backed out the PR in https://github.com/mozsearch/mozsearch-mozilla/pull/77 - there were at least two problems with it. One was that I neglected to manually create the branch in the gecko-blame repo, which made me run afoul of my own warning. The other, which I realized afterwards, was that I used esr32 which isn't a thing, I meant to use esr31. But it didn't even get to that point.

Fixed these problems and relanded: https://github.com/mozsearch/mozsearch-mozilla/pull/78

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Updated

•

4 years ago

Depends on: 1620796

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 17

•

4 years ago

https://github.com/mozsearch/mozsearch-mozilla/pull/79

I deployed the new ESR repos:
https://searchfox.org/mozilla-esr45/source
https://searchfox.org/mozilla-esr31/source
https://searchfox.org/mozilla-esr17/source

Working on the update to help.html

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 18

•

4 years ago

I made changes to help.html to use a table, see https://kats.searchfox.org/ for what it looks like. It's ugly, please suggest improvements, or edit the webserver directly to try changes.

Current changes are in:
https://github.com/mozsearch/mozsearch-mozilla/pull/80
https://github.com/mozsearch/mozsearch/pull/284

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Updated

•

4 years ago

Status: NEW → RESOLVED

Closed: 4 years ago

Resolution: --- → FIXED

Bugzilla

Quick Search

Stand up text search for older revisions of m-c / older ESRs

Categories

(Webtools :: Searchfox, enhancement)

Tracking

(Not tracked)

People

(Reporter: Gijs, Assigned: kats)

References

Details

Crash Data

Security

(public)

User Story

Description

Updated

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Comment 10

Comment 11

Comment 12

Comment 13

Comment 14

Comment 15

Comment 16

Updated

Comment 17

Comment 18

Updated