Closed Bug 1842632 Opened 2 years ago Closed 2 years ago

Searchfox doesn't display the right files on mozilla-beta

Categories

(Webtools :: Searchfox, defect)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: diannaS, Assigned: kats)

References

Details

Attachments

(1 file)

STR:
(Trying to see the changes from bug 1842075 (this bug should only be on mozilla-central) )
-Visit https://searchfox.org/mozilla-beta/source/browser/base/content/browser.js#9855
-Notice the file doesn't match what is currently on the tip of beta
https://hg.mozilla.org/releases/mozilla-beta/file/tip/browser/base/content/browser.js#l9915

It seems search fox is displaying the mozilla-central file instead

This indeed seems similar to the see-also'd bug 1840841; on July 9th the config2 indexer sent an email with:

Searchfox produced warnings during indexing! The first 50 warnings:

  File "/usr/lib/python3.10/urllib/request.py", line 536, in _open
    result = self._call_chain(self.handle_open, protocol, protocol +
  File "/usr/lib/python3.10/urllib/request.py", line 496, in _call_chain
    result = func(*args)
  File "/usr/lib/python3/dist-packages/mercurial/url.py", line 449, in https_open
    return self.do_open(self._makeconnection, req)
  File "/usr/lib/python3/dist-packages/mercurial/keepalive.py", line 253, in do_open
    raise urlerr.urlerror(err)
urllib.error.URLError: <urlopen error TLS/SSL connection has been closed (EOF) (_ssl.c:997)>
error: could not fetch esr17
+ inhibit_upload
+ touch /mnt/index-scratch/mozilla-beta/INHIBIT_UPLOAD
+ return 1
+ handle_tree_error 'tree setup script'
+ local 'msg=tree setup script'
+ echo 'warning: Tree '\''mozilla-beta'\'' error: tree setup script'
warning: Tree 'mozilla-beta' error: tree setup script

Given that hg.mozilla.org seems frequently unreliable, the simplest mitigation may be to change the "on_error" directives for all m-c branches in https://github.com/mozsearch/mozsearch-mozilla/blob/master/config2.json to "halt" like has been done for m-c trunk in https://github.com/mozsearch/mozsearch-mozilla/blob/master/config1.json.

I think things may have self-corrected on the subsequent config2 run, as clicking on the hash from the "showing" line just below the search box that shows the tip revision when browsing "/source/" of https://searchfox.org/mozilla-beta/commit/71168bf626d7b1def63b264a1ff4fbea00b66f5e provides an appropriate looking commit and the link of https://hg.mozilla.org/releases/mozilla-beta/rev/5e3818f9b90d68871a703f28d7b2874e8043a7e0 seems correct if clicked on. (It's likely that in the broken state experienced, a mozilla-beta link would still be generated, but it would be invalid if clicked on assuming mozilla-beta doesn't have access to the underlying commits.)

I looked into the code a bit more and confirmed my suspicion that this is basically bad fallback handling of this particular error. What happens is that this line errors out, causing the remainder of the script to get skipped, but then the line here logs the error and continues. So the rest of the indexing stuff proceeds, but the git repo is still checked out at the master branch. This ends up rendering the m-c code instead of the relevant branch.

(In reply to Andrew Sutherland [:asuth] (he/him) from comment #1)

Given that hg.mozilla.org seems frequently unreliable, the simplest mitigation may be to change the "on_error" directives for all m-c branches in https://github.com/mozsearch/mozsearch-mozilla/blob/master/config2.json to "halt" like has been done for m-c trunk in https://github.com/mozsearch/mozsearch-mozilla/blob/master/config1.json.

Yeah this seems like the best solution, at least for now. It does mean that any time there's an error pulling from hg.m.o like this, all the other repos on that indexer don't get updated either. Which is maybe acceptable, given that it seems intermittent and the other repos aren't as important as m-c.

Another option I considered was to check out the git/blame repos to the last-known commit on the relevant branch before trying to pull from hg.m.o. So that would ensure that even if the fetch fails and the rest of the script aborts, at least the fallback behaviour renders content from the right branch, even if it's not the latest commit on that branch for which we have searchfox. However this would result in misalignment between the searchfox artifacts we fetch from taskcluster (which use $INDEXED_HG_REV) and the git repo state (which would potentially be some other revision) and that will likely cause other problems.

I guess another option would be to always halt the indexer if any repo errors out during the setup phase, by modifying the handle_tree_error clause here. That might be better because (a) it will handle similar problems in other repos where the setup phase fails, and (b) will still continue if a later phase (say rust analysis) fails for one of the m-c repos.

Attached file GitHub Pull Request
Assignee: nobody → kats

Went with that approach, but would like a second opinion before merging. It's hard to test too since I can't trigger the failure mode on demand.

Thank you!

(In reply to Kartikaya Gupta (email:kats@mozilla.staktrace.com) from comment #5)

Went with that approach, but would like a second opinion before merging. It's hard to test too since I can't trigger the failure mode on demand.

I think your choice to make setup errors fatal was definitely the right one. My decision for it to keep going was somewhat dubious in the first place, but I never really imagined that hg.mozilla.org would be a consistent problem where the failure mode is extra bad. I think bugs like bug 1730923 about sourceforge were primarily on my mind, and their failure modes wouldn't involve branch jumping (which I don't think occurred to me).

It seems like if we were going to make enhancements here, the next steps might be:

  1. Add some hg or git wrappers or shell retry blocks (or function helpers or using GNU parallel like in this stackexchange post around the revision control stuff involving remote servers. We could end up waiting potentially a long time for the retries, but I don't think the cost of the indexer taking an extra hour to fail is remotely a concern at this point.
  2. Take a page from treeherder/taskcluster and have a concept of a Purple, infrastructure exception which would re-trigger the run. This is potentially a bit scary since self re-triggering opens up new possibilities of pathological failure modes. Also, at some point, one is just reinventing taskcluster or other similar tools. Also, in general this just ends up being a very resource-intensive sleep delay since usually all we need to do is wait a bit and retry the git/hg option.

That said, I don't think it's actually necessary to make any enhancements. The current failure rate seems relatively low, it was just that the failure mode where it was the wrong branch was extra misleading and a much worse outcome than stale data for a day.

Agreed. I think for now I'll close this bug and if we see this erroring out frequently we can look into the enhancements. I still have bug 1841389 open against the hg.m.o component to hopefully get this addressed on the server side, but there hasn't been any activity there.

Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: