Closed Bug 1686981 Opened 4 years ago Closed 5 months ago

Optimize searchfox taskcluster cron jobs for less frequently updated branches like ESR and release

Categories

(Webtools :: Searchfox, enhancement)

enhancement

Tracking

(firefox137 fixed)

RESOLVED FIXED
Tracking Status
firefox137 --- fixed

People

(Reporter: asuth, Assigned: jcristau)

References

Details

Attachments

(3 files)

Currently the searchfox-index jobs in https://searchfox.org/mozilla-central/source/.cron.yml are run daily for the mozilla-beta, mozilla-release, and mozilla-esr78 branches. This helps ensure that we run the jobs at most once a day even when branches are experiencing a lot of pushes, but also results in us running the jobs at least once a day, even if there have been no pushes. Since the taskcluster searchfox jobs use the in-tree MozsearchIndexer.cpp rather than downloading it from an external location (or anything else from the mozsearch repo), the jobs are effectively deterministic and so extra runs of the job are wasteful beyond making sure we never encounter a situation where artifacts expire.

It would be good to address this inefficiency.

Skip running the indexing tasks if they already ran on the same revision
and the previous decision task expires in over a week.

Assignee: nobody → jcristau
Status: NEW → ASSIGNED
Pushed by jcristau@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/7516d1ebb8e1 optimize searchfox taskcluster cron jobs. r=taskgraph-reviewers,gbrown
Status: ASSIGNED → RESOLVED
Closed: 1 year ago
Resolution: --- → FIXED

My mental model of the expiry logic in the PR here was that we would re-run searchfox tasks if it looks like the artifacts have less than a week on them. But in bug 1930345 we saw that this isn't the case; here's the Nov 11 cron task deciding not to re-trigger the searchfox jobs for which the artifacts had expired.

Looking more closely, it looks like the logic is actually finding some earlier decision-searchfox-index but like the first cron decision task apparently ever run for that revision itself still has an expiry 9 months from now, with the decision task from the other day that decided not to index things still having a ~full year left... I'm not sure I understand which route wins there, but none of them will regenerate the searchfox artifacts (which appear to expire after 3 months).

Should I file a new bug for this / is there a correct idiom we can copy for this purpose? Thanks!

Flags: needinfo?(jcristau)

Hmm, there's 2 things here:

  • with the change here we check the expiration date of the previous searchfox-index cron task; that can't work, as that task is pretty much always going to be from the previous day. We should look up the last jobs that were actually scheduled.
  • indexing tasks like https://firefox-ci-tc.services.mozilla.com/tasks/Znd1gn9cQteWOXqQETsO-A (and their artifacts) expire after a year, like the cron task. But the searchfox-index cron task also schedules a couple of source-test tasks, that do seem to have shorter expiries; are these the ones that caused the issue for cypress?
Status: RESOLVED → REOPENED
Flags: needinfo?(jcristau)
Resolution: FIXED → ---

(In reply to Julien Cristau [:jcristau] from comment #5)

Ah, yeah, the first specific error reported was bugzilla-components.json and I erroneously assumed everything was expiring, but in fact we had fewer failures than everything failing:

curl: (22) The requested URL returned error: 404
parallel: This job failed:
curl -SsfL --compressed https://firefox-ci-tc.services.mozilla.com/api/index/v1/task/gecko.v2.cypress.revision.3f80fbb5cea5ff9c54b727195e9df32731321411.source.source-bugzilla-info/artifacts/public/components-normalized.json -o bugzilla-components.json    || curl -SsfL --compressed https://firefox-ci-tc.services.mozilla.com/api/index/v1/task/gecko.v2.cypress.latest.source.source-bugzilla-info/artifacts/public/components-normalized.json -o bugzilla-components.json

We saw 11 errors reported but only the one specific URL got logged; the fetches we issue are accumulated in fetch-tc-artifacts.sh. I'm attaching an example downloads.lst file that we build there from a successful run from yesterday :

+ parallel --halt now,fail=1
curl: (22) The requested URL returned error: 404
curl: (22) The requested URL returned error: 404
curl: (22) The requested URL returned error: 404
curl: (22) The requested URL returned error: 404
curl: (22) The requested URL returned error: 404
curl: (22) The requested URL returned error: 404
curl: (22) The requested URL returned error: 404
curl: (22) The requested URL returned error: 404
curl: (22) The requested URL returned error: 404
curl: (22) The requested URL returned error: 404
curl: (22) The requested URL returned error: 404
parallel: This job failed:
Attachment #9437782 - Attachment mime type: application/octet-stream → text/plain

On repos that rarely get pushes, we don't necessarily want to re-run
indexing every day. However we need to make sure artifacts from
downstream tasks remain available for searchfox to download.

The previous approach had two issues:

  • the cron task looked at the standard taskgraph index path to find its
    previous run. That normally would have been the previous day's task,
    so the expiry logic would never kick in
  • some of the downstream tasks (searchfox kind) have medium
    expiration policy, while others (source-test) use the default, so
    assuming that everything relevant would expire at the same time as the
    cron task itself was broken

With this change, the searchfox cron task:

  • gets indexed at
    gecko.v2.{project}.revision.{revision}.searchfox-index only if it does
    schedule jobs
  • looks up the previously indexed task at that location, and checks if
    any of the tasks it scheduled are about to expire
Pushed by jcristau@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/b645aa2710dd (take 2) - don't optimize out searchfox index jobs if any downstream task is close to expiring r=taskgraph-reviewers,bhearsum
Status: REOPENED → RESOLVED
Closed: 1 year ago5 months ago
Resolution: --- → FIXED
Blocks: 1967518
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: