Closed Bug 1884364 Opened 2 months ago Closed 2 months ago

reduce json-automationrelevance queries to hg.mozilla.org

Categories

(Firefox Build System :: Task Configuration, enhancement)

enhancement

Tracking

(firefox126 fixed)

RESOLVED FIXED
126 Branch
Tracking Status
firefox126 --- fixed

People

(Reporter: bhearsum, Assigned: ahal)

References

(Regressed 1 open bug)

Details

Attachments

(4 files, 1 obsolete file)

hg.mozilla.org is under great stress lately; we should stop making unnecessary queries to it. This endpoint in particular seems to be queried unconditionally every time we run ./mach taskgraph tasks or any other downstream commands. This means that we are often querying it in places where it is absolutely not necessary. As a specific example, I noticed it being queried when we run release decision tasks which definitely do not need the information it provides.

We make the call in this function, which is memoized, so we will only query it once per process (unless we get an error that we retry for).

That function is called in two places:

I suggest that we find a way to stop hitting this endpoint as part of transforms. This will avoid it being called in things like release decision tasks and other action tasks, and likely other scenarios as well (such as running taskgraph locally, and probably some other scenarios in automation I'm not considering).

@ahal - I'm needinfo'ing you because you generally have clever ideas about this sort of thing, but also because you're the defacto expert on the taskgraph-diff tasks.

@jmaher - You're the defacto expert on test configuration. Are there ways we can avoid running this transform, or defering it to a later point (like an optimization or morph)? Even if we temporarily made the chunks fixed to reduce hg load, that might be a good thing.

get_changed_files, which is used by two optimization strategies, a function only used by tests, and a function that is called by a transform to determine the number of chunks of something to run. Of these, the latter is the only concerning one. Because it is called as part of a transform, it will necessarily be run in virtually any scenario that taskgraph runs in.

Correct here: this actually only calls to hg.m.o for non-Try repos. I still think it would be useful to further limit or defer this - but it's not as critical because of this.

(In reply to bhearsum@mozilla.com (:bhearsum) from comment #0)

  • set_base_revision_in_tgdiff, which is a transform used by the taskgraph-diff tasks in the source-test kind. This is another place where we hit this endpoint virtually any time that taskgraph runs.

@ahal - I'm needinfo'ing you because you generally have clever ideas about this sort of thing, but also because you're the defacto expert on the taskgraph-diff tasks.

I think I found a good solution to this part. I'm moving it out to https://bugzilla.mozilla.org/show_bug.cgi?id=1884386.

Depends on: 1884386

What if we turned files-changed into a parameter rather than something that needs to be computed during graph generation?

Decision tasks would still need to use json-automationrelevance in decision.py, but we can avoid it in all other cases:

  1. When ./mach taskgraph locally we could rely on local vcs to populate this. E.g, use all draft commits that are ancestors of the current commit.
  2. When running locally but with --parameters task-id=<id> (or similar), you'd get the exact files-changed that the decision task computed as it's just baked into the parameters.
  3. We would hardcode dummy files changed into all the taskcluster/test/params, so the taskgraph-diff task wouldn't compute anything either. Or if we still wanted to use real files changed (which might be necessary for testing optimizations and such), we could have ./mach taskgraph insert this parameter on the fly (either using local VCS or the value in the Decision task's parameters, but never using json-automationrelevance).
  4. Transforms would just use what's in the params and also wouldn't need to compute anything.

After writing this out, this seems like a no-brainer to me! Can you think of anything I might be missing?

(In reply to bhearsum@mozilla.com (:bhearsum) from comment #0)

That function is called in two places:

@jmaher - You're the defacto expert on test configuration. Are there ways we can avoid running this transform, or defering it to a later point (like an optimization or morph)? Even if we temporarily made the chunks fixed to reduce hg load, that might be a good thing.

Whoops, forgot to needinfo, heh.

Flags: needinfo?(jmaher)

the two spots:

  1. files-changed.py :: check - this appears to only be used by unittests; I suspect we could live without this, or keep it as this is only a unittest; we should confirm that
  2. perfile.py :: per_file_number_of_chunks - this is specifically for test-verify/test-coverage. The point of this is to find test files that were changed so we can run test-verify/coverage on those. This is why these tasks are only run periodically, when certain files are changed. If we didn't do this here, we wouldn't know the number of chunks (no files == zero chunks) and the tasks wouldn't be scheduled.

I am not sure about moving the logic from perfile.py to somewhere else. Could we live without test-coverage/test-verify- sure we could. It appears for perfile.py stuff this only invokes the files_changed when not on try; maybe we could limit this to run on try and central and skipping autoland?

Other suggestions are welcome.

Flags: needinfo?(jmaher)

I was having problems getting 'responses' to work with the unittest style tests
and decided it would be easier to just convert it to Pytest.

We use hg.m.o's json-automationrelevance endpoint for a variety of reasons
such as getting the files changed for optimization purposes, or finding the
base revision for diff purposes. But this endpoint is slow and puts undue load
on hg.mozilla.org if queried too often.

The helper function that fetches this is memoized, so in theory we should only
ever make this request once per graph generation. However, there are still cases
where we request this unnecessarily:

  1. When running ./mach taskgraph locally, we first fetch
    json-automationrelevance and then fall back to fetching it locally if the
    revision wasn't found. I believe the reason for this is to be able to generate
    identical graphs as produced by CI.

  2. When specifying multiple parameters (so graphs are generated in parallel),
    the memoize won't cache across processes, so we make the request once per
    parameter set.

  3. Any other time we generate tasks outside the context of a Decision task (e.g
    ./mach try), as there are transforms that call this function.

By turning files_changed into a parameter, we can ensure that this value gets
"frozen" by the Decision task and it will never need to be recomputed. E.g, you
could use -p task-id=<decision id> and you'd still get the files_changed
value that Decision task computed. This means, that for all non-Decision use
cases we can rely on local VCS to give us our changed files.

This should greatly cut back on the number of queries being made to hg.m.o.

Assignee: nobody → ahal
Attachment #9390304 - Attachment description: WIP: Bug 1884364 - Convert gecko_taskgraph/test/test_decision.py to pytest format, r?#taskgraph-reviewers! → Bug 1884364 - Convert gecko_taskgraph/test/test_decision.py to pytest format, r?#taskgraph-reviewers!
Status: NEW → ASSIGNED
Attachment #9390305 - Attachment description: WIP: Bug 1884364 - Create a new 'files_changed' parameter, r?#taskgraph-reviewers! → Bug 1884364 - Create a new 'files_changed' parameter, r?#taskgraph-reviewers!

Previously we were using json-automationrelevance to compute the base_rev.
But recently we improved the Decision task to be smarter about determining
this, so the value already baked into the parameters should be sufficient.

Note that bug 1884386 already removed the transform that was calling
get_json_automationrelevance. Rather than re-adding it, this patch uses the
task_context transforms to grab the value from parameters instead.

This additionally adds the --force-local-files-changed flag to the command,
as we might miss something if we just used the fake values baked into the test
params.

See Also: → 1885036
Pushed by ahalberstadt@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/0c89ad084b0d
Convert gecko_taskgraph/test/test_decision.py to pytest format, r=taskgraph-reviewers,gbrown
https://hg.mozilla.org/integration/autoland/rev/85241d2d265c
Create a new 'files_changed' parameter, r=taskgraph-reviewers,releng-reviewers,jcristau
https://hg.mozilla.org/integration/autoland/rev/d6232354e9f3
Add an option to |mach taskgraph| to force using locally changed files, r=taskgraph-reviewers,jcristau
https://hg.mozilla.org/integration/autoland/rev/60a6ed6de4ae
Re-enable tgdiff task and force local files changed, r=taskgraph-reviewers,jcristau

Backed out for causing py3 unit test failures

Backout link: https://hg.mozilla.org/integration/autoland/rev/719fcbd9cc2a0730856fd730ef31f17f6c00a005

Push with failures

Failure log

taskcluster/test/test_mach_try_auto.py::test_only_important_manifests PASSED
[task 2024-03-14T16:28:52.831Z] taskcluster/test/test_mach_try_auto.py::test_tasks_are_scheduled[mochitest-browser-chrome] TEST-UNEXPECTED-FAIL
[task 2024-03-14T16:28:52.831Z] taskcluster/test/test_mach_try_auto.py::test_tasks_are_not_scheduled[no shippable builds] PASSED
[task 2024-03-14T16:28:52.831Z] taskcluster/test/test_mach_try_auto.py::test_tasks_are_not_scheduled[no fuzzing builds] PASSED
[task 2024-03-14T16:28:52.831Z] taskcluster/test/test_mach_try_auto.py::test_tasks_are_not_scheduled[no ccov builds] PASSED
[task 2024-03-14T16:28:52.831Z] taskcluster/test/test_mach_try_auto.py::test_tasks_are_not_scheduled[no build-signing] TEST-UNEXPECTED-FAIL
[task 2024-03-14T16:28:52.831Z] taskcluster/test/test_mach_try_auto.py::test_tasks_are_not_scheduled[no upload-symbols] PASSED
Flags: needinfo?(ahal)
See Also: → 1886196
Pushed by ahalberstadt@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/d3a11f386f93
Convert gecko_taskgraph/test/test_decision.py to pytest format, r=taskgraph-reviewers,gbrown
https://hg.mozilla.org/integration/autoland/rev/6bae790de5de
Create a new 'files_changed' parameter, r=taskgraph-reviewers,releng-reviewers,jcristau
https://hg.mozilla.org/integration/autoland/rev/0c6f696169f0
Add an option to |mach taskgraph| to force using locally changed files, r=taskgraph-reviewers,jcristau
https://hg.mozilla.org/integration/autoland/rev/5ed9438f3dc0
Re-enable tgdiff task and force local files changed, r=taskgraph-reviewers,jcristau
https://hg.mozilla.org/integration/autoland/rev/dc701f4ef6c4
Use local files_changed in Taskcluster integration tests, r=taskgraph-reviewers,bhearsum
Regressions: 1886230

Just wanted to come in here and say thanks for taking this on to help with the load issues on hg.mozilla.org, it is much appreciated.

Attachment #9391776 - Attachment is obsolete: true

Np! Has it made a noticeable difference?

Flags: needinfo?(ahal)
See Also: → 1891768
Regressions: 1896978
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: