reduce json-automationrelevance queries to hg.mozilla.org
Categories
(Firefox Build System :: Task Configuration, enhancement)
Tracking
(firefox126 fixed)
Tracking | Status | |
---|---|---|
firefox126 | --- | fixed |
People
(Reporter: bhearsum, Assigned: ahal)
References
(Regressed 1 open bug)
Details
Attachments
(4 files, 1 obsolete file)
hg.mozilla.org is under great stress lately; we should stop making unnecessary queries to it. This endpoint in particular seems to be queried unconditionally every time we run ./mach taskgraph tasks
or any other downstream commands. This means that we are often querying it in places where it is absolutely not necessary. As a specific example, I noticed it being queried when we run release decision tasks which definitely do not need the information it provides.
We make the call in this function, which is memoized, so we will only query it once per process (unless we get an error that we retry for).
That function is called in two places:
- get_changed_files, which is used by two optimization strategies, a function only used by tests, and a function that is called by a transform to determine the number of chunks of something to run. Of these, the latter is the only concerning one. Because it is called as part of a transform, it will necessarily be run in virtually any scenario that taskgraph runs in.
- set_base_revision_in_tgdiff, which is a transform used by the
taskgraph-diff
tasks in thesource-test
kind. This is another place where we hit this endpoint virtually any time that taskgraph runs.
I suggest that we find a way to stop hitting this endpoint as part of transforms. This will avoid it being called in things like release decision tasks and other action tasks, and likely other scenarios as well (such as running taskgraph locally, and probably some other scenarios in automation I'm not considering).
@ahal - I'm needinfo'ing you because you generally have clever ideas about this sort of thing, but also because you're the defacto expert on the taskgraph-diff
tasks.
@jmaher - You're the defacto expert on test configuration. Are there ways we can avoid running this transform, or defering it to a later point (like an optimization or morph)? Even if we temporarily made the chunks fixed to reduce hg load, that might be a good thing.
Reporter | ||
Comment 1•2 months ago
|
||
get_changed_files, which is used by two optimization strategies, a function only used by tests, and a function that is called by a transform to determine the number of chunks of something to run. Of these, the latter is the only concerning one. Because it is called as part of a transform, it will necessarily be run in virtually any scenario that taskgraph runs in.
Correct here: this actually only calls to hg.m.o for non-Try repos. I still think it would be useful to further limit or defer this - but it's not as critical because of this.
Reporter | ||
Comment 2•2 months ago
|
||
(In reply to bhearsum@mozilla.com (:bhearsum) from comment #0)
- set_base_revision_in_tgdiff, which is a transform used by the
taskgraph-diff
tasks in thesource-test
kind. This is another place where we hit this endpoint virtually any time that taskgraph runs.
@ahal - I'm needinfo'ing you because you generally have clever ideas about this sort of thing, but also because you're the defacto expert on the
taskgraph-diff
tasks.
I think I found a good solution to this part. I'm moving it out to https://bugzilla.mozilla.org/show_bug.cgi?id=1884386.
Assignee | ||
Comment 3•2 months ago
|
||
What if we turned files-changed
into a parameter rather than something that needs to be computed during graph generation?
Decision tasks would still need to use json-automationrelevance
in decision.py
, but we can avoid it in all other cases:
- When
./mach taskgraph
locally we could rely on local vcs to populate this. E.g, use all draft commits that are ancestors of the current commit. - When running locally but with
--parameters task-id=<id>
(or similar), you'd get the exact files-changed that the decision task computed as it's just baked into the parameters. - We would hardcode dummy files changed into all the
taskcluster/test/params
, so thetaskgraph-diff
task wouldn't compute anything either. Or if we still wanted to use real files changed (which might be necessary for testing optimizations and such), we could have./mach taskgraph
insert this parameter on the fly (either using local VCS or the value in the Decision task's parameters, but never usingjson-automationrelevance
). - Transforms would just use what's in the params and also wouldn't need to compute anything.
After writing this out, this seems like a no-brainer to me! Can you think of anything I might be missing?
Reporter | ||
Comment 4•2 months ago
|
||
(In reply to bhearsum@mozilla.com (:bhearsum) from comment #0)
That function is called in two places:
- get_changed_files, which is used by two optimization strategies, a function only used by tests, and a function that is called by a transform to determine the number of chunks of something to run. Of these, the latter is the only concerning one. Because it is called as part of a transform, it will necessarily be run in virtually any scenario that taskgraph runs in.
@jmaher - You're the defacto expert on test configuration. Are there ways we can avoid running this transform, or defering it to a later point (like an optimization or morph)? Even if we temporarily made the chunks fixed to reduce hg load, that might be a good thing.
Whoops, forgot to needinfo, heh.
Comment 5•2 months ago
|
||
the two spots:
- files-changed.py :: check - this appears to only be used by unittests; I suspect we could live without this, or keep it as this is only a unittest; we should confirm that
- perfile.py :: per_file_number_of_chunks - this is specifically for test-verify/test-coverage. The point of this is to find test files that were changed so we can run test-verify/coverage on those. This is why these tasks are only run periodically, when certain files are changed. If we didn't do this here, we wouldn't know the number of chunks (no files == zero chunks) and the tasks wouldn't be scheduled.
I am not sure about moving the logic from perfile.py to somewhere else. Could we live without test-coverage/test-verify- sure we could. It appears for perfile.py stuff this only invokes the files_changed when not on try; maybe we could limit this to run on try and central and skipping autoland?
Other suggestions are welcome.
Assignee | ||
Comment 6•2 months ago
|
||
I was having problems getting 'responses' to work with the unittest style tests
and decided it would be easier to just convert it to Pytest.
Assignee | ||
Comment 7•2 months ago
|
||
We use hg.m.o's json-automationrelevance
endpoint for a variety of reasons
such as getting the files changed for optimization purposes, or finding the
base revision for diff purposes. But this endpoint is slow and puts undue load
on hg.mozilla.org if queried too often.
The helper function that fetches this is memoized, so in theory we should only
ever make this request once per graph generation. However, there are still cases
where we request this unnecessarily:
-
When running
./mach taskgraph
locally, we first fetch
json-automationrelevance
and then fall back to fetching it locally if the
revision wasn't found. I believe the reason for this is to be able to generate
identical graphs as produced by CI. -
When specifying multiple parameters (so graphs are generated in parallel),
the memoize won't cache across processes, so we make the request once per
parameter set. -
Any other time we generate tasks outside the context of a Decision task (e.g
./mach try
), as there are transforms that call this function.
By turning files_changed
into a parameter, we can ensure that this value gets
"frozen" by the Decision task and it will never need to be recomputed. E.g, you
could use -p task-id=<decision id>
and you'd still get the files_changed
value that Decision task computed. This means, that for all non-Decision use
cases we can rely on local VCS to give us our changed files.
This should greatly cut back on the number of queries being made to hg.m.o
.
Updated•2 months ago
|
Updated•2 months ago
|
Assignee | ||
Comment 8•2 months ago
|
||
Assignee | ||
Comment 9•2 months ago
|
||
Previously we were using json-automationrelevance
to compute the base_rev.
But recently we improved the Decision task to be smarter about determining
this, so the value already baked into the parameters should be sufficient.
Note that bug 1884386 already removed the transform that was calling
get_json_automationrelevance
. Rather than re-adding it, this patch uses the
task_context
transforms to grab the value from parameters instead.
This additionally adds the --force-local-files-changed
flag to the command,
as we might miss something if we just used the fake values baked into the test
params.
Comment 10•2 months ago
|
||
Pushed by ahalberstadt@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/0c89ad084b0d Convert gecko_taskgraph/test/test_decision.py to pytest format, r=taskgraph-reviewers,gbrown https://hg.mozilla.org/integration/autoland/rev/85241d2d265c Create a new 'files_changed' parameter, r=taskgraph-reviewers,releng-reviewers,jcristau https://hg.mozilla.org/integration/autoland/rev/d6232354e9f3 Add an option to |mach taskgraph| to force using locally changed files, r=taskgraph-reviewers,jcristau https://hg.mozilla.org/integration/autoland/rev/60a6ed6de4ae Re-enable tgdiff task and force local files changed, r=taskgraph-reviewers,jcristau
Comment 11•2 months ago
|
||
Backed out for causing py3 unit test failures
Backout link: https://hg.mozilla.org/integration/autoland/rev/719fcbd9cc2a0730856fd730ef31f17f6c00a005
taskcluster/test/test_mach_try_auto.py::test_only_important_manifests PASSED
[task 2024-03-14T16:28:52.831Z] taskcluster/test/test_mach_try_auto.py::test_tasks_are_scheduled[mochitest-browser-chrome] TEST-UNEXPECTED-FAIL
[task 2024-03-14T16:28:52.831Z] taskcluster/test/test_mach_try_auto.py::test_tasks_are_not_scheduled[no shippable builds] PASSED
[task 2024-03-14T16:28:52.831Z] taskcluster/test/test_mach_try_auto.py::test_tasks_are_not_scheduled[no fuzzing builds] PASSED
[task 2024-03-14T16:28:52.831Z] taskcluster/test/test_mach_try_auto.py::test_tasks_are_not_scheduled[no ccov builds] PASSED
[task 2024-03-14T16:28:52.831Z] taskcluster/test/test_mach_try_auto.py::test_tasks_are_not_scheduled[no build-signing] TEST-UNEXPECTED-FAIL
[task 2024-03-14T16:28:52.831Z] taskcluster/test/test_mach_try_auto.py::test_tasks_are_not_scheduled[no upload-symbols] PASSED
Assignee | ||
Comment 12•2 months ago
|
||
Comment 13•2 months ago
|
||
Pushed by ahalberstadt@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/d3a11f386f93 Convert gecko_taskgraph/test/test_decision.py to pytest format, r=taskgraph-reviewers,gbrown https://hg.mozilla.org/integration/autoland/rev/6bae790de5de Create a new 'files_changed' parameter, r=taskgraph-reviewers,releng-reviewers,jcristau https://hg.mozilla.org/integration/autoland/rev/0c6f696169f0 Add an option to |mach taskgraph| to force using locally changed files, r=taskgraph-reviewers,jcristau https://hg.mozilla.org/integration/autoland/rev/5ed9438f3dc0 Re-enable tgdiff task and force local files changed, r=taskgraph-reviewers,jcristau https://hg.mozilla.org/integration/autoland/rev/dc701f4ef6c4 Use local files_changed in Taskcluster integration tests, r=taskgraph-reviewers,bhearsum
Comment 14•2 months ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/d3a11f386f93
https://hg.mozilla.org/mozilla-central/rev/6bae790de5de
https://hg.mozilla.org/mozilla-central/rev/0c6f696169f0
https://hg.mozilla.org/mozilla-central/rev/5ed9438f3dc0
https://hg.mozilla.org/mozilla-central/rev/dc701f4ef6c4
Comment 15•2 months ago
|
||
Just wanted to come in here and say thanks for taking this on to help with the load issues on hg.mozilla.org, it is much appreciated.
Updated•2 months ago
|
Assignee | ||
Comment 16•2 months ago
|
||
Np! Has it made a noticeable difference?
Assignee | ||
Updated•2 months ago
|
Description
•