Pushes to try with --trigger-tests should make us schedule TC test jobs as many times as requested

RESOLVED FIXED in Firefox 47

Status

Testing
General
RESOLVED FIXED
2 years ago
2 years ago

People

(Reporter: armenzg, Assigned: armenzg)

Tracking

unspecified
mozilla47
Points:
---

Firefox Tracking Flags

(firefox47 fixed)

Details

MozReview Requests

Submitter Diff Changes Open Issues Last Updated
Loading...
Error loading review requests:

Attachments

(1 attachment)

(Assignee)

Description

2 years ago
We currently accomplish this on the Buildbot side by having a pulse listener watch for '--times' in the try syntax and scheduling extra jobs.

Is there something we can put in a task definition to schedule a job multiple times instead of just once?
Or should we create one task per the number indicated with --times?

We can also change the pulse monitor to do this instead of through the gecko decision.

Preference? Recommendations?

Comment 1

2 years ago
when "--times" is specified, it schedules *all* jobs that many times?  If so, it sounds like the integration component that schedules the decision task (mozilla-taskcluster) would just do so N times.  You'll get a graph for each of them.  I would think we shouldn't add them all to the same graph because there is an upper limit to the number of tasks that can be in a graph.
(Assignee)

Comment 2

2 years ago
We schedule the test jobs N times. Not the builds.

Comment 3

2 years ago
We should be able to do this within the decision task to part out that try flag and when iterating over the tests add it to the graph N times.
(Assignee)

Comment 4

2 years ago
What is the upper limit of tasks for a graph?

We want to run an experiment this weekend with 100 jobs per task.
I think I can create graphs from mozci. If I know the limit I will make sure not to create graphs bigger than that.

On another note, would it better to create independent tasks instead of graphs of tasks?
Assignee: nobody → armenzg

Comment 5

2 years ago
I was told that the upper limit would be somewhere around 1300 tasks but you could experience issues before that is reached depending on the size of the graph.

These tests would need to be part of the same graph as the build for the tests to wait until the build is complete before running.  With our current scheduler you cannot require a task from another graph to be completed before running.
1300 tasks- is that all the total jobs we can run for a given push?  or is this per platform?  one of the goals of switching to taskcluster is being able to run in smaller chunks, that would imply about 250 jobs for each platform- so linux32/64/asan opt/debug already puts us at 1250, and that is just for linux.  Our current chunking which is limited by buildbot buildernames is not ideal and leaves us with about 50 per platform (except for android which has more!).

Comment 7

2 years ago
Jonas, is the rough limit of 1300 tasks for all tasks within a graph or the number of tasks that could be submitted to extend the graph at a single time?

If I recall correctly, the limit of tasks is related to the size of the document we can store in azure table storage, and after 1300 tasks it becomes too large to be stored but I might be mistaken.  I'll let Jonas weigh in.
Flags: needinfo?(jopsen)
(Assignee)

Comment 8

2 years ago
I believe we could schedule a graph per platform if we wanted to (a build with its associated tests).
(Assignee)

Comment 9

2 years ago
https://treeherder.mozilla.org/#/jobs?repo=try&revision=d246e5981927
(Assignee)

Comment 10

2 years ago
https://treeherder.mozilla.org/#/jobs?repo=try&revision=ded063aded6e
(Assignee)

Comment 11

2 years ago
Created attachment 8713652 [details]
MozReview Request: Bug 1243039 - Allow, on try, to schedule TaskCluster test jobs multiple times. DONTBUILD. r=garndt

Review commit: https://reviewboard.mozilla.org/r/32797/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/32797/
(Assignee)

Comment 12

2 years ago
Hi gardnt, I believe my patch is correct, however, the gecko decision task discards the extra tasks.

If you look at my output locally [1] you will see that mochitest-chrome 3 appears three times, however, the scheduled graph only lists it once [2]. If I try to look at how many tasks should have been scheduled, the number is 16 [3], however, I only see 6 on the graph.

Would you mind having a look at what happens on the scheduling side?



[1]
armenzg@armenzg-thinkpad:~/repos/mozilla-central$ ./mach taskcluster-graph --pushlog-id=107023 --project=try '--message=try: -b d -p linux64 -u mochitest-3 -t none --rebuild 3' --owner=armenzg@mozilla.com --level=1 --revision-hash=25014ae86fb9e850ff07ffa4b829b8f6be06455d --extend-graph | grep "mochitest-chrome 3"
Querying URL for pushdate: None/json-pushes?changeset=None
Error querying pushinfo for repository 'None' revision 'None'
                    "name": "[TC] Linux64 mochitest-chrome 3", 
                    "name": "[TC] Linux64 mochitest-chrome 3", 
                    "name": "[TC] Linux64 mochitest-chrome 3",

[2] https://tools.taskcluster.net/task-graph-inspector/#chSsS1XkQ36QlHs1eJbpkg/

[3]
armenzg@armenzg-thinkpad:~/repos/mozilla-central$ ./mach taskcluster-graph --pushlog-id=107023 --project=try '--message=try: -b d -p linux64 -u mochitest-3 -t none --rebuild 3' --owner=armenzg@mozilla.com --level=1 --revision-hash=25014ae86fb9e850ff07ffa4b829b8f6be06455d --extend-graph | grep '"name"' | grep "TC"
Querying URL for pushdate: None/json-pushes?changeset=None
Error querying pushinfo for repository 'None' revision 'None'
                    "name": "[TC] Linux64 Dbg", 
                    "name": "[TC] Linux64 mochitest-plain e10s 3", 
                    "name": "[TC] Linux64 mochitest-plain e10s 3", 
                    "name": "[TC] Linux64 mochitest-plain e10s 3", 
                    "name": "[TC] Linux64 mochitest-plain 3", 
                    "name": "[TC] Linux64 mochitest-plain 3", 
                    "name": "[TC] Linux64 mochitest-plain 3", 
                    "name": "[TC] Linux64 mochitest-browser-chrome e10s M(bc3)", 
                    "name": "[TC] Linux64 mochitest-browser-chrome e10s M(bc3)", 
                    "name": "[TC] Linux64 mochitest-browser-chrome e10s M(bc3)", 
                    "name": "[TC] Linux64 mochitest-browser-chrome M(bc3)", 
                    "name": "[TC] Linux64 mochitest-browser-chrome M(bc3)", 
                    "name": "[TC] Linux64 mochitest-browser-chrome M(bc3)", 
                    "name": "[TC] Linux64 mochitest-chrome 3", 
                    "name": "[TC] Linux64 mochitest-chrome 3", 
                    "name": "[TC] Linux64 mochitest-chrome 3",
Flags: needinfo?(garndt)
Each of those tasks must have a unique task ID otherwise the queue would have though the same task was being submitted and ignore it.
Flags: needinfo?(garndt)
(Assignee)

Comment 14

2 years ago
https://treeherder.mozilla.org/#/jobs?repo=try&revision=831171e83b91
(Assignee)

Comment 15

2 years ago
https://treeherder.mozilla.org/#/jobs?repo=try&revision=70b759d1a66e
> Jonas, is the rough limit of 1300 tasks for all tasks within a graph
Max number of tasks in a graph, regardless of how many times you call extend.

Note: big-graph scheduler won't have this limitation. Hence, the limitation is going away.
Flags: needinfo?(jopsen)
will big-graph scheduler require changes to how we currently hack the graph?  What is the timeline for big-graph?
(Assignee)

Updated

2 years ago
Attachment #8713652 - Attachment description: MozReview Request: Bug 1243039 - Allow on try to schedule TC test jobs multiple times. → MozReview Request: Bug 1243039 - Allow on try to schedule TC test jobs multiple times. r=garndt
Attachment #8713652 - Flags: review?(garndt)
(Assignee)

Comment 18

2 years ago
Comment on attachment 8713652 [details]
MozReview Request: Bug 1243039 - Allow, on try, to schedule TaskCluster test jobs multiple times. DONTBUILD. r=garndt

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/32797/diff/1-2/
(Assignee)

Updated

2 years ago
Summary: Pushes to try with --times should make us schedule test jobs as many times as requested → Pushes to try with --rebuild should make us schedule TC test jobs as many times as requested
Comment on attachment 8713652 [details]
MozReview Request: Bug 1243039 - Allow, on try, to schedule TaskCluster test jobs multiple times. DONTBUILD. r=garndt

https://reviewboard.mozilla.org/r/32797/#review29681

Just a nit about the name of the option, but I don't have a better name to suggest, and I think we can get away with not needing to copy the test_task each time.

::: testing/taskcluster/mach_commands.py:545
(Diff revision 2)
> +                        test_task = copy.deepcopy(test_task)

Is there a need to do any kind of copy of this or couldn't we just update the task ID and append it?

::: testing/taskcluster/taskcluster_graph/commit_parser.py:26
(Diff revision 2)
> -def escape_whitspace_in_brackets(input_str):
> +def escape_whitespace_in_brackets(input_str):

ah thanks for catching this!

::: testing/taskcluster/taskcluster_graph/commit_parser.py:262
(Diff revision 2)
> +    parser.add_argument('--rebuild', dest='rebuild', type=int, default=1)

This option being called "rebuild" makes it seem like we are actually rebuilding something when really we're just running the tests N times.
Attachment #8713652 - Flags: review?(garndt) → review+
(Assignee)

Comment 20

2 years ago
https://treeherder.mozilla.org/#/jobs?repo=try&revision=6c42f0fa7b6d
(Assignee)

Comment 21

2 years ago
https://treeherder.mozilla.org/#/jobs?repo=try&revision=7f49d9421358
(Assignee)

Comment 22

2 years ago
https://treeherder.mozilla.org/#/jobs?repo=try&revision=ef04f3a5948a
(Assignee)

Comment 23

2 years ago
https://treeherder.mozilla.org/#/jobs?repo=try&revision=d4c8e0fe852d
(Assignee)

Comment 24

2 years ago
https://treeherder.mozilla.org/#/jobs?repo=try&revision=14cd0e7aad77
(Assignee)

Comment 25

2 years ago
(In reply to Greg Arndt [:garndt] from comment #19)
..
> ::: testing/taskcluster/mach_commands.py:545
> (Diff revision 2)
> > +                        test_task = copy.deepcopy(test_task)
> 
> Is there a need to do any kind of copy of this or couldn't we just update
> the task ID and append it?
> 
Yes, otherwise, the three tasks will use the task id of the last time I touched that field.
It is a reference within the graph, hence, needing a deepcopy.

> ::: testing/taskcluster/taskcluster_graph/commit_parser.py:262
> (Diff revision 2)
> > +    parser.add_argument('--rebuild', dest='rebuild', type=int, default=1)
> 
> This option being called "rebuild" makes it seem like we are actually
> rebuilding something when really we're just running the tests N times.

I was re-using what chmanchester used on his trigger bot.
I believe --trigger-tests is the compromise we came up with.

I will land with that naming and file bugs for the other tools.
(Assignee)

Updated

2 years ago
Summary: Pushes to try with --rebuild should make us schedule TC test jobs as many times as requested → Pushes to try with --trigger-tests should make us schedule TC test jobs as many times as requested
(Assignee)

Updated

2 years ago
Attachment #8713652 - Attachment description: MozReview Request: Bug 1243039 - Allow on try to schedule TC test jobs multiple times. r=garndt → MozReview Request: Bug 1243039 - Allow, on try, to schedule TaskCluster test jobs multiple times. DONTBUILD. r=garndt
(Assignee)

Comment 26

2 years ago
Comment on attachment 8713652 [details]
MozReview Request: Bug 1243039 - Allow, on try, to schedule TaskCluster test jobs multiple times. DONTBUILD. r=garndt

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/32797/diff/2-3/
https://reviewboard.mozilla.org/r/32797/#review29779
(Assignee)

Comment 28

2 years ago
https://hg.mozilla.org/integration/mozilla-inbound/rev/de6d4626e415e1aa93dc00209197e6daeaf15ddd
Bug 1243039 - Allow, on try, to schedule TaskCluster test jobs multiple times. DONTBUILD. r=garndt
(Assignee)

Comment 29

2 years ago
https://reviewboard.mozilla.org/r/32797/#review29681

> This option being called "rebuild" makes it seem like we are actually rebuilding something when really we're just running the tests N times.

I replied to this in the bug. We will use --trigger-tests instead.

Comment 30

2 years ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/de6d4626e415
Status: NEW → RESOLVED
Last Resolved: 2 years ago
status-firefox47: --- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla47
(Assignee)

Comment 31

2 years ago
I want to give triggerbot the capacity of scheduling as many TC jobs as required.
As we add more platforms the current fix will not handle very high number of tasks.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
:armen, can we outline what issues we have here?  are we blocked on big graph support?  are there other related bugs?
Flags: needinfo?(armenzg)
(Assignee)

Comment 33

2 years ago
For now I will close this again.
We have a working solution with the first version we landed.

I've filed bug 1250988 to take my work in progress and implement it with pulse_actions.
It will help with adding full TaskCluster support in mozci/pulse_actions.
Status: REOPENED → RESOLVED
Last Resolved: 2 years ago2 years ago
Flags: needinfo?(armenzg)
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.