Closed Bug 1243039 Opened 8 years ago Closed 8 years ago

Pushes to try with --trigger-tests should make us schedule TC test jobs as many times as requested

Categories

(Testing :: General, defect)

defect
Not set
normal

Tracking

(firefox47 fixed)

RESOLVED FIXED
mozilla47
Tracking Status
firefox47 --- fixed

People

(Reporter: armenzg, Assigned: armenzg)

References

Details

Attachments

(1 file)

We currently accomplish this on the Buildbot side by having a pulse listener watch for '--times' in the try syntax and scheduling extra jobs.

Is there something we can put in a task definition to schedule a job multiple times instead of just once?
Or should we create one task per the number indicated with --times?

We can also change the pulse monitor to do this instead of through the gecko decision.

Preference? Recommendations?
when "--times" is specified, it schedules *all* jobs that many times?  If so, it sounds like the integration component that schedules the decision task (mozilla-taskcluster) would just do so N times.  You'll get a graph for each of them.  I would think we shouldn't add them all to the same graph because there is an upper limit to the number of tasks that can be in a graph.
We schedule the test jobs N times. Not the builds.
We should be able to do this within the decision task to part out that try flag and when iterating over the tests add it to the graph N times.
What is the upper limit of tasks for a graph?

We want to run an experiment this weekend with 100 jobs per task.
I think I can create graphs from mozci. If I know the limit I will make sure not to create graphs bigger than that.

On another note, would it better to create independent tasks instead of graphs of tasks?
Assignee: nobody → armenzg
I was told that the upper limit would be somewhere around 1300 tasks but you could experience issues before that is reached depending on the size of the graph.

These tests would need to be part of the same graph as the build for the tests to wait until the build is complete before running.  With our current scheduler you cannot require a task from another graph to be completed before running.
1300 tasks- is that all the total jobs we can run for a given push?  or is this per platform?  one of the goals of switching to taskcluster is being able to run in smaller chunks, that would imply about 250 jobs for each platform- so linux32/64/asan opt/debug already puts us at 1250, and that is just for linux.  Our current chunking which is limited by buildbot buildernames is not ideal and leaves us with about 50 per platform (except for android which has more!).
Jonas, is the rough limit of 1300 tasks for all tasks within a graph or the number of tasks that could be submitted to extend the graph at a single time?

If I recall correctly, the limit of tasks is related to the size of the document we can store in azure table storage, and after 1300 tasks it becomes too large to be stored but I might be mistaken.  I'll let Jonas weigh in.
Flags: needinfo?(jopsen)
I believe we could schedule a graph per platform if we wanted to (a build with its associated tests).
Hi gardnt, I believe my patch is correct, however, the gecko decision task discards the extra tasks.

If you look at my output locally [1] you will see that mochitest-chrome 3 appears three times, however, the scheduled graph only lists it once [2]. If I try to look at how many tasks should have been scheduled, the number is 16 [3], however, I only see 6 on the graph.

Would you mind having a look at what happens on the scheduling side?



[1]
armenzg@armenzg-thinkpad:~/repos/mozilla-central$ ./mach taskcluster-graph --pushlog-id=107023 --project=try '--message=try: -b d -p linux64 -u mochitest-3 -t none --rebuild 3' --owner=armenzg@mozilla.com --level=1 --revision-hash=25014ae86fb9e850ff07ffa4b829b8f6be06455d --extend-graph | grep "mochitest-chrome 3"
Querying URL for pushdate: None/json-pushes?changeset=None
Error querying pushinfo for repository 'None' revision 'None'
                    "name": "[TC] Linux64 mochitest-chrome 3", 
                    "name": "[TC] Linux64 mochitest-chrome 3", 
                    "name": "[TC] Linux64 mochitest-chrome 3",

[2] https://tools.taskcluster.net/task-graph-inspector/#chSsS1XkQ36QlHs1eJbpkg/

[3]
armenzg@armenzg-thinkpad:~/repos/mozilla-central$ ./mach taskcluster-graph --pushlog-id=107023 --project=try '--message=try: -b d -p linux64 -u mochitest-3 -t none --rebuild 3' --owner=armenzg@mozilla.com --level=1 --revision-hash=25014ae86fb9e850ff07ffa4b829b8f6be06455d --extend-graph | grep '"name"' | grep "TC"
Querying URL for pushdate: None/json-pushes?changeset=None
Error querying pushinfo for repository 'None' revision 'None'
                    "name": "[TC] Linux64 Dbg", 
                    "name": "[TC] Linux64 mochitest-plain e10s 3", 
                    "name": "[TC] Linux64 mochitest-plain e10s 3", 
                    "name": "[TC] Linux64 mochitest-plain e10s 3", 
                    "name": "[TC] Linux64 mochitest-plain 3", 
                    "name": "[TC] Linux64 mochitest-plain 3", 
                    "name": "[TC] Linux64 mochitest-plain 3", 
                    "name": "[TC] Linux64 mochitest-browser-chrome e10s M(bc3)", 
                    "name": "[TC] Linux64 mochitest-browser-chrome e10s M(bc3)", 
                    "name": "[TC] Linux64 mochitest-browser-chrome e10s M(bc3)", 
                    "name": "[TC] Linux64 mochitest-browser-chrome M(bc3)", 
                    "name": "[TC] Linux64 mochitest-browser-chrome M(bc3)", 
                    "name": "[TC] Linux64 mochitest-browser-chrome M(bc3)", 
                    "name": "[TC] Linux64 mochitest-chrome 3", 
                    "name": "[TC] Linux64 mochitest-chrome 3", 
                    "name": "[TC] Linux64 mochitest-chrome 3",
Flags: needinfo?(garndt)
Each of those tasks must have a unique task ID otherwise the queue would have though the same task was being submitted and ignore it.
Flags: needinfo?(garndt)
> Jonas, is the rough limit of 1300 tasks for all tasks within a graph
Max number of tasks in a graph, regardless of how many times you call extend.

Note: big-graph scheduler won't have this limitation. Hence, the limitation is going away.
Flags: needinfo?(jopsen)
will big-graph scheduler require changes to how we currently hack the graph?  What is the timeline for big-graph?
Attachment #8713652 - Attachment description: MozReview Request: Bug 1243039 - Allow on try to schedule TC test jobs multiple times. → MozReview Request: Bug 1243039 - Allow on try to schedule TC test jobs multiple times. r=garndt
Attachment #8713652 - Flags: review?(garndt)
Comment on attachment 8713652 [details]
MozReview Request: Bug 1243039 - Allow, on try, to schedule TaskCluster test jobs multiple times. DONTBUILD. r=garndt

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/32797/diff/1-2/
Summary: Pushes to try with --times should make us schedule test jobs as many times as requested → Pushes to try with --rebuild should make us schedule TC test jobs as many times as requested
Comment on attachment 8713652 [details]
MozReview Request: Bug 1243039 - Allow, on try, to schedule TaskCluster test jobs multiple times. DONTBUILD. r=garndt

https://reviewboard.mozilla.org/r/32797/#review29681

Just a nit about the name of the option, but I don't have a better name to suggest, and I think we can get away with not needing to copy the test_task each time.

::: testing/taskcluster/mach_commands.py:545
(Diff revision 2)
> +                        test_task = copy.deepcopy(test_task)

Is there a need to do any kind of copy of this or couldn't we just update the task ID and append it?

::: testing/taskcluster/taskcluster_graph/commit_parser.py:26
(Diff revision 2)
> -def escape_whitspace_in_brackets(input_str):
> +def escape_whitespace_in_brackets(input_str):

ah thanks for catching this!

::: testing/taskcluster/taskcluster_graph/commit_parser.py:262
(Diff revision 2)
> +    parser.add_argument('--rebuild', dest='rebuild', type=int, default=1)

This option being called "rebuild" makes it seem like we are actually rebuilding something when really we're just running the tests N times.
Attachment #8713652 - Flags: review?(garndt) → review+
(In reply to Greg Arndt [:garndt] from comment #19)
..
> ::: testing/taskcluster/mach_commands.py:545
> (Diff revision 2)
> > +                        test_task = copy.deepcopy(test_task)
> 
> Is there a need to do any kind of copy of this or couldn't we just update
> the task ID and append it?
> 
Yes, otherwise, the three tasks will use the task id of the last time I touched that field.
It is a reference within the graph, hence, needing a deepcopy.

> ::: testing/taskcluster/taskcluster_graph/commit_parser.py:262
> (Diff revision 2)
> > +    parser.add_argument('--rebuild', dest='rebuild', type=int, default=1)
> 
> This option being called "rebuild" makes it seem like we are actually
> rebuilding something when really we're just running the tests N times.

I was re-using what chmanchester used on his trigger bot.
I believe --trigger-tests is the compromise we came up with.

I will land with that naming and file bugs for the other tools.
Summary: Pushes to try with --rebuild should make us schedule TC test jobs as many times as requested → Pushes to try with --trigger-tests should make us schedule TC test jobs as many times as requested
Attachment #8713652 - Attachment description: MozReview Request: Bug 1243039 - Allow on try to schedule TC test jobs multiple times. r=garndt → MozReview Request: Bug 1243039 - Allow, on try, to schedule TaskCluster test jobs multiple times. DONTBUILD. r=garndt
Comment on attachment 8713652 [details]
MozReview Request: Bug 1243039 - Allow, on try, to schedule TaskCluster test jobs multiple times. DONTBUILD. r=garndt

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/32797/diff/2-3/
https://hg.mozilla.org/integration/mozilla-inbound/rev/de6d4626e415e1aa93dc00209197e6daeaf15ddd
Bug 1243039 - Allow, on try, to schedule TaskCluster test jobs multiple times. DONTBUILD. r=garndt
https://reviewboard.mozilla.org/r/32797/#review29681

> This option being called "rebuild" makes it seem like we are actually rebuilding something when really we're just running the tests N times.

I replied to this in the bug. We will use --trigger-tests instead.
https://hg.mozilla.org/mozilla-central/rev/de6d4626e415
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla47
I want to give triggerbot the capacity of scheduling as many TC jobs as required.
As we add more platforms the current fix will not handle very high number of tasks.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
:armen, can we outline what issues we have here?  are we blocked on big graph support?  are there other related bugs?
Flags: needinfo?(armenzg)
For now I will close this again.
We have a working solution with the first version we landed.

I've filed bug 1250988 to take my work in progress and implement it with pulse_actions.
It will help with adding full TaskCluster support in mozci/pulse_actions.
Status: REOPENED → RESOLVED
Closed: 8 years ago8 years ago
Flags: needinfo?(armenzg)
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: