1254325 - Change the runnable_jobs API to include TaskCluster jobs as well

Reporter

Description

•

9 years ago

I want to define this project as a Google Summer of Code project [1] This is a feature that developers are really liking and we need to make it work for TaskCluster. This is the equivalent to bug 1194830 This depends on bug 1232005 which would give us the ability to know which tasks could have been scheduled. [1] https://docs.google.com/document/d/1TDF-UKYb16wxvAtqdzCFauTvJXSx_fjf7i1f8F3l7bk/edit#heading=h.fsx45ke2f3gd

Kalpesh Krishna [:martianwars]

Assignee

Comment 1

•

9 years ago

jmaher, armenzg, Hi! I would like to start working on this! Went through the docs briefly. What sort of setup would I need? My treeherder is setup :) martianwars

Flags: needinfo?(jmaher)

Flags: needinfo?(armenzg)

Joel Maher ( :jmaher ) (UTC -8)

Comment 2

•

9 years ago

there are a few parts here: 1) getting the list of available jobs from taskcluster to schedule 2) having pulse actions take the selected job(s) from a treeherder requestion initiated in the UI and resolve scheduling via taskcluster api 3) working on taskcluster and the gecko decision task to accept requests coming in and making it happen. 4) worrying about dependencies (request a test job, make sure the build job exists or is scheduled) if you are fluent in treeherder, I would look to pulse actions: https://github.com/mozilla/pulse_actions. Likewise see the interface for buildbot (https://github.com/mozilla/mozilla_ci_tools) as an example. And finally read up on Taskcluster and the gecko decision task.

Flags: needinfo?(jmaher)

Flags: needinfo?(armenzg)

Kalpesh Krishna [:martianwars]

Assignee

Updated

•

9 years ago

Assignee: nobody → kalpeshk2011

Kalpesh Krishna [:martianwars]

Assignee

Comment 3

•

9 years ago

So I guess I'll begin work on this. My first task here is to fetch all_tasks.json (you can see the file here https://tools.taskcluster.net/task-inspector/#PzmhiGZ8TMiLxeNf0IZtbg/0). I will be following the approach used for allthethings.json

Kalpesh Krishna [:martianwars]

Assignee

Comment 4

•

9 years ago

A conversation between camd and me. <martianwars> camd, wlach: a bit confused here. When exactly is this task triggered? https://github.com/mozilla/treeherder/blob/master/treeherder/etl/tasks/buildapi_tasks.py#L39 12:24 AM <•camd> martianwars: so that's a celery task 12:24 AM <•camd> it's scheduled on a "celery beat" 12:24 AM <•camd> that's specified in the settings.py file 12:24 AM <•camd> I believe it's set to happen once/day right now 12:25 AM <martianwars> camd: yeah correct, it's set to happen once a day 12:26 AM <martianwars> camd: thanks! 12:29 AM <•camd> martianwars: you bet! :) 12:32 AM <martianwars> camd: so the problem here is, all_tasks.json isn't on any public link. I would have to fetch the URL from the UI (which isn't hard to produce), send it to the Django API and use python's requests or something 12:32 AM <martianwars> camd: it's push specific in other words 12:33 AM <martianwars> camd: is this possible? if not, we can use the "latest" of the day 12:34 AM <•camd> martianwars: you shouldn't have to fetch it from the UI. You have access to the whole db from the service. 12:34 AM <•camd> from the python side, that is 12:35 AM <•camd> But I think I see your point 12:35 AM <martianwars> camd: no, I'll just get the URL from the UI, send it to the python side, and python fetches it realtime I mean. Do you like this approach? IMO, it'll be too slow 12:35 AM <•camd> you need to fetch that file from one of the jobs in the push that's a taskcluster job 12:35 AM mcote|afk → mcote 12:35 AM <•camd> martianwars: oh I see. 12:35 AM <•camd> I'm not sure how slow that would be. might not be a problem at all 12:36 AM <•camd> martianwars: that's worth trying as that might be simpler. then you wouldn't have to do the fetch on the service side at all 12:37 AM <martianwars> camd: I'll just check up to see how different this file is push to push 12:39 AM <martianwars> camd: it seems to have revision data, so I think it's better to get it each time "Add New Jobs" is pressed or something 12:41 AM <•camd> martianwars: yeah, it kind of seems like you would have the api just use ANY taskcluster job in the push, and send that up 12:41 AM <•camd> then python would open the file and send the job info down with it. 12:42 AM <martianwars> camd: so should I fetch the file using "requests" in python? 12:42 AM <•camd> martianwars: well, I'd have to look closer at the code to determine the actual workflow. But sounds like you're on a good track 12:43 AM <•camd> martianwars: yes, that's the right way 12:44 AM <martianwars> camd: alright thanks! Also, I can see all the pushes in local.treeherder.mozilla.org (identical to the actual treeherder.mozilla.org). Is that a problem? 12:44 AM <martianwars> I didnt need to open another terminal for this ^ 12:45 AM <•camd> martianwars: if you are running "celery -A treeherder worker -B" then that's what you'd get, yeah. 12:45 AM <•camd> there's a celery beat task called "pushlog" that gets all the pushes locally, just like for production 12:45 AM camd → •camd|lunch

Flags: needinfo?(armenzg)

Armen [:armenzg]

Reporter

Comment 5

•

9 years ago

I read the conversation but it is still a bit confusing the approach. camd: what do you think of what I write below? If it helps, here are some links to the tasks and to the artifact: https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&filter-searchStr=gecko&selectedJob=3866852&revision=c3f5e6079284 https://tools.taskcluster.net/task-inspector/#C4UJf0PhSleAkNDckZENvg/0 The following are just some ideas on how this would work in my head. Steps: * Users clicks add new jobs * UI or backend grabs all_tasks.json which is an artifact of the gecko decision task ** Treeherder has the info about where this file is hosted Steps: * Push appears on Treeherder * Gecko decision completes * Treeherder notices this, fetches and caches the all_tasks.json information, thus, being ready for when the user clicks "add new jobs" ** OR it puts the data in some sort of API similar to runnable Steps: * Push appears on Treeherder * Gecko decision submits all_tasks.json information to Treeherder via pulse or an API, thus, being ready for when the user clicks "add new jobs" ** OR it puts the data in some sort of API similar to runnable

Flags: needinfo?(armenzg) → needinfo?(cdawson)

Cameron Dawson [:camd]

Comment 6

•

9 years ago

Hey Armen: yeah, those all sound like viable approaches. TBH, IMO (need another TLA here...) going with the first one would be best. We've done premature optimizations so many times that end up just not being necessary. I would think we should just grab that data in real-time and see how the performance is. If it's too slow, then we could try approach 2. Approach 2 is similar to how Alice did things, which is fine. But perhaps we could go with the simplest approach here and just optimize as needed. But I don't feel TOO strongly about it. :) If you prefer the pre-caching, then we can do that.

Flags: needinfo?(cdawson)

GitHub Autolander Bot

Comment 7

•

9 years ago

Attached file [treeherder] martiansideofthemoon:forceawakens > mozilla:master — Details

Kalpesh Krishna [:martianwars]

Assignee

Comment 8

•

9 years ago

Attached image Screenshot from 2016-05-18 02:31:09.png — Details

This is the set of jobs that I get (marked with yellow squares) :) Does it look good? Should there be any more jobs?

Attachment #8753534 - Flags: review?(armenzg)

Kalpesh Krishna [:martianwars]

Assignee

Comment 9

•

9 years ago

Attached image Screenshot from 2016-05-18 02:36:36.png — Details

This is the original set of jobs. So I guess this completes the first part of the PR, which was to allow TC runnable jobs to be shown in Treeherder. Now I need to study how the code works when we try to click on the yellow symbols.

Kalpesh Krishna [:martianwars]

Assignee

Comment 10

•

9 years ago

Comment on attachment 8751911 [details] [review] [treeherder] martiansideofthemoon:forceawakens > mozilla:master camd, I hope this is okay :)

Attachment #8751911 - Flags: feedback?(cdawson)

Armen [:armenzg]

Reporter

Comment 11

•

9 years ago

Comment on attachment 8753534 [details] Screenshot from 2016-05-18 02:31:09.png That looks about right :)

Attachment #8753534 - Flags: review?(armenzg) → review+

Armen [:armenzg]

Reporter

Comment 12

•

9 years ago

Hi Kalpesh, Here's some info on what comes after you can produce the pulse messages from Treeherder. Right now, for Buildbot, we get the list of builders selected and it is processed by "buildbot_graph_builder()" https://github.com/mozilla/pulse_actions/blob/master/pulse_actions/handlers/treeherder_runnable.py#L91 In the case of Buildbot, there can be these in a request: * requests for one or more test builders (downstream jobs) * The developer might have forgotten to select the required "build" for a specific test builder * The developer might have chosen a test job for which there is completed "build"; in that case pulse_actions requests the missing build to be scheduled * The developer might have chosen a test job for which there is a running "build", thus, we can't schedule the test job until it completes (right this is not possible for Buildbot even through the Buildbot bridge) * requests for one or more build builders (upstream jobs) * There might be a running build; I don't know if we do or do not schedule it. I think we do * There might not be a build scheduled In the case of TaskCluster, if we receive the tasks that the developer requests + associated build job we won't need to do any searching for the upstream job on pulse_actions. What I'm trying to say is that Treherder should be able to know that if a selected job has the "required" field, it should then also send the associated task for the upstream build. * task N requires task A * developer only selects task N * Treeherder determines that both tasks should be sent over pulse The alternative is to make pulse_actions have to go and search for all_tasks.json to determine what other tasks does task N need (similar to what Buildbot does).

Cameron Dawson [:camd]

Comment 13

•

9 years ago

Comment on attachment 8751911 [details] [review] [treeherder] martiansideofthemoon:forceawakens > mozilla:master Looking good so far! Please need-info me if you have more questions and I'm not around on IRC. :)

Attachment #8751911 - Flags: feedback?(cdawson) → feedback+

Kalpesh Krishna [:martianwars]

Assignee

Comment 14

•

9 years ago

:camd, :armenzg, :garndt, :dustin, I'd like to outline the approach I plan to follow with respect to Treeherder. Dustin has landed changes which changes all_tasks.json to full-tasks.json and adds a few changes in the JSON format. For example, you can see an artifact https://public-artifacts.taskcluster.net/SQas-oGSQaWoyFIPLeLBdg/0/public/full-task-graph.json. This file has a number of keys called TaskLabels each having a task JSON. I intend to send these taskLabels along with the URL of the push in the pulse message. MozCI can download this file and get the corresponding task easily using these taskLabels. What are your thoughts?

Flags: needinfo?(garndt)

Flags: needinfo?(cdawson)

Flags: needinfo?(armenzg)

Greg Arndt [:garndt]

Comment 15

•

9 years ago

I think that by doing it this way, mozci should be able to locate the label in the full-tasks.json and be able to know what to schedule. Would be nice if there was a way to have the labels for these that mozci could resubmit a decision task for and things would just get scheduled without mozci needing to make a decision but that might be difficult or not possible yet.

Flags: needinfo?(garndt)

Kalpesh Krishna [:martianwars]

Assignee

Comment 16

•

9 years ago

Comment on attachment 8751911 [details] [review] [treeherder] martiansideofthemoon:forceawakens > mozilla:master Hey Cameron, This is working for me locally (I hope I've removed all the debugging code). I'm not sure how to see the Pulse messages, but I can definitely see status code 200 for both add_runnable_jobs and runnable_jobs APIs.

Attachment #8751911 - Flags: review?(cdawson)

Armen [:armenzg]

Reporter

Comment 17

•

9 years ago

WRT to pulse messages, are you publishing locally and you want to listen to that? Or are you intending to listen to the production exchanges/topics?

Cameron Dawson [:camd]

•

9 years ago

I would also gain value by adding UTC unix timestamp to the messages so I can measure how late we're to processing it.

Cameron Dawson [:camd]

•

9 years ago

dustin and I spoke about this project in the RRA. For now, we would like to *only* focus on adding new jobs for Try. This means that the pulse_actions credentials will only be able to schedule Try jobs. On the TH PR, could we avoid allowing to add new jobs for TaskCluster in every other repo but try? Once TH uses TaskCluster for authentication, I would like to see the scm_level being sent as part of the message so pulse_actions can determine if the user can add new jobs on that push. Another approach would be that TH would not allonw users click "add new jobs" unless they have the right scm level (e.g. someone has try push level and can only add new jobs to try pushes). camd: what do you think about adding UI restrictions on TH? [1] From conversation with dustin > assume we can ensure that only task graphs the user could have created (are at or below their SCM level) can be extended, > so a level-1 person can only modify level-1 pushes (try) > -> then this service provides no capabilities not already present by pushing to that tree > > We'll accomplish this initially by only operating at level 1. Then when treeherder's integration with taskcluster has > improved (and thus we can determine the user's permissions and/or scm level reliably), we can improve that.

Flags: needinfo?(cdawson)

Summary: Give Treeherder the ability to add TaskCluster jobs to pushes → Give Treeherder the ability to add TaskCluster jobs to try pushes

Armen [:armenzg]

Reporter

Comment 25

•

9 years ago

I want to document the use cases I want to see tested: 1) only a build job is requested 2) only a test job is requested for a completed build 3) only a test job is requested for a non-completed build 4) only a test job is requested for a currently running build We should *not* be scheduling a new build if there is already one completed or currently running. We should *not* be creating the docker image tasks unless we really have to.

Cameron Dawson [:camd]

Comment 26

•

9 years ago

Armen-- Yeah, that's just fine about the UI restrictions to only try. There's precedence there, actually. I think we had code like that where you could only cancel jobs (or was it all jobs?) on try unless you're a sheriff. I don't think you'd want to hide the menu item in that case, because you still want to be able to add BB jobs. But you could certainly do that in the controller or model. Just do the same check for sheriff'ness that is done for enabling the buttons in the Sheriff panel.

Flags: needinfo?(cdawson)

Armen [:armenzg]

Reporter

Comment 27

•

9 years ago

Thanks camd! martianwars: do the changes that camd and I speak on comment 26 and comment 24 make sense to you?

Armen [:armenzg]

Reporter

Comment 28

•

9 years ago

Note from one of my emails: I see in full-task-graph.json that build jobs depend on the docker build image task. However, if I recall correctly, we don't always need to build the docker image and can go straight to running the build (since we can reuse the docker build image that was built on another push). How can we know when we don't need to request building the docker build image? ---- dustin's reply is to use optimization which lives in the tree. It is unlikely that our system will work well in pulse_actions unless we have access to the code in-tree. There are other messy ways of accomplishing this, however, it might make more sense to start investing on the "action tasks" idea. We should talk more about it in London.

Kalpesh Krishna [:martianwars]

Assignee

Comment 29

•

9 years ago

Comment on attachment 8751911 [details] [review] [treeherder] martiansideofthemoon:forceawakens > mozilla:master I've completed the small issues. I've added UTC timestamps in the Pulse messages. I've also restricted this to the try repository. @armenzg

Attachment #8751911 - Flags: review?(cdawson)

Kalpesh Krishna [:martianwars]

Assignee

Comment 30

•

9 years ago

By restricting to try, I mean that TC jobs will only show up on pressing "Add New Jobs" in the try repository. For other repositories, you can still use "Add New Jobs" to schedule buildbot jobs.

Cameron Dawson [:camd]

Comment 31

•

9 years ago

I'm sorry I haven't reviewed this yet. I will take a look at it tomorrow.

Cameron Dawson [:camd]

Armen [:armenzg]

Reporter

Comment 40

•

9 years ago

Can we keep this bug open until we're fully live? or rename the bug and use it a different one? For external viewers it might be confusing to see the bug marked as fixed.

Ed Morley [:emorley]

Comment 41

•

9 years ago

It is fully live, unless I'm missing something? If there is remaining work, I agree adjusting the bug summary to reflect what part was completed here would be good. (I'd prefer not to keep the bug open, it's harder to track what has and hasn't landed)

Armen [:armenzg]

Reporter

Comment 42

•

9 years ago

This bug's title proposal: Add to Treeherder's try runnable API TaskCluster tasks New bug: Enable Treeherder's ability to add TaskCluster jobs to try pushes martianwars: does this work for you?

Flags: needinfo?(kalpeshk2011)

Kalpesh Krishna [:martianwars]

Assignee

•

9 years ago

Blocks: 1285924

Kalpesh Krishna [:martianwars]

Assignee

Updated

•

9 years ago

Flags: needinfo?(kalpeshk2011)

Armen [:armenzg]

Reporter

Updated

•

9 years ago

Blocks: 1284911

Ed Morley [:emorley]

Updated

•

9 years ago

Depends on: 1288053

[treeherder] martiansideofthemoon:forceawakens > mozilla:master 9 years ago GitHub Autolander Bot 47 bytes, text/x-github-pull-request	camd : review+ camd : feedback+	Details \| Review
Screenshot from 2016-05-18 02:31:09.png 9 years ago Kalpesh Krishna [:martianwars] 80.19 KB, image/png	armenzg : review+	Details
Screenshot from 2016-05-18 02:36:36.png 9 years ago Kalpesh Krishna [:martianwars] 19.69 KB, image/png		Details
[treeherder] martiansideofthemoon:icelandfootballislove > mozilla:master 9 years ago GitHub Autolander Bot 47 bytes, text/x-github-pull-request	camd : review+	Details \| Review