Closed Bug 1194830 Opened 4 years ago Closed 4 years ago

Integrate Try Extender with Treeherder

Categories

(Tree Management :: Treeherder, defect)

defect
Not set

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: adusca, Assigned: adusca)

References

Details

Attachments

(1 file, 1 obsolete file)

Currently Try Extender has its own web app: http://try-extender.herokuapp.com/ . It would be better if it was integrated with Treeherder instead.

In the UI I'm imagining, users would be able to click on a '+' or 'Add more jobs' button on Treeherder. This button would then show every job that can be added, using the traditional TH UI for jobs per push. Then the user would select the jobs they want and click on a button to trigger those jobs. TH could send a pulse message for Pulse Actions to pick up and trigger the necessary jobs.

For this, we would need to:

- Teach TH what jobs can be added on a push
- Figure out a UI for selecting jobs that fits with TH
- Authentication: currently Try Extender allows the commit author and people with a '@mozilla.com' email address to extend a push, what should we do after the TH integration?
Currently I'm working on teaching TH what jobs can be added: https://github.com/adusca/treeherder/commit/92984abbd49b3b7bd4402f0d435a11f087c185c1

I'm using a hardcoded json file with the information about all possible builds and their symbols. It was generated by a script that uses a mixture of mozci and treeherder/etl/buildbot.py functions: https://github.com/adusca/mozilla_ci_tools/blob/th-exp/mozci/scripts/misc/th_dict.py . I believe it will be possible to get a real endpoint in TH's api to return something similar and be less hardcoded.

I still need to subtract the existing jobs from the possible jobs list, create a status 'possible' style for possible jobs (currently I'm just pretending they are 'running') and figure out a way to select jobs before moving to the backend.

Any feedback on the current patch, architectural decisions, API endpoints or UX suggestions?
Flags: needinfo?(emorley)
I don't really understand how mozci or tryextender works, or how this fits in with the EOL of buildbot and moving to taskcluster. I think Armen might have a better idea :-)
Flags: needinfo?(emorley) → needinfo?(armenzg)
Attached file Add button to show all possible jobs (obsolete) —
I think Alice is more interested on getting feedback on the TH code she points out.

Buildbot is still staying around. I will be able to schedule the jobs through TaskCluster instead of BuildAPI but the builders will still behave the same and builds 4hr will still show them there.

The idea is that a user presses the "extend button" on the UI.
TH would be able to show all possible jobs which can be scheduled.
The user selects the jobs it would like to see scheduled and submits.
TH sends over pulse a message that pulse_actions consumes.
pulse_actions submits the requests and the jobs will show up on TH like normal.

I believe we can have an API for TH to calculate possible buildbot jobs (I can elaborate more) but I believe her patch is more about the UI changes.

Perhaps I can bring this up in today's meetig.
Flags: needinfo?(armenzg)
Attachment #8650443 - Flags: feedback?(emorley)
Comment on attachment 8650443 [details]
Add button to show all possible jobs

I think chatting about the overall approach at the meeting might be good.
I think my main concern at the moment is with the hardcoded list and where that might be served? It seems that it's something that should be handled by mozci/taskcluster/buildapi rather than by treeherder? I seem to recall from our rough chat several months ago that we proposed having a separate view for this linked from the Treeherder UI? At the moment correlating the two (and getting Treeherder to generate this list) seems like it may be hard?
Attachment #8650443 - Flags: feedback?(emorley)
We worked on try-extender: http://try-extender.herokuapp.com
However, the developers prefer a more TH like view.
We could build it on our own side but we would try to make it a carbon copy of TH.
If that is the route, we might need to create a way for our own web app to keep up to date with TH (groupings and platforms for jobs - note: I asked yesterday on the mailing lists about this for another project not this).

Generating the list for each repo is not hard.
We can submit it somewhere for TH to feed from.

The code to generate *all* builders is this [1]:
python -c "from mozci.mozci import query_builders; import pprint; builders = query_builders(); pprint.pprint(sorted(builders))" > ~/public_html/permanent/all_builders.txt

We can query for repo_name as well.

This list can only change when there is a reconfig.

I will bring it up on today's meeting.

[1] http://people.mozilla.org/~armenzg/permanent/all_builders.txt
I needed a json that contained every possible try buildername in a format that treeherder would understand. To generate it, I got a list of every try buildername and used functions from buildbot.py [1] to get the data in the right format, the resulting json for try is [2].

Since to generate this I use a lot of treeherder functions and the only thing I get from mozci is the list of builders, I think we could make this a TH api endpoint per repo. Do you think I can get the loading of data in the ETL layer?

I worked a little more on my patch, now after clicking on '+' it is possible to click on every job you want and then click on 'Trigger new jobs' (currently that is just a placeholder flag button, but I'm working on improving this) and that sends a pulse message that Pulse Actions can act upon. Since it is still using the hardcoded json I don't consider it ready for a PR, but I would love feedback [3]. I have some UI screenshots in [4].

[1] https://github.com/mozilla/treeherder/blob/master/treeherder/etl/buildbot.py#L972
[2] https://github.com/adusca/treeherder/blob/try-extender/ui/th_try_graph.json
[3] https://github.com/mozilla/treeherder/compare/master...adusca:try-extender
[4] https://drive.google.com/folderview?id=0B7rHgvgC7s4ZflY4cmtSSUJjcHVqQktIMGljcE5nLW9jaWx5T21CQ3F5QllSVGtDQl9sSGM&usp=sharing
I was not sure about who should review this, if it is not you please redirect it.

This is a review request for the client-side code, and a feedback request for the API part. For the server side code, do you think it would be a reasonable solution to process an external JSON file in an ETL step, and serve the data through the /jobs/list_possible endpoint? Is it OK to create a new DB table for this?
Attachment #8650443 - Attachment is obsolete: true
Attachment #8650691 - Flags: review?(emorley)
Assignee: nobody → alicescarpa
Comment on attachment 8650691 [details] [review]
Add button to trigger new buildbot jobs

Mauro might be a better reviewer for this :-)
Attachment #8650691 - Flags: review?(emorley) → review?(mdoglio)
Attachment #8650691 - Flags: review?(cdawson)
Comment on attachment 8650691 [details] [review]
Add button to trigger new buildbot jobs

Clearing review for now.  I commented in the PR on several items.  Nice work so far!  :)
Attachment #8650691 - Flags: review?(cdawson)
(Alice will be offline until next week)

Alice, from our conversation this week, you mentioned that you only act on mozilla.com addresses; do I recall correctly?

This has the disadvantage that we don't notify through TH if the user's request will happen since we decide that under pulse_actions.

A way to fix this would be to prevent a user on TH from submitting a request if they don't have LDAP creds (bug 1032163) or only allow this feature to sheriffs + the owner of the push.
Flags: needinfo?(alicescarpa)
My workaround plan to deal with authentication was to do it on the Pulse Actions side by using the same logic that is currently in place for Try Extender: users can extend a push if they have a @mozilla.com email address or if they are the commit's author.

The main problem with this approach is that there is no way for TH to notify users that their requests were not processed. This also blocks real users from extending their pushes if they don't show up as the author on pushlog. I don't know how big of a problem that is, if it is too bad we can also just wait until TH has a "has LDAP" user class.
Flags: needinfo?(alicescarpa)
Better to get something out, until we do the TH LDAP authentication.
We then can limit the UI as to what it allows each user.
Status: NEW → ASSIGNED
Comment on attachment 8650691 [details] [review]
Add button to trigger new buildbot jobs

:adusca r? me again when your branch is ready for a second pass
Attachment #8650691 - Flags: review?(mdoglio)
Comment on attachment 8650691 [details] [review]
Add button to trigger new buildbot jobs

I think I made all of the changes I was planning to make before the new round of reviews :)
Attachment #8650691 - Flags: review?(mdoglio)
Comment on attachment 8650691 [details] [review]
Add button to trigger new buildbot jobs

I did another round of review, please r? me again when ready.
Attachment #8650691 - Flags: review?(mdoglio)
I found the summary in bug 1196933 something I would like to keep around in this bug:
#########################
For bug 1194830 we are already creating the treherder.mozilla.org/api/project/{branch}/runnable_jobs endpoint that contains a list of all the current buildbot buildernames and their symbols on that branch. I believe that when it lands [1] the endpoint will be available to anyone to read from. This is what an entry for a job currently looks like:

{
            "build_system_type": "buildbot",
            "job_group_symbol": "T",
            "job_group_name": "Talos Performance",
            "platform_option": "pgo",
            "job_type_description": "fill me",
            "signature": "3508b6b5320de5575fbc9b814574528379ce3226",
            "result": "runnable",
            "ref_data_name": "Windows XP 32-bit mozilla-inbound pgo talos tp5o",
            "machine_platform_architecture": "x86",
            "job_type_id": 128,
            "build_platform": "windowsxp",
            "job_type_name": "Talos tp",
            "platform": "windowsxp",
            "state": "runnable",
            "machine_platform_id": 4,
            "build_os": "win",
            "option_collection_hash": "f69e1b00908837bf0550250abb1645014317e8ec",
            "job_type_symbol": "tp",
            "job_group_description": "fill me",
            "job_coalesced_to_guid": null,
            "machine_platform_os": "win",
            "device_id": 1,
            "build_architecture": "x86",
            "device_name": "unknown",
            "build_platform_id": 4,
            "job_group_id": 9
        },

I think it is possible to add any information that is on allthethings.json to this endpoint. The main problem is that right now its goal is to only have currently existing jobs.  This means that when a buildername disappears from buildbot it disappears from this endpoint. But I think it would be possible to go around that by creating a new dict key to represent if a job is active or not and changing the way we clean old data.
Attachment #8650691 - Flags: review?(mdoglio)
Hi adusca, I did a release of mozci.
Please let me know if it works for you.
Since I took the time to think it through it:

adusca:armenzg: How do I know if I should extend a graph or create a new one?
armenzg: I will be landing some code that answers your question partially
you have 3 cases if looking at one platform: a) build does not exists b) build is running c) build is completed
for A - you can submit a graph
for B - we can't do anything (except if the build was a TC task or a BBB job)
for C - we can create a graph of test jobs
 in any case, you only extend a graph if you have a TC task running
 for Buildbot builds which are running, we can't do much about it
 unless... we created a task which listens to a buildbot sendchange step over Pulse and that task creates all associated BBB tasks

adusca: armenzg:Do you think it would be possible to land push-extender in Treeherder even if we can't add jobs to currently running build jobs?
armenzg: adusca, I see value for it - yes
 if we moved Buildbot scheduling to in-tree we would not have this problem
 since every job would have a taskId

With regards to pulse listening, we would have to listen for builds.#.finished

https://tools.taskcluster.net/pulse-inspector/#!%28%28exchange:exchange/build,routingKeyPattern:build.%23.finished%29%29
This key would listen to all finishing builds (if it was a valid routing):
build.(repo_name)-(platform).(build#).finished

e.g.     build.try-macosx64.9032.finished

Under "payload/properties" we have all URLs as we need.
Blocks: 1194264
No longer blocks: 1194264
Depends on: 1194264
Comment on attachment 8650691 [details] [review]
Add button to trigger new buildbot jobs

Moving the r? to emorley
Attachment #8650691 - Flags: review?(mdoglio) → review?(emorley)
Sorry for the delay with the review, a PR of this size (and that's hard to split up into smaller pieces) needs at least a half-day uninterrupted for me to page back in the comments here, previous discussions, comments on the PR etc - which I've unfortunately not yet had. Will try and get to it by the end of the week/weekend :-)
Comment on attachment 8650691 [details] [review]
Add button to trigger new buildbot jobs

Cameron, I don't suppose you could take a glance at the UI parts, since you're more familiar with them? I'll look at them more briefly and focus on the service parts.

Armen, since you have a bit more context as to allthethings, and were involved in the discussions above, could you give some feedback on the PR to make sure the buildapi parts look ok?

Thanks :-)
Attachment #8650691 - Flags: review?(cdawson)
Attachment #8650691 - Flags: feedback?(armenzg)
Comment on attachment 8650691 [details] [review]
Add button to trigger new buildbot jobs

WRT to allthethings.json, I've codified the generation of it with this piece of code:
https://hg.mozilla.org/build/braindump/file/default/community/generate_allthethings_json.sh#l41

We generate the file at most every 15 minutes (cronjob) but in reality only if the following repos are updated:
* buildbot-configs (production branch)
* buildbotcustom (production-0.8 branch)
* tools

If you want, I could let you know somehow if it has changed even though I 
believe you can just tell by checking if there is a newer file to download.

IIUC, the patch seems to fetch the file once a day:
https://github.com/mozilla/treeherder/pull/892/files#diff-26d4413823b415e0e1902bf845ff7067R234

That cadence would not be ideal as new builders can be added anytime during the day and we would display out-of-date data.
However, knowing how big this patch is already let's punt that for now.

This is going to be a game changer and congrats on the patch! It is very impressive :)
Attachment #8650691 - Flags: feedback?(armenzg) → feedback+
Sorry for my own delay.  But yes, I'll take a look at the UI parts.
Comment on attachment 8650691 [details] [review]
Add button to trigger new buildbot jobs

I made a few nit comments for style and syntax.  Also one for a possible function name change.  But overall everything looks good to me.  I tested locally and the interaction seems to work well.  

That being said, I didn't actually get it to trigger a job (running locally) 

Alice - would you ping me when you get a chance and help me see if I'm just not running something I need?  Or is there some reason why it won't trigger from a local instance?
Flags: needinfo?(alicescarpa)
Attachment #8650691 - Flags: review?(cdawson) → review+
BTW - The branch could stand to be rebased on master.  :)
Pulse Actions will listen to these trigger requests on exchange/treeherder/v1/resultset-runnable-job-actions and trigger what needs to be triggered from there. Since the Pulse Actions part is not live yet (it is being worked on https://github.com/mozilla/pulse_actions/pull/19/files ) there is still no way of testing the triggering part :(. 

As soon as the Pulse Actions patch lands we can schedule a test. While running TH locally the messages are sent to exchange/{user}/v1/resultset-runnable-job-actions (user being the PULSE_EXCHANGE_NAMESPACE env variable [1]) so we would need to modify Pulse Actions to listen to that during the test.

[1] https://github.com/mozilla/treeherder/blob/master/treeherder/config/settings.py#L336
Flags: needinfo?(alicescarpa)
No longer depends on: 1194264
Depends on: 1220840
Alice I will be working on bug 1220840 to provide whatever you need for this bug.
Depends on: 1203552
Commits pushed to master at https://github.com/mozilla/treeherder

https://github.com/mozilla/treeherder/commit/5d9e430cacd912a41d0a887f6ac66a53874cab81
Bug 1194830: Add a runnable_job API endpoint

This creates a 'runnable_job' table in the database, as well as an API
endpoint at /api/project/{branch}/runnable_jobs listing all existing
buildbot jobs and their symbols. A new daily task 'fetch_allthethings' is
added to update the this table.

https://github.com/mozilla/treeherder/commit/d91cd4ba31d23bc1d4bebae5e040218584bacc72
Bug 1194830: Add a button to trigger new buildbot jobs

This adds a "Add new jobs" button that shows all buildbot jobs
that can be added as normal treeherder jobs (with a new CSS class).
After logged in users click on the jobs they want to trigger, they
can click on "Trigger new jobs" to send a pulse message that will
trigger the jobs.
Comment on attachment 8650691 [details] [review]
Add button to trigger new buildbot jobs

Looks good after the changes & rebase - thank you!

I deployed to stage & manually run the task using `./manage.py shell` on stage rabbitmq. Testing on a stage try push then worked well.

One additional change that can be done in a followup bug (could you file?): When the "No additional jobs found" message is shown (eg before the fetch task was run on stage), the show/hide toggle on the push action menu still gets changed - which means the user has to choose "Hide ..." then "show ..." to try again.
Attachment #8650691 - Flags: review?(emorley) → review+
Blocks: 1225351
Depends on: 1227552
I see that camd deployed something in bug 1227552; thanks!

The only place where I've seen this half-working is on mozilla-central.
In all other trees it says "No new jobs".

On mozilla-central, clicking on the + sign shows *some* jobs.
Appending &group_state=expanded shows more jobs but I think it might have a lot more missing.
http://people.mozilla.org/~armenzg/sattap/c8666e35.png
It's not deployed yet. See bug 1227552 comment 7.
It was mentioned earlier on IRC that this was deployed.
However, it seems that is not yet working. Maybe there is some other issue?

It seems to work if I load allizom.
As Alive mentioned on IRC, the task needs to run first. The task only runs once a day.
Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.