Closed Bug 1169320 Opened 9 years ago Closed 9 years ago

Add support for ingesting Taskcluster jobs via Pulse (to support local development)

Categories

(Tree Management :: Treeherder: Data Ingestion, defect, P3)

Product:

Component:

Type:

defect

Priority:

P3

Severity:

normal

Tracking

(Not tracked)

Status:

RESOLVED FIXED

People

(Reporter: camd, Assigned: camd)

References

Details

Attachments

(2 files)

PR 9 years ago Cameron Dawson [:camd] 46 bytes, text/x-github-pull-request	mdoglio : feedback+ wlach : feedback+ garndt : feedback+ emorley : feedback+	Details \| Review
Renewed PR 9 years ago Cameron Dawson [:camd] 47 bytes, text/x-github-pull-request		Details \| Review

Cameron Dawson [:camd]

Assignee

Description

•

9 years ago

Perhaps Task Cluster could post to pulse. Then we have a process we can run locally that loads from that and posts with the TreeherderClient to our local instance.

William Lachance (:wlach)

Comment 1

•

9 years ago

Depending on the use case, writing a management command to import job data (as we did for performance in bug 1163138) from an existing treeherder instance may also be an option.

Ed Morley [:emorley]

Updated

•

9 years ago

Priority: -- → P3

See Also: → 1076989

Summary: develop workflow to get taskcluster jobs on a local instance → Develop workflow to get taskcluster jobs on a local instance (eg by listening to Pulse)

Ed Morley [:emorley]

Updated

•

9 years ago

Component: Treeherder → Treeherder: Data Ingestion

Armen [:armenzg]

Comment 3

•

9 years ago

Instead of TC emitting the pulse data. Would it make sense for TH to emit the data? We both process and repeat the wave. Wrt to a management command, would it be able to import new data? I think combining both ideas would give us: * Previous data * Incoming data What do you think?

Cameron Dawson [:camd]

Assignee

Updated

•

9 years ago

Assignee: nobody → cdawson

Cameron Dawson [:camd]

Assignee

Comment 4

•

9 years ago

It sounds like I need to do this: 1. Find or setup an exchange for Task Cluster to publish to. i. There is a "taskcluster-treeherder" user owned by jlal. Perhaps that was intended for this purpose? 2. Setup a queue each for production, stage, myself and the other devs that might want to subscribe that is fed by that exchange. 3. Modify Treeherder service so it consumes from whatever queue is specified. (In the beginning, treeherder had this. I can take a look at some of the old commits to see how we did this. but... may be a lot of bitrot there.)

Cameron Dawson [:camd]

Assignee

Comment 5

•

9 years ago

Armen-- I think, ideally, we would just convert our ingestion mechanism from Task Cluster posting directly to us, to TC posting to Pulse and then we ingest from pulse. And any user could create their own queue based off the same exchange to get data. This way, when buildbot goes away and we ONLY get data from Task Cluster, then we are all set.

Armen [:armenzg]

Comment 6

•

9 years ago

NI myself to not miss this in my backlog of bugmail. I will read this by EOD.

Flags: needinfo?(armenzg)

Armen [:armenzg]

Comment 7

•

9 years ago

That works for me. We can probably file a bug for the TaskCluster team to post both to TH (current approach) and to Pulse (new approach). You can CC garndt, jonasfj and my self. Could you please post a thread to the release.engineering mailing list with our intent to change the setup? I'm hoping we would get anything that we might have missed from releng or TC folks. Once we have TH feed from Pulse we can ask them to stop posting with the treeherder client. I believe wlach's suggestion needs to be filed as a separate request. IIUC it is something that we need regardless of changing our ingestion approach. Please correct me if needed!

Flags: needinfo?(armenzg)

Ed Morley [:emorley]

Updated

•

9 years ago

Summary: Develop workflow to get taskcluster jobs on a local instance (eg by listening to Pulse) → Ingest Taskcluster jobs via Pulse rather than via the REST API (to support local development)

Ed Morley [:emorley]

Updated

•

9 years ago

See Also: 1076989 →

Ed Morley [:emorley]

Comment 9

•

9 years ago

I've duped bug 1169320 here - I think part of the solution created in this bug would let us have similar content on stage/prod - eg by using the same pulse stream for stage/prod, but having certain submissions marked as for Treeherder stage only, and others for both stage+prod.

Cameron Dawson [:camd]

Assignee

Updated

•

9 years ago

Depends on: 1196804

Cameron Dawson [:camd]

Assignee

Updated

•

9 years ago

Depends on: 1199291

Cameron Dawson [:camd]

Assignee

Updated

•

9 years ago

Depends on: 1199364

Cameron Dawson [:camd]

Assignee

Updated

•

9 years ago

Blocks: 1199511

Cameron Dawson [:camd]

Assignee

Updated

•

9 years ago

No longer depends on: 1196804

Ed Morley [:emorley]

Updated

•

9 years ago

No longer blocks: 1199511

Depends on: 1199511

Ed Morley [:emorley]

Updated

•

9 years ago

Depends on: 1199506

Cameron Dawson [:camd]

Assignee

Updated

•

9 years ago

No longer depends on: 1199511

Ed Morley [:emorley]

Updated

•

9 years ago

See Also: → 1199511

Ed Morley [:emorley]

Comment 10

•

9 years ago

Marking this as blocking bug 1176484, since it impacts our ability to stress-test Heroku.

Blocks: treeherder-heroku

Cameron Dawson [:camd]

Assignee

Comment 11

•

9 years ago

Attached file PR — Details

Attachment #8658990 - Flags: feedback?(mdoglio)

Cameron Dawson [:camd]

Assignee

Updated

•

9 years ago

Attachment #8658990 - Flags: feedback?(emorley)

Cameron Dawson [:camd]

Assignee

Updated

•

9 years ago

Attachment #8658990 - Flags: feedback?(wlachance)

Cameron Dawson [:camd]

Assignee

Updated

•

9 years ago

Attachment #8658990 - Flags: feedback?(garndt)

Ed Morley [:emorley]

Comment 12

•

9 years ago

Comment on attachment 8658990 [details] [review] PR Left some comments :-)

Attachment #8658990 - Flags: feedback?(emorley)

Mauro Doglio [:mdoglio]

Updated

•

9 years ago

Attachment #8658990 - Flags: feedback?(mdoglio) → feedback+

William Lachance (:wlach)

Comment 13

•

9 years ago

Comment on attachment 8658990 [details] [review] PR I guess the one big piece of feedback I have is that I'm not really sure of the point of providing an API endpoint to validate against schema? Shouldn't we just put the schema somewhere well-known, document it, and let people validate their own stuff? This feels like scope creep in treeherder to me-- one more thing for us to test, validate, etc. I'd prefer if you left it out. Aside from that I approve of the general direction this is going in. :)

Attachment #8658990 - Flags: feedback?(wlachance) → feedback+

Ed Morley [:emorley]

Comment 14

•

9 years ago

I'm inclined to agree with Will. Could you explain some more of the context here - we may be missing something obvious :-)

Cameron Dawson [:camd]

Assignee

Comment 15

•

9 years ago

Thanks for reviewing. And I suppose this is a fair critique. I added the validating endpoint just because it was so easy to do so. It's not necessary for any of the ingestion work as I get the schema to validate against directly, not through an endpoint. Just to be clear: do you prefer to have neither endpoint? Or to leave the one to fetch the schema, but not to validate?

Flags: needinfo?(wlachance)

William Lachance (:wlach)

Comment 16

•

9 years ago

(In reply to Cameron Dawson [:camd] from comment #15) > Thanks for reviewing. And I suppose this is a fair critique. I added the > validating endpoint just because it was so easy to do so. It's not > necessary for any of the ingestion work as I get the schema to validate > against directly, not through an endpoint. > > Just to be clear: do you prefer to have neither endpoint? Or to leave the > one to fetch the schema, but not to validate? I'd prefer neither endpoint. As an alternative, maybe document some stuff about the schema and how to validate against it yourself in the documentation (perhaps with a link to the raw schema file on github)?

Flags: needinfo?(wlachance)

Cameron Dawson [:camd]

Assignee

Comment 17

•

9 years ago

Comment on attachment 8658990 [details] [review] PR I updated this PR quite a bit based on feedback and even some thoughts of my own. I broke it up into smaller chunks. I hope it feel reasonable. :)

Attachment #8658990 - Flags: review?(mdoglio)

Cameron Dawson [:camd]

Assignee

Updated

•

9 years ago

Attachment #8658990 - Flags: review?(emorley)

Ed Morley [:emorley]

Comment 18

•

9 years ago

Comment on attachment 8658990 [details] [review] PR I've left some more comments :-) Given this is a 1000 line PR, and involves things like jsonschema and pulse, which require additional reading, I'm calling this an f+ rather than r+, since I've barely been able to scratch the surface in the time I've spent this evening (I'd already left for the weekend when the review request came in on Friday). However I know this is a deliverable, and isn't used in production at the moment, so feel free to land this with f+ from a couple of us and we can iterate as we go forwards, to save blocking you on this :-)

Attachment #8658990 - Flags: review?(emorley) → feedback+

Greg Arndt [:garndt]

Comment 19

•

9 years ago

Comment on attachment 8658990 [details] [review] PR Sorry for the long overdue f?, just took a look at it. I only know a little bit about some of the data we send, but looks good. Just had a question about the group symbol and user for github projects in the PR.

Attachment #8658990 - Flags: feedback?(garndt) → feedback+

Cameron Dawson [:camd]

Assignee

Comment 20

•

9 years ago

Thanks for the review comments, guys. Good info I'm assimilating. I have talked to mcote already that I won't actually land this by EOQ (hey! that's today!). And that's all cool. Yeah, there's a lot to absorb here and will likely require some iterating when it starts to get some use.

Cameron Dawson [:camd]

Assignee

Comment 21

•

9 years ago

Note to self: Here is the Github api for getting the revisions (commits) for a PR: https://api.github.com/repos/mozilla/treeherder/pulls/940/commits

Cameron Dawson [:camd]

Assignee

Comment 22

•

9 years ago

With comment 18 from Ed, and a chat with him on Vidyo, I've got the go-ahead to merge this. This code is dormant until we activate a worker for the ``store_pulse_jobs`` queue and a process to run the ``ingest_from_pulse`` mgmt command. So it's benign.

Cameron Dawson [:camd]

Assignee

Comment 23

•

9 years ago

Attached file Renewed PR — Details

The old PR got kind of messed up with all the reviews/revising/rebasing and reviewable.io got quite confused. I've left it non-obsoleted for reference. but this new PR has the merged-code and I copied over a few comments from Mauro with my answers.

Treeherder GitHub Bugbot

Comment 24

•

9 years ago

Commits pushed to master at https://github.com/mozilla/treeherder https://github.com/mozilla/treeherder/commit/d23435f8ed3e97187d5a46963be75390b8691249 Bug 1169320, 1199506 - Adds requirements, settings and JSON schema for Pulse Ingestion https://github.com/mozilla/treeherder/commit/ff64f67247759ad165a55e3d355c6c2559385e04 Bug 1169320 - Adds ability to ingest Jobs from Pulse exchanges This ingests jobs over a pulse stream if the resultset for the job already exists. There is a new management command called ``ingest_from_pulse`` which reads from a set of exchanges specified in ``base.py`` or environment variables. These will be loaded in a local celery queue for ingestion into the DB. The user is expected to create their own Pulse user with Pulse Guardian to be able to ingest jobs as well as post to their own queue. A JSON Schema in the form of a YML file is included to validate the jobs prior to ingestion. The resultset/push for a job must already exist in Treeherder or the job will be skipped. https://github.com/mozilla/treeherder/commit/35fb1309512a9b0bd0be9a5ec6c271d99528fdd4 Bug 1169320 - Tests against the JobLoader https://github.com/mozilla/treeherder/commit/7510eb98d9c281c997130d11f05676ad384de575 Bug 1169320 - Documentation for pulse ingestion This documentation instructs a user on how to setup their local machine to ingest data from existing exchanges as well as posting to their own to test their jobs.

Cameron Dawson [:camd]

Assignee

Updated

•

9 years ago

Status: NEW → RESOLVED

Closed: 9 years ago

Resolution: --- → FIXED

Henrik Skupin [:whimboo][⌚️UTC+2]

Comment 25

•

9 years ago

Cameron, as I can see in the code and docs even other tools beside TC could use Pulse to send reports. Therefore each tool needs its own queue, right? What is the process of getting something added to TH staging and production for firefox-ui-tests? I wonder if it makes sense for us to use Pulse instead of the TH client.

Flags: needinfo?(cdawson)

Cameron Dawson [:camd]

Assignee

Comment 26

•

9 years ago

Henrik-- Yep, you will definitely be able to do so. Though the processes for this change are not "live" on stage or production yet. I wanted to work with a few folks (you, Task Cluster, Autophone) to have you test it working on your local machine and be sure the schema has everything you need. I'd rather make adjustments before going live the first time to prevent having to support backward compatibility too soon. :) So, in the immediate term, it would be great if you created your pulse exchange with Pulse Guardian and wired things up in your local environment to get Treeherder listening and your tool posting and report back with any issues. Also, please let me know if things aren't clear and I need to write more docs!! :) https://treeherder.readthedocs.org/en/latest/pulseload.html I want Task Cluster to get a chance to modify their tool (mozilla-taskcluster) to post to an exchange and test that. They weren't able to actually test with this last quarter due to time constraints. That being said, once things go "live" on Stage (and eventually production) the process would be to file a "data ingestion" bug against Treeherder and assign it to me. I'll add your info to the config. Hmm, perhaps you could make a PR that makes the addition to ``base.py`` in Treeherder and assign me to review? That might be better. Thanks for your interest in this! :)

Flags: needinfo?(cdawson)

Mauro Doglio [:mdoglio]

Updated

•

9 years ago

Attachment #8658990 - Flags: review?(mdoglio)

Ed Morley [:emorley]

Updated

•

9 years ago

No longer blocks: treeherder-heroku

Ed Morley [:emorley]

Updated

•

9 years ago

Depends on: 1266229

Ed Morley [:emorley]

Updated

•

9 years ago

Depends on: 1266584

Ed Morley [:emorley]

Updated

•

8 years ago

Blocks: 1266229

No longer depends on: 1266229, 1266584

Summary: Ingest Taskcluster jobs via Pulse rather than via the REST API (to support local development) → Add support for ingesting Taskcluster jobs via Pulse (to support local development)

You need to log in before you can comment on or make changes to this bug.