Closed Bug 1169320 Opened 9 years ago Closed 9 years ago

Add support for ingesting Taskcluster jobs via Pulse (to support local development)

Categories

(Tree Management :: Treeherder: Data Ingestion, defect, P3)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: camd, Assigned: camd)

References

Details

Attachments

(2 files)

46 bytes, text/x-github-pull-request
mdoglio
: feedback+
wlach
: feedback+
garndt
: feedback+
emorley
: feedback+
Details | Review
47 bytes, text/x-github-pull-request
Details | Review
Perhaps Task Cluster could post to pulse.  Then we have a process we can run locally that loads from that and posts with the TreeherderClient to our local instance.
Depending on the use case, writing a management command to import job data (as we did for performance in bug 1163138) from an existing treeherder instance may also be an option.
Priority: -- → P3
See Also: → 1076989
Summary: develop workflow to get taskcluster jobs on a local instance → Develop workflow to get taskcluster jobs on a local instance (eg by listening to Pulse)
Component: Treeherder → Treeherder: Data Ingestion
Instead of TC emitting the pulse data.
Would it make sense for TH to emit the data?
We both process and repeat the wave.

Wrt to a management command, would it be able to import new data?

I think combining both ideas would give us:
* Previous data
* Incoming data

What do you think?
Assignee: nobody → cdawson
It sounds like I need to do this:

1. Find or setup an exchange for Task Cluster to publish to.  
   i. There is a "taskcluster-treeherder" user owned by jlal.  Perhaps that was intended for this purpose?

2. Setup a queue each for production, stage, myself and the other devs that might want to subscribe that is fed by that exchange.

3. Modify Treeherder service so it consumes from whatever queue is specified.  (In the beginning, treeherder had this.  I can take a look at some of the old commits to see how we did this.  but... may be a lot of bitrot there.)
Armen-- I think, ideally, we would just convert our ingestion mechanism from Task Cluster posting directly to us, to TC posting to Pulse and then we ingest from pulse.  And any user could create their own queue based off the same exchange to get data.

This way, when buildbot goes away and we ONLY get data from Task Cluster, then we are all set.
NI myself to not miss this in my backlog of bugmail. I will read this by EOD.
Flags: needinfo?(armenzg)
That works for me.
We can probably file a bug for the TaskCluster team to post both to TH (current approach) and to Pulse (new approach).
You can CC garndt, jonasfj and my self.

Could you please post a thread to the release.engineering mailing list with our intent to change the setup?
I'm hoping we would get anything that we might have missed from releng or TC folks.

Once we have TH feed from Pulse we can ask them to stop posting with the treeherder client.

I believe wlach's suggestion needs to be filed as a separate request. IIUC it is something that we need regardless of changing our ingestion approach. Please correct me if needed!
Flags: needinfo?(armenzg)
Summary: Develop workflow to get taskcluster jobs on a local instance (eg by listening to Pulse) → Ingest Taskcluster jobs via Pulse rather than via the REST API (to support local development)
See Also: 1076989
I've duped bug 1169320 here - I think part of the solution created in this bug would let us have similar content on stage/prod - eg by using the same pulse stream for stage/prod, but having certain submissions marked as for Treeherder stage only, and others for both stage+prod.
Depends on: 1196804
Depends on: 1199291
Depends on: 1199364
Blocks: 1199511
No longer depends on: 1196804
No longer blocks: 1199511
Depends on: 1199511
Depends on: 1199506
No longer depends on: 1199511
See Also: → 1199511
Marking this as blocking bug 1176484, since it impacts our ability to stress-test Heroku.
Attached file PR
Attachment #8658990 - Flags: feedback?(mdoglio)
Attachment #8658990 - Flags: feedback?(emorley)
Attachment #8658990 - Flags: feedback?(wlachance)
Attachment #8658990 - Flags: feedback?(garndt)
Comment on attachment 8658990 [details] [review]
PR

Left some comments :-)
Attachment #8658990 - Flags: feedback?(emorley)
Attachment #8658990 - Flags: feedback?(mdoglio) → feedback+
Comment on attachment 8658990 [details] [review]
PR

I guess the one big piece of feedback I have is that I'm not really sure of the point of providing an API endpoint to validate against schema? Shouldn't we just put the schema somewhere well-known, document it, and let people validate their own stuff? This feels like scope creep in treeherder to me-- one more thing for us to test, validate, etc. I'd prefer if you left it out.

Aside from that I approve of the general direction this is going in. :)
Attachment #8658990 - Flags: feedback?(wlachance) → feedback+
I'm inclined to agree with Will. Could you explain some more of the context here - we may be missing something obvious :-)
Thanks for reviewing.  And I suppose this is a fair critique.  I added the validating endpoint just because it was so easy to do so.  It's not necessary for any of the ingestion work as I get the schema to validate against directly, not through an endpoint.

Just to be clear: do you prefer to have neither endpoint?  Or to leave the one to fetch the schema, but not to validate?
Flags: needinfo?(wlachance)
(In reply to Cameron Dawson [:camd] from comment #15)
> Thanks for reviewing.  And I suppose this is a fair critique.  I added the
> validating endpoint just because it was so easy to do so.  It's not
> necessary for any of the ingestion work as I get the schema to validate
> against directly, not through an endpoint.
> 
> Just to be clear: do you prefer to have neither endpoint?  Or to leave the
> one to fetch the schema, but not to validate?

I'd prefer neither endpoint. As an alternative, maybe document some stuff about the schema and how to validate against it yourself in the documentation (perhaps with a link to the raw schema file on github)?
Flags: needinfo?(wlachance)
Comment on attachment 8658990 [details] [review]
PR

I updated this PR quite a bit based on feedback and even some thoughts of my own.  I broke it up into smaller chunks.  I hope it feel reasonable.  :)
Attachment #8658990 - Flags: review?(mdoglio)
Attachment #8658990 - Flags: review?(emorley)
Comment on attachment 8658990 [details] [review]
PR

I've left some more comments :-)

Given this is a 1000 line PR, and involves things like jsonschema and pulse, which require additional reading, I'm calling this an f+ rather than r+, since I've barely been able to scratch the surface in the time I've spent this evening (I'd already left for the weekend when the review request came in on Friday).

However I know this is a deliverable, and isn't used in production at the moment, so feel free to land this with f+ from a couple of us and we can iterate as we go forwards, to save blocking you on this :-)
Attachment #8658990 - Flags: review?(emorley) → feedback+
Comment on attachment 8658990 [details] [review]
PR

Sorry for the long overdue f?, just took a look at it.  I only know a little bit about some of the data we send, but looks good.  Just had a question about the group symbol and user for github projects in the PR.
Attachment #8658990 - Flags: feedback?(garndt) → feedback+
Thanks for the review comments, guys.  Good info I'm assimilating.  I have talked to mcote already that I won't actually land this by EOQ (hey!  that's today!).  And that's all cool.  

Yeah, there's a lot to absorb here and will likely require some iterating when it starts to get some use.
Note to self:  Here is the Github api for getting the revisions (commits) for a PR:

https://api.github.com/repos/mozilla/treeherder/pulls/940/commits
With comment 18 from Ed, and a chat with him on Vidyo, I've got the go-ahead to merge this.  This code is dormant until we activate a worker for the ``store_pulse_jobs`` queue and a process to run the ``ingest_from_pulse`` mgmt command.  So it's benign.
Attached file Renewed PR
The old PR got kind of messed up with all the reviews/revising/rebasing and reviewable.io got quite confused.  I've left it non-obsoleted for reference.  but this new PR has the merged-code and I copied over a few comments from Mauro with my answers.
Commits pushed to master at https://github.com/mozilla/treeherder

https://github.com/mozilla/treeherder/commit/d23435f8ed3e97187d5a46963be75390b8691249
Bug 1169320, 1199506 - Adds requirements, settings and JSON schema for Pulse Ingestion

https://github.com/mozilla/treeherder/commit/ff64f67247759ad165a55e3d355c6c2559385e04
Bug 1169320 - Adds ability to ingest Jobs from Pulse exchanges

This ingests jobs over a pulse stream if the resultset for the job
already exists.  There is a new management command called
``ingest_from_pulse`` which reads from a set of exchanges specified
in ``base.py`` or environment variables.  These will be loaded in a
local celery queue for ingestion into the DB.

The user is expected to create their own Pulse user with Pulse Guardian
to be able to ingest jobs as well as post to their own queue.

A JSON Schema in the form of a YML file is included to validate the jobs
prior to ingestion.  The resultset/push for a job must
already exist in Treeherder or the job will be skipped.

https://github.com/mozilla/treeherder/commit/35fb1309512a9b0bd0be9a5ec6c271d99528fdd4
Bug 1169320 - Tests against the JobLoader

https://github.com/mozilla/treeherder/commit/7510eb98d9c281c997130d11f05676ad384de575
Bug 1169320 - Documentation for pulse ingestion

This documentation instructs a user on how to setup their local
machine to ingest data from existing exchanges as well as posting
to their own to test their jobs.
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Cameron, as I can see in the code and docs even other tools beside TC could use Pulse to send reports. Therefore each tool needs its own queue, right? What is the process of getting something added to TH staging and production for firefox-ui-tests? I wonder if it makes sense for us to use Pulse instead of the TH client.
Flags: needinfo?(cdawson)
Henrik--  Yep, you will definitely be able to do so.  Though the processes for this change are not "live" on stage or production yet.  I wanted to work with a few folks (you, Task Cluster, Autophone) to have you test it working on your local machine and be sure the schema has everything you need.  I'd rather make adjustments before going live the first time to prevent having to support backward compatibility too soon.  :)

So, in the immediate term, it would be great if you created your pulse exchange with Pulse Guardian and wired things up in your local environment to get Treeherder listening and your tool posting and report back with any issues.  Also, please let me know if things aren't clear and I need to write more docs!!  :)

https://treeherder.readthedocs.org/en/latest/pulseload.html

I want Task Cluster to get a chance to modify their tool (mozilla-taskcluster) to post to an exchange and test that.  They weren't able to actually test with this last quarter due to time constraints.  

That being said, once things go "live" on Stage (and eventually production) the process would be to file a "data ingestion" bug against Treeherder and assign it to me.  I'll add your info to the config.  Hmm, perhaps you could make a PR that makes the addition to ``base.py`` in Treeherder and assign me to review?  That might be better.  

Thanks for your interest in this!  :)
Flags: needinfo?(cdawson)
Attachment #8658990 - Flags: review?(mdoglio)
No longer blocks: treeherder-heroku
Depends on: 1266229
Depends on: 1266584
Blocks: 1266229
No longer depends on: 1266229, 1266584
Summary: Ingest Taskcluster jobs via Pulse rather than via the REST API (to support local development) → Add support for ingesting Taskcluster jobs via Pulse (to support local development)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: