Closed Bug 1264074 Opened 8 years ago Closed 8 years ago

Use Pulse for creation of Github resultsets

Categories

(Tree Management :: Treeherder: Data Ingestion, defect, P2)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: camd, Assigned: camd)

References

Details

Attachments

(3 files)

Currently we only handle auto-resultset generation for HG repos.  We should also do this for github repos like gaia

Task Cluster is going this creation based on a github webhook that gives them the revisions.  We'll need to decide what information each job must pass in so that we can fetch the list of revisions from github.

For now, we can leave task cluster's resultset creation in place and tackle this later (possibly Q3 2016?).
This could be useful for Servo as well, which is based on github and is considering submitting performance data to treeherder (but not via taskcluster): https://github.com/servo/servo/issues/10452.
I believe this blocks/depends on a couple of bugs, could you set the deps? :-)
This is also interesting for the WebDev folks, it could save them a bunch of work on their end.

Maybe we should consider bumping up the priority of this-- if there are at least 3 groups of people (Taskcluster, Servo, WebDev) who could benefit from this, it might be worth doing sooner than later.
We discussed in the meeting today that we could use github webhooks to tell us when a new resultset should be created:  https://developer.github.com/webhooks/

The events I'm thinking would apply here are:

1. pull_request
  - Any time a Pull Request is assigned, unassigned, labeled, unlabeled, opened, edited, closed, reopened, or synchronized (updated due to a new push in the branch that the pull request is tracking).

2. push
  - Any Git push to a Repository, including editing tags or branches. Commits via API actions that update references are also counted. This is the default event.

Not ALL the pull_request events would trigger this, but a few at least.

However, now that I think a bit more about this, I don't think it should just call our API directly.  I think it should write the info to a Pulse exchange.  That way, any Treeherder instance can subscribe to this (even locally) and get the info.
wrt the pulse exchange: I'd love to avoid having a service that the GH webhook talks to that then publishes to Pulse.  Hopefully the webhook itself would be able to do all that itself without us needing to host yet another service.  :)  But I have never created a webhook, so not sure of the limitations yet.
Task cluster has a webhook called taskcluster-github that posts all github pushes and PR changes to a pulse exchange.  Treeherder could subscribe to this exchange to get the info for new resultsets.

This project does not yet put the revisions into the pulse messages, but we can hopefully convince the owners to do so.  :)
Sounds like jonasfj thinks adding the revisions to those messages is a good idea.  So the next part of the project would be to have treeherder subscribe a channel to that exchange for github pushes.

I hope to jump on this in the second half of Q2, once I get us ingesting jobs via pulse from task cluster for my Q2 deliverable.
Assignee: nobody → cdawson
Attached file push Pulse message
Attached file PR pulse message
Summary: Handle autocreation of resultsets for github repos → Use Pulse for creation of Github resultsets
Priority: -- → P2
Depends on: 1291010
Depends on: 1290521
Comment on attachment 8771203 [details] [review]
[treeherder] mozilla:github-pulse-resultsets > mozilla:master

Hey Ed: This is a pretty big one.  If you'd like me to walk through it with you, I'm happy to do it.  :)  It should look fairly familiar from the pulse jobs PR though.  Thanks!!
Attachment #8771203 - Flags: review?(emorley)
Attachment #8771203 - Flags: review?(emorley) → review+
Commit pushed to master at https://github.com/mozilla/treeherder

https://github.com/mozilla/treeherder/commit/b2e5e714aab359c3dded9fc66af50cf27f261394
Bug 1264074 - Use Pulse for creation of Github resultsets (#1692)

* Bug 1264074 - Move to_timestamp function to a reusable location

* Bug 1264074 - Refactor JobConsumer to have a PulseConsumer super class

Much of what was in the JobConsumer is reusable by the upcoming
ResultsetConsumer.  So refactor those parts out so that each specific
consumer can reuse code as much as possible.

* Bug 1264074 - Add ability to ingest Github Resultsets via Pulse

This introduces a ResultsetConsumer and a read_pulse_resultsets
management command to ingest resultsets from the TaskCluster
github exchanges.

When a supported Github repo has a Pull Request created or
updated, or a push is made to master, then it will kick off a
Pulse message.  We will receive it and then fetch any additional
information we need from github's API and store the Resultset.

This follows a very similar pattern to the Job Pulse ingestion.

* Bug 1264074 - Old code/comments cleanup

* Bug 1264074 - Tests for the Github resultset pulse loader
Blocks: 1297208
Ah I see now why the Heroku deploy is failing - PULSE_RESULTSET_SOURCES was set to invalid json, but the compile step seems to then cache it, even though (a) the release failed, (b) `heroku config` insists it's not set (lies). Worse, it's not possible to unset it.

I've filed:
https://help.heroku.com/tickets/394841

The workaround is just to set `PULSE_RESULTSET_SOURCES` to valid json, since setting to a new value works, even if unsettting doesn't. (done now)
(In reply to Ed Morley [:emorley] from comment #16)
> I've filed:
> https://help.heroku.com/tickets/394841

Heroku have now fixed this :-)
Depends on: 1301739
Depends on: 1302529
This feature is now fixed, even if a few repos are not reporting to Pulse yet.  I'll follow up with them or in separate bugs, if need be.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Depends on: 1306155
Depends on: 1306157
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: