Closed Bug 1170600 Opened 4 years ago Closed 10 months ago

Publish GitHub events to Pulse

Categories

(Developer Services :: General, task, P3)

Tracking

(Not tracked)

RESOLVED INACTIVE

People

(Reporter: gps, Assigned: gps)

References

Details

(Keywords: leave-open)

Attachments

(2 files)

I have an EC2 service that's been receiving org-wide GitHub webhook notifications since December for mozilla, rust, servo, and a bunch of other Mozilla organizations on GitHub.

I'd like to re-publish the hooks to Pulse. This will enable us to have one org-wide webhook instead of N.

Code for the aggregator lives at https://github.com/indygreg/github-webhooks-firehose if anyone is curious.

I /think/ I may need help from a Pulse admin to create an account and/or exchange for this publishing.
You can do that using pulse guardian (https://pulse.mozila.org). If you create a new user I believe you can use its credentials to create a new exchange.
Should be easier once the service is in our AWS account and under the control of version-control-tools.
Depends on: 1188464
Priority: -- → P3
I'm starting to have interest in getting github.com changes in addition to hgmo ones on pulse.

gps, what's your status here?
Flags: needinfo?(gps)
This hasn't been a priority for me. It would be trivial for me to hook up the system to Pulse. But since it isn't running on a monitored system, I'd feel uncomfortable actually deploying that.
Flags: needinfo?(gps)
Jonas, is this bug a dupe of your bug 1150287?
Blocks: 1294883
Flags: needinfo?(jopsen)
It's similar, bug 1150287 is about doing from taskcluster-github which is afaik receiving org-wide hooks for a few github orgs.
bstack knows more about the state of taskcluster-github as he has been preparing it for an outreachy participant.
Flags: needinfo?(jopsen)
This has become a Q4 mini project for me.

I don't need cooperation from Pulse people to implement this (yay self-service). So moving to Developer Services to raise visibility with Ops folk.
Assignee: nobody → gps
Status: NEW → ASSIGNED
Component: Pulse → General
Product: Webtools → Developer Services
If automation is going to rely on this, we'll need monitoring, runbooks, and make sure all of the ops folks have access. :dhouse has volunteered. :-)
The current state of things is there is an ad-hoc EC2 instance in some random AWS account that's been receiving GitHub web hook HTTP requests for ~2 years. I just noticed that instance ran out of disk space and has not been accepting writes since the end of August :/

To productionize the current architecture, we'd likely have to build out multiple servers for redundancy. That's a lot of work.

So, I hacked together an AWS Lambda function hooked up to an API Gateway that simply copies the web hook HTTP request body into AWS S3. By storing in S3, we've created retention of GitHub change data so we can go back and analyze it later. We can also hook up additional notifications easily.

I'm currently just storing each event as a separate S3 key. I may want to change things to use Kinesis, as that is better suited for "streaming data." Let me keep poking around.

For those who want to poke at things, everything is in the moz-devservices account in us-west-2. Go to the API Gateway in the web console and everything should be linked. We're currently storing things in the "moz-github-events" S3 bucket. The data is not world readable because there could be private repos.

Let me keep poking around...
OK. I've got Kinesis hooked up.

The current architecture is:

  API Gateway -> Lambda --> SNS -> Lambda -> Pulse
                         \
                          -> Kinesis Firehose -> S3

You can't have multiple Lambda functions registered to the same API Gateway request. So for right now I have a single Lambda function writing to both SNS and Firehose. Then we have another Lambda function monitoring SNS and publishing to Pulse. I suppose I could publish to a Kinesis Stream instead of SNS. But Kinesis Streams feel a bit heavyweight. Plus, we could potentially expose a public SNS topic of public GitHub events so people can write their own consumers.

You can view the Pulse messages at https://tools.taskcluster.net/pulse-inspector/#!((exchange:exchange/github-webhooks/v2,routingKeyPattern:%23)).

All of this is very alpha and subject to change.
If anyone wants to send GitHub web hooks to the new endpoint, point it at https://74rtibg2p2.execute-api.us-west-2.amazonaws.com/prod/webhook.

At some point, we may want to get a better hostname.
Lars: github.com/servo and/or github.com/servo/servo has a webhook configured to point to https://github.com/organizations/mozilla/settings/hooks/3631976. That server is being end-of-lifed. Can you please reconfigure it to point to the URL in comment #11 instead?

(We may get a mozilla hostname in the future. But for now, not sending data to a server that's about to be shut down is a step in the right direction.)
Flags: needinfo?(larsberg)
Gene: could you please do the same thing as comment #17, but for the mozilla-services github org?
Flags: needinfo?(gene)
I've done so, but attempting to redeliver I'm getting:
```
Connection: keep-alive
Content-Length: 37
Content-Type: application/json
Date: Mon, 24 Oct 2016 21:21:04 GMT
Via: 1.1 441811a054e8d055b893175754efd0c3.cloudfront.net (CloudFront)
X-Amz-Cf-Id: vR4_jEX4B0VPRNkDc9HecVlPRmwt-3-7QoAn9bYEPAP6EjOM_i2ANQ==
x-amzn-RequestId: c5763f02-9a2f-11e6-835c-fb0d242c0b7d
X-Cache: Error from cloudfront
```

And:
```
{"message": "Unsupported Media Type"}
```
Flags: needinfo?(larsberg)
Depends on: 1312577
Per IRC, the issue with the servo webhook switchover was the content type. The webhook's content type needs to be configured as application/json.
Comment on attachment 8802358 [details]
github-webhooks: AWS Lambda support for consuming GitHub web hooks (bug 1170600);

https://reviewboard.mozilla.org/r/86772/#review87854

Aside from the questions of deployment, this is actually awesome! Getting off of ec2 and on to lambda is win all around.

I'd like to get the deployment questions figured out.  I know that asking for Terraform or even ansible may be scope bloat but I think picking a method of aws infra deployment that everyone agrees on and sticking with it is important.  As you mentioned there is a lot of deployment parts that need to be worked out (iam, kinesis, sns, etc...).  These are all setup by hand via the consone so we'll need to get the state written out so we have 'infra as code' commited somewhere.

::: testing/vcttesting/deploy.py:95
(Diff revision 5)
>      extra = {'repos': repos}
>  
>      return run_playbook('hgmo-reclone-repos', extra_vars=extra,
>                          verbosity=verbosity)
> +
> +def github_lambda_deploy_package(pulse_password):

Can this be done with Terrafora?  I know that boto has been used elsewhere but after talking with :fubar, I think the preferred method of deployments would be Terraform first and ansible second.

::: testing/vcttesting/deploy.py:137
(Diff revision 5)
> +        return zf.getvalue()
> +    finally:
> +        shutil.rmtree(d)
> +
> +
> +def github_webhook_lambda(pulse_password):

Same here.  Can this be done via terraform?
(In reply to Gregory Szorc [:gps] from comment #11)
> If anyone wants to send GitHub web hooks to the new endpoint, point it at
> https://74rtibg2p2.execute-api.us-west-2.amazonaws.com/prod/webhook.
> 
> At some point, we may want to get a better hostname.

I think there should also be an auth token on gateway endpoint.  Right now, it is wide open to POST.
The mozilla-b2g GitHub org is also sending events to the old endpoint. I have no clue who is an owner of that org and can change appropriately.
(In reply to Jake Watkins [:dividehex] from comment #26)
> I think there should also be an auth token on gateway endpoint.  Right now,
> it is wide open to POST.

GitHub doesn't allow this out of the box with web hooks. Instead, GitHub allows you to define a secret when configuring the webhook. This is fed into HMAC and the digest is sent as an HTTP request header. The receiver can validate the digest and discard any events from unknown secret keys. Alternatively, the receiver could look at the org name and filter accordingly.

The main thing strong auth would buy us is a buffer against unwanted service usage driving up our costs. For that, we may have to develop a custom "Integration" that customizes the delivery. Even then, it appears this relies on web hooks. So I don't think that buys us anything other than security through obscurity. Perhaps the best we can do is not publicly advertise the URL to which we send web hooks. But even then, any GitHub org member with access could discover this, so it isn't great security through obscurity :/
I had to recreate the API Gateway endpoint in order to make Terraform happy.

Lars: Please update the servo org's web hook to https://3abyt2fapj.execute-api.us-west-2.amazonaws.com/prod/webhook

(I hope to get a more permanent DNS entry for this service soon.)
Flags: needinfo?(larsberg)
Updated and a test event appears to have successfully been delivered to the new endpoint.
Flags: needinfo?(larsberg)
Jake: any chance we can get a shiny new mozops.net hostname for the GitHub web hooks ingestion point?

To hook this up to Amazon API Gateway, we'll need a custom domain name and base path mapping defined in Terraform. See https://www.terraform.io/docs/providers/aws/r/api_gateway_base_path_mapping.html. The Route53 CNAME can live in the account mozops.net is delegated to methinks (unless you want to delegate a subdomain to moz-devservices).

I think "github-webhooks-ingest" would be an acceptable hostname. I'm open to suggestions.
Flags: needinfo?(jwatkins)
Comment on attachment 8802358 [details]
github-webhooks: AWS Lambda support for consuming GitHub web hooks (bug 1170600);

https://reviewboard.mozilla.org/r/86772/#review90432

I agree with your assesment of the minimal api use and thanks for bringing the infra into scope with terraform.  Shipit!
Attachment #8802358 - Flags: review?(jwatkins) → review+
Comment on attachment 8802357 [details]
deploy: add boto3 to deploy environment;

https://reviewboard.mozilla.org/r/86770/#review90438

lgtm
Attachment #8802357 - Flags: review?(jwatkins) → review+
Pushed by gszorc@mozilla.com:
https://hg.mozilla.org/hgcustom/version-control-tools/rev/84f1645f7e40
github-webhooks: AWS Lambda support for consuming GitHub web hooks ; r=dividehex
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Let's keep this open to track monitoring, SSL cert, ...
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
(In reply to Gregory Szorc [:gps] from comment #31)
> Jake: any chance we can get a shiny new mozops.net hostname for the GitHub
> web hooks ingestion point?
> 
> To hook this up to Amazon API Gateway, we'll need a custom domain name and
> base path mapping defined in Terraform. See
> https://www.terraform.io/docs/providers/aws/r/api_gateway_base_path_mapping.
> html. The Route53 CNAME can live in the account mozops.net is delegated to
> methinks (unless you want to delegate a subdomain to moz-devservices).
> 
> I think "github-webhooks-ingest" would be an acceptable hostname. I'm open
> to suggestions.

:gps, I've setup a policy page for using mozops.net just to get some foundation laid down.  It includes some examples for adding records to the hosted zone.  Let me know if this works for you.

https://mana.mozilla.org/wiki/display/DEVSERVICES/Using+mozops.net+for+FQDNs
Flags: needinfo?(jwatkins)
(In reply to Jake Watkins [:dividehex] from comment #38)
> :gps, I've setup a policy page for using mozops.net just to get some
> foundation laid down.  It includes some examples for adding records to the
> hosted zone.  Let me know if this works for you.
> 
> https://mana.mozilla.org/wiki/display/DEVSERVICES/Using+mozops.net+for+FQDNs

Yes, this works for me. I'm still not sure exactly what needs to be added where in terraform. Is this something you could help with?
Keywords: leave-open
Duplicate of this bug: 1188464
Bleh. This bug fell off my radar.

The GitHub -> Pulse (via AWS Lambda) seems to be working fine. We just don't have it formally productionized in an operational sense. We can probably do without a dedicated hostname and SSL cert for now. But it would be nice to have some kind of monitoring for this so someone gets paged if it stops working.

Anyway, for Servo VCS sync, I'll build in a fallback to poll GitHub periodically. I don't trust the delivery guarantees of GitHub webhooks and Pulse anyway (webhooks susceptible to network hiccupts, Pulse susceptible to... AMQP). So in the absence of a robust, lossless queue, periodic polling to handle "lost" events is just good engineering design.
:gps
   I'm not an administrator of the mozilla-services github org, I'd recommend hitting up someone in Cloud Services
Flags: needinfo?(gene)
The leave-open keyword is there and there is no activity for 6 months.
:gps, maybe it's time to close this bug?
Flags: needinfo?(gps)
:gps, if this is no longer needed, please let us know - the Servo GH organization is still sending all webhook events to:
https://3abyt2fapj.execute-api.us-west-2.amazonaws.com/prod/webhook
Status: REOPENED → RESOLVED
Closed: 3 years ago10 months ago
Flags: needinfo?(gps)
Resolution: --- → INACTIVE
You need to log in before you can comment on or make changes to this bug.