Closed Bug 1155044 Opened 9 years ago Closed 9 years ago

Setup robust logging service, using ELK or heka, for new AWS infrastructure

Categories

(Socorro :: Infra, task)

x86
macOS
task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: jschneider, Assigned: jschneider)

References

Details

Setup either SQS-based or syslog-based logging service for AWS.  I'm leaning toward ELK, but am open to thoughts!

After the service is stood up, we'll need to:
* setup log shipping on our staging infrastructure
* setup logstash inputs to accomodate incoming info
* determine if we want logging such as system, puppet, auth, syslog routed as well.
* Make all of the dashboards
Blocks: 1123833
I have got this started.  What I've done:

1) Created a templated-in-us-west-2 and us-east-1 m3.2xl Elasticsearch cluster from these directions (https://jdotpz.github.io/#gh-weblog-1402498716135).  This is essentially capable of careful autoscaling horizontally.  Made this into an autoscale group: prod__loggins_elasticsearch-as 
2) Created a loggins-elasticsearch-ec2-sg security group to allow for like servers to auto discover each other with the aws discovery plugin
3) Created a prod__loggins_ec2 security group, more in line with naming standards, which will host syslog collection over UDP 1514 and allow an ELB to access it via https/http.  This server proxies the elasticsearch cluster, so that will not be open to the public.
4) Created an ELB to sit in front of Kibana.  prod--elb-for-logstash, along with an associated prod__loggins_elb_sg security group allowing 80/443
5) Created a templated-in-us-west-2 logstash/kibana combo server which will process logs over UDP syslog, and serve kibana traffic over the SSL-protected and basic-auth protected nginx proxy.  Made this into an autoscaling group, loading servers into the created prod--elb-for-logstash server.  Put this in a prod__loggins_ec2_sg security group, allowing UDP/TCP 1514 from anywhere (for now), and the associated ELB 80/443 traversal.

Jeez, I probably did more than that, but that's as much as I can remember to document. :)
Assignee: nobody → jschneider
DNS entries associated with this:

loggins-es.mocotoolsprod.net -- Elasticsearch master server, CNAMEd to the EC2 instance.
loggins.mocotoolsprod.net -- The end user end point for viewing Kibana and reports.  Pointed to the elb for kibana.
logshipper.mocotoolsprod.net -- Where we'll point rsyslog to for shipment of the logs.  Pointed to the ec2 address of the logstash/kibana instance.
Alright, got the node all configured.

* http://loggins.mocotoolsprod.net/#/dashboard/file/default.json is happytime with a default dashboard.  
* I've redone the AMI for the logstash/kibana combo server after I fixed the ES config in /etc/logstash/logstash_syslog.conf and /var/www/config.js.  I've applied that AMI to the autoscaling group.
* I smoked the hell out of a cigarette.
Next steps:  We should send it logs from nginx, and potentially syslog.

Rsyslog endpoint will be:  logshipper.mocotoolsprod.net on UDP 1514.  We can also use GELF if we want to be fancy, I'd just have to open up the ports.
Depends on: 1155071
See Also: → 1155071
Blocks: 1118288
No longer blocks: 1123833
We're currently logging all via syslog to our loggins.mocotoolsprod.net server (use https, let me know if you need the u/n).  

Next steps / Down the road
1) We want to setup logstash forwarder to enable better tagging/parsing of disparate log types
2) Setup good dashboards based on that tagged/parsed log stream
3) Setup alerting / Hooks into datadog possibly
4) Look a scaling / hosted options for these logs/services.
(In reply to JP Schneider [:jp] from comment #5)
> We're currently logging all via syslog to our loggins.mocotoolsprod.net
> server (use https, let me know if you need the u/n).  

Please add these credentials to our shared LastPass.
Flags: needinfo?(jschneider)
I'm gonna change those creds (they came with my ami) and then I shall.
Flags: needinfo?(jschneider)
Shared in lastpass!
I simultaneously upsized the node since we were exercising that "demo" node a bit hard.
What's left before we can go live?
Flags: needinfo?(jschneider)
Closing this bug.
Status: NEW → RESOLVED
Closed: 9 years ago
Flags: needinfo?(jschneider)
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.