Closed Bug 1020596 Opened 10 years ago Closed 10 years ago

Get hekad running on Loop-Server Stage environment

Categories

(Hello (Loop) :: Server, defect)

defect
Not set
normal

Tracking

(Not tracked)

VERIFIED FIXED

People

(Reporter: jbonacci, Unassigned)

References

Details

(Whiteboard: [qa+])

because it is not running ;-)
root      9757  0.5  0.0      0     0 ?        Zs   21:05   0:00 [hekad] <defunct>

Should this be running and collecting log info?
Whiteboard: [qa+]
This should be collecting log info, yes. It's supposed to replace the statsd implementation, at least.

Benson (or Bob), can you confirm it's setup correctly?
Flags: needinfo?(bwong)
Flags: needinfo?(bobm)
Changed the title to be more accurate.
We need to get logging and aggregation all hooked up.
Summary: Investigate hekad on Loop-Server Stage environment → Get hekad running on Loop-Server Stage environment
:alexis what encoding (json?) and shema (fxa?) is loop outputting logs to? This is our preferred way: 

- loop server outputs logs on stdout using JSON and a predefined schema
- circus writes stdout logs to a file 
- heka tails log file, parses and sends to our main heka aggregation point 
- we configure the aggregation point to do something with ie: aggregate metrics, send to elasticsearch, etc
Flags: needinfo?(bwong)
We're currently using heka as a transport for statsd. We don't output any json to stdout, we're using sentry for logging instead.

What kind of json logs are you usually sending to stdout in other projects? (Also, that could be useful to have this way of doing things — and all our best practices when it comes to deployment) defined somewhere.
Flags: needinfo?(bobm) → needinfo?(bwong)
Actually we have a stastd/graphite server that loop logs to directly. There is no heka there. We just pushed a change where the nginx logs will be sent via heka to elasticsearch. 

We don't have a standard schema for JSON logging as it is highly application dependent. However, OpSec does have a schema they use for application level logs [1] for mozdev. 


[1] http://mozdef.readthedocs.org/en/latest/usage.html#json-format
Flags: needinfo?(bwong)
:mostlygeek this is most unusual
Can you drop the Prod and Stage shared links to Graphite here so we can verify this is actually working?
This seems so .... empty:
https://graphite.shared.us-west-2.prod.mozaws.net

What's the Stage version of this link?
Also, hmmmm.... this appears to use Persona just to get the dashboard to show.
But Graphite has its own account system - do we need logins for graphite also?
The new loop-server w/ heka for nginx log shipping is deployed on stage now. Here are some appropriate URLs for monitoring stuff: 

- https://graphite.shared.us-east-1.stage.mozaws.net (statsd data)
- https://heka.shared.us-east-1.stage.mozaws.net (shared heka)
- https://kibana.shared.us-east-1.stage.mozaws.net (kibana for looking at the elastic search data)

The prod endpoints when things are all hooked up are: 

- https://graphite.shared.us-west-2.prod.mozaws.net (statsd data)
- https://heka.shared.us-west-2.prod.mozaws.net (shared heka)
- https://kibana.shared.us-west-2.prod.mozaws.net (kibana for looking at the elastic search data)

Also I decided to use our graphite/statsd stack for the statsd output. Shipping it through heka just added extra complexity.
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
I've got user not authorized error so far on these links.
(As a note, that's working for me, see with :whd about that Rémy)
They all work for me, but here is one very important correction:
WRONG:
- https://sentry.shared.us-east-1.stage.mozaws.net 
- https://sentry.shared.us-west-2.prod.mozaws.net
CORRECT:
- http://sentry.shared.us-east-1.stage.mozaws.net 
- http://sentry.shared.us-west-2.prod.mozaws.net

:mostlygeek did we want the sentry links to be http or https?
OK, with Stage load running, this site gets populated now:
https://graphite.shared.us-east-1.stage.mozaws.net 
Look under Graphite to see new subcategories: carbon, stats, stats_counts, statsd

This site is updated in real time now:
https://heka.shared.us-east-1.stage.mozaws.net/

These are now working:
https://kibana.shared.us-east-1.stage.mozaws.net
https://kibana.shared.us-east-1.stage.mozaws.net/index.html#/dashboard/file/loop_http_status.json
The percentage error graph (colored circle) is very nice..

This is also showing updates...
https://heka.shared.us-east-1.stage.mozaws.net/#sandboxes/LoopHTTPStatus/outputs/LoopHTTPStatus.HTTPStatus.cbuf

This also looks good:
http://sentry.shared.us-east-1.stage.mozaws.net/loop/loop-stage/

Will assume same for Prod.
Status: RESOLVED → VERIFIED
Blocks: 1024222
You need to log in before you can comment on or make changes to this bug.