Get hekad running on Loop-Server Stage environment

VERIFIED FIXED

Status

Hello (Loop)
Server
VERIFIED FIXED
4 years ago
4 years ago

People

(Reporter: jbonacci, Unassigned)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [qa+])

(Reporter)

Description

4 years ago
because it is not running ;-)
root      9757  0.5  0.0      0     0 ?        Zs   21:05   0:00 [hekad] <defunct>

Should this be running and collecting log info?
(Reporter)

Updated

4 years ago
Whiteboard: [qa+]
This should be collecting log info, yes. It's supposed to replace the statsd implementation, at least.

Benson (or Bob), can you confirm it's setup correctly?
Flags: needinfo?(bwong)
Flags: needinfo?(bobm)
(Reporter)

Comment 2

4 years ago
Changed the title to be more accurate.
We need to get logging and aggregation all hooked up.
Summary: Investigate hekad on Loop-Server Stage environment → Get hekad running on Loop-Server Stage environment
:alexis what encoding (json?) and shema (fxa?) is loop outputting logs to? This is our preferred way: 

- loop server outputs logs on stdout using JSON and a predefined schema
- circus writes stdout logs to a file 
- heka tails log file, parses and sends to our main heka aggregation point 
- we configure the aggregation point to do something with ie: aggregate metrics, send to elasticsearch, etc
Flags: needinfo?(bwong)
We're currently using heka as a transport for statsd. We don't output any json to stdout, we're using sentry for logging instead.

What kind of json logs are you usually sending to stdout in other projects? (Also, that could be useful to have this way of doing things — and all our best practices when it comes to deployment) defined somewhere.
Flags: needinfo?(bobm) → needinfo?(bwong)
Actually we have a stastd/graphite server that loop logs to directly. There is no heka there. We just pushed a change where the nginx logs will be sent via heka to elasticsearch. 

We don't have a standard schema for JSON logging as it is highly application dependent. However, OpSec does have a schema they use for application level logs [1] for mozdev. 


[1] http://mozdef.readthedocs.org/en/latest/usage.html#json-format
Flags: needinfo?(bwong)
(Reporter)

Comment 6

4 years ago
:mostlygeek this is most unusual
Can you drop the Prod and Stage shared links to Graphite here so we can verify this is actually working?
(Reporter)

Comment 7

4 years ago
This seems so .... empty:
https://graphite.shared.us-west-2.prod.mozaws.net

What's the Stage version of this link?
(Reporter)

Comment 8

4 years ago
Also, hmmmm.... this appears to use Persona just to get the dashboard to show.
But Graphite has its own account system - do we need logins for graphite also?
The new loop-server w/ heka for nginx log shipping is deployed on stage now. Here are some appropriate URLs for monitoring stuff: 

- https://graphite.shared.us-east-1.stage.mozaws.net (statsd data)
- https://heka.shared.us-east-1.stage.mozaws.net (shared heka)
- https://kibana.shared.us-east-1.stage.mozaws.net (kibana for looking at the elastic search data)

The prod endpoints when things are all hooked up are: 

- https://graphite.shared.us-west-2.prod.mozaws.net (statsd data)
- https://heka.shared.us-west-2.prod.mozaws.net (shared heka)
- https://kibana.shared.us-west-2.prod.mozaws.net (kibana for looking at the elastic search data)

Also I decided to use our graphite/statsd stack for the statsd output. Shipping it through heka just added extra complexity.
Status: NEW → RESOLVED
Last Resolved: 4 years ago
Resolution: --- → FIXED
I've got user not authorized error so far on these links.
(As a note, that's working for me, see with :whd about that Rémy)
(Reporter)

Comment 14

4 years ago
They all work for me, but here is one very important correction:
WRONG:
- https://sentry.shared.us-east-1.stage.mozaws.net 
- https://sentry.shared.us-west-2.prod.mozaws.net
CORRECT:
- http://sentry.shared.us-east-1.stage.mozaws.net 
- http://sentry.shared.us-west-2.prod.mozaws.net

:mostlygeek did we want the sentry links to be http or https?
(Reporter)

Comment 15

4 years ago
OK, with Stage load running, this site gets populated now:
https://graphite.shared.us-east-1.stage.mozaws.net 
Look under Graphite to see new subcategories: carbon, stats, stats_counts, statsd

This site is updated in real time now:
https://heka.shared.us-east-1.stage.mozaws.net/

These are now working:
https://kibana.shared.us-east-1.stage.mozaws.net
https://kibana.shared.us-east-1.stage.mozaws.net/index.html#/dashboard/file/loop_http_status.json
The percentage error graph (colored circle) is very nice..

This is also showing updates...
https://heka.shared.us-east-1.stage.mozaws.net/#sandboxes/LoopHTTPStatus/outputs/LoopHTTPStatus.HTTPStatus.cbuf

This also looks good:
http://sentry.shared.us-east-1.stage.mozaws.net/loop/loop-stage/

Will assume same for Prod.
Status: RESOLVED → VERIFIED
(Reporter)

Updated

4 years ago
Blocks: 1024222
You need to log in before you can comment on or make changes to this bug.