Monitoring and Performance needs for Firefox Account in Stage and Production

VERIFIED FIXED

Status

P2
blocker
VERIFIED FIXED
5 years ago
5 years ago

People

(Reporter: jbonacci, Unassigned)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [qa+])

Attachments

(3 attachments)

(Reporter)

Description

5 years ago
Dev, QA, OPs needs for monitoring the correct data, including requirements for metrics.
(Reporter)

Updated

5 years ago
Blocks: 907494
Whiteboard: [qa+]
(Reporter)

Comment 2

5 years ago
This covers most non-metrics-specific needs: data generation and collection, logs, monitoring (general and for load), etc...

Associated GitHub links:
https://github.com/mozilla/fxa-auth-server/issues/292
https://github.com/mozilla/fxa-auth-server/issues/372
https://github.com/mozilla/fxa-auth-server/pull/376
https://github.com/mozilla/fxa-auth-server/issues/351
https://github.com/mozilla/fxa-auth-server/issues/349
https://github.com/mozilla/fxa-auth-server/issues/312
https://github.com/mozilla/fxa-auth-server/issues/17
https://github.com/mozilla/fxa-auth-server/pull/28
https://github.com/mozilla/fxa-auth-server/pull/159
https://github.com/mozilla/fxa-auth-server/issues/222
https://github.com/mozilla/fxa-auth-server/issues/205
https://github.com/mozilla/fxa-auth-server/issues/349
https://github.com/mozilla/fxa-auth-server/issues/30


Note: not including relevant links from fxa-content-server or fxa-scrypt-helper at this time...

Notes from the 12/5 meeting with Dev and OPs:
We will have a heka-ES-kibana set up for Stage and Production all through aggregated logger.
OPsView - what persona uses for availability monitoring
CloudWatch - perf monitoring
(More options available if we continue to use Heka in AWS)
(Reporter)

Comment 3

5 years ago
Also, the api.md file has some good details about errors and error handling:
https://github.com/mozilla/fxa-auth-server/blob/master/docs/api.md
(Reporter)

Comment 4

5 years ago
Also, useful is the following GitHub repo that gives some current methods for deploying Dev and Load test environments, use of Heka for data gathering, use of a log aggregator, etc.
https://github.com/mozilla/fxa-deployment

Here is a working Heka Dashboard from a Dev site - so you can see the types of data tracked:
http://ec2-50-112-66-71.us-west-2.compute.amazonaws.com:4352/

And, I am attached 3 screen captures of a working Kibana dashboard for the load test stack.
The first two just show the general layout of the dashboard and the type of data graphed or tabled.
The third image shows a detail of one of the GETs.
(Reporter)

Comment 5

5 years ago
Created attachment 8344924 [details]
Kibana-LoadTestStack1.jpg
(Reporter)

Comment 6

5 years ago
Created attachment 8344925 [details]
Kibana-LoadTestStack2.jpg
(Reporter)

Comment 7

5 years ago
Created attachment 8344926 [details]
Kibana-LoadTestStack3.jpg
(Reporter)

Comment 8

5 years ago
Working OPs-style monitoring for Persona is documented here:
https://github.com/mozilla/identity-ops/wiki/Access%20Guide#monitoring

We will be starting with the same basic model for FxA, I believe...
One thing missing from our current kibana setup is good latency monitoring, e.g. graphs of mean and peak request latency, db query running time, etc.
(Reporter)

Comment 10

5 years ago
:rfkelly - as POC, can this be set up quickly or at all for the load test stack?
Otherwise, we can certainly see what Persona has in terms of latency monitoring and go from there.
(Reporter)

Comment 11

5 years ago
Nice reference from the GeoLocation project for hekad --> carbon/graphite:
https://github.com/mozilla/ichnaea/pull/59/files
(Reporter)

Comment 12

5 years ago
Another good location for OPs-specific issues and resolution:
https://github.com/mozilla-services/puppet-config
Blocks: 949267
bumping priority here.
Severity: normal → blocker
Priority: -- → P2
(Reporter)

Comment 14

5 years ago
This is a work in progress.
Most of your monitoring is happening via StackDriver.
Logging is set up to run through to the dashboard.
Work is still being done on that and on adding more metrics, more items to track and log.
See the puppet-config GitHub repo...
Assignee: rfkelly → nobody
No longer blocks: 907494
(Reporter)

Comment 15

5 years ago
I think this is done.
I opened it, I am closing it.
We have Heka dashboards, Kibana dashboards, and Stackdriver now for Stage and Prod.
Status: NEW → RESOLVED
Last Resolved: 5 years ago
Resolution: --- → FIXED
(Reporter)

Updated

5 years ago
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.