InfluxDB Onboarding for performance team
Categories
(Cloud Services :: Operations: Metrics/Monitoring, task)
Tracking
(Not tracked)
People
(Reporter: bdekoz, Assigned: brian, Mentored)
References
Details
User Story
fenix-fe-perf, fenix-perf, performance, android, fenix
+++ This bug was initially created as a clone of Bug #1533837 +++
General
- What are the Mozilla email addresses of the people who need to manage dashboards and alerts for your team?
aerickson is also working on dashboards and alerts for another part of our team's coverage area.
- What is a brief description of your use case, or how are you planning to use InfluxDB?
We are collecting startup performance using an instrumented test harness, currently named FNPRMS. Each day, the current build startup time is recorded for two metrics, time to view from an applink intent (view metric), and time to main/home after onboarding (main-post-onboard metric).
https://github.com/mozilla-mobile/FNPRMS
There is a server host in the SF Office with 4 android phones, running that harness every night against two Fenix binaries and logging results to a local influx database. That dashboard is working but needs to move to a host where others can access it.
- When do you want to start on your implementation?
Already logging locally on quint.corp.sfo1.mozilla.com.
Writes
- How many Data Sources do you have and what types are they? If you are monitoring hosts then how many hosts will you be monitoring?
data sources: 4 phones x 2 binaries x 2 metrics, each day
- What is your proposed architecture for ingesting data? E.G. where will you run telegraf, what forwards to telegraf vs writing directly to influxdb, etc.
We use chron to schedule the harness, and then calculate per-device metric values, and send it to the influx db instance using the CURL api.
- How many values per second or minute will you be ingesting into the DB?
Once a day at noon, each data source above.
- If you are generating the data, what does your data look like (please enter a sample of the data if possible)?
line format:
for metric ( view, main-pre-onboard, main-post-onboard)
for devices (samsung-galaxy-s10, samsung-galaxy-s7, pixel-2-xl, pixel-4-xl)
for products (fennec-nightly, fenix-nightly)
DB="http://localhost:8086/write?db=${DBI}&precision=s"
DATA="${METRIC},device=${DEVICE},product=${PRODUCTID} value=${VALUE} ${DATE}"
curl -i -XPOST "${DB}" --data-binary "${DATA}"
- How often will you be sending data to the database? Every few seconds, minutes, etc.
Once a day. We've got serialized data since Feb, 2020 that we will populate in bulk at the beginning, or prior to tranfer.
- How many unique series do you think you will have?
?
Reads
- What sort of query volume do you anticipate having (queries/sec or min)?
Only used by grafana dashboards right now. Depends on the number of consumers. Right now I'd say there are less than 10 users of the dashboard.
- How many concurrent queries do you expect to have?
Less than 5.
- What types of queries do you expect to run (dashboards or analytics)?
Dashboards/grafana.
- How many users will be using the dashboards at any point in time?
Less than 10.
- Do you have any requirements for response times on these queries?
Not anything ridiculous.
- What sort of other analytic or downsampling jobs do you expect to have?
None planned.
Retention
- How long do you need to retain the data at full resolution? For example, 15 days
I'm not sure on questions in this section... the temptation is to say full resolution forever. These are my best guesses.
Full resolution would be handy for 365 days. We use this data to evaluate the performance of Fenix, so whatever use-retention that telemetry data uses, but since this is lab data there is no privacy concern.... Since there is already a serialized data format (gzipped JSON) I'm not quite sure it matters.
- Do you need to keep downsampled data for longer? If so, on what schedule? For example, 5 minute averages for 63 days, 1 hour averages for 366 days
1 day averages would be nice for a year, 2-year.
Processing, analytics, alerting and taking action
- What kind of processing, if any, is required on incoming data?
None.
- Do you do any alerting based on patterns, anomalies or thresholds crossed?
Yes, grafana has alerts configured on the data.
| Reporter | ||
Updated•6 years ago
|
| Assignee | ||
Comment 1•6 years ago
|
||
Thanks for the details! This sounds fine to put in the existing relops database that aerickson is using already. Does that sound good to you? It has full resolution data for 1 year. The cardinality there is so low we can certainly increase that to 2 years if needed.
| Assignee | ||
Updated•6 years ago
|
| Reporter | ||
Comment 2•6 years ago
|
||
Hmm. I'd prefer a separate instance, but will let Andrew weigh in.
| Assignee | ||
Comment 3•6 years ago
|
||
Can you elaborate on why? There is resource overhead for each database we create. Ideally we'll just have a few per team, dictated by retention requirements.
Comment 4•6 years ago
|
||
- We're on separate teams (Relops vs Perf).
- Ben's team doesn't have any existing databases that I'm aware of.
- My data is around test worker uptime and performance, Ben's data is Firefox test data regarding performance.
| Assignee | ||
Comment 5•6 years ago
|
||
Oh, I'm sorry! I misunderstood what you meant when you said "aerickson is also working on dashboards and alerts for another part of our team's coverage area" and thought that you were also on relops, Ben.
There is a releng database you can use. I will send you credentials via Slack.
| Assignee | ||
Updated•6 years ago
|
| Assignee | ||
Updated•6 years ago
|
| Assignee | ||
Comment 6•6 years ago
|
||
Turns out I was still confused about the team. I created a new performance database and users, since that is the team Ben says he is on.
| Reporter | ||
Comment 7•6 years ago
|
||
Ok! This is up and working now, thanks for your help.
Description
•