Closed Bug 1639416 Opened 6 years ago Closed 6 years ago

InfluxDB Onboarding for performance team

Tracking

(Not tracked)

Status:

RESOLVED FIXED

People

(Reporter: bdekoz, Assigned: brian, Mentored)

References

Details

User Story

fenix-fe-perf, fenix-perf, performance, android, fenix

Benjamin De Kosnik [:bdekoz]

Reporter

Description

•

6 years ago

•

Edited

+++ This bug was initially created as a clone of Bug #1533837 +++

General

What are the Mozilla email addresses of the people who need to manage dashboards and alerts for your team?

bdekoz@mozilla.com

aerickson is also working on dashboards and alerts for another part of our team's coverage area.

What is a brief description of your use case, or how are you planning to use InfluxDB?

We are collecting startup performance using an instrumented test harness, currently named FNPRMS. Each day, the current build startup time is recorded for two metrics, time to view from an applink intent (view metric), and time to main/home after onboarding (main-post-onboard metric).

https://github.com/mozilla-mobile/FNPRMS

There is a server host in the SF Office with 4 android phones, running that harness every night against two Fenix binaries and logging results to a local influx database. That dashboard is working but needs to move to a host where others can access it.

See:
http://quint.corp.sfo1.mozilla.com:3000/d/nMQrd_RGk/andy-sandbox?panelId=2&edit&fullscreen&orgId=1&from=now-30d&to=now

When do you want to start on your implementation?

Already logging locally on quint.corp.sfo1.mozilla.com.

Writes

How many Data Sources do you have and what types are they? If you are monitoring hosts then how many hosts will you be monitoring?

data sources: 4 phones x 2 binaries x 2 metrics, each day

What is your proposed architecture for ingesting data? E.G. where will you run telegraf, what forwards to telegraf vs writing directly to influxdb, etc.

We use chron to schedule the harness, and then calculate per-device metric values, and send it to the influx db instance using the CURL api.

How many values per second or minute will you be ingesting into the DB?

Once a day at noon, each data source above.

If you are generating the data, what does your data look like (please enter a sample of the data if possible)?

line format:

for metric ( view, main-pre-onboard, main-post-onboard)
for devices (samsung-galaxy-s10, samsung-galaxy-s7, pixel-2-xl, pixel-4-xl)
for products (fennec-nightly, fenix-nightly)

DB="http://localhost:8086/write?db=${DBI}&precision=s"
DATA="${METRIC},device=${DEVICE},product=${PRODUCTID} value=${VALUE} ${DATE}"

curl -i -XPOST "${DB}" --data-binary "${DATA}"

How often will you be sending data to the database? Every few seconds, minutes, etc.

Once a day. We've got serialized data since Feb, 2020 that we will populate in bulk at the beginning, or prior to tranfer.

How many unique series do you think you will have?

Reads

What sort of query volume do you anticipate having (queries/sec or min)?

Only used by grafana dashboards right now. Depends on the number of consumers. Right now I'd say there are less than 10 users of the dashboard.

How many concurrent queries do you expect to have?

Less than 5.

What types of queries do you expect to run (dashboards or analytics)?

Dashboards/grafana.

How many users will be using the dashboards at any point in time?

Less than 10.

Do you have any requirements for response times on these queries?

Not anything ridiculous.

What sort of other analytic or downsampling jobs do you expect to have?

None planned.

Retention

How long do you need to retain the data at full resolution? For example, 15 days

I'm not sure on questions in this section... the temptation is to say full resolution forever. These are my best guesses.

Full resolution would be handy for 365 days. We use this data to evaluate the performance of Fenix, so whatever use-retention that telemetry data uses, but since this is lab data there is no privacy concern.... Since there is already a serialized data format (gzipped JSON) I'm not quite sure it matters.

Do you need to keep downsampled data for longer? If so, on what schedule? For example, 5 minute averages for 63 days, 1 hour averages for 366 days

1 day averages would be nice for a year, 2-year.

Processing, analytics, alerting and taking action

What kind of processing, if any, is required on incoming data?

None.

Do you do any alerting based on patterns, anomalies or thresholds crossed?

Yes, grafana has alerts configured on the data.

Benjamin De Kosnik [:bdekoz]

Reporter

Updated

•

6 years ago

Summary: InfluxDB Onboarding for relops (fenix perf android stuff) → InfluxDB Onboarding for perf (fenix startup android stuff)

Brian Pitts

Assignee

Comment 1

•

6 years ago

Thanks for the details! This sounds fine to put in the existing relops database that aerickson is using already. Does that sound good to you? It has full resolution data for 1 year. The cardinality there is so low we can certainly increase that to 2 years if needed.

Flags: needinfo?(bdekoz)

Brian Pitts

Assignee

Updated

•

6 years ago

Status: NEW → ASSIGNED

Benjamin De Kosnik [:bdekoz]

Reporter

Comment 2

•

6 years ago

Hmm. I'd prefer a separate instance, but will let Andrew weigh in.

Flags: needinfo?(bdekoz) → needinfo?(aerickson)

Brian Pitts

Assignee

Comment 3

•

6 years ago

Can you elaborate on why? There is resource overhead for each database we create. Ideally we'll just have a few per team, dictated by retention requirements.

Andrew Erickson [:aerickson]

Comment 4

•

6 years ago

We're on separate teams (Relops vs Perf).
- Ben's team doesn't have any existing databases that I'm aware of.
My data is around test worker uptime and performance, Ben's data is Firefox test data regarding performance.

Flags: needinfo?(aerickson)

Brian Pitts

Assignee

Comment 5

•

6 years ago

Oh, I'm sorry! I misunderstood what you meant when you said "aerickson is also working on dashboards and alerts for another part of our team's coverage area" and thought that you were also on relops, Ben.

There is a releng database you can use. I will send you credentials via Slack.

Brian Pitts

Assignee

Updated

•

6 years ago

Status: ASSIGNED → RESOLVED

Closed: 6 years ago

Resolution: --- → FIXED

Brian Pitts

Assignee

Updated

•

6 years ago

Summary: InfluxDB Onboarding for perf (fenix startup android stuff) → InfluxDB Onboarding for performance team

Brian Pitts

Assignee

Comment 6

•

6 years ago

Turns out I was still confused about the team. I created a new performance database and users, since that is the team Ben says he is on.

Benjamin De Kosnik [:bdekoz]

Reporter

Comment 7

•

6 years ago

Ok! This is up and working now, thanks for your help.

You need to log in before you can comment on or make changes to this bug.

Bugzilla

InfluxDB Onboarding for performance team

Categories

(Cloud Services :: Operations: Metrics/Monitoring, task)

Tracking

(Not tracked)

People

(Reporter: bdekoz, Assigned: brian, Mentored)

References

Details

Crash Data

Security

(public)

User Story

Description

General

Reads

Retention

Processing, analytics, alerting and taking action

Updated

Comment 1

Updated

Comment 2

Comment 3

Comment 4

Comment 5

Updated

Updated

Comment 6

Comment 7