Fire emails on ping size budget monitoring alerts

NEW
Unassigned

Status

Data Platform and Tools
Monitoring & Alerting
P3
normal
2 years ago
7 months ago

People

(Reporter: gfritzsche, Unassigned)

Tracking

Details

(Whiteboard: [measurement:client:tracking])

(Reporter)

Description

2 years ago
(Mark Reid [:mreid] from bug 1249343, comment 2)
> We'd need to make several changes, but it does seem worthwhile to monitor
> the size of incoming pings.
> 
> The code that computes the aggregates for the budget dashboard also fires
> alerts if we exceed our estimates. Those alerts could be extended to do some
> detection of increases (or decreases) in size or volume of submissions.
(Reporter)

Comment 1

2 years ago
The histogram regression alerts have some relatively stable regression detection, we might be able to lift something off from there?

Comment 2

2 years ago
Georg, can we simply add a regular histogram on the client for the length of the uncompresssed payload?  Then we'd get alerting for free.
Points: --- → 2
Flags: needinfo?(gfritzsche)
Priority: -- → P3
(Reporter)

Comment 3

2 years ago
(In reply to Mark Reid [:mreid] from comment #2)
> Georg, can we simply add a regular histogram on the client for the length of
> the uncompresssed payload?  Then we'd get alerting for free.

Yes, but that wouldn't cover the "core" ping (which has a really constrained set of data points, bug 1249343 will request adding that to the budget monitor).
It would also be nice to not have to trust the clients and monitor the actual incoming data, including meta data / headers.

We do have some "opt-in" measurements already for "too big" pings on the Fx Desktop (TELEMETRY_PING_SIZE_EXCEEDED_SEND, TELEMETRY_DISCARDED_SEND_PINGS_SIZE_MB), we could add more fine-grained ones and request making them opt-out.
Flags: needinfo?(gfritzsche)

Comment 4

9 months ago
:trink and :gfritzsche - does the recently-deployed doctype monitoring solve this use case?
Flags: needinfo?(mtrinkala)
Flags: needinfo?(gfritzsche)
This is what is currently firing.

####
Subject: Hindsight [analysis.moz_telemetry_doctype_monitor_crash#release] - size
MIME-Version: 1.0
Date: Mon, 24 Apr 2017 03:34:07 +0000
From: <hindsight@pipeline-cep.prod.mozaws.net>
To: AlertRecipients <noreply@example.com>
Content-Type: text/plain; charset="iso-8859-1"
X-Mailer: LuaSocket 3.0-rc1
Message-ID: <0101015b9e060a8d-dffaa9ca-66c1-4eca-9e35-571606054795-000000@us-west-2.amazonses.com>
X-SES-Outgoing: 2017.04.24-54.240.27.113
Feedback-ID: 1.us-west-2.9obwqSuHxAmNPKpejVDo3cEAmnSHOVLO3+B/64gdyXQ=:AmazonSES

Hostname: pipeline-cep.prod.mozaws.net
Pid: 54712

The average message size has changed by 101.995% (current avg: 16812B)

graph: https://pipeline-cep.prod.mozaws.net/dashboard_output/graphs/analysis.moz_telemetry_doctype_monitor_crash.size.html
###

Acceptable? Sadly we won't know what the new average size of the crash ping will actually be until well after the 53 roll-out as the average size continues to increase.
Flags: needinfo?(mtrinkala)
(Reporter)

Comment 6

9 months ago
This is great already.
Would it be hard to get a per-docType monitor going?
AFAICT, currently we can't tell which docType is changing based on this monitor alone.
Flags: needinfo?(gfritzsche)

Updated

7 months ago
Component: Metrics: Pipeline → Monitoring & Alerting
Product: Cloud Services → Data Platform and Tools

Comment 7

7 months ago
You can monitor this for any configured docType. Trink, where are the monitored doctypes configured?
Flags: needinfo?(mtrinkala)
You need to log in before you can comment on or make changes to this bug.