Closed Bug 1252050 Opened 8 years ago Closed 3 years ago

Fire emails on ping size budget monitoring alerts

Categories

(Data Platform and Tools :: Monitoring & Alerting, defect, P3)

defect
Points:
2

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: gfritzsche, Unassigned)

Details

(Whiteboard: [measurement:client:tracking])

(Mark Reid [:mreid] from bug 1249343, comment 2)
> We'd need to make several changes, but it does seem worthwhile to monitor
> the size of incoming pings.
> 
> The code that computes the aggregates for the budget dashboard also fires
> alerts if we exceed our estimates. Those alerts could be extended to do some
> detection of increases (or decreases) in size or volume of submissions.
The histogram regression alerts have some relatively stable regression detection, we might be able to lift something off from there?
Georg, can we simply add a regular histogram on the client for the length of the uncompresssed payload?  Then we'd get alerting for free.
Points: --- → 2
Flags: needinfo?(gfritzsche)
Priority: -- → P3
(In reply to Mark Reid [:mreid] from comment #2)
> Georg, can we simply add a regular histogram on the client for the length of
> the uncompresssed payload?  Then we'd get alerting for free.

Yes, but that wouldn't cover the "core" ping (which has a really constrained set of data points, bug 1249343 will request adding that to the budget monitor).
It would also be nice to not have to trust the clients and monitor the actual incoming data, including meta data / headers.

We do have some "opt-in" measurements already for "too big" pings on the Fx Desktop (TELEMETRY_PING_SIZE_EXCEEDED_SEND, TELEMETRY_DISCARDED_SEND_PINGS_SIZE_MB), we could add more fine-grained ones and request making them opt-out.
Flags: needinfo?(gfritzsche)
:trink and :gfritzsche - does the recently-deployed doctype monitoring solve this use case?
Flags: needinfo?(mtrinkala)
Flags: needinfo?(gfritzsche)
This is what is currently firing.

####
Subject: Hindsight [analysis.moz_telemetry_doctype_monitor_crash#release] - size
MIME-Version: 1.0
Date: Mon, 24 Apr 2017 03:34:07 +0000
From: <hindsight@pipeline-cep.prod.mozaws.net>
To: AlertRecipients <noreply@example.com>
Content-Type: text/plain; charset="iso-8859-1"
X-Mailer: LuaSocket 3.0-rc1
Message-ID: <0101015b9e060a8d-dffaa9ca-66c1-4eca-9e35-571606054795-000000@us-west-2.amazonses.com>
X-SES-Outgoing: 2017.04.24-54.240.27.113
Feedback-ID: 1.us-west-2.9obwqSuHxAmNPKpejVDo3cEAmnSHOVLO3+B/64gdyXQ=:AmazonSES

Hostname: pipeline-cep.prod.mozaws.net
Pid: 54712

The average message size has changed by 101.995% (current avg: 16812B)

graph: https://pipeline-cep.prod.mozaws.net/dashboard_output/graphs/analysis.moz_telemetry_doctype_monitor_crash.size.html
###

Acceptable? Sadly we won't know what the new average size of the crash ping will actually be until well after the 53 roll-out as the average size continues to increase.
Flags: needinfo?(mtrinkala)
This is great already.
Would it be hard to get a per-docType monitor going?
AFAICT, currently we can't tell which docType is changing based on this monitor alone.
Flags: needinfo?(gfritzsche)
Component: Metrics: Pipeline → Monitoring & Alerting
Product: Cloud Services → Data Platform and Tools
You can monitor this for any configured docType. Trink, where are the monitored doctypes configured?
Flags: needinfo?(mtrinkala)

we have a working process to manage size now- also legacy telemetry is migrating to Glean in foreseeable future

Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.