Bug 1843406 Comment 2 Edit History

Note: The actual edited comment in the bug view page will always show the original commenter’s name and original timestamp.

Looking at the alerts for July 27, we have (harvested by selecting them from the dashboard, copying them here, and manually sorting):
```
nightly	Linux	moments_ping_volume
nightly	Mac	moments_ping_volume
nightly	Windows	moments_ping_volume
nightly	Mac	other_ping_volume
nightly	Mac	spotlight_ping_volume
nightly	Linux	spotlight_ping_volume
nightly	Linux	infobar_ping_volume
aurora	Mac	moments_ping_volume
aurora	Windows	moments_ping_volume
aurora	Linux	moments_ping_volume
beta	Windows	moments_ping_volume
beta	Linux	moments_ping_volume
beta	Mac	moments_ping_volume
esr	Mac	moments_ping_volume
esr	Windows	moments_ping_volume
esr	Linux	moments_ping_volume
esr	Mac	null_ping_volume
esr	Mac	cfr_ping_volume
esr	Mac	spotlight_ping_volume
esr	Linux	infobar_ping_volume
esr	Windows	whats_new_panel_ping_volume
release	Windows	whats_new_panel_ping_volume
release	Linux	whats_new_panel_ping_volume
release	Mac	whats_new_panel_ping_volume
release	Windows	unknown_keys_volume
release	Linux	moments_ping_volume
	Windows	whats_new_panel_ping_volume
Other	Mac	moments_ping_volume
Other	Linux	moments_ping_volume
Other	Mac	other_ping_volume
Other	Mac	null_ping_volume
Other	Mac	spotlight_ping_volume
Other	Mac	whats_new_panel_ping_volume
Other	Windows	whats_new_panel_ping_volume
```

Let's go through them in order:
* Nightly: the only alert across all three OSes is `moments_ping_volume`... which is totally legit. There was an order of magnitude drop on July 22.
* Nightly: Mac `other_ping_volume`: There should not be any.
    * *Proposal:* File and fix a bug for this, like bug 1844360
* Nightly: Linux/Mac `spotlight_ping_volume` - The titanic shift in volume for spotlight pings happened starting July 13 which is now out of the alert window. These two OSes are alerting because the absolute number of these pings is so low that smallish perturbations can trigger the alert.
    * *Proposal:* Not sure. This is a Low Volume Case. Maybe this would flatten out if we looked at it normalized by client volume? Maybe it wouldn't. Maybe we find some way to save alerting on Nightly for messages that are more stable in volume.
* Nightly: Linux `infobar_ping_volume` - Wow, there's usually just 0 of these. So every time there's more than 0 of them, the alert will fire. An extreme case of the low volume case? Or worth looking into since it doesn't have Mac-like volumes and we should expect it to?
    * *Proposal:* File an investigation bug to see if Linux infobars are special. If they are, fold this into Low Volume Case. If they aren't, fix 'em.
* Aurora: `moments_ping_volume` - Again, totally legit due to a cliff edge on July 22.
* Beta: `moments_ping_volume` - A third time: totally legit due to a cliff edge on July 22.
* ESR: `moments_ping_volume` - A fourth time: totally legit due to a cliff edge on July 22.
* ESR: Mac `null`, `cfr`, and `spotlight` - starts July 16 and so almost certainly reflects the increase in overall population due to migration from release to ESR. We can ignore this.
* ESR: Linux `infobar_ping_volume` - Huh, another instance of "Are Linux infobars special"
* ESR: Windows `whats_new_panel_ping_volume` Unlike the other messages spiking due to old Windows' migration from release to esr, the Whats New Panel spiked and this alert is for the subsequent levelling down. Could be due to the "channel switch updates cause client_id reset" incident (now thought to be resolved). Wait and see on this one.
* Release: Windows/Mac/Linux all spiking on `whatsnew`, huh. This is post-release-week levelling-off of messages. We see these happen more sharply on other channels where releases go out unfettered - on release channel, they're drawn out enough for the window to get properly used to them being here.
    * *Proposal:* Education fix. This is something to expect, and is totally cool. Worth it to have the regular cadence of alertness so that we catch weirdness between releases.
* Release: Windows `unknown_keys_volume` this is bug 1844360
* Release: Linux `moments_ping_volume` . It exhibits a large double-spike... which is weird. What's weirder is that all of the OSes are exhibiting this, and only Linux is small enough to alert on it on this day (the others, with two spikes, have their averages sufficiently high to allow for the 27th's low to fall inside the tolerance). I don't know what's causing this.
    * *Proposal:* File an investigation bug. Maybe some messaging's been going out this past week or two?
* We're alerting on `null` `normalized_channel`
    * *Proposal:* Filter out null `normalized_channel` from monitoring and alerts. They're so small _any_ data that slips through will alert. And there's nothing we'll intend to do about it.
* We're alerting on `Other` `normalized_channel`
    * *Proposal:* The same as for `null`, only bigger.

In conclusion: Mostly reactions to real things happening, which is exactly what we want from alerts. A couple of things that might warrant additional investigation in follow-ups. And then there's the Low Volume Case... how to deal with legitimately-low and legitimately-noisy volumes of messages? This is probably the one dashboard-inherent quirk we should consider adjusting.

But before we get to that, this is just one day's alerts. I shall perform this same analysis on Monday to see if we get spurious weekend alerts or in case some other avenues of discussion crop up when examining multiple days' alert loads.
Looking at the alerts for July 27, we have (harvested by selecting them from the dashboard, copying them here, and manually sorting):
```
nightly	Linux	moments_ping_volume
nightly	Mac	moments_ping_volume
nightly	Windows	moments_ping_volume
nightly	Mac	other_ping_volume
nightly	Mac	spotlight_ping_volume
nightly	Linux	spotlight_ping_volume
nightly	Linux	infobar_ping_volume
aurora	Mac	moments_ping_volume
aurora	Windows	moments_ping_volume
aurora	Linux	moments_ping_volume
beta	Windows	moments_ping_volume
beta	Linux	moments_ping_volume
beta	Mac	moments_ping_volume
esr	Mac	moments_ping_volume
esr	Windows	moments_ping_volume
esr	Linux	moments_ping_volume
esr	Mac	null_ping_volume
esr	Mac	cfr_ping_volume
esr	Mac	spotlight_ping_volume
esr	Linux	infobar_ping_volume
esr	Windows	whats_new_panel_ping_volume
release	Windows	whats_new_panel_ping_volume
release	Linux	whats_new_panel_ping_volume
release	Mac	whats_new_panel_ping_volume
release	Windows	unknown_keys_volume
release	Linux	moments_ping_volume
	Windows	whats_new_panel_ping_volume
Other	Mac	moments_ping_volume
Other	Linux	moments_ping_volume
Other	Mac	other_ping_volume
Other	Mac	null_ping_volume
Other	Mac	spotlight_ping_volume
Other	Mac	whats_new_panel_ping_volume
Other	Windows	whats_new_panel_ping_volume
```

Let's go through them in order:
* Nightly: the only alert across all three OSes is `moments_ping_volume`... which is totally legit. There was an order of magnitude drop on July 22.
* Nightly: Mac `other_ping_volume`: There should not be any.
    * *Proposal:* File and fix a bug for this, like bug 1844360
* Nightly: Linux/Mac `spotlight_ping_volume` - The titanic shift in volume for spotlight pings happened starting July 13 which is now out of the alert window. These two OSes are alerting because the absolute number of these pings is so low that smallish perturbations can trigger the alert.
    * *Proposal:* Not sure. This is a Low Volume Case. Maybe this would flatten out if we looked at it normalized by client volume? Maybe it wouldn't. Maybe we find some way to save alerting on Nightly for messages that are more stable in volume.
* Nightly: Linux `infobar_ping_volume` - Wow, there's usually just 0 of these. So every time there's more than 0 of them, the alert will fire. An extreme case of the low volume case? Or worth looking into since it doesn't have Mac-like volumes and we should expect it to?
    * *Proposal:* File an investigation bug to see if Linux infobars are special. If they are, fold this into Low Volume Case. If they aren't, fix 'em.
* Aurora: `moments_ping_volume` - Again, totally legit due to a cliff edge on July 22.
* Beta: `moments_ping_volume` - A third time: totally legit due to a cliff edge on July 22.
* ESR: `moments_ping_volume` - A fourth time: totally legit due to a cliff edge on July 22.
* ESR: Mac `null`, `cfr`, and `spotlight` - starts July 16 and so almost certainly reflects the increase in overall population due to migration from release to ESR. We can ignore this.
* ESR: Linux `infobar_ping_volume` - Huh, another instance of "Are Linux infobars special"
* ESR: Windows `whats_new_panel_ping_volume` Unlike the other messages spiking due to old Windows' migration from release to esr, the Whats New Panel spiked and this alert is for the subsequent levelling down. Could be due to the "channel switch updates cause client_id reset" incident (now thought to be resolved). Wait and see on this one.
* Release: Windows/Mac/Linux all spiking on `whatsnew`, huh. This is post-release-week levelling-off of messages. We see these happen more sharply on other channels where releases go out unfettered - on release channel, they're drawn out enough for the window to get properly used to them being here.
    * *Proposal:* Education fix. This is something to expect, and is totally cool. Worth it to have the regular cadence of alertness so that we catch weirdness between releases.
* Release: Windows `unknown_keys_volume` this is bug 1844360
* Release: Linux `moments_ping_volume` . It exhibits a large double-spike... which is weird. What's weirder is that all of the OSes are exhibiting this, and only Linux is small enough to alert on it on this day (the others, with two spikes, have their averages sufficiently high to allow for the 27th's low to fall inside the tolerance). I don't know what's causing this.
    * *Proposal:* File an investigation bug. Maybe some messaging's been going out this past week or two?
* We're alerting on `null` `normalized_channel`
    * *Proposal:* Filter out null `normalized_channel` from monitoring and alerts. They're so small _any_ data that slips through will alert. And there's nothing we'll intend to do about it.
* We're alerting on `Other` `normalized_channel`
    * *Proposal:* The same as for `null`, only bigger.

In conclusion: Mostly reactions to real things happening, which is exactly what we want from alerts. A couple of things that might warrant additional investigation in follow-ups. A few things we should definitely update the dashboard to filter out. And then there's the Low Volume Case... how to deal with legitimately-low and legitimately-noisy volumes of messages? This is probably the one dashboard-inherent quirk we should consider adjusting.

But before we get to that, this is just one day's alerts. I shall perform this same analysis on Monday to see if we get spurious weekend alerts or in case some other avenues of discussion crop up when examining multiple days' alert loads.

Back to Bug 1843406 Comment 2