In bug 1382596 DLC downloads failed for all clients. We noticed this 2 weeks later "accidentally" when looking at the telemetry dashboard. Re:dash allows us to setup alerts. This should be pretty straight-forward with the telemetry we already collect. We just need to decide who to notify (maybe a mailing list?).
I've setup two alerts in re:dash: * Content download error rate > 25% (Previously we have been at ~15%, currently we are at ~21% while we recover) * Catalog sync error > 10% (Currently we are at ~5%) If one of the alarms triggers I'll get an email. Additionally Jason has setup CDN error rate monitoring in bug 1383092.