Closed
Bug 1236838
Opened 9 years ago
Closed 5 years ago
(primarily 404) monitoring of mobile.cdn.mozilla.net and downloadable content (DLC)
Categories
(Cloud Services :: Operations: Metrics/Monitoring, task)
Cloud Services
Operations: Metrics/Monitoring
Tracking
(Not tracked)
RESOLVED
INACTIVE
People
(Reporter: sebastian, Assigned: jschneider)
References
Details
With "Downloadable Content (DLC)" we are starting to download default fonts from mobile.cdn.mozilla.net (meta bug 1194338) instead of shipping them with Fennec (To reduce the app size and eventually update content independently from releases). The implementation is now ready for a test in Nightly (bug 1233799).
Before we go ahead I was wondering:
* Do we monitor mobile.cdn.mozilla.net in any way?
* Do we get notified if the clients would do unexpected things? (For example lots of 404s)
* Is there some way to see usage in general? If I could see Nightly clients successfully download the fonts then that would be nice; But that's not critical in any way.
Reporter | ||
Updated•9 years ago
|
Summary: Monitoring of mobile.cdn.mozilla.net and download content (DLC) → Monitoring of mobile.cdn.mozilla.net and downloadable content (DLC)
Comment 1•9 years ago
|
||
Shyam, can you help answer these questions? We don't have access to the kanban board, so we can't see the status of any work there. Thanks!
Flags: needinfo?(smani)
(In reply to :Margaret Leibovic from comment #1)
> Shyam, can you help answer these questions? We don't have access to the
> kanban board, so we can't see the status of any work there. Thanks!
Our kanban board is set up to mirror of Bugzilla, and has no "status of work" information other than what's in Bugzilla.
(In reply to Sebastian Kaspari (:sebastian) from comment #0)
> With "Downloadable Content (DLC)" we are starting to download default fonts
> from mobile.cdn.mozilla.net (meta bug 1194338) instead of shipping them with
> Fennec (To reduce the app size and eventually update content independently
> from releases). The implementation is now ready for a test in Nightly (bug
> 1233799).
>
> Before we go ahead I was wondering:
> * Do we monitor mobile.cdn.mozilla.net in any way?
> * Do we get notified if the clients would do unexpected things? (For example
> lots of 404s)
> * Is there some way to see usage in general? If I could see Nightly clients
> successfully download the fonts then that would be nice; But that's not
> critical in any way.
MOC handles monitoring, so I'm pushing this request over to them; they can help you set up whatever monitoring we have available.
Metrics would have the usage statistics for this domain, if any are available. They may need time to set up the metrics, so reach out to them if you need them and we'll make it happen.
Assignee: server-ops-webops → nobody
Component: WebOps: Other → MOC: Service Requests
Flags: needinfo?(smani)
QA Contact: smani → lypulong
Whiteboard: [kanban:https://webops.kanbanize.com/ctrl_board/2/2399]
Comment 4•9 years ago
|
||
:bobm, this is the monitoring bug.
To clarify, assuming that we can trust our CDN provider(s), and assuming that we have stable steps for deploying and QAing new content, the most likely error here is about missing files or a catalog/content mismatch between Kinto and S3.
CCing :tarek to think about that some.
Summary: Monitoring of mobile.cdn.mozilla.net and downloadable content (DLC) → (primarily 404) monitoring of mobile.cdn.mozilla.net and downloadable content (DLC)
Component: MOC: Service Requests → Operations: Metrics/Monitoring
Product: Infrastructure & Operations → Cloud Services
QA Contact: lypulong
Assignee | ||
Comment 6•9 years ago
|
||
We've got pretty reliable telemetry from AWS Cloudwatch on:
Requests – Number of requests for all HTTP methods and for both HTTP and HTTPS requests
BytesDownloaded – Number of bytes downloaded by viewers for GET, HEAD, and OPTIONS requests
BytesUploaded – Number of bytes uploaded to your origin with CloudFront using POST and PUT requests
TotalErrorRate – Percentage of all requests for which the HTTP status code is 4xx or 5xx
4xxErrorRate – Percentage of all requests for which the HTTP status code is 4xx
5xxErrorRate – Percentage of all requests for which the HTTP status code is 5xx
We can route that to datadog, newrelic, etc...whatever the audience needs and we usually do.
From there, we can hang alerts and notifications as needed.
Reporter | ||
Comment 7•9 years ago
|
||
(In reply to JP Schneider [:jp] from comment #6)
> We've got pretty reliable telemetry from AWS Cloudwatch on:
> Requests – Number of requests for all HTTP methods and for both HTTP and
> HTTPS requests
> BytesDownloaded – Number of bytes downloaded by viewers for GET, HEAD,
> and OPTIONS requests
> BytesUploaded – Number of bytes uploaded to your origin with CloudFront
> using POST and PUT requests
> TotalErrorRate – Percentage of all requests for which the HTTP status
> code is 4xx or 5xx
> 4xxErrorRate – Percentage of all requests for which the HTTP status code
> is 4xx
> 5xxErrorRate – Percentage of all requests for which the HTTP status code
> is 5xx
Do we have all this now after the CDN move or do we need to specifically configure those? I'm wondering what has to happen before I can enable the font downloader in Nightly (bug 1233799).
Flags: needinfo?(jschneider)
Assignee | ||
Comment 8•9 years ago
|
||
Good morning and happy Friday!
We currently are collecting metrics, but not yet alerting on anything.
Depending on the audience receiving the monitor/alert on 4xx/5xx rates, we can get the metrics and alerts where they need to go...so where is that? :)
We can hook Datadog into this metric very easily, or provide email/pagerduty notifications and alerts straight from AWS, etc... Is that going to MOC, or someone else?
Flags: needinfo?(jschneider) → needinfo?(rnewman)
Comment 9•9 years ago
|
||
(In reply to JP Schneider [:jp] from comment #8)
> We can hook Datadog into this metric very easily, or provide email/pagerduty
> notifications and alerts straight from AWS, etc... Is that going to MOC,
> or someone else?
I guess there are two sets here: stuff that got screwed up operationally (e.g., accidentally changing permissions or deleting a bucket), for which the MOC or someone in product-ops should be aware, and stuff that's a mismatch between client and server.
Betting that the latter is our first port of call, an email alert to mobile-frontend@mozilla.com when we bump over a noise threshold is probably a good starting point. Email is likely to be an adequate channel.
Let's figure out what that noise threshold is after a week or two on Nightly.
Thanks, JP!
Flags: needinfo?(rnewman)
Updated•5 years ago
|
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → INACTIVE
You need to log in
before you can comment on or make changes to this bug.
Description
•