Closed
Bug 999028
Opened 10 years ago
Closed 7 years ago
Telemetry Analysis Job for Loop ICE Reports
Categories
(Data Platform and Tools :: General, defect)
Tracking
(Not tracked)
RESOLVED
INCOMPLETE
People
(Reporter: abr, Assigned: whd)
References
Details
(Whiteboard: [est:4h][p=.5, s=mlpnightly2, c=loop-general] [SvcOps])
We need to write a telemetry analysis module for aggregating information about ICE failures in the Loop client. See http://mreid-moz.github.io/blog/2013/11/06/current-state-of-telemetry-analysis/ for a description of the analysis tools. The format of the data is described at https://wiki.mozilla.org/Loop/Telemetry#ICE_Failures (this report generation should only handle rerorde with "report":"ice failure"), and the initial data to extract is described in the final bullet under "Nature of Data": > For initial analysis, we could probably do with something as simple as a > report that says "on date, there were x failures, broken down as follows: > failed: failed count, disconnected: disconnected count," and then lets us list > all the failures for a given date/reason pair, ultimately allowing us to > download the log to analyze. As we get experience with how things tend to > break, we might want to refine this some, but it's a good start. Note that the actual contents of, e.g., the statistics field, the SDP, and the log files need to be treated as confidential.
Updated•10 years ago
|
Whiteboard: [est:4h] → [est:4h]p=.5
Updated•10 years ago
|
Whiteboard: [est:4h]p=.5 → [est:4h][p=.5, s=mlpnightly2, c=loop-general]
Comment 1•10 years ago
|
||
Is there an update on who you were working on this with? we don't want to lose this bug - but aren't sure where it is.
Flags: needinfo?(adam)
Reporter | ||
Comment 2•10 years ago
|
||
I believe that EKR was working with Ben Brittain to do a first cut at this work. Ben -- is that correct? Should we assign this bug to you?
Flags: needinfo?(adam) → needinfo?(ben)
Comment 3•10 years ago
|
||
Hey Adam -- Do we have an owner for this? Or do I need to find one?
Flags: needinfo?(adam)
Reporter | ||
Comment 4•10 years ago
|
||
(In reply to Maire Reavy [:mreavy] (Plz needinfo me) from comment #3) > Hey Adam -- Do we have an owner for this? Or do I need to find one? We don't have an owner any more. The original plan was to have Ben Brittain do this, although I believe he's gone back to school now.
Flags: needinfo?(ben)
Flags: needinfo?(adam)
Comment 5•10 years ago
|
||
Would this be something that the Metrics teams could do - or do you know if we have a specific team for Telemetry? The ICE data is being gathered and needs telemetry work now to generate reports. The user describes the initial reports needed and has the write-up of how it's formatted.
Flags: needinfo?(sguha)
Flags: needinfo?(kparlante)
Comment 6•10 years ago
|
||
There is no one telemetry team. Our experience with Telemetry reporting is limited. I'm roping in Ali here as he some experience with the Telemtry JS API. He can help Katie design and implement the dashboards.
Flags: needinfo?(sguha)
Flags: needinfo?(kparlante)
Comment 7•10 years ago
|
||
Ali, would you be able to help katie with this?
Flags: needinfo?(aalmossawi)
Comment 9•10 years ago
|
||
Mark, I'm sending this one your way. If I understand correctly we have two tasks: - Create a telemetry analysis job for aggregating this data (via http://telemetry-dash.mozilla.org/) - Make the aggregated data available to our custom dashboard via the Telemetry JS API (which we need to do to make the data available to our partners) The second one is sorta captured here: https://bugzilla.mozilla.org/show_bug.cgi?id=1073516
Assignee: nobody → mreid
Flags: needinfo?(mreid)
Updated•10 years ago
|
Flags: needinfo?(mreid)
Comment 10•10 years ago
|
||
Can the aggregate data be public (ie. web-facing)? If so, then we can publish the results to a web-facing S3 bucket (per the usual for a telemetry analysis job). If not, we typically put the results in a private bucket, and we'll need to sort out a mechanism for sync'ing it over to the dashboard.
Flags: needinfo?(kparlante)
Comment 11•10 years ago
|
||
Lets treat this as private/confidential. :whd can handle the access control/mechanism for syncing it to the dashboard.
Flags: needinfo?(kparlante)
Comment 12•10 years ago
|
||
What format would be most convenient for use by the Loop dashboard? I currently have a job that outputs a small json file for each day containing a summary of the failures by type, as well as a gzip'd tsv file with the full payloads for detailed inspection.
Comment 13•10 years ago
|
||
It would be great if we could have one json file instead of one per day, something like: [ { "date":"2014-10-16", "failureA":0, "failureB":0, "failureC":0, }, { "date":"2014-10-17", "failureA":0, "failureB":0, "failureC":0, } ] We're not going to make the full payloads available via the dashboard, we can give access to specific devs (:abr, others?)
Comment 14•10 years ago
|
||
Ok, I'll update the format. How many days' history should I include? It should be pretty small, but I don't like to generate files that grow forever.
Comment 15•10 years ago
|
||
180 days
Comment 16•10 years ago
|
||
Ok, data is now being saved to a single combined file: s3://telemetry-private-analysis/loop_failures/data/failures_by_type.json This will require AWS credentials to copy it over to the dashboard web server. The job is currently scheduled to run at 14:00UTC to populate data for the previous day. It should take far less than 1 hour to run, so fetching it at 15:00UTC should be safe. Note that the full per-day detail will still be generated in case you want to make the full payloads available later on.
Comment 17•10 years ago
|
||
I created (and merged) a PR for the analysis code here: https://github.com/mozilla/telemetry-server/pull/86
Comment 18•10 years ago
|
||
Excellent! Thanks for including the PR. Assigning to :whd to move to the metrics box so the dashboard can access it.
Assignee: mreid → whd
Assignee | ||
Comment 19•10 years ago
|
||
:mreid minor issue with the data, the date looks like "20141016" instead of "2014-10-16". :relud has sorted out the cross-IAM stuff for me, so I'm now setting up the metrics box to pull the data down at 15:00UTC and make it available via the dashboard.
Assignee | ||
Comment 20•10 years ago
|
||
https://github.com/mozilla-services/puppet-config/pull/981 https://github.com/mozilla-services/svcops/pull/313 Available at: https://metrics.services.mozilla.com/loop-server-dashboard/data/loop_failures_by_type.json
Comment 21•10 years ago
|
||
(In reply to Wesley Dawson [:whd] from comment #19) > :mreid minor issue with the data, the date looks like "20141016" instead of > "2014-10-16". Right - the data is stored without the dashes. If it's a problem let me know and I'll add them before exporting.
Comment 22•10 years ago
|
||
A graph of ICE failures is now on the dashboard: https://metrics.services.mozilla.com/loop-server-dashboard/ https://github.com/mozilla/loop-server-dashboard/pull/10 :abr, the log data is in gzip'd tsv files in s3, :whd is going to give you credentials to access them as a short term solution. Instead of building a bespoke dashboard for accessing the log files, we should route this data to kibana or sentry (probably sentry), but we may need to wait for the transition to the new pipeline.
Updated•7 years ago
|
Whiteboard: [est:4h][p=.5, s=mlpnightly2, c=loop-general] → [est:4h][p=.5, s=mlpnightly2, c=loop-general] [SvcOps]
Comment 23•7 years ago
|
||
loop infrastructure was decommissioned in bug 1307378. I don't think this still needed. Please reopen if it is.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → INCOMPLETE
Updated•6 years ago
|
Product: Webtools → Data Platform and Tools
Updated•1 year ago
|
Component: Telemetry Dashboards (TMO) → General
You need to log in
before you can comment on or make changes to this bug.
Description
•