Closed
Bug 1213417
Opened 9 years ago
Closed 9 years ago
Derived dataset request: basic metadata and crash info
Categories
(Cloud Services Graveyard :: Metrics: Pipeline, defect, P1)
Cloud Services Graveyard
Metrics: Pipeline
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: benjamin, Assigned: trink)
References
Details
User Story
Derived from the unified telemetry "main" pings: FILTERS: environment.build.applicationName == "Firefox" environment.update.channel in ("nightly", "aurora", "beta", "release") INCLUDE VERBATIM: clientId environment.build.version environment.build.buildId environment.build.architecture environment.update.channel environment.os.name environment.os.version environment.os.servicePackMajor (may be missing/NULL) environment.os.servicePackMinor (may be missing/NULL) environment.settings.locale environment.activeExperiment.id environment.activeExperiment.branch <IP-based-geocountry> payload.info.reason payload.info.timezoneOffset payload.info.subsessionStartDate payload.info.subsessionLength rename "payload.histograms.SUBPROCESS_ABNORMAL_ABORT.plugin.sum" to "aborts-plugin" rename "payload.histograms.SUBPROCESS_ABNORMAL_ABORT.content.sum" to "aborts-content" rename "payload.histograms.SUBPROCESS_ABNORMAL_ABORT.gmplugin.sum" to "aborts-gmplugin" rename "payload.histograms.SUBPROCESS_CRASHES_WITH_DUMP.plugin.sum" to "crashesdetected-plugin" rename "payload.histograms.SUBPROCESS_CRASHES_WITH_DUMP.pluginhang.sum" to "hangsdetected-plugin" rename "payload.histograms.SUBPROCESS_CRASHES_WITH_DUMP.content.sum" to "crashesdetected-content" rename "payload.histograms.SUBPROCESS_CRASHES_WITH_DUMP.gmplugin.sum" to "crashesdetected-gmplugin" For each of SUBPROCESS_CRASH_SUBMIT_ATTEMPT and SUBPROCESS_CRASH_SUBMIT_SUCCESS rename "payload.histograms.SUBPROCESS_CRASH_SUBMIT_X.main.sum" to "crash-N-main" rename "payload.histograms.SUBPROCESS_CRASH_SUBMIT_X.plugin.sum" to "crash-N-plugin" rename "payload.histograms.SUBPROCESS_CRASH_SUBMIT_X.content.sum" to "crash-N-content" CALCULATED FIELDS: number of active addons: len(environment.addons.activeAddons) most recent version of Flash: max([ plugin.version.split('.') for plugin in environment.addons.activePlugins if plugin.name == 'Shockwave Flash' ])
I would like a derived dataset as specified in the "user story" which I can then use either within Spark or a heka filter to create some redshift rollup tables for usage and crash metrics.
Reporter | ||
Updated•9 years ago
|
User Story: (updated)
Comment 1•9 years ago
|
||
:bsmedberg, are you taking this? Do you need us to do this for you in the next 2 weeks?
Flags: needinfo?(benjamin)
Priority: -- → P2
Reporter | ||
Updated•9 years ago
|
User Story: (updated)
Flags: needinfo?(benjamin)
Reporter | ||
Comment 2•9 years ago
|
||
I don't know when I'll be able to take this. I'd like to but other things are still trumping.
Assignee | ||
Comment 4•9 years ago
|
||
I assume you mean keyedHistograms for all the SUBPROCESS_* Also for the executive summary plugin hangs are currently counted as payload.keyedHistograms.SUBPROCESS_ABNORMAL_ABORT.plugin.sum is there a reason we a counting them differently here?
Flags: needinfo?(benjamin)
Reporter | ||
Comment 5•9 years ago
|
||
Yes, keyedHistograms. When you say plugin hangs are currently counted as payload.keyedHistograms.SUBPROCESS_ABNORMAL_ABORT.plugin.sum that's not correct. That counts all plugin crashes (and probably some other aborts). ABNORMAL_ABORT can't tell the difference between a crash-abort and a hang-abort.
Flags: needinfo?(benjamin)
Assignee | ||
Comment 6•9 years ago
|
||
The current behaviour was reviewed and approved in the executive summary meeting with the metrics team . There was some discussion here https://bugzilla.mozilla.org/show_bug.cgi?id=1131834 but is it unclear why we went with SUBPROCESS_ABNORMAL_ABORT (I believe it was because we wanted the executive summary to include all plugin failures) Since the name is misleading we should rename it or change the extraction so it only represents the hangs. I can change it, just file a bug with the desired modification.
Assignee | ||
Comment 7•9 years ago
|
||
I have been trolling the logs for PROCESS_CRASH_SUBMIT_ATTEMPT since there are no SUBPROCESS versions defined in the code http://mxr.mozilla.org/mozilla-central/source/toolkit/components/telemetry/Histograms.json#8511 A definitive list of crash types cannot be gleaned from the code (http://mxr.mozilla.org/mozilla-central/source/toolkit/components/crashes/CrashManager.jsm#1189) and I haven't seen anything but an empty object {} in the live data. Some docs or sample data would be useful.
Flags: needinfo?(gfritzsche)
Assignee | ||
Comment 8•9 years ago
|
||
Just coded it to main, plugin, and content
Flags: needinfo?(gfritzsche)
Assignee | ||
Comment 9•9 years ago
|
||
Available for experimentation on the pipeline-redshift.dev.mozaws.net cluster (see :dthornton for creds). The following table is a test load of the Oct 4 data. pipeline=# \d main_summary Table "public.main_summary" Column | Type | Modifiers ------------------------------+-----------------------------+----------- timestamp | timestamp without time zone | subsessiontimestamp | timestamp without time zone | clientid | character(36) | buildversion | character varying(32) | buildid | character(14) | buildarchitecture | character varying(32) | channel | character varying(7) | os | character varying(7) | osversion | character varying(32) | ossericepackmajor | character varying(32) | ossericepackminor | character varying(32) | locale | character varying(32) | activeexperimentid | character varying(32) | activeexperimentbranch | character varying(32) | country | character varying(5) | reason | character varying(32) | subsessionlength | integer | timezoneoffset | integer | pluginhangs | integer | aborts_plugin | integer | aborts_content | integer | aborts_gmplugin | integer | crashesdetected_plugin | integer | crashesdetected_content | integer | crashesdetected_gmplugin | integer | crash_submit_attempt_main | integer | crash_submit_attempt_content | integer | crash_submit_success_main | integer | crash_submit_success_content | integer | active_addons | integer | flash_version | character varying(16) | pipeline=# select * from main_summary limit 1; timestamp | subsessiontimestamp | clientid | buildversion | buildid | buildarchitecture | channel | os | osversion | ossericepackmajor | ossericepackminor | locale | activeexperimentid | activeexperimentbranch | country | reason | subsessionlength | timezoneoffset | pluginhangs | aborts_plugin | aborts_content | aborts_gmplugin | crashesdetected_plugin | crashesdetected_content | crashesdetected_gmplugin | crash_submit_attempt_main | crash_submit_attempt_content | crash_submit_success_main | crash_submit_success_content | active_addons | flash_version --------------------------+---------------------+--------------------------------------+--------------+----------------+-------------------+---------+---------+-----------+-------------------+-------------------+--------+--------------------+------------------------+---------+----------+------------------+----------------+-------------+---------------+----------------+-----------------+------------------------+-------------------------+--------------------------+---------------------------+------------------------------+---------------------------+------------------------------+---------------+--------------- 2015-10-04 00:00:01.2746 | 2015-09-06 04:00:00 | f18cef1c-6e6c-40ea-af28-91617b273d83 | 42.0a2 | 20150818004007 | x86 | aurora | Windows | 10.0 | 0 | 0 | en-US | | | US | shutdown | 879 | -240 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | (1 row)
Reporter | ||
Comment 10•9 years ago
|
||
Excellent. Followup questions: Is `timestamp` the date the ping was received? Question about the sort key and distribution style for the redshift data: I presume the sort key is either `timestamp` or `subsessiontimestamp` is that correct? Is the distribution style KEYed on clientID? Is it possible to query this same dataset in spark?
Flags: needinfo?(mtrinkala)
Assignee | ||
Comment 11•9 years ago
|
||
- timestamp is the date/time we received the data and serves as the sort key - no indexes have been created since the primary use cases haven't been defined - the database tables will be created as a sequence of time series tables (one per day) and hidden with a UNION ALL view - spark looks like it has some tools but I am not a spark user... deferring to others
Flags: needinfo?(mtrinkala)
Assignee | ||
Comment 12•9 years ago
|
||
(In reply to Mike Trinkala [:trink] from comment #11) - redshift disables indexes so none will ever be added... sadness
Reporter | ||
Comment 13•9 years ago
|
||
Yeah wasn't worried about the indexes, just about the cluster sharing type. For some of the queries I want to run, sharding (distribution key) the database by clientID can improve query results.
Reporter | ||
Comment 14•9 years ago
|
||
Tried this today. Here are my notes and requests, from most to least important: A (critical): missing fields: crash_submit_attempt_plugin crash_submit_success_plugin B (critical): Is `pluginhangs` the name of payload.histograms.SUBPROCESS_CRASHES_WITH_DUMP.pluginhang.sum ? C. (important): The tables need to be distributed on the clientid key, instead of evenly. That will allow queries of "active user over day/week/month" to be done on distributed nodes instead of a giant unified join. D. (minor) Store subsessiontimestamp as a date instead of a timestamp (and rename to subsessiondate). E. (trivial) ossericepackmajor and ossericepackminor are misspelled, should be "servicepack" Can you have at least changes A-C in place and data backfilled for the entire month of October by sometime early next week? If the redshift bits are too complex, I think I can deal with this just as well in spark, which I think means S3 output instead of redshift.
Flags: needinfo?(mtrinkala)
Assignee | ||
Comment 15•9 years ago
|
||
A is easy, B yes, C already done in the latest version, D easy, E already fixed (the changes will land today). Yeah we will be able to fire up a cluster of machines and backfill October (fyi the column names have changed a bit in the new version to make them consistent across all derived outputs).
Flags: needinfo?(mtrinkala)
Comment 16•9 years ago
|
||
(In reply to Benjamin Smedberg [:bsmedberg] from comment #10) > Is it possible to query this same dataset in spark? It is. The data on S3 can be converted to Parquet files for fast access with Spark, see [1]. [1] https://mail.mozilla.org/pipermail/fhr-dev/2015-November/000660.html
Comment 17•9 years ago
|
||
(In reply to Roberto Agostino Vitillo (:rvitillo) from comment #16) > (In reply to Benjamin Smedberg [:bsmedberg] from comment #10) > > Is it possible to query this same dataset in spark? > > It is. The data on S3 can be converted to Parquet files for fast access with > Spark, see [1]. > > [1] https://mail.mozilla.org/pipermail/fhr-dev/2015-November/000660.html This data set is not currently stored on S3 - it's in a Redshift database. It would be interesting to see if we can use something like [2] to load Redshift into dataframes, but it requires Spark 1.4+ (we're currently on 1.3 AIUI).
Assignee | ||
Comment 18•9 years ago
|
||
The full October data is now available in Redshift.
Assignee | ||
Updated•9 years ago
|
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Updated•6 years ago
|
Product: Cloud Services → Cloud Services Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•