Closed
Bug 1232453
Opened 9 years ago
Closed 8 years ago
Modify Spark telemetry to allow for python notebook analysis of addonHistograms
Categories
(Cloud Services Graveyard :: Metrics: Pipeline, defect)
Cloud Services Graveyard
Metrics: Pipeline
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: thills, Unassigned)
References
Details
Hi Roberto, This is what we talked about how addonHistograms needs to be included in the types of histograms that can be analyzed. IIRC, it was already including keyedHistograms, but not addonHistograms. Thanks, -tamara
Comment 1•9 years ago
|
||
Please provide for testing purposes a set of filters for get_pings which return a RDD with submissions that contain histograms within addonHistograms.
Flags: needinfo?(thills)
Updated•9 years ago
|
Component: Telemetry → Metrics: Pipeline
Product: Toolkit → Cloud Services
Updated•9 years ago
|
Assignee: rvitillo → nobody
Reporter | ||
Comment 2•9 years ago
|
||
Hi Roberto, Thank you again for the *huge* tip on pings.filter! Here is what I did in my iPython notebook to get a rdd with a submission with an addonHistogram: pings = get_pings(sc, app="FirefoxOS", channel="default", submission_date="20151217", doc_type="OTHER", schema="v4") pings = pings.filter(lambda p: p.get("id") == "0b69d27e-ed66-466c-9930-7556183f7b9b") pings.first() Here is partial of what prints out after pings.first: ... u'payload': {u'addonHistograms': {u'CONTACTS': {u'DEVTOOLS_HUD_CUSTOM_TELEMETRY-GAIA-CONTACTS-IMPORT-GMAIL': {u'bucket_count': 3, u'histogram_type': 4, u'range': [1, 2], u'sum': 1, u'sum_squares_hi': 0, u'sum_squares_lo': 1, u'values': {u'0': 1, u'1': 0}}}}, u'keyedHistograms': ...
Flags: needinfo?(thills)
Reporter | ||
Comment 3•8 years ago
|
||
Hi Roberto, Just following up to see if there is anything else needed from us on this one? Thanks, -tamara
Flags: needinfo?(rvitillo)
Comment 4•8 years ago
|
||
Our current implementation requires a histogram definition file, like e.g. Histograms.json, to return a pandas Series given a json description of a histogram. Where can the definitions of those histograms be found? In the meantime you should use the raw json descriptions of the histograms.
Flags: needinfo?(rvitillo)
Comment 6•8 years ago
|
||
(In reply to Tamara Hills [:thills] from comment #2) > pings = get_pings(sc, app="FirefoxOS", channel="default", > submission_date="20151217", doc_type="OTHER", schema="v4") Shouldn't we get those pings out of the "OTHER" bucket (see doc_type)? Mark, did we already have a bug on this?
Flags: needinfo?(mreid)
Reporter | ||
Comment 7•8 years ago
|
||
Hi, just to update, I had a conversation with rvitillo and he suggested that I try and use the pandas series for this. So, I'm currently working on this. He did suggest that I reach out to mreid about getting the change to https://github.com/mozilla/python_moztelemetry/blob/master/moztelemetry/spark.py so that we can graph the addonHistograms.
Flags: needinfo?(thills)
Comment 8•8 years ago
|
||
I've filed a separate bug for splitting these docs out of "OTHER"
Flags: needinfo?(mreid)
Updated•8 years ago
|
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Updated•6 years ago
|
Product: Cloud Services → Cloud Services Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•