Modify Spark telemetry to allow for python notebook analysis of addonHistograms

RESOLVED FIXED

Status

RESOLVED FIXED
3 years ago
2 months ago

People

(Reporter: thills, Unassigned)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

(Reporter)

Description

3 years ago
Hi Roberto,

This is what we talked about how addonHistograms needs to be included in the types of histograms that can be analyzed.  IIRC, it was already including keyedHistograms, but not addonHistograms.

Thanks,

-tamara
Please provide for testing purposes a set of filters for get_pings which return a RDD with submissions that contain histograms within addonHistograms.
Flags: needinfo?(thills)
Component: Telemetry → Metrics: Pipeline
Product: Toolkit → Cloud Services
Assignee: rvitillo → nobody
(Reporter)

Comment 2

3 years ago
Hi Roberto,

Thank you again for the *huge* tip on pings.filter!

Here is what I did in my iPython notebook to get a rdd with a submission with an addonHistogram:

pings = get_pings(sc, app="FirefoxOS", channel="default", submission_date="20151217", doc_type="OTHER", schema="v4")

pings = pings.filter(lambda p: p.get("id") == "0b69d27e-ed66-466c-9930-7556183f7b9b")

pings.first()

Here is partial of what prints out after pings.first:

...
u'payload': {u'addonHistograms': {u'CONTACTS': {u'DEVTOOLS_HUD_CUSTOM_TELEMETRY-GAIA-CONTACTS-IMPORT-GMAIL': {u'bucket_count': 3,
     u'histogram_type': 4,
     u'range': [1, 2],
     u'sum': 1,
     u'sum_squares_hi': 0,
     u'sum_squares_lo': 1,
     u'values': {u'0': 1, u'1': 0}}}},
  u'keyedHistograms':
...
Flags: needinfo?(thills)
(Reporter)

Comment 3

3 years ago
Hi Roberto,

Just following up to see if there is anything else needed from us on this one?

Thanks,
-tamara
Flags: needinfo?(rvitillo)
Our current implementation requires a histogram definition file, like e.g. Histograms.json, to return a pandas Series given a json description of a histogram. Where can the definitions of those histograms be found?

In the meantime you should use the raw json descriptions of the histograms.
Flags: needinfo?(rvitillo)

Comment 5

3 years ago
Hello Tamara, do you have what you need?
Flags: needinfo?(thills)
(In reply to Tamara Hills [:thills] from comment #2)
> pings = get_pings(sc, app="FirefoxOS", channel="default",
> submission_date="20151217", doc_type="OTHER", schema="v4")

Shouldn't we get those pings out of the "OTHER" bucket (see doc_type)?
Mark, did we already have a bug on this?
Flags: needinfo?(mreid)
(Reporter)

Comment 7

3 years ago
Hi, just to update, I had a conversation with rvitillo and he suggested that I try and use the pandas series for this.  So, I'm currently working on this.  He did suggest that I reach out to mreid about getting the change to https://github.com/mozilla/python_moztelemetry/blob/master/moztelemetry/spark.py so that we can graph the addonHistograms.
Flags: needinfo?(thills)
I've filed a separate bug for splitting these docs out of "OTHER"
Flags: needinfo?(mreid)

Updated

3 years ago
Status: NEW → RESOLVED
Last Resolved: 3 years ago
Resolution: --- → FIXED

Updated

2 months ago
Product: Cloud Services → Cloud Services Graveyard
You need to log in before you can comment on or make changes to this bug.