Closed Bug 1232453 Opened 9 years ago Closed 8 years ago

Modify Spark telemetry to allow for python notebook analysis of addonHistograms

Categories

(Cloud Services Graveyard :: Metrics: Pipeline, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: thills, Unassigned)

References

Details

Hi Roberto,

This is what we talked about how addonHistograms needs to be included in the types of histograms that can be analyzed.  IIRC, it was already including keyedHistograms, but not addonHistograms.

Thanks,

-tamara
Please provide for testing purposes a set of filters for get_pings which return a RDD with submissions that contain histograms within addonHistograms.
Flags: needinfo?(thills)
Component: Telemetry → Metrics: Pipeline
Product: Toolkit → Cloud Services
Assignee: rvitillo → nobody
Hi Roberto,

Thank you again for the *huge* tip on pings.filter!

Here is what I did in my iPython notebook to get a rdd with a submission with an addonHistogram:

pings = get_pings(sc, app="FirefoxOS", channel="default", submission_date="20151217", doc_type="OTHER", schema="v4")

pings = pings.filter(lambda p: p.get("id") == "0b69d27e-ed66-466c-9930-7556183f7b9b")

pings.first()

Here is partial of what prints out after pings.first:

...
u'payload': {u'addonHistograms': {u'CONTACTS': {u'DEVTOOLS_HUD_CUSTOM_TELEMETRY-GAIA-CONTACTS-IMPORT-GMAIL': {u'bucket_count': 3,
     u'histogram_type': 4,
     u'range': [1, 2],
     u'sum': 1,
     u'sum_squares_hi': 0,
     u'sum_squares_lo': 1,
     u'values': {u'0': 1, u'1': 0}}}},
  u'keyedHistograms':
...
Flags: needinfo?(thills)
Hi Roberto,

Just following up to see if there is anything else needed from us on this one?

Thanks,
-tamara
Flags: needinfo?(rvitillo)
Our current implementation requires a histogram definition file, like e.g. Histograms.json, to return a pandas Series given a json description of a histogram. Where can the definitions of those histograms be found?

In the meantime you should use the raw json descriptions of the histograms.
Flags: needinfo?(rvitillo)
Hello Tamara, do you have what you need?
Flags: needinfo?(thills)
(In reply to Tamara Hills [:thills] from comment #2)
> pings = get_pings(sc, app="FirefoxOS", channel="default",
> submission_date="20151217", doc_type="OTHER", schema="v4")

Shouldn't we get those pings out of the "OTHER" bucket (see doc_type)?
Mark, did we already have a bug on this?
Flags: needinfo?(mreid)
Hi, just to update, I had a conversation with rvitillo and he suggested that I try and use the pandas series for this.  So, I'm currently working on this.  He did suggest that I reach out to mreid about getting the change to https://github.com/mozilla/python_moztelemetry/blob/master/moztelemetry/spark.py so that we can graph the addonHistograms.
Flags: needinfo?(thills)
I've filed a separate bug for splitting these docs out of "OTHER"
Flags: needinfo?(mreid)
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Product: Cloud Services → Cloud Services Graveyard
You need to log in before you can comment on or make changes to this bug.