Closed
Bug 1325653
Opened 9 years ago
Closed 8 years ago
Dataset API should provide consistent view of raw telemetry pings in telemetry-batch-view
Categories
(Cloud Services Graveyard :: Metrics: Pipeline, defect, P3)
Cloud Services Graveyard
Metrics: Pipeline
Tracking
(Not tracked)
RESOLVED
INCOMPLETE
People
(Reporter: harter, Unassigned)
References
Details
The example payload in the Longitudinal test suite has a few fields which use dot notation to specify hierarchy (e.g. "payload.histograms" to specify {payload: {histograms: ...}}} [1]).
This causes some oddness. For example:
> payload.get("payload").get("histograms")
Causes the tests to throw an error while:
> payload.get("payload.histograms")
Retrieves the parent histogram JSON. This is how we parse parent histograms in the current version of the code [2]
We do not use this notation for scalars. Instead we build a full JSON payload [3]. Accordingly, parsing the scalars from JSON uses the `\` operator [4].
This caused some difficulty in Bug 13363800. Specifically, it would be nice to automatically pull both histograms and keyedHistograms from a single location known to hold histograms (e.g. payload.processes.content). However, if we try something like:
> val content = payload \ "payload" \ "processes" \ "content"
> content \ "histograms"
> content \ "keyedHistograms"
We'll get an error, since histograms are stored under the key "payload.processes.content.histograms", not under the payload JSON object.
[0] https://github.com/mozilla/telemetry-batch-view/blob/master/src/test/scala/com/mozilla/telemetry/LongitudinalTest.scala#L20
[1] https://github.com/mozilla/telemetry-batch-view/blob/master/src/test/scala/com/mozilla/telemetry/LongitudinalTest.scala#L184
[2]https://github.com/mozilla/telemetry-batch-view/blob/master/src/main/scala/com/mozilla/telemetry/views/Longitudinal.scala#L729
[3] https://github.com/mozilla/telemetry-batch-view/blob/master/src/test/scala/com/mozilla/telemetry/LongitudinalTest.scala#L186
[4] https://github.com/mozilla/telemetry-batch-view/blob/master/src/main/scala/com/mozilla/telemetry/views/Longitudinal.scala#L819
Comment 1•9 years ago
|
||
This is due to how we process the data in our streaming pipeline. Among other things, Heka messages are composed of a list of fields (key-value pairs) [1]. Initially, the whole JSON blob resides under the "payload" field of a message [2]. Heka parses that blob and saves parts of it (like payload.histograms) in individual fields of a new message [3] in order to avoid to re-parse the whole thing later on downstream.
This clearly causes some pain during analysis though. The way we solved it in Python-land is to provide a recombined view over the split pings [4]. We could do something similar in telemetry-batch-view.
[1] https://github.com/mozilla-services/heka/blob/versions/0.10/message/message.proto#L50
[2] https://github.com/mozilla-services/data-pipeline/blob/50b26837b7b9b5c60bed2091e139c30674c7f62e/heka/sandbox/decoders/extract_telemetry_dimensions.lua#L286
[3] https://github.com/mozilla-services/data-pipeline/blob/50b26837b7b9b5c60bed2091e139c30674c7f62e/heka/sandbox/decoders/extract_telemetry_dimensions.lua#L198
[4] https://github.com/mozilla/python_moztelemetry/blob/a4a3a8c1d4bcb7cbc6ab44257a08f098988a4b80/moztelemetry/heka_message_parser.py#L23
Updated•9 years ago
|
Summary: Consider refactoring Longitudinal test payload → Provide consistent view of raw telemetry pings
Updated•9 years ago
|
Summary: Provide consistent view of raw telemetry pings → Dataset API should provide consistent view of raw telemetry pings in telemetry-batch-view
Updated•8 years ago
|
Points: --- → 3
Priority: -- → P3
Comment 2•8 years ago
|
||
Closing abandoned bugs in this product per https://bugzilla.mozilla.org/show_bug.cgi?id=1337972
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → INCOMPLETE
Updated•7 years ago
|
Product: Cloud Services → Cloud Services Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•