Closed Bug 1325653 Opened 9 years ago Closed 8 years ago

Dataset API should provide consistent view of raw telemetry pings in telemetry-batch-view

Categories

(Cloud Services Graveyard :: Metrics: Pipeline, defect, P3)

defect

Tracking

(Not tracked)

RESOLVED INCOMPLETE

People

(Reporter: harter, Unassigned)

References

Details

The example payload in the Longitudinal test suite has a few fields which use dot notation to specify hierarchy (e.g. "payload.histograms" to specify {payload: {histograms: ...}}} [1]). This causes some oddness. For example: > payload.get("payload").get("histograms") Causes the tests to throw an error while: > payload.get("payload.histograms") Retrieves the parent histogram JSON. This is how we parse parent histograms in the current version of the code [2] We do not use this notation for scalars. Instead we build a full JSON payload [3]. Accordingly, parsing the scalars from JSON uses the `\` operator [4]. This caused some difficulty in Bug 13363800. Specifically, it would be nice to automatically pull both histograms and keyedHistograms from a single location known to hold histograms (e.g. payload.processes.content). However, if we try something like: > val content = payload \ "payload" \ "processes" \ "content" > content \ "histograms" > content \ "keyedHistograms" We'll get an error, since histograms are stored under the key "payload.processes.content.histograms", not under the payload JSON object. [0] https://github.com/mozilla/telemetry-batch-view/blob/master/src/test/scala/com/mozilla/telemetry/LongitudinalTest.scala#L20 [1] https://github.com/mozilla/telemetry-batch-view/blob/master/src/test/scala/com/mozilla/telemetry/LongitudinalTest.scala#L184 [2]https://github.com/mozilla/telemetry-batch-view/blob/master/src/main/scala/com/mozilla/telemetry/views/Longitudinal.scala#L729 [3] https://github.com/mozilla/telemetry-batch-view/blob/master/src/test/scala/com/mozilla/telemetry/LongitudinalTest.scala#L186 [4] https://github.com/mozilla/telemetry-batch-view/blob/master/src/main/scala/com/mozilla/telemetry/views/Longitudinal.scala#L819
This is due to how we process the data in our streaming pipeline. Among other things, Heka messages are composed of a list of fields (key-value pairs) [1]. Initially, the whole JSON blob resides under the "payload" field of a message [2]. Heka parses that blob and saves parts of it (like payload.histograms) in individual fields of a new message [3] in order to avoid to re-parse the whole thing later on downstream. This clearly causes some pain during analysis though. The way we solved it in Python-land is to provide a recombined view over the split pings [4]. We could do something similar in telemetry-batch-view. [1] https://github.com/mozilla-services/heka/blob/versions/0.10/message/message.proto#L50 [2] https://github.com/mozilla-services/data-pipeline/blob/50b26837b7b9b5c60bed2091e139c30674c7f62e/heka/sandbox/decoders/extract_telemetry_dimensions.lua#L286 [3] https://github.com/mozilla-services/data-pipeline/blob/50b26837b7b9b5c60bed2091e139c30674c7f62e/heka/sandbox/decoders/extract_telemetry_dimensions.lua#L198 [4] https://github.com/mozilla/python_moztelemetry/blob/a4a3a8c1d4bcb7cbc6ab44257a08f098988a4b80/moztelemetry/heka_message_parser.py#L23
Summary: Consider refactoring Longitudinal test payload → Provide consistent view of raw telemetry pings
Summary: Provide consistent view of raw telemetry pings → Dataset API should provide consistent view of raw telemetry pings in telemetry-batch-view
Points: --- → 3
Priority: -- → P3
Closing abandoned bugs in this product per https://bugzilla.mozilla.org/show_bug.cgi?id=1337972
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → INCOMPLETE
Product: Cloud Services → Cloud Services Graveyard
You need to log in before you can comment on or make changes to this bug.