Closed Bug 1346317 Opened 7 years ago Closed 7 years ago

python_moztelemetry: Add support for Heka messages with field names containing "." that are NOT json strings.

Categories

(Cloud Services Graveyard :: Metrics: Pipeline, enhancement, P3)

enhancement

Tracking

(Not tracked)

RESOLVED INCOMPLETE

People

(Reporter: mreid, Unassigned)

References

Details

The current Heka message parsing code assumes that any item in the Fields array in the message with a "." in its name contains a JSON string.

This is a fairly safe assumption for Telemetry messages, and has served us quite well for the past 2 years or so.

It causes problems when trying to use the Dataset API with other data sources that use "." in their field names, but do not contain JSON strings[1].

Once we have improved the message parsing test coverage in bug 1346312, we should improve and generalize the JSON handling code.

One way would be to start using the "representation" for Heka Field entries, flagging these fields as JSON explicitly.

Another way would be to put all the fields into the "meta" section as strings, then lazily try to parse them if/when client code tries to access that part of the document.

[1] There is some code to work around this issue in https://github.com/mozilla/mozilla-reports/blob/master/etl/sync_log.kp/knowledge.md
Depends on: 1346312
Points: --- → 2
Priority: -- → P3
Bug 1348337 added the "json" representation for such fields, and should be available once the updated code is deployed.
Depends on: 1255748
Closing abandoned bugs in this product per https://bugzilla.mozilla.org/show_bug.cgi?id=1337972
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → INCOMPLETE
Product: Cloud Services → Cloud Services Graveyard
You need to log in before you can comment on or make changes to this bug.