python_moztelemetry: Add support for Heka messages with field names containing "." that are NOT json strings.

RESOLVED INCOMPLETE

Status

Cloud Services
Metrics: Pipeline
P3
normal
RESOLVED INCOMPLETE
8 months ago
a month ago

People

(Reporter: mreid, Unassigned)

Tracking

(Depends on: 1 bug)

Firefox Tracking Flags

(Not tracked)

Details

(Reporter)

Description

8 months ago
The current Heka message parsing code assumes that any item in the Fields array in the message with a "." in its name contains a JSON string.

This is a fairly safe assumption for Telemetry messages, and has served us quite well for the past 2 years or so.

It causes problems when trying to use the Dataset API with other data sources that use "." in their field names, but do not contain JSON strings[1].

Once we have improved the message parsing test coverage in bug 1346312, we should improve and generalize the JSON handling code.

One way would be to start using the "representation" for Heka Field entries, flagging these fields as JSON explicitly.

Another way would be to put all the fields into the "meta" section as strings, then lazily try to parse them if/when client code tries to access that part of the document.

[1] There is some code to work around this issue in https://github.com/mozilla/mozilla-reports/blob/master/etl/sync_log.kp/knowledge.md
(Reporter)

Updated

8 months ago
Depends on: 1346312

Updated

7 months ago
Points: --- → 2
Priority: -- → P3
(Reporter)

Comment 1

7 months ago
Bug 1348337 added the "json" representation for such fields, and should be available once the updated code is deployed.

Updated

7 months ago
Depends on: 1255748
Closing abandoned bugs in this product per https://bugzilla.mozilla.org/show_bug.cgi?id=1337972
Status: NEW → RESOLVED
Last Resolved: a month ago
Resolution: --- → INCOMPLETE
You need to log in before you can comment on or make changes to this bug.