Closed Bug 1248845 Opened 9 years ago Closed 9 years ago

Investigate get_records inconsistency with object size and compression types

Categories

(Cloud Services Graveyard :: Metrics: Pipeline, defect, P2)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: whd, Assigned: rvitillo)

References

Details

Following up on bug #1231410. I've uploaded various versions of the data, the parameters being snappy/no snappy and 50-250MB chunks / single object. Here's the results of calling get_records on each variation: In [3]: records = get_records(sc, "telemetry-webrtc", submissionDate="20160101-protobuf-single"); records.count() Out[3]: 32371 In [4]: records = get_records(sc, "telemetry-webrtc", submissionDate="20160101-snappy-single"); records.count() Out[4]: 1838 In [5]: records = get_records(sc, "telemetry-webrtc", submissionDate="20160101-protobuf-100mb"); records.count() Out[5]: 34300 In [6]: records = get_records(sc, "telemetry-webrtc", submissionDate="20160101-snappy-100mb"); records.count() Out[6]: 32828 In [7]: records = get_records(sc, "telemetry-webrtc", submissionDate="20160101-snappy-250mb"); records.count() Out[7]: 14497 In [8]: records = get_records(sc, "telemetry-webrtc", submissionDate="20160101-protobuf-250mb"); records.count() Out[8]: 31476 In [9]: records = get_records(sc, "telemetry-webrtc", submissionDate="20160101-snappy-50mb"); records.count() Out[9]: 34327 In [10]: records = get_records(sc, "telemetry-webrtc", submissionDate="20160101-protobuf-50mb"); records.count() Out[10]: 34327 $ heka-cat -format count output.log Input:output.log Offset:0 Match:TRUE Format:count Tail:false Output: Processed: 34327, matched: 34327 messages So when the chunks are substantially smaller than _chunk_size (200MB), we see all the records, but the larger the object size, the fewer records returned, and the problem is more apparent with snappy-encoded records.
:rvitillo, can you take a look at this? I will continue to look at this next week, but you might be able to figure it out in a more timely fashion.
Flags: needinfo?(rvitillo)
I can take this.
Flags: needinfo?(rvitillo)
Assignee: nobody → rvitillo
Points: --- → 2
Priority: -- → P2
Blocks: 1255748
I did a bit of digging around;, and here are the initial results: https://gist.github.com/Uberi/a3a92bb011c7f3b0e8dc677c91471a10 It seems like telemetry.utils.heka_message.unpack can't backtrack properly in the Snappy files. For some reason, it works for Protobuf. I'll need to look at it some more to find the exact cause, but I suspect this is a bug in the Snappy library.
In the end we decided to remove Heka file chunking altogether, which doesn't work correctly with Snappy encoding. It was originally introduced to reduce the memory pressure. Since then some configuration changes have landed that deal with the issue in a more general way so that chunking should be no longer required. https://github.com/mozilla/python_moztelemetry/pull/62 https://github.com/mozilla/telemetry-tools/pull/5
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Product: Cloud Services → Cloud Services Graveyard
You need to log in before you can comment on or make changes to this bug.