Closed
Bug 1248845
Opened 9 years ago
Closed 9 years ago
Investigate get_records inconsistency with object size and compression types
Categories
(Cloud Services Graveyard :: Metrics: Pipeline, defect, P2)
Cloud Services Graveyard
Metrics: Pipeline
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: whd, Assigned: rvitillo)
References
Details
Following up on bug #1231410.
I've uploaded various versions of the data, the parameters being snappy/no snappy and 50-250MB chunks / single object. Here's the results of calling get_records on each variation:
In [3]: records = get_records(sc, "telemetry-webrtc", submissionDate="20160101-protobuf-single"); records.count()
Out[3]: 32371
In [4]: records = get_records(sc, "telemetry-webrtc", submissionDate="20160101-snappy-single"); records.count()
Out[4]: 1838
In [5]: records = get_records(sc, "telemetry-webrtc", submissionDate="20160101-protobuf-100mb"); records.count()
Out[5]: 34300
In [6]: records = get_records(sc, "telemetry-webrtc", submissionDate="20160101-snappy-100mb"); records.count()
Out[6]: 32828
In [7]: records = get_records(sc, "telemetry-webrtc", submissionDate="20160101-snappy-250mb"); records.count()
Out[7]: 14497
In [8]: records = get_records(sc, "telemetry-webrtc", submissionDate="20160101-protobuf-250mb"); records.count()
Out[8]: 31476
In [9]: records = get_records(sc, "telemetry-webrtc", submissionDate="20160101-snappy-50mb"); records.count()
Out[9]: 34327
In [10]: records = get_records(sc, "telemetry-webrtc", submissionDate="20160101-protobuf-50mb"); records.count()
Out[10]: 34327
$ heka-cat -format count output.log
Input:output.log Offset:0 Match:TRUE Format:count Tail:false Output:
Processed: 34327, matched: 34327 messages
So when the chunks are substantially smaller than _chunk_size (200MB), we see all the records, but the larger the object size, the fewer records returned, and the problem is more apparent with snappy-encoded records.
Reporter | ||
Comment 1•9 years ago
|
||
:rvitillo, can you take a look at this? I will continue to look at this next week, but you might be able to figure it out in a more timely fashion.
Flags: needinfo?(rvitillo)
Assignee | ||
Updated•9 years ago
|
Assignee: nobody → rvitillo
Assignee | ||
Updated•9 years ago
|
Points: --- → 2
Priority: -- → P2
I did a bit of digging around;, and here are the initial results:
https://gist.github.com/Uberi/a3a92bb011c7f3b0e8dc677c91471a10
It seems like telemetry.utils.heka_message.unpack can't backtrack properly in the Snappy files. For some reason, it works for Protobuf.
I'll need to look at it some more to find the exact cause, but I suspect this is a bug in the Snappy library.
Assignee | ||
Comment 5•9 years ago
|
||
In the end we decided to remove Heka file chunking altogether, which doesn't work correctly with Snappy encoding. It was originally introduced to reduce the memory pressure. Since then some configuration changes have landed that deal with the issue in a more general way so that chunking should be no longer required.
https://github.com/mozilla/python_moztelemetry/pull/62
https://github.com/mozilla/telemetry-tools/pull/5
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Updated•6 years ago
|
Product: Cloud Services → Cloud Services Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•