Closed
Bug 1271640
Opened 8 years ago
Closed 8 years ago
Calculate the exact number of Telemetry records on S3
Categories
(Cloud Services Graveyard :: Metrics: Pipeline, defect, P1)
Cloud Services Graveyard
Metrics: Pipeline
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: mreid, Assigned: trink)
Details
Attachments
(1 file)
2.87 MB,
application/x-compressed-tar
|
Details |
When querying for a relatively large amount of records using different methods, I keep seeing slightly different record counts. I'd like to use Heka/Hindsight to confirm the actual record counts on S3 (as well as check for any corrupt streams) so that I can validate counts that come from the Scala and Python code. Ideally, I'd like to count the exact number of records under the following S3 prefixes: telemetry-2/20160401/telemetry/4/main/Firefox telemetry-2/20160402/telemetry/4/main/Firefox That will make it easy to compare against the counts in the "main_summary" derived dataset.
Assignee | ||
Updated•8 years ago
|
Assignee: nobody → mtrinkala
Status: NEW → ASSIGNED
Points: --- → 1
Priority: -- → P2
Assignee | ||
Updated•8 years ago
|
Priority: P2 → P1
Assignee | ||
Comment 1•8 years ago
|
||
schema.json was the file selection criteria tsv: column 1 filename, column 2 number of messages in the file cnts-20160401.tsv total files = 166338 total messages = 396151215 cnts-20160402.tsv total files = 152984 total messages = 303040220
Comment 2•8 years ago
|
||
Mark checked that the Scala implementation returns the correct number of records while I did the same for the Python one. We can both confirm that the number of records match for the given dates.
Assignee | ||
Updated•8 years ago
|
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Updated•6 years ago
|
Product: Cloud Services → Cloud Services Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•