Closed
Bug 1271640
Opened 9 years ago
Closed 9 years ago
Calculate the exact number of Telemetry records on S3
Categories
(Cloud Services Graveyard :: Metrics: Pipeline, defect, P1)
Cloud Services Graveyard
Metrics: Pipeline
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: mreid, Assigned: trink)
Details
Attachments
(1 file)
|
2.87 MB,
application/x-compressed-tar
|
Details |
When querying for a relatively large amount of records using different methods, I keep seeing slightly different record counts.
I'd like to use Heka/Hindsight to confirm the actual record counts on S3 (as well as check for any corrupt streams) so that I can validate counts that come from the Scala and Python code.
Ideally, I'd like to count the exact number of records under the following S3 prefixes:
telemetry-2/20160401/telemetry/4/main/Firefox
telemetry-2/20160402/telemetry/4/main/Firefox
That will make it easy to compare against the counts in the "main_summary" derived dataset.
| Assignee | ||
Updated•9 years ago
|
Assignee: nobody → mtrinkala
Status: NEW → ASSIGNED
Points: --- → 1
Priority: -- → P2
| Assignee | ||
Updated•9 years ago
|
Priority: P2 → P1
| Assignee | ||
Comment 1•9 years ago
|
||
schema.json was the file selection criteria
tsv: column 1 filename, column 2 number of messages in the file
cnts-20160401.tsv
total files = 166338
total messages = 396151215
cnts-20160402.tsv
total files = 152984
total messages = 303040220
Comment 2•9 years ago
|
||
Mark checked that the Scala implementation returns the correct number of records while I did the same for the Python one. We can both confirm that the number of records match for the given dates.
| Assignee | ||
Updated•9 years ago
|
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Updated•7 years ago
|
Product: Cloud Services → Cloud Services Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•