Closed Bug 1271640 Opened 8 years ago Closed 8 years ago

Calculate the exact number of Telemetry records on S3

Tracking

(Not tracked)

Status:

RESOLVED FIXED

People

(Reporter: mreid, Assigned: trink)

Details

Attachments

(1 file)

Counts per file for each day 8 years ago Mike Trinkala [:trink] 2.87 MB, application/x-compressed-tar		Details

Mark Reid [:mreid]

Reporter

Description

•

8 years ago

When querying for a relatively large amount of records using different methods, I keep seeing slightly different record counts.

I'd like to use Heka/Hindsight to confirm the actual record counts on S3 (as well as check for any corrupt streams) so that I can validate counts that come from the Scala and Python code.

Ideally, I'd like to count the exact number of records under the following S3 prefixes:

telemetry-2/20160401/telemetry/4/main/Firefox
telemetry-2/20160402/telemetry/4/main/Firefox

That will make it easy to compare against the counts in the "main_summary" derived dataset.

Mike Trinkala [:trink]

Assignee

Updated

•

8 years ago

Assignee: nobody → mtrinkala

Status: NEW → ASSIGNED

Points: --- → 1

Priority: -- → P2

Mike Trinkala [:trink]

Assignee

Updated

•

8 years ago

Priority: P2 → P1

Mike Trinkala [:trink]

Assignee

Comment 1

•

8 years ago

Attached file Counts per file for each day — Details

schema.json was the file selection criteria

tsv: column 1 filename, column 2 number of messages in the file
cnts-20160401.tsv
  total files    = 166338
  total messages = 396151215
cnts-20160402.tsv
  total files    = 152984
  total messages = 303040220

Roberto Agostino Vitillo (:rvitillo)

Comment 2

•

8 years ago

Mark checked that the Scala implementation returns the correct number of records while I did the same for the Python one. We can both confirm that the number of records match for the given dates.

Mike Trinkala [:trink]

Assignee

Updated

•

8 years ago

Status: ASSIGNED → RESOLVED

Closed: 8 years ago

Resolution: --- → FIXED

BMO Automation

Updated

•

6 years ago

Product: Cloud Services → Cloud Services Graveyard

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Quick Search

Calculate the exact number of Telemetry records on S3

Categories

(Cloud Services Graveyard :: Metrics: Pipeline, defect, P1)

Tracking

(Not tracked)

People

(Reporter: mreid, Assigned: trink)

References

Details

Crash Data

Security

(public)

User Story

Attachments

(1 file)

Description

Updated

Updated

Comment 1

Comment 2

Updated

Updated

Attachment

General

Description

File Name

Content Type