1246137 - Add crash pings and data from nightly and aurora to "telemetry-sample".

Roberto Agostino Vitillo (:rvitillo)

Reporter

Description

•

10 years ago

Mark, this depends on your work of adding nightly and aurora to telemetry-release.

Roberto Agostino Vitillo (:rvitillo)

Reporter

Updated

•

10 years ago

Flags: needinfo?(mreid)

Mark Reid [:mreid]

Assignee

Comment 1

•

10 years ago

I propose that we do the following: - Rename 'telemetry-release' to 'telemetry-sample' - Extend it to include data for all channels (not just release+beta) - Limit it to 10% of the clientId space using the 'sampleId' field - Consider dropping any dimensions we don't need. If / when we move to a longitudinal dataset containing 100% of the data, we can simply use the "raw" telemetry-2 dataset instead, since we won't be selecting by sampleId. Question - is 10% enough? Are we likely to want more than 10% but less than 100%?

Flags: needinfo?(mreid) → needinfo?(rvitillo)

Roberto Agostino Vitillo (:rvitillo)

Reporter

Comment 2

•

10 years ago

(In reply to Mark Reid [:mreid] from comment #1) > I propose that we do the following: > - Rename 'telemetry-release' to 'telemetry-sample' > - Extend it to include data for all channels (not just release+beta) > - Limit it to 10% of the clientId space using the 'sampleId' field > - Consider dropping any dimensions we don't need. I am ok with this but it depends on what fraction of profiles we want ultimately end up with in the longitudinal dataset. > If / when we move to a longitudinal dataset containing 100% of the data, we > can simply use the "raw" telemetry-2 dataset instead, since we won't be > selecting by sampleId. > > Question - is 10% enough? Are we likely to want more than 10% but less than > 100%? We are not going to use more than 10% of the data in the short term, but I can't take the decision for other teams.

Flags: needinfo?(rvitillo)

Thomas Huelbert

Updated

•

10 years ago

Assignee: nobody → mreid

Points: --- → 2

Priority: -- → P1

Roberto Agostino Vitillo (:rvitillo)

Reporter

Comment 3

•

10 years ago

Crash pings should also be stored in the same structure on S3 with main pings.

Roberto Agostino Vitillo (:rvitillo)

Reporter

Updated

•

10 years ago

Summary: Add nightly and aurora channels to longitudinal datasets. → Add crash pings and data from nightly and aurora to "telemetry-sample".

Roberto Agostino Vitillo (:rvitillo)

Reporter

Updated

•

10 years ago

Whiteboard: loasis

Roberto Agostino Vitillo (:rvitillo)

Reporter

Updated

•

10 years ago

Blocks: 1246954

Roberto Agostino Vitillo (:rvitillo)

Reporter

Updated

•

10 years ago

Blocks: 1245490

Mark Reid [:mreid]

Assignee

Comment 4

•

10 years ago

Sent a PR to implement the above changes: https://github.com/mozilla-services/puppet-config/pull/1779

Roberto Agostino Vitillo (:rvitillo)

Reporter

Comment 5

•

9 years ago

We can't generate an updated longitudinal dataset as "telemetry-release" stopped updating the 12th and there are only a handful of dates in "telemetry-sample". Mark, could you please backfill telemetry-sample until the 15th of November?

Flags: needinfo?(mreid)

Mark Reid [:mreid]

Assignee

Comment 6

•

9 years ago

:whd is backfilling the data presently

Flags: needinfo?(mreid)

Wesley Dawson [:whd]

Comment 7

•

9 years ago

As jobs complete the data is being uploaded by a separate process to the prod bucket, starting at 20160212 and going backwards.

Mark Reid [:mreid]

Assignee

Comment 8

•

9 years ago

I just had a look and it appears the data is backfilled to Nov 15th. I noticed that all the records have "UNKNOWN" for sampleId though... any idea why? It was working previously in telemetry-release, and I didn't change that field in the PR.

Flags: needinfo?(whd)

Mark Reid [:mreid]

Assignee

Comment 9

•

9 years ago

Sorry, all the *files* on S3, not the records themselves. The actual data contains correct sampleIds.

Wesley Dawson [:whd]

Comment 10

•

9 years ago

I have taken a look at this and think I have determined the cause. I copied the code I used to set up the backfill jobs from one of :mreid's old backfill scripts: s3://telemetry-analysis-code-2/jobs/whd-sample-backfill-1/telemetry-sample-backfill-0.1.tar.gz This pulls in a heka build from http://people.mozilla.org/~mreid/heka-20150918-0_11_0-linux-amd64.tar.gz which most unfortunately was built three days before the commit that added support for non-string fields for s3 dimensions: https://github.com/mozilla-services/data-pipeline/commit/f02b014e674fbe62f47844da02a369a94053ee08 Additionally I noticed we botched the sampleId filtering logic (|| instead of &&), so we're sending the full data stream instead of a 10% sample: https://github.com/mozilla-services/puppet-config/pull/1779/files This might explain why it is requiring so many machines to process. I'm going to file a PR to fix the sampleId filtering, but we are probably going to need to do the whole backfill process again, with a newer heka and a fixed message matcher.

Flags: needinfo?(whd)

Stephen A Pohl [:spohl]

Updated

•

9 years ago

Blocks: 1251398

Comment hidden (off-topic)

Roberto Agostino Vitillo (:rvitillo)

Reporter

Updated

•

9 years ago

Blocks: 1251580

Wesley Dawson [:whd]

Comment 12

•

9 years ago

The backfill completed this weekend.

Roberto Agostino Vitillo (:rvitillo)

Reporter

Comment 13

•

9 years ago

The data looks OK so far :)

Status: NEW → RESOLVED

Closed: 9 years ago

Resolution: --- → FIXED

BMO Automation

Updated

•

7 years ago

Product: Cloud Services → Cloud Services Graveyard

Bugzilla

Add crash pings and data from nightly and aurora to "telemetry-sample".

Categories

(Cloud Services Graveyard :: Metrics: Pipeline, defect, P1)

Tracking

(Not tracked)

People

(Reporter: rvitillo, Assigned: mreid)

References

Details

(Whiteboard: loasis)

Crash Data

Security

(public)

User Story

Description

Updated

Comment 1

Comment 2

Updated

Comment 3

Updated

Updated

Updated

Updated

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Comment 10

Updated

Comment 11

Updated

Comment 12

Comment 13

Updated