Closed Bug 1246137 Opened 9 years ago Closed 9 years ago

Add crash pings and data from nightly and aurora to "telemetry-sample".

Categories

(Cloud Services Graveyard :: Metrics: Pipeline, defect, P1)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: rvitillo, Assigned: mreid)

References

Details

(Whiteboard: loasis)

Mark, this depends on your work of adding nightly and aurora to telemetry-release.
Flags: needinfo?(mreid)
I propose that we do the following: - Rename 'telemetry-release' to 'telemetry-sample' - Extend it to include data for all channels (not just release+beta) - Limit it to 10% of the clientId space using the 'sampleId' field - Consider dropping any dimensions we don't need. If / when we move to a longitudinal dataset containing 100% of the data, we can simply use the "raw" telemetry-2 dataset instead, since we won't be selecting by sampleId. Question - is 10% enough? Are we likely to want more than 10% but less than 100%?
Flags: needinfo?(mreid) → needinfo?(rvitillo)
(In reply to Mark Reid [:mreid] from comment #1) > I propose that we do the following: > - Rename 'telemetry-release' to 'telemetry-sample' > - Extend it to include data for all channels (not just release+beta) > - Limit it to 10% of the clientId space using the 'sampleId' field > - Consider dropping any dimensions we don't need. I am ok with this but it depends on what fraction of profiles we want ultimately end up with in the longitudinal dataset. > If / when we move to a longitudinal dataset containing 100% of the data, we > can simply use the "raw" telemetry-2 dataset instead, since we won't be > selecting by sampleId. > > Question - is 10% enough? Are we likely to want more than 10% but less than > 100%? We are not going to use more than 10% of the data in the short term, but I can't take the decision for other teams.
Flags: needinfo?(rvitillo)
Assignee: nobody → mreid
Points: --- → 2
Priority: -- → P1
Crash pings should also be stored in the same structure on S3 with main pings.
Summary: Add nightly and aurora channels to longitudinal datasets. → Add crash pings and data from nightly and aurora to "telemetry-sample".
Whiteboard: loasis
Blocks: 1246954
Blocks: 1245490
Sent a PR to implement the above changes: https://github.com/mozilla-services/puppet-config/pull/1779
We can't generate an updated longitudinal dataset as "telemetry-release" stopped updating the 12th and there are only a handful of dates in "telemetry-sample". Mark, could you please backfill telemetry-sample until the 15th of November?
Flags: needinfo?(mreid)
:whd is backfilling the data presently
Flags: needinfo?(mreid)
As jobs complete the data is being uploaded by a separate process to the prod bucket, starting at 20160212 and going backwards.
I just had a look and it appears the data is backfilled to Nov 15th. I noticed that all the records have "UNKNOWN" for sampleId though... any idea why? It was working previously in telemetry-release, and I didn't change that field in the PR.
Flags: needinfo?(whd)
Sorry, all the *files* on S3, not the records themselves. The actual data contains correct sampleIds.
I have taken a look at this and think I have determined the cause. I copied the code I used to set up the backfill jobs from one of :mreid's old backfill scripts: s3://telemetry-analysis-code-2/jobs/whd-sample-backfill-1/telemetry-sample-backfill-0.1.tar.gz This pulls in a heka build from http://people.mozilla.org/~mreid/heka-20150918-0_11_0-linux-amd64.tar.gz which most unfortunately was built three days before the commit that added support for non-string fields for s3 dimensions: https://github.com/mozilla-services/data-pipeline/commit/f02b014e674fbe62f47844da02a369a94053ee08 Additionally I noticed we botched the sampleId filtering logic (|| instead of &&), so we're sending the full data stream instead of a 10% sample: https://github.com/mozilla-services/puppet-config/pull/1779/files This might explain why it is requiring so many machines to process. I'm going to file a PR to fix the sampleId filtering, but we are probably going to need to do the whole backfill process again, with a newer heka and a fixed message matcher.
Flags: needinfo?(whd)
Blocks: 1251398
Blocks: 1251580
The backfill completed this weekend.
The data looks OK so far :)
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Product: Cloud Services → Cloud Services Graveyard
You need to log in before you can comment on or make changes to this bug.